FastText encoding issues in Windows using PowerShell piping

Skylar Sadlier

I do programming for a living and for fun. I'm also interested in all things tech. Want to chat? Hit me up on Matrix! @skylord123:skylar.tech

More posts by Skylar Sadlier.

Skylar Sadlier

20 Jul 2019 • 1 min read

I wrote a pre-processing script in Python for cleaning up my data before giving it to FastText. I was running this in Windows and instead of writing the file output code I decided to use piping (via the > character).

Anyways, I was having issues with my nearest neighbor queries only returning single characters and when doing the training the word counts were completely wrong. I tried searching the internet for some sort of solution but alas didn't find anyone having this issue so I want to make sure I document it so others can find it in the future.

I noticed if I opened the result in Sublime, copied the text to a new tab in Sublime, then save it my issue would go away. This lead me to believe the issue was with my file and not FastText. The strange thing was that I used Intellij and Sublime for checking the encoding type on the file and both said it was UTF-8. My co-worker told me to try Notepad++ (hadn't used it in years) so I downloaded it and checked the encoding and it showed the file was not using UTF-8 (can't remember off the top of my head exactly what the encoding ended up being but it was definitely not UTF-8). This bothered me a little bit that I had to go through 2 IDEs before I found the correct encoding but that's life for you.

The Fix

You can fix the PowerShell encoding using this command:

$PSDefaultParameterValues = @{'Out-File:Encoding' = 'utf8'}

And now piping in PowerShell will use UTF-8 (you can also change it to whatever you want). This will have to be run every time PS is started.

FastText encoding issues in Windows using PowerShell piping

Skylar Sadlier

Skylar Sadlier

The Fix

Fix for concurrent requests breaking Symfony's Remember Me

Doctrine Fixtures always exclude table

Convert Doctrine entities to arrays for checking changes

Dim Home Assistant lights when Chromecast is playing

ColdFusion executing compiled Python exe

The Fix

Subscribe to Skylar.Tech

Subscribe to Skylar.Tech