How to check a file's Unicode encoding with no tools

Posted: 5.12.2008 19.44.50 (EET/GMT+2)

Sometimes when working with XML documents and/or web applications, you need to know whether a XML, HTML, ASPX or a text file is Unicode encoded. This is very easy of course if you have a text editor, Visual Studio or a development tool available, but sometimes you don't. How then could you detect file's Unicode encoding and the presence of a byte order mark (BOM) without any tools except those that Windows has to offer?

Luckily, there are at least two options: one is to use Notepad, and the second is to use the "type" command in the command shell (cmd.exe or "DOS prompt"). Here's how to do it.

First, Notepad. To detect the Unicode encoding, open the file with Notepad. Then, choose File/Save As, and the Save As dialog box opens. This dialog box has a field called Encoding. This field will by default have the value that corresponds to the encoding that the file currently has. It is for example UTF-8, or ANSI if there's no Unicode BOM in the file.

The second option is to use the command-line command "type" which is available the Windows Command Prompt. Then, simply type the file in question to the console. Since type doesn't understand about Unicode BOMs, then if the file's two or three first characters are garbage (sort of), then you know that the file is in fact Unicode encoded. Unicode BOMs are discussed for example at www.unicode.org.

You could call this the "Unicode tip of the week".