Finding information about code pages common in .NET and Windows development
Posted: (EET/GMT+2)
I'm currently involved in a project that takes good use of older hardware, and in this particular solution, data gets stored in various formats in different code pages. Windows has good support for many different code pages, and this also includes .NET based applications. My application is written in C#, which means I'm using the System.Text.Encoding class (example here) to do the job.
However, finding information about these different codes pages can be tricky, so I wanted to share a couple of good links on MSDN. Firstly, there's a list of different code pages supported by Windows, i.e. the Windows code pages, OEM code pages and the ISO code pages.
Next, there's a list of code page identifiers which contains the code page numbers as well as their .NET programming identifier.
Here's a list of common Windows code pages, especially here in Europe:
- OEM 437, i.e. IBM US code page
- OEM 850, i.e. OEM Multilingual Latin 1; Western European (DOS) code page
- Windows 1252, i.e. ANSI Latin 1; Western European (Windows) code page
Since it's nice to remind oneself about older things in the IT (or should I say computing?) world, here's a list of ASCII control characters and their names.
- 00h = NUL: NULL
- 01h = STX: START OF HEADING
- 02h = SOT: START OF TEXT
- 03h = ETX: END OF TEXT
- 04h = EOT: END OF TRANSMISSION
- 05h = ENQ: ENQUIRY
- 06h = ACK: ACKNOWLEDGE
- 07h = BEL: BELL
- 08h = BS: BACKSPACE
- 09h = HT: HORIZONTAL TABULATION
- 0Ah = LF: LINE FEED
- 0Bh = VT: VERTICAL TABULATION
- 0Ch = FF: FORM FEED
- 0Dh = CR: CARRIAGE RETURN
- 0Eh = SO: SHIFT OUT
- 0Fh = SI: SHIFT IN
- 10h = DLE: DATA LINK ESCAPE
- 11h = DC1: DEVICE CONTROL ONE
- 12h = DC2: DEVICE CONTROL TWO
- 13h = DC3: DEVICE CONTROL THREE
- 14h = DC4: DEVICE CONTROL FOUR
- 15h = NAK: NEGATIVE ACKNOWLEDGE
- 16h = SYN: SYNCHRONOUS IDLE
- 17h = ETB: END OF TRANSMISSION BLOCK
- 18h = CAN: CANCEL
- 19h = EM: END OF MEDIUM
- 1Ah = SUB: SUBSTITUTE
- 1Bh = ESC: ESCAPE
- 1Ch = FS: FILE SEPARATOR
- 1Dh = GS: GROUP SEPARATOR
- 1Eh = RS: RECORD SEPARATOR
- 1Fh = US: UNIT SEPARATOR
When the JSON syntax started taking hold after XML, many preferred JSON because it was less verbose. Well, had those guys (and gals) behind JSON looked at the beginning of the ASCII table, we could have a even more compact way of expressing things.
Happy hacking!