Reading Office 2007 XML file formats from C#
Posted: (EET/GMT+2)
Now that Office 2007 is here with its XML based file formats (such as .docx for Word and .xlsx for Excel), the question about how to read these files from .NET/C# code quickly arises.
Luckily, MSDN has a new article to demonstrate this. The article shows how to unpack the ZIP file format using the classes in the System.IO.Packaging namespace, and then use regular XML techniques to read data from an Excel spreadsheet. There are also Code Snippets available to extend Visual Studio's IntelliSense.
Of course, you will also need to understand the file formats. Luckily, the specifications are public, and an ECMA-376 standard. The specification documents are available for free in five parts. It is interesting to note how the Office documents are much smaller than the PDF counterparts. Co-incidence?
Also, eWeek also just reported that the document specifications will probably get an ISO standard stamp before the cold autumn evenings are here.