The circle of life – ** .Net – Training – MOSS **

  • Past Post

  • Vistor Locations

  • Advertisements

Office 12 Open XML file format

Posted by Clayton James on August 23, 2006

This is a follow on to my previous blog about working in the office 12 (O12) space at Tech.Ed.

Working with the XML file format.

This is one feature that office developers and system integrators will love. Everything under the covers is XML, not binary! When you save a word or excel document in O12 it will have an extension like .docx or .xlsx. If you add a .zip extension to the file and open the file up using winrar then you will see all of the .xml files in various folder directories. These files are grouped together in what is termed as a package. A word package will consist of three root folders.

1/ word: This includes other subfolders and .xml files but basically stores the content and themes/styles.

2/ docProps: This stores the core document properties.

3/ _rels: This stores the relationship information between xml files to merges these files on open.

Some .xml files will contain the data of the document, some .xml files will contain the author, version, title, etc…, and there are .xml files that manage the relationship between .xml files to merge it all together at runtime. Very different to binary format (which can also still be completed for backwork compatibility) and is Microsoft’s way of stating that this is the future…..XML = interoperability = integration.

This structure also means that you don’t need to run an office desktop application on the server to modify a document. No one likes installing client apps on a server.

So to get properties, say title and author out of these packages you need to reference the System.IO.Packaging namespace and do the following.

Package package = Package.Open(fullFileName, FileMode.Open);

// Get core properties

XmlDataDocument doc = new XmlDataDocument();

Uri uri = new Uri(“/docProps/core.xml”, UriKind.Relative);

PackagePart propsPart = package.GetPart(uri);


XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);



nsmgr.AddNamespace(“dc”, “http://purl.org/dc/elements/1.1/”);

titleTextBox.Text =  doc.SelectSingleNode(“cp:coreProperties/dc:title”, nsmgr).InnerText;

authorTextBox.Text =  doc.SelectSingleNode(“cp:coreProperties/dc:creator”, nsmgr).InnerText;

So as you can see, it is all Xml under the covers and you can traverse through the Xml node collection and retrieve any data you want. This example demonstrates retrieving the properties of a word document but by simply changing the namespace and some SelectNode arguments you can retrieve data from the document. This example lists all of the contents of the document that have a Heading2 style applied and displays them into a list box.

// Get contents


doc = new XmlDataDocument();

uri = new Uri(“/word/document.xml”, UriKind.Relative);

PackagePart docPart = package.GetPart(uri);


nsmgr = new XmlNamespaceManager(doc.NameTable);

nsmgr.AddNamespace(“w”, “http://schemas.openxmlformats.org/wordprocessingml/2006/3/main”);

foreach (XmlNode node in doc.SelectNodes(“//*[name()=’w:pStyle’]”, nsmgr))


    if (node.Attributes[“w:val”].Value == “Heading2”)


        System.Xml.XPath.XPathNavigator nav = node.CreateNavigator();



        nav.MoveToChild(“t”, “http://schemas.openxmlformats.org/wordprocessingml/2006/3/main”);




Sharing/Integrating data between applications has never been so easy 🙂


One Response to “Office 12 Open XML file format”

  1. rajiv said

    This helped. Thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: