Office

TechEd, Open XML, and HDR Photography

While being at TechEd in Orlando, FL, last week, I had lunch with Doug Mahugh and we talked about the upcoming ODF support in Office 2007 SP2, the new features in the Open XML SDK, Altova's new support for Open XML diff/merge in DiffDog, creation of Open XML from StyleVision, and data integration and mapping for Open XML in MapForce, as well as various other XML-related topics.

We also talked about some other industry topics and finally came to chat about HDR (high dynamic range) photography. Doug sent me a few links to some of his recent photos, and this one impressed me the most.

I couldn't help it and had to get the software the same day. However, as I had left my Canon SLR camera at home for this trip, I wasn't able to test-drive HDR imaging until I got back home today:

1X5F2686_7_8

Obviously, this isn't a particularly exciting scene - I just shot from our balcony towards the end of the cul-de-sac. I used an automatic exposure bracketing of ±2 and loaded all three images into Photomatix and then played with some settings in the tone-mapping to create some vibrant and surreal colors.

But I still like the result quite a bit - it makes me want to go out and take some HDR photos of Marblehead harbor and experiment with other local scenes where the high dynamic range can come into play nicely.

Inbox Zero

This is a bit off-topic and might even be old news for some, but I recently stumbled across this video of a great e-mail productivity enhancing talk titled "Inbox Zero" by Merlin Mann. For further information, see his series of blog postings on the same topic on 43folders.com.

This very closely reflects my personal policy of dealing with e-mail, with the main difference being that once I'm done processing a message, I archive my e-mail into a variety of hierarchical folders instead of just one big archive folder - primarily for easier retrieval from a mobile device.

Another productivity tip for e-mail: keep your replies short and sweet. Maybe as short as five.sentenc.es? I haven't managed to adopt that one yet...

OOXML vs. ODF - the "battle" is heating up as we get closer to the ISO BRM date

I wrote about the Burton Group's report "What's Up, .DOC?" before on the XML Aficionado blog, and it didn't take long for the ODF Alliance to write a scathing rebuttal to the Burton Group report. Ironically, that rebuttal was published in PDF format, not ODF...

Before we take a look at what happens next, maybe it is time for a short review of the various acronyms and abbreviations that are commonly used in these reports, discussions, and in related blogs:

OOXML Office Open XML is an XML-based file format specification for electronic productivity application documents, such as spreadsheets, charts, presentations, and word processing documents. Originally developed by Microsoft, it is already an Ecma standard and widely used due to its implementation by Microsoft Office 2007. It is currently in the process of being proposed as an ISO standard.
ODF Open Document Format is a file format for electronic office documents, originally developed by Sun for the OpenOffice.org office suite and then later standardized through OASIS and ISO.
ISO International Organization for Standardization
BRM Ballot Resolution Meeting is the ISO process by which comments received during the previous ISO FastTrack vote and letter ballot phase are resolved by the meeting, during which national bodies and the submitting entity (Ecma) will possibly agree on a set of revisions to the originally submitted standard text. The DIS-29500 BRM is scheduled for February 25-29, 2008, in Geneva.
DIS-29500 The official ISO name and standard number for OOXML
OASIS Organization for the Advancement of Structured Information Standards; a non-profit consortium that defines open standards for the global information society
Ecma Originally this was called the European Computer Manufacturers Association, but their new name is Ecma International - European association for standardizing information and communication systems.
XML eXtensible Markup Language as defined by the W3C in 1998. Probably the most important standard of them all, because both OOXML and ODF are built on top of XML. If you don't know it already, you should definitely learn XML... :)

 

So what's new with the OOXML vs. ODF debate now that we are only two weeks away from the ISO BRM? Earlier this week, the Burton Group responded to the ODF Alliance's rebuttal in a series of three postings by Guy Creese on the Collaboration and Content Strategies Blog, and you can find them here: Part 1, Part 2, and Part 3. In this response, the Burton Group addresses each criticism from the ODF Alliance point by point.

Also, Slashdot reported this week on the Ecma response to the ISO comments and the recent blog post from Russel Ossendryver (an open source and ODF advocate) criticizing the Ecma response.

If you prefer some demos over reading thousands of pages of specifications, you may find these videos interesting that have been posted on YouTube recently: a video of Open XML on the iPhone, as well as a video of Native Open XML support on Mac OS X. Both videos show support of OOXML on Apple's platforms, yet Martin Bekkelund (a proponent of Norway's "no" vote on DIS-29500) writes on his blog today about some headaches he's had with OOXML on the Mac and his iPhone. I was curious about his allegation that a .DOCX on the iPhone produces an error message, so I had to try it myself - and I am happy to report that any .DOCX attachment received on my iPhone (running the 1.1.3 software) displays beautifully and works pretty much exactly like it is shown in the YouTube video above.

More commentary and further information can be found on Michael Desmond's blog, as well as in previous OOXML-related posts on this XML Aficionado blog. Also keep in mind that the best way to learn OOXML is to start experimenting with it, and I recently wrote a longer article on Content reuse with Open XML and XSLT to show exactly how easily it can be done using the built-in OOXML support in your favorite XML Editor.

One thing is certain: everybody will be watching the outcome of the ISO BRM very closely...

Content reuse with Open XML and XSLT

While Open XML may not yet be an ISO standard, it is already standardized by ECMA and - even more important - all documents created by Office 2007 are already stored in Open XML by default, so there is an abundance of documents whose content you can now reuse much more easily and productively than ever before. So instead of waiting for the ISO vote or paying too much attention to all the political battles being fought around it, I want to show you how you can already take advantage of Open XML (sometimes also called OOXML or Office Open XML) today.

This is the first article in a series of blog postings that I plan to write about practical Open XML tips & tricks, so I encourage you to subscribe to my XML Aficionado blog (via RSS or via e-mail), if you haven't already done so. This will ensure that you get future articles from this series automatically as soon as I post them.

So let's look at an Open XML document in our favorite XML Editor. For this example I am going to use a WordprocessingML document (.docx) that I have created with Microsoft Office Word 2007. When I open the .docx file in XMLSpy, I immediately get to see the contents of the package file, which is structured according to the Open Packaging Convention.

That's a fancy way of saying that it is a ZIP file that contains specific files and directories that make up the content, structure, styles, relationships, and other parts of the document. Using XMLSpy's built-in capability to open any ZIP-formatted archive, I can directly browse any directory structures inside the ZIP package, add new files to the package, or open any existing XML file contained in the package:

OOXML1

For the purpose of reusing the content from this WordprocessingML example file, I am going to open the 'document.xml' file, which contains the content of the document.

As soon as I double-click the file in the ZIP archive, the XML is displayed in a separate window just like any other XML document and I can use the powerful grid view or text view features of XMLSpy to view or edit the XML data (sometimes it may be useful to invoke the pretty-print function in text view to make the file more easily readable):

OOXML2

This is, of course, a live editing view, so you can not only view the Open XML data, but make any changes to the XML and save it back into the package file.

But now let's look at how we can easily reuse content from this Open XML document using XSLT. XMLSpy ships with a few Open XML example documents as well as example XSLT stylesheets for just that purpose. Let's look at the 'docx2html.xslt' stylesheet, which takes a WordprocessingML document and extracts all paragraphs to turn them into HTML. This example stylesheet is by no means intended to be a fully-featured conversion tool from .docx to HTML. Instead it serves as a blue-print of how to reuse content from a .docx file and hopefully will serve as a starting point for your stylesheet development efforts.

At the core of that XSLT stylesheet we need a <xsl:for-each> loop to iterate over all the <docx:p> elements, which it turns into simple HTML <p> paragraphs. The text inside the paragraphs is grouped into runs of characters that share common attributes, and so we need an inner <xsl:for-each> loop to iterate over those <docx:r> elements and extract the text from their <docx:t> text node children. Thus the most primitive content reuse that only extracts the text of all paragraphs looks like this:

XSLT1

Once we have constructed those loops, we can start to think about perhaps extracting and reusing some style information. To do that, we now emit a <span> HTML element for every <docx:r> run of characters and give it a style attribute, whose value will depend on the <docx:rPr> element, so we use <xsl:apply-templates> to decide what HTML style we want to apply to the <span> elements:

XSLT2

The corresponding templates for the three most common styles (bold, italic, underline) are trivially easy to construct and look like this:

XSLT3

With just a few lines of XSLT and a few templates we have already written a stylesheet that extracts the basic paragraphs and most important styles from a WordprocessingML document and turns them into HTML that can be viewed in the browser view - here is the result produced from running the above XSLT stylesheet on the example WordprocessingML document that you can find in the XMLSpy examples directory:

OOXML4

Similarly, it is quite easy to extend the stylesheet to extract meta information, other styles, or image information from the WordprocessingML document and reuse the content for any modern application scenario, from web publishing via HTML, RSS, or social media formats to mobile web applications and beyond.

"But wait! How can I apply an XSLT stylesheet to an XML document that is stored within a ZIP file?", you might ask.

You can, of course, extract all the XML files using a regular ZIP expander, but there is a much better solution: when you use the document() function in XSLT 2.0 within XMLSpy or with our royalty-free XSLT engine AltovaXML, you can directly access files contained in a ZIP archive by using the "|zip" pipe operator within the filename, e.g. "MyDocument.docx|zip\_rels\.rels" will address the Relationship file ".rels" in the archive directory "\_rels" inside the ZIP package with the file named "MyDocument.docx".

The benefits of using XSLT to reuse content from Open XML documents are obvious: because XSLT is a cornerstone of the core set of XML standards from the W3C, you can apply all your existing XML, XPath, and XSLT know-how and you can use the excellent tools support that is available for these standards. For example, you can easily develop and debug your XSLT stylesheet using the powerful XSLT debugger in XMLSpy, which allows you to single-step through the transformation, set breakpoints on XSLT instructions or even on data nodes in your Open XML document, view the partially generated output, and inspect the state of the XSLT processor in detail as the output document is constructed:

OOXML3

Using the XSLT Debugger eliminates a lot of the pain that is normally associated with XSLT stylesheet development and allows for a very iterative approach to creating and improving stylesheets that facilitate content reuse and repurposing.

To sum it up, reusing content from Open XML documents for a variety of web applications, mobile scenarios, or social media and Web 2.0 contexts is very easy and can be achieved with standard XML-related technologies, such as XSLT.

For additional information on Open XML and how to take advantage of all the content that is now already available in that format, please refer to the following sites:

Binary and OOXML office formats - interesting news

Brian Jones has a nice post on (a) making the documentation for binary Office file formats available more easily and (b) mapping from binary formats to OOXML:

The binary formats documentation will be available publicly by February 15, 2008 and the file formats will also be subject to the Microsoft Open Specification Promise.

Regarding mapping from binary to OOXML formats (and back), Microsoft promised to start an open source effort on SourceForge for that purpose.

This also made it onto TechMeme today...

OOXML and ODF report from Burton Group: "What's Up, .DOC?"

Burton Group has released a new report "What’s Up, .DOC? ODF, OOXML, and the Revolutionary Implications of XML in Productivity Applications" this week. Written by Peter O'Kelly (blog) and Guy Creese (blog), the report provides a deep and insightful analysis of the current state of ODF, OOXML, and other document formats (W3C, PDF), the history of those formats, and then continues to parse through the FUD and standardization games to arrive at a set of projections regarding the success of OOXML and ODF, as well as a set of practical recommendations.

I've read the report already, but am not going to spoil the fun for you and reveal all the conclusions here - the report is well worth your time and you should read it yourself! However, I will say this much: the report validates some of my thinking on the subject that I have expressed in various previous blog posts on OOXML here.

There is a quick overview of the report on the Burton Group's Collaboration and Content Strategies blog, and you can download the entire report for free after filling out a registration form.

Other early blog reactions to the new report are here:

And I am sure more blog reactions will follow next week.

As always, if you are interested in working with Office Open XML (OOXML) files, a great place to get started is to look at the OOXML support in XMLSpy and to download a free evaluation version of Altova's XML Editor and give it a try.

Office Open XML fails to win ISO approval (so far)

This week Microsoft failed to win ISO approval for the Office Open XML (OOXML) standard in the 2nd round of the ISO standardization process. While the Wall Street Journal published a critical article about this, this is by no means the end of the road, nor is it even a major setback - the ISO process commonly requires multiple rounds, and the 3rd round (expected for early 2008) will very likely see Office Open XML becoming an ISO standard.

As Burton Group analyst Peter O'Kelly observes in his blog: "FWIW I still expect Open XML to become an ISO standard -- and it's reassuring to see the spec/design improved by the standardization process." - I couldn't agree more.

Irrespective of the timeline of this ISO standardization process, I expect OOXML to quickly become a de-facto standard as more and more corporations and enterprises upgrade from Office XP or 2003 to Office 2007 and start generating tons of documents in the new OOXML formats (which is the default setting for new Office 2007 file).

Having all these Word documents, Excel spreadsheets, etc. available in OOXML lays the groundwork for a huge increase in content reuse and repurposing, because the data - being XML - can now be accessed through other applications, transformed through stylesheets, and integrated with other enterprise data.

To get started with OOXML, software developers can simply open OOXML documents in XMLSpy and start working with the XML data inside, then use XMLSpy to create and debug XSLT 2.0 stylesheets that directly access the OOXML data through the AltovaXML processing engine.