Office Open XML

Injunction to prevent Microsoft from selling Word due to XML

It appears that there are just a handful of truly ubiquitous PC applications out there that exist on almost every single computer on the planet and Microsoft Word is certainly among that bunch. So it is an interesting twist that a Texas judge ruled yesterday in an injunction that Microsoft can no longer sell Word (version 2003 and 2007) starting in 60 days because they can handle XML data.

In a lawsuit filed in 2007 i4i (based out of Canada) said that Word violated its 1998 patent No. 5,787,449 on a method for reading XML.

If you are like me and prefer to read all the legal details yourself, here are the relevant links (thanks to The Microsoft Blog for the PDFs):

Needless to say, Microsoft will either appeal the injunction, try to invalidate the patent, or – most likely – settle and write a big check to i4i, but this just shows yet another aspect of the broken patent system when it comes to software patents especially in relations to standards such as XML.

It is interesting to note that the patent appears to deal primarily with representing any document in XML, which appears to be slightly ludicrous given the long history of SGML prior to 1998. Also interesting is that the injunction doesn't just talk about XML, but rather specifically mentions .DOCX (i.e. Open Office XML), which is used by Word 2007 as its default storage format.

More coverage on techmeme.com

Amazon Kindle 2 Review

My Kindle 2 arrived from Amazon today! It appears that I am lucky in this respect, because Amazon had originally announced the ship date as February 25th, and most people are still waiting for their unit to show up. But I had ordered mine literally within 10 minutes of the announcement - so I guess being an early adopter finally got awarded...

As I had promised a few weeks ago, I am providing a review of the new Kindle 2 as a follow-up to my popular original Kindle Review from November 2007. Just like with the previous review, this one is based on unpacking the Kindle 2 and working with the device for about 2-3 hours. I plan to add information about long-term issues such as battery-life in a future blog posting once I have accumulated several days of usage of the Kindle 2.

Unpacking the Kindle 2 is fun. Just like the Kindle 1 the packaging is well-designed and this  resembles a shipping box with a "tear here to open" strip on one side. The package contains the Kindle itself, a thin "Read me" brochure, and the charging cable. The screen of the device shows instructions to plug it in and then push the power switch on top of the unit - for those that hate even the shortest of manuals.

Once you turn the Kindle 2 on, you immediately get to read the User's Guide on the screen, or you can skip ahead and press the Home button to get to your main library page.

Before I talk about the improvements in the software, let's take a look at all the improvements in the hardware of the device compared to the Kindle 1:

  • The Kindle 2 looks much more polished or refined and gets rid of some of the edginess of the original unit. It feels more "solid" and less flimsy, which may also be due to the fact that it is about 10g heavier (468g with book cover for the Kindle 2 compared to 458g for the Kindle 1).
  • The Kindle 2 now locks into place in the book cover / sleeve that you can order from Amazon. The original Kindle fell out of that cover far too often, so this is a great improvement.
  • Another annoying "feature" of the Kindle 1 is now a thing of the past, too: accidental clicks on the Next or Prev buttons. The buttons on the Kindle 2 are still on the very edge of the unit, but the buttons now have their pivot point on the outside edge and need to be clicked inward, which completely prevents accidental clicking. Very clever design change!
  • The new Kindle 2 gets rid of the shiny silvery and strange LCD sidebar that the old unit used to provide a selection cursor on the page or within a menu. Since the new display is much faster and more responsive, the selection feedback is now directly shown on the main screen.
  • The Kindle 2 has a better position for the power switch (top left of the unit) and gets rid of clumsy wireless on/off hardware switch on back of unit, too.
  • It comes with a better power adapter (mini USB plug on Kindle, charger cable can either use desktop USB plug or wall outlet), which is similar to what the iPhone charger from Apple does.
  • I'm lucky to be in a Spring 3G network coverage area, and so I found the unit to have much faster downloads using Amazon 3G Whispernet (only in areas where 3G EVDO service is available). This was especially noticeable when I downloaded all my previous purchases to the new device.
  • The new 16-grayscale display is great, especially for viewing web content, such as Wikipedia, newspapers, or blogs. It's probably not the most important feature, but certainly nice to have and much easier on the eyes than the old display when rendering images.
  • I never really liked the hardware on/off switch in the back or the sleep mode on the old Kindle, but this is now all much more user friendly and consistent: wake-up from sleep mode is now done using power-button instead of "Alt-AA", and it is much more responsive; pushing the power button briefly puts Kindle in sleep mode (artwork screen saver is shown); and pushing the power button for 4-5 sec turns the Kindle off.

In addition to these hardware changes, the Kindle 2 also apparently offers some improved software that contains several usability enhancements. Some of those are more network features and I assume they will be available as an upgrade on the old units, too, but I haven't heard any details about such an upgrade yet. Anyway, here are the software enhancements that I found notable:

  • The first positive surprise was how easy it was to migrate books from my old to the new Kindle. There are essentially two different upgrade paths: you can either just turn on the new Kindle and from the home page access "Archived Items" and it will show you all previously purchased books that are available in your Amazon account and you can download them right there. Alternatively, you can user your computer to go to the Amazon.com website and use the "Manage your Kindle" page to view a list of all your previously purchased Kindle books and send them to the new Kindle from that list.
  • The Kindle 2 apparently has a faster processor, so it comes with Text-to-Speech software built in. You can turn this on from the font-size menu or from the main menu, and you can customize reading speed as well as male/female voice. A nice feature is that the Kindle auto-turns the pages for you if you are using Text-to-Speech so you can still follow the text as it is being read to you. A neat feature, but the Text-to-Speech engine makes the usual pronunciation errors...
  • Another neat feature is the ability to sync devices, if you have more than 1 Kindle. This lets you read a book on one device and then continue from the exact same page on another device, if they are both linked to your Amazon account.
  • The search function now offers 6 choices: search my items (i.e. all books, documents, subscriptions on the Kindle locally); search the kindle store; search google; search dictionary; search wikipedia; and go to web, which lets you enter a URL. The same choices are also directly available from the address bar in the built-in browser, which seems to have gotten some improvements in usability.

So much for the positive experiences with the new Kindle 2. But not everything is perfect and there are a few disappointments that I experienced when playing with the device on the first day:

Mainly, the built-in browser still leaves much to be desired. It is not quite clear to me why it is not built on WebKit like Safari or Chrome to provide proper rendering of HTML pages. If a device like the iPhone that is less than half the Kindle's screen size can render web pages beautifully and accurately, then why can't the Kindle? This is a very bothersome oversight - especially when open source browser packages are readily available in the form of FireFox or WebKit.

Another issue: no doubt it is great that one can shop in the Kindle Store on Amazon.com using the Kindle, which allows you to buy new books on the road and has been a feature of the Kindle 1 from the start (see left). But the world has changed since November of 2007! On my iPhone I can use the Amazon.com iPhone app today and shop all of Amazon.com - not just the Kindle store. Why can I not order a DVD from my Kindle or shop for new electronic gadgets? It doesn't make any sense to just limit the Kindle application to shopping for Kindle books only....

Also, Amazon has unfortunately failed to address the following points that I had raised in my initial Kindle 1 Review over a year ago:

  • It is great that I can send PDF and Word docs to my Kindle via my personalized kindle.com e-mail address. But that is not enough. When I place annotations, notes, and highlights in such documents on my Kindle, I now want to be able to e-mail them back to my office e-mail address and I want to see those comments, annotations, notes, and highlights back in the Word or PDF doc so that I can send it to others in the company. This would allow me to use the Kindle for actually reviewing business documents – it would be fantastic!!!
  • How can I get additional blogs on the Kindle? I am happy to pay extra, but I want to be able to enter any RSS feeder URL into my Amazon account and create a Kindle blog feed for it. Blog authors can now sign up with Amazon to publish their blogs on Kindle, but as a consumer I would like to be able to pick a niche blog and pay for it myself - that still doesn't work.
  • It would be nice, if Amazon could integrate some Social Networking aspects into the Kindle. How many of my friends are reading books on it? What are they reading? How can I post comments about a book to my blog? How can I tell my friends about comments I have on a book?

Last, but not least, I wanted to test whether the Kindle 2 can now receive and process Open Office XML (OOXML) documents via the personalized e-mail address, and I was indeed able to receive, read, and review a DOCX document in WordprocessingML that I had created from an XML source with Altova StyleVision 2009.

So the overall verdict is: definitely a great improvement over the first generation Kindle, and still one of the best eBook readers in my opinion. But it leaves a few things to be desired - especially in the iPhone-age....

Is it worth to upgrade from the Kindle 1? I would say only if you have kids or other family members whom you plan to give the Kindle 1 unit to. The improvements from the Kindle 1 are certainly nice, but they are more incremental than revolutionary.


UPDATE: The Kindle's Secret has been revealed by XKCD:


Avenue Q and French Open XML

I'm spending a few days in New York with the family and we just saw Avenue Q tonight - absolutely fantastic. I haven't laughed so hard since ... well ... since ... uhm ... probably since seeing Spamalot two years ago.
In an unrelated story, I just saw that Julien Chable has recently published three French articles on his blog about Open XML and using Altova products like XMLSpy and DiffDog:

TechEd, Open XML, and HDR Photography

While being at TechEd in Orlando, FL, last week, I had lunch with Doug Mahugh and we talked about the upcoming ODF support in Office 2007 SP2, the new features in the Open XML SDK, Altova's new support for Open XML diff/merge in DiffDog, creation of Open XML from StyleVision, and data integration and mapping for Open XML in MapForce, as well as various other XML-related topics.

We also talked about some other industry topics and finally came to chat about HDR (high dynamic range) photography. Doug sent me a few links to some of his recent photos, and this one impressed me the most.

I couldn't help it and had to get the software the same day. However, as I had left my Canon SLR camera at home for this trip, I wasn't able to test-drive HDR imaging until I got back home today:

1X5F2686_7_8

Obviously, this isn't a particularly exciting scene - I just shot from our balcony towards the end of the cul-de-sac. I used an automatic exposure bracketing of ±2 and loaded all three images into Photomatix and then played with some settings in the tone-mapping to create some vibrant and surreal colors.

But I still like the result quite a bit - it makes me want to go out and take some HDR photos of Marblehead harbor and experiment with other local scenes where the high dynamic range can come into play nicely.

Creating Open XML (OOXML) Spreadsheet Documents

As Office Open XML (OOXML) gains more wide-spread adoption and popularity - and since it is now an ISO standard - developers will be interested in how easy it is to create Open XML documents directly in their applications, e.g. spreadsheet documents that are compatible with Excel 2007. Most approaches require quite a bit of hand-coding and worrying about the actual OpenXML specifications, but what I want to show you today on the XML Aficionado blog is a way to use MapForce to auto-generate all the source-code (for example in C#) that will produce the desired .xlsx document so that you can integrate it into your applications (and use it royalty-free within your organization).

I will use a very simple example to demonstrate how you can turn some raw sales data in an arbitrary XML format:

SalesDataXML

into a pretty business graph in Excel 2007:

SalesDataGraph

For such a simple use-case you could, of course, simply open the XML file in Excel 2007 directly, but I am only using a simple example to illustrate the process. The true power of this approach is that you can easily work with very complex data in a visual and intuitive manner - and that you can auto-generate the source-code to implement this as part of your application to automate such processes.

So let's open MapForce and insert the XML data file into our working surface where we are going to define the mapping:

MapForceXMLfile

Next we are going to insert an OpenXML spreadsheet document into the work surface of our mapping project - we can either insert an empty spreadsheet, or we can use an example document that we have previously created in Excel to indicate what sheets and what data ranges or labels should be receiving our data:

MapForceExcelTarget

Now it is time to define how the source XML data should be mapped to the target OpenXML document. This particular mapping is just one example - MapForce lets you map between any combination of XML, relational database, EDI, flat-file (e.g. legacy text files), and OpenXML spreadsheet documents. In our case we are going to convert from start-date/end-date ranges in the XML source to months in the OpenXML document and from states to regions:

SalesDataMapping

Once you've defined the whole mapping, this is how your project will look in MapForce - note that underneath the blue-gradient working surface the "Mapping" tab is the one that is presently selected, because I've just defined my mapping between the input and output files:

MapForceScreenshot

To test my mapping - before I auto-generate my program code, I can click on the "Output" tab underneath the working surface, and MapForce opens up Excel 2007 embedded within the same application frame to show me the result that is produced by my mapping:

SalesDataExcelOutput

This Excel table is then used to produce the graph that I showed earlier.

Now I want to auto-generate code in C# for my data integration project that will automate this generation of Excel 2007 OpenXML documents, so the next step is to check the code-gen settings to ensure that I generate it for the correct development environment - in my case Visual Studio .NET 2008 - but MapForce supports many other environments and can also generate code in C++ or Java in addition to C#.

CSharpSetting

OK, now we are ready to generate code. All that is required is using the corresponding command on the File menu, and all the source-code files are placed in a designated output directory, and the corresponding solution file for Visual Studio is generated as well:

MapForceCodeGen

The auto-generated source-code can now be integrated into any application and can be used royalty-free within your organization to automate the creation of Open XML (OOXML) spreadsheet documents.

If you would like to experiment a bit more with this example yourself, you can find all the files used here in the MapForceExamples directory when you download the free 30-day evaluation version of MapForce.

Also, keep in mind that you can use Excel 2007 files (or any other OpenXML spreadsheet documents) in MapForce both as input and output files, so you can create data integration applications and mapping or conversion code for any possible scenario that involves OOXML spreadsheet data, XML, EDI, or relational databases.

Creating Open XML documents from XML and database data

The latest release 2008r2 of StyleVision gives users important new functionality for creating advanced stylesheets to publish XML and database data in Word 2007, which uses the new Open XML (OOXML) data format, as well as simpler processes for publishing the same source content in other formats. And, to further ease the transition for developers and designers working with OOXML, we have just reduced the price of StyleVision considerably. As adoption of Open XML increases, StyleVision developers will be ready with a powerful tool for publishing XML and database data in what is sure to be the most predominant end-user document format, now that Open XML has been approved as an ISO standard.

Here is how the process works:

  1. Open your existing XML document or connect to an existing relational database to populate the source pane in StyleVision:
    Sources
  2. Drag & drop elements from the source pane into the design pane and apply styles to them, thereby creating a meta stylesheet for producing the desired output formatting:
    DragDrop
  3. Click on one of the preview tabs underneath the design pane to preview the output in any of the supported output formats (Open XML for Word 2007, HTML, PDF, and RTF) - all outputs are automatically created from one and the same visual design:
    OpenXMLpreview
  4. Save the generated output file(s) as well as the specific stylesheets that have been auto-generated to render your data in the desired output formats again and again...

StyleVision can access data from database tables,views, or you can directly enter a SQL SELECT statement to query only for particular data from a database. This makes StyleVision ideal for flexible database reporting, too.

If you are interested in further details, you can read more about the new features of StyleVision 2008r2 here.

New BIG "minor" release of Altova tools

It's called Version 2008 Release 2, but in reality it should be a new major version. Our "problem" is that each year has 12 months whereas our talented engineers are practically cranking out a new major version every 5-6 months. So we have to call one of them the major release and the other one a minor release - but this one is BIG!

We've updated all the tools in the popular Altova MissionKit bundle with tons of new features and usability enhancements that our customers have asked for. I am most excited about the following, which provide big benefits to our users:

  • Very Large File Support: XMLSpy 2008r2 contains a number of advanced optimizations for working with very large files. These result in a reduction of memory consumption by up to 75-80% compared to the previous version when opening and validating XML documents in Text View. This means that you can now open and work with files that are about 4-5 times larger than those supported in the past!!
  • Extended Open XML (OOXML) Support: XMLSpy was the first XML Editor to directly support Open XML in April 2007 and today we are introducing more Open XML support in these products:
    • MapForce 2008r2 now directly supports SpreadsheetML and allows the user to place any Excel 2007 document inside a mapping project to directly transform data from EDI, XML, databases, web services, and legacy text files to Excel 2007 and vice-versa. This new support for Open XML and Excel 2007 is, of course, also available in the automatic code-generation capabilities of MapForce, allowing developers to generate application code for recurring data transformation scenarios in Java, C# and C++.
    • StyleVision 2008r2 now directly supports Open XML output in Word 2007 (WordprocessingML) to allow the user to generate multiple rich output formats from one single stylesheet design. StyleVision supports the generation of stylesheets via an easy-to-use drag&drop interface from XML documents as well as from databases and is the ultimate report designer that can produce output in HTML, PDF, RTF, and Open XML from one visual design. In addition, it allows developers the creation of Authentic forms from the same design to facilitate XML-based data entry across an organization with no deployment cost.
    • DiffDog 2008r2 now supports detailed XML differencing between Open XML documents, including the ability to directly edit and merge changes across those files. In addition, the directory comparison feature now also supports ZIP file types so that directories and ZIP archives can be compared as well.
  • Expanded Modeling Capabilities: UModel 2008r2 now supports the OMG's BPMN (Business Process Modeling Notation) and is also the first UML tool to ship full support for C# 3.0 and Visual Basic 9.0 - including accurate parsing of new language constructs in these programming languages that directly support XML. UModel does, of course, also continue to fully support Java 6.0 and provides full reverse-engineering and round-tripping for all the above languages.
  • Better Integration Through Global Resources: developers using multiple Altova tools - for example as parts of the MissionKit bundle - can now take advantage of increased integration between these tools. The new Global Resources feature lets a developer define directories, databases, and ancillary files in one central location and those are shared between all applications. In addition, a developer can define multiple deployment scenarios (e.g. test, staging, production) for their XML projects, and also directly connect the output of one application to become the input for another.

The above list has just a few of the highlights that I find most exciting. More details and all the other cool new features can be found on the "What's New" page on the Altova website. There is also a press release being issued today about the new version.

I will also be covering some of these features in more detail on this XML Aficionado blog in the next couple of days - stay tuned...

Open XML is now an ISO Standard

The official press-release came out of the ISO offices on April 2nd and Open XML (OOXML) is now an ISO Standard with the official designation IS 29500.

Microsoft issued a press release today, and states that 86% of all voting bodies and 75% of P-members approved the standard - both measures being above the needed thresholds of 75% and 66.7% respectively.

See also the following blog and media reactions today:

There is also an interesting story floating around that Norway allegedly seeks to reverse its Open XML vote to No - but that seems to be irrelevant given the high margins that the tallied outcome has over the minimum requirements for approval as a standard.

To get an early start working with Open XML (IS 29500), check out Altova's support for Open XML in our XMLSpy XML Editor.

You are also invited to read all previous articles on Open XML on this XML Aficionado blog - especially my January 30 tutorial post on Content reuse with Open XML and XSLT.

ISO BRM ended in Geneva today

The ISO Ballot Resolution Meeting (BRM) on DIS-29500 - better known as Office Open XML (OOXML) came to a close in Geneva today. I was not there, so I suggest you read some first impressions in these blogs:

One thing is clear: the five days in Geneva were not nearly enough to discuss all the proposed dispositions exhaustively. But that doesn't really matter, because the BRM was never intended to produce a final vote on the standard.

That will happen in the next step of the process: now we get to wait 30 days while the ISO member countries cast their official votes on the adoption of OOXML as an ISO standard.

For a quick summary of the acronyms surrounding OOXML and the ISO process, see my previous post on XML Aficionado.

OOXML vs. ODF - the "battle" is heating up as we get closer to the ISO BRM date

I wrote about the Burton Group's report "What's Up, .DOC?" before on the XML Aficionado blog, and it didn't take long for the ODF Alliance to write a scathing rebuttal to the Burton Group report. Ironically, that rebuttal was published in PDF format, not ODF...

Before we take a look at what happens next, maybe it is time for a short review of the various acronyms and abbreviations that are commonly used in these reports, discussions, and in related blogs:

OOXML Office Open XML is an XML-based file format specification for electronic productivity application documents, such as spreadsheets, charts, presentations, and word processing documents. Originally developed by Microsoft, it is already an Ecma standard and widely used due to its implementation by Microsoft Office 2007. It is currently in the process of being proposed as an ISO standard.
ODF Open Document Format is a file format for electronic office documents, originally developed by Sun for the OpenOffice.org office suite and then later standardized through OASIS and ISO.
ISO International Organization for Standardization
BRM Ballot Resolution Meeting is the ISO process by which comments received during the previous ISO FastTrack vote and letter ballot phase are resolved by the meeting, during which national bodies and the submitting entity (Ecma) will possibly agree on a set of revisions to the originally submitted standard text. The DIS-29500 BRM is scheduled for February 25-29, 2008, in Geneva.
DIS-29500 The official ISO name and standard number for OOXML
OASIS Organization for the Advancement of Structured Information Standards; a non-profit consortium that defines open standards for the global information society
Ecma Originally this was called the European Computer Manufacturers Association, but their new name is Ecma International - European association for standardizing information and communication systems.
XML eXtensible Markup Language as defined by the W3C in 1998. Probably the most important standard of them all, because both OOXML and ODF are built on top of XML. If you don't know it already, you should definitely learn XML... :)

 

So what's new with the OOXML vs. ODF debate now that we are only two weeks away from the ISO BRM? Earlier this week, the Burton Group responded to the ODF Alliance's rebuttal in a series of three postings by Guy Creese on the Collaboration and Content Strategies Blog, and you can find them here: Part 1, Part 2, and Part 3. In this response, the Burton Group addresses each criticism from the ODF Alliance point by point.

Also, Slashdot reported this week on the Ecma response to the ISO comments and the recent blog post from Russel Ossendryver (an open source and ODF advocate) criticizing the Ecma response.

If you prefer some demos over reading thousands of pages of specifications, you may find these videos interesting that have been posted on YouTube recently: a video of Open XML on the iPhone, as well as a video of Native Open XML support on Mac OS X. Both videos show support of OOXML on Apple's platforms, yet Martin Bekkelund (a proponent of Norway's "no" vote on DIS-29500) writes on his blog today about some headaches he's had with OOXML on the Mac and his iPhone. I was curious about his allegation that a .DOCX on the iPhone produces an error message, so I had to try it myself - and I am happy to report that any .DOCX attachment received on my iPhone (running the 1.1.3 software) displays beautifully and works pretty much exactly like it is shown in the YouTube video above.

More commentary and further information can be found on Michael Desmond's blog, as well as in previous OOXML-related posts on this XML Aficionado blog. Also keep in mind that the best way to learn OOXML is to start experimenting with it, and I recently wrote a longer article on Content reuse with Open XML and XSLT to show exactly how easily it can be done using the built-in OOXML support in your favorite XML Editor.

One thing is certain: everybody will be watching the outcome of the ISO BRM very closely...

Content reuse with Open XML and XSLT

While Open XML may not yet be an ISO standard, it is already standardized by ECMA and - even more important - all documents created by Office 2007 are already stored in Open XML by default, so there is an abundance of documents whose content you can now reuse much more easily and productively than ever before. So instead of waiting for the ISO vote or paying too much attention to all the political battles being fought around it, I want to show you how you can already take advantage of Open XML (sometimes also called OOXML or Office Open XML) today.

This is the first article in a series of blog postings that I plan to write about practical Open XML tips & tricks, so I encourage you to subscribe to my XML Aficionado blog (via RSS or via e-mail), if you haven't already done so. This will ensure that you get future articles from this series automatically as soon as I post them.

So let's look at an Open XML document in our favorite XML Editor. For this example I am going to use a WordprocessingML document (.docx) that I have created with Microsoft Office Word 2007. When I open the .docx file in XMLSpy, I immediately get to see the contents of the package file, which is structured according to the Open Packaging Convention.

That's a fancy way of saying that it is a ZIP file that contains specific files and directories that make up the content, structure, styles, relationships, and other parts of the document. Using XMLSpy's built-in capability to open any ZIP-formatted archive, I can directly browse any directory structures inside the ZIP package, add new files to the package, or open any existing XML file contained in the package:

OOXML1

For the purpose of reusing the content from this WordprocessingML example file, I am going to open the 'document.xml' file, which contains the content of the document.

As soon as I double-click the file in the ZIP archive, the XML is displayed in a separate window just like any other XML document and I can use the powerful grid view or text view features of XMLSpy to view or edit the XML data (sometimes it may be useful to invoke the pretty-print function in text view to make the file more easily readable):

OOXML2

This is, of course, a live editing view, so you can not only view the Open XML data, but make any changes to the XML and save it back into the package file.

But now let's look at how we can easily reuse content from this Open XML document using XSLT. XMLSpy ships with a few Open XML example documents as well as example XSLT stylesheets for just that purpose. Let's look at the 'docx2html.xslt' stylesheet, which takes a WordprocessingML document and extracts all paragraphs to turn them into HTML. This example stylesheet is by no means intended to be a fully-featured conversion tool from .docx to HTML. Instead it serves as a blue-print of how to reuse content from a .docx file and hopefully will serve as a starting point for your stylesheet development efforts.

At the core of that XSLT stylesheet we need a <xsl:for-each> loop to iterate over all the <docx:p> elements, which it turns into simple HTML <p> paragraphs. The text inside the paragraphs is grouped into runs of characters that share common attributes, and so we need an inner <xsl:for-each> loop to iterate over those <docx:r> elements and extract the text from their <docx:t> text node children. Thus the most primitive content reuse that only extracts the text of all paragraphs looks like this:

XSLT1

Once we have constructed those loops, we can start to think about perhaps extracting and reusing some style information. To do that, we now emit a <span> HTML element for every <docx:r> run of characters and give it a style attribute, whose value will depend on the <docx:rPr> element, so we use <xsl:apply-templates> to decide what HTML style we want to apply to the <span> elements:

XSLT2

The corresponding templates for the three most common styles (bold, italic, underline) are trivially easy to construct and look like this:

XSLT3

With just a few lines of XSLT and a few templates we have already written a stylesheet that extracts the basic paragraphs and most important styles from a WordprocessingML document and turns them into HTML that can be viewed in the browser view - here is the result produced from running the above XSLT stylesheet on the example WordprocessingML document that you can find in the XMLSpy examples directory:

OOXML4

Similarly, it is quite easy to extend the stylesheet to extract meta information, other styles, or image information from the WordprocessingML document and reuse the content for any modern application scenario, from web publishing via HTML, RSS, or social media formats to mobile web applications and beyond.

"But wait! How can I apply an XSLT stylesheet to an XML document that is stored within a ZIP file?", you might ask.

You can, of course, extract all the XML files using a regular ZIP expander, but there is a much better solution: when you use the document() function in XSLT 2.0 within XMLSpy or with our royalty-free XSLT engine AltovaXML, you can directly access files contained in a ZIP archive by using the "|zip" pipe operator within the filename, e.g. "MyDocument.docx|zip\_rels\.rels" will address the Relationship file ".rels" in the archive directory "\_rels" inside the ZIP package with the file named "MyDocument.docx".

The benefits of using XSLT to reuse content from Open XML documents are obvious: because XSLT is a cornerstone of the core set of XML standards from the W3C, you can apply all your existing XML, XPath, and XSLT know-how and you can use the excellent tools support that is available for these standards. For example, you can easily develop and debug your XSLT stylesheet using the powerful XSLT debugger in XMLSpy, which allows you to single-step through the transformation, set breakpoints on XSLT instructions or even on data nodes in your Open XML document, view the partially generated output, and inspect the state of the XSLT processor in detail as the output document is constructed:

OOXML3

Using the XSLT Debugger eliminates a lot of the pain that is normally associated with XSLT stylesheet development and allows for a very iterative approach to creating and improving stylesheets that facilitate content reuse and repurposing.

To sum it up, reusing content from Open XML documents for a variety of web applications, mobile scenarios, or social media and Web 2.0 contexts is very easy and can be achieved with standard XML-related technologies, such as XSLT.

For additional information on Open XML and how to take advantage of all the content that is now already available in that format, please refer to the following sites:

Binary and OOXML office formats - interesting news

Brian Jones has a nice post on (a) making the documentation for binary Office file formats available more easily and (b) mapping from binary formats to OOXML:

The binary formats documentation will be available publicly by February 15, 2008 and the file formats will also be subject to the Microsoft Open Specification Promise.

Regarding mapping from binary to OOXML formats (and back), Microsoft promised to start an open source effort on SourceForge for that purpose.

This also made it onto TechMeme today...

OOXML and ODF report from Burton Group: "What's Up, .DOC?"

Burton Group has released a new report "What’s Up, .DOC? ODF, OOXML, and the Revolutionary Implications of XML in Productivity Applications" this week. Written by Peter O'Kelly (blog) and Guy Creese (blog), the report provides a deep and insightful analysis of the current state of ODF, OOXML, and other document formats (W3C, PDF), the history of those formats, and then continues to parse through the FUD and standardization games to arrive at a set of projections regarding the success of OOXML and ODF, as well as a set of practical recommendations.

I've read the report already, but am not going to spoil the fun for you and reveal all the conclusions here - the report is well worth your time and you should read it yourself! However, I will say this much: the report validates some of my thinking on the subject that I have expressed in various previous blog posts on OOXML here.

There is a quick overview of the report on the Burton Group's Collaboration and Content Strategies blog, and you can download the entire report for free after filling out a registration form.

Other early blog reactions to the new report are here:

And I am sure more blog reactions will follow next week.

As always, if you are interested in working with Office Open XML (OOXML) files, a great place to get started is to look at the OOXML support in XMLSpy and to download a free evaluation version of Altova's XML Editor and give it a try.

OOXML Resolutions to ISO Comments - a closed process to create an open standard?

It appears that Microsoft has now provided 662 responses to the ISO comments on DIS29500 (Office Open XML) through Ecma, but those responses are presently only available to members of the ISO voting organizations through password-protected access. This move is already gathering much criticism from the ODF camp.

Guess what: those responses are neither provided in ODF format nor in OOXML. They are 662 individual PDF files. How ironic is that...?

XML Aficionado in the News

Two articles hit the news today that are based in part on interviews with the XML Aficionado:

The former is a detailed analysis of the OOXML vs. ODF controversy from a developer's perspective, including interviews with Burton Group analyst Peter O'Kelly, Microsoft's Brian Jones, and me.

The latter is a review of the new Altov XMLSpy 2008 release (in German).

Google, Microsoft, and Yahoo - interesting synchronicity

I could not help, but notice an interesting synchronicity between the various announcements and news clips about these three firms in the last 2-3 weeks or so:


Sep 9, 2007Microsoft fails to win ISO approval for OOXML. A review of detailed country comments does, however, show that they are likely going to succeed in the next round in March 2008.
Sep 16, 2007Yahoo launches Mash - a new social networking site designed to compete with Facebook.
Sep 17, 2007Google adds slide-show/presentation application to Google Documents in an effort to increase competition with Microsoft Office.
Sep 24, 2007The The Wall Street Journal reported that Microsoft is in talks with Facebook to acquire a 5% stake in the social networking site.
Sep 27, 2007Microsoft announced an updated Search capability in the Live Search engine. Incidentally it is also Google's 9th birthday.
Sep 30, 2007Microsoft unveils its answer to Google Docs called Office Live Workspaces.
Oct 1, 2007Yahoo announced a new Search Assist function to improve Yahoo Search.
Oct 2, 2007Steve Ballmer speaks in Europe and says that the craze for individual social networks such as Facebook risks being exposed as a "fad". UPDATE: Robert Scoble responds that Steve Ballmer doesn't "get" social networking.

Office, Social Networking, Search, Office, Social Networking, Search, ... — is it just me, or is there some kind of pattern here?

And it all seems to revolve around online advertising platforms. Hmmmm.

Office Open XML (OOXML) Update: Country comments tabulated & rated

Rick Jellife today posted a great table of comments and likely outcomes of the upcoming 3rd round of ISO stantards voting on OOXML next spring based on reviewing each country's comments on the current DIS 29500 (= OOXML) proposal:

"I’ve just glanced over the 3549 or so comments put in by various national bodies for the recent ballot on DIS 29500. I’ve made a table listing the countries that commented, together with their votes and whether I think their issues could be resolved during the upcoming Ballot Resolution Meeting next year."
(see "Your country's comments rated!" for full article and table)

After looking through Rick's table, my previous conclusion remains unchanged: most comments can be addressed easily - and will even make the standard better - and I expect OOXML to become an ISO standard next year.

Office Open XML fails to win ISO approval (so far)

This week Microsoft failed to win ISO approval for the Office Open XML (OOXML) standard in the 2nd round of the ISO standardization process. While the Wall Street Journal published a critical article about this, this is by no means the end of the road, nor is it even a major setback - the ISO process commonly requires multiple rounds, and the 3rd round (expected for early 2008) will very likely see Office Open XML becoming an ISO standard.

As Burton Group analyst Peter O'Kelly observes in his blog: "FWIW I still expect Open XML to become an ISO standard -- and it's reassuring to see the spec/design improved by the standardization process." - I couldn't agree more.

Irrespective of the timeline of this ISO standardization process, I expect OOXML to quickly become a de-facto standard as more and more corporations and enterprises upgrade from Office XP or 2003 to Office 2007 and start generating tons of documents in the new OOXML formats (which is the default setting for new Office 2007 file).

Having all these Word documents, Excel spreadsheets, etc. available in OOXML lays the groundwork for a huge increase in content reuse and repurposing, because the data - being XML - can now be accessed through other applications, transformed through stylesheets, and integrated with other enterprise data.

To get started with OOXML, software developers can simply open OOXML documents in XMLSpy and start working with the XML data inside, then use XMLSpy to create and debug XSLT 2.0 stylesheets that directly access the OOXML data through the AltovaXML processing engine.