Categories

It’s Not Like You Care About Your Documents

Recently, as part of the many antitrust/anti-competition legal actions they’re suffering under, Microsoft released specifications for the old Office binary file formats. As expected, they’re big and complex. Joel Spolsky (a former member of the Excel team) had some thoughts on their size and complexity:

With a little bit of digging, I’ll show you how those file formats got so unbelievably complicated, why it doesn’t reflect bad programming on Microsoft’s part, and what you can do to work around it.

The digging turns up reasons that make some sense: the limitations of older computers, feature creep, a complete lack of attention to the future. But it’s hard to see some of these reasons as “why it doesn’t reflect bad programming on Microsoft’s part”. Carelessness is common, sure, but we don’t call it a virtue because everybody does it.

And these are problems that should have been on someone’s radar at Microsoft. It’s one thing for a grunt programmer to hack a feature to meet a deadline; it’s another for the management to simply go along with it, or to not order a rethink when the problems come to light. When you read about hacks like the following, everything sounds nice and reasonable, until you remember what the end result is: that Microsoft Excel doesn’t have a standard format for storing and manipulating dates!

There are two kinds of Excel worksheets: those where the epoch for dates is 1/1/1900 (with a leap-year bug deliberately created for 1-2-3 compatibility that is too boring to describe here), and those where the epoch for dates is 1/1/1904. Excel supports both because the first version of Excel, for the Mac, just used that operating system’s epoch because that was easy, but Excel for Windows had to be able to import 1-2-3 files, which used 1/1/1900 for the epoch. It’s enough to bring you to tears. At no point in history did a programmer ever not do the right thing, but there you have it.

It may not have been the wrong decision, in the sense that it enabled them to ship, and shipping is everything in some circles. But as a design decision, how can anyone defend such inconsistency?

Business information technology was able to move forward in the early ’90s because older document formats like 1-2-3 and WordPerfect were simple enough to import easily into Microsoft Office. Today, when we talk about moving to open-source suites like OpenOffice or online systems like Google Docs, detractors left and right cite the pain of document conversion as a reason to hold back. But if Joel is right about the old binary formats, the pain of transition is like the pain of changing your oil: you can pay now, or you can pay a lot more later. Even Microsoft is having trouble opening its own files from long ago, with “long ago” being a period measured in years, not decades.

Maybe you didn’t write anything a decade ago you’d care to read again today; maybe you can’t imagine any of your stuff being worth reading a decade from now. Do you want to take that chance?

Thankfully, I was a geek, and kept most of my documents in plain text. Today, I take care to save important documents in formats and encodings designed for the long haul, like Unicode, ODF, and PDF. It helps that I avoid Microsoft software like the plague. (If you think they’ve changed since the bad old days, just surf the web in Firefox on Linux sometime, and see how many badly-rendered pages look much better when you switch their text encoding from Unicode to “Windows-1252″.)

If you have a lot of Office documents, even if you’re happy with Office, you might consider whether you care about opening those documents ten years from now, and whether you’d rather take the time to future-proof them while you still can.

Creative Commons License
This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.

8 comments to It’s Not Like You Care About Your Documents

  • Kevin

    The most funniest thing in the article is its definition of workaround:

    the workaround for using MS Word to open a MS Word document is to use MS Word through its COM interface.

  • [...] about it. There are some interesting comments on the Office Binary formats on the web, including these by Jeff Licquia, and those he links to by Joel [...]

  • Привет. Подскажите, как перевести блог с бесплатного хостинга, вот мой блог html,
    Вроде как wordpress должен легко перемещаться на свежую площадку, но у меня все время промахи в базе данных. Я посмотрел там, но в php ничего не знаю вообще и привлекать сторонних программистов то ведь не хочется. Может подскажите, как безболезненно перенести блог?

  • Привет. Образовалась проблема – прикупил я электродрель на магазин ру
    А она сломалась у меня в тот ведь день – гарантии никакой не дали. Просто почта пришла с коробкой, а там все на китайском. Написал в магазин этот, дали ответ, что обращайтесь в сервис центр, но у меня ни документов на руках нет, ничего. Как возможно приструнить данный интернет магазин? Есть ли некие компетентные органы, что писать. Куда писать, кому писать. Сделал ошибку, да, надо было в торговом центре нормальном прикупить, хотя необходимой мне модели просто не было, да и дешевле в интернет-магазине. А вот и уже сижу у разбитого корыта. Подскажите, что делать.

  • Ну что я могу сказать, не покупайте больше в интернет магазинах.

  • Привет! Вы пишете очень интересно. Я, пожалуй, подпишусь. Спасибо за инфу. Если вам надо, что-то интересное посмотрите сдесь.

  • Интересная заметка. А как подписаться на rss?

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>