I wanted to take this opportunity to extend a very heartfelt thanks to John Holder, Mike Morse, Irfan Shaikh and Ramadan Mansoura for the absolutely outstanding, above-and-beyond support they provided us during the process of upgrading.

Putting aside for the moment that ZCS in and of itself is a terrific product, it's the extraordinarily high calibre of support that just continues to give us great comfort in running Zimbra.

Since we run SLES (SuSE Linux Enterprise Server), getting from 32-bit ZCS 4.5.11 on SLES9 to 64-bit ZCS 5.0.4 on SLES10 meant first moving from the physical server running SLES9/4.5.11 to another server running SLES10/4.5.11, and then upgrading the SLES10 ZCS installation to 5.0.4.

In all fairness to John, Mike, Irfan and Ramadan, this upgrade/migration was the least smooth we have experienced since we started evaluating Zimbra at version 3.x (Our first Zimbra deployment was on 4.0.3). We started testing and prototyping this upgrade/migration more than two months ago (using Xen), but still we didn't catch all the tweaks needed to complete the process, even though we have over 100 man hours spent on internal testing prior to attempting the production upgrade/migration.

I appreciate that the jump from 4.5.x to 5.0.x is bigger than either the jump from 3.x to 4.0, and bigger than the jump from 4.0.x to 4.5.x. I also appreciate that Zimbra 5 is a much more complex product than any 3.x/4.x series version. In a previous life I managed a multi-million dollar software development project, so I am sensitive to the challenges facing Zimbra here.

But now that we are done and I can see things with the benefit of hindsight, I would like to recommend to Zimbra that going forward additional resources are devoted to:
  • Documentation
  • Disclosure
  • QA Testing


What does all that mean?

The documentation to do our migration was pretty slim. The official docs recommended an in-place operating system upgrade. Although SUSE supports this, it's hard to roll back from an OS upgrade without a lot of downtime, which we were hoping to avoid.

So, we settled on moving to a new server (just in case). The Zimbra Certified wiki doc on 32-bit to 64-bit migration is very good, but there was little SuSE-specific information in there.

For example, we found consistently that on SLES10 you need a second root terminal window open. At one point, the ZCS installer complains about the permissions being wrong on /etc/sudoers, so we had to use the second window to change the permissions to enable the installer to continue.

As regards disclosure, nowhere in the release notes, the certified wiki document, or elsewhere did we see any warnings about our commercial certificates not coming over as part of the update/migration. Sure, these forums are full of posts from folks having certificate problems, but none of those posts seemed (at least in advance) like they would apply to our scenario. But they did, unfortunately.

With regard to QA, cyberdeath today opened a bug report regarding broken log file rotation on SLES10, along with the fix. Basically, the Zimbra log file rotation script tries to restart the syslog service without using the SUSE "rc[servicename] restart" convention. In addition, the /etc/sudoers file was missing the proper command for restarting the syslog service. Now, it takes waiting for just one attempted log file rotation to see the results of this bug (zimbra.log stay at 0 bytes and all your stats go away). Did no one at Zimbra keep a test SLES10 system up for a day and check the Admin console? I'm not trying to be harsh, but I'm having difficulty understanding how a bug like this could be around in the fourth release of the 5-series GA product.

We always respected the high level of QA that Zimbra puts in ZCS, and I fully supported the delay in the release of Zimbra 5 (remember, it was originally supposed to be out late last year) even though it cost us money from prospective customers who didn't want to migrate their old systems until 5.0.x was in place.

So if I may be so bold as to make some specific recommendations...

First, I'd like to see Zimbra delay adding new enhancements to ZCS for a few agile development cycles, and undertake a relatively thorough code review. I know Zimbra does pretty solid unit testing, but when bugs like the log file rotation issue slip through I confess that makes me a bit nervous about what else may have slipped through.

Second, I know there has been talk about a KB and keeping the wiki, but I believe there needs to be a dedicated person responsible for documentation at Zimbra. Look at the new documentation regarding branding for example: it tells me everything I could possibly want to know about skins, css, logos, etc.--except nowhere in the document does it say: "To create a new skin, we suggest copying an existing skin directory, renaming it, and customizing the files inside as per this document." The old 4.0/4.5 branding document had that in there. Also, that the 5.0.4 release notes had nothing about commercial certificates in the Known Issues section is also a problem. Someone IMHO needs to "own" Zimbra documentation--especially now that 5.0 is such a feature-rich product. Where is the 5.0 user manual for example?

Lastly, before releasing even a minor point upgrade, I would recommend Zimbra P2V a few customer systems and test the upgrades on those machines in a VMware or Xen environment. The customers would need to agree to spend a few hours working with Zimbra support, but I think everyone would benefit, and such a program would IMHO likely have eliminated for example the need to pull 5.0.3.

I hope this post is taken in the supportive and constructive manner in which it is intended. We have no interest in ranting nor in pointing fingers! We really, really like ZCS and have been nothing but impressed with everyone at Zimbra with whom we have come into contact.

ZCS is a much, much more sophisticated and complex product than it was in the 4.0 days, and all I am suggesting is that Zimbra's internal workflow processes, based on our upgrade/migration challenges, could probably benefit from an investment of a fresh look, now that 5.0 is out the door.

With best regards,
Mark