I'd always prefer you to be on the most current release for testing if bugs exist but I know that's not always practical.
You might consider implementing Auhentication fallback to overcome this problem in the short term. Run the following command:
Code:
zmprov md sub.domain.com zimbraAuthFallbackToLocal TRUE
That will, of course, require the passwords to be kept in sync. There's currently no facility in Zimbra to do that but I seem to remember there's an RFE in bugzilla if you'd like to search and vote on it.
I can't really understand why the TLD would authenticate and not a subdomain. Try an ldapsearch on the subdomain before you reboot after the next failure.
Does the server have sufficient RAM? Which particular backup script are you running from that wiki page?