We still have not solved this but did identify and resolve a related issue. Along with the failed cron jobs, we kept seeing this in /var/log/secure:
Code:
May 8 04:55:01 zimbra01 crond[14829]: pam_loginuid(crond:session): set_loginuid failed
Apparently, pam_loginuid needs to write to /proc which is not available for write inside a vserver. We knew we had to disable this for sshd but it must be called elsewhere as well including cron. We this changed them all with:
/bin/sed -i -e "s/^session.*required.*pam_loginuid.so/# session\trequired\tpam_loginuid.so/g" /etc/pam.d/*
Now we have statistics and, I assume backups, but we still have our huge problem of missing data.