Results 1 to 7 of 7

Thread: LDAP Stops after full backup

  1. #1
    fultonj is offline Senior Member
    Join Date
    Feb 2008
    Location
    Easton PA
    Posts
    63
    Rep Power
    7

    Default LDAP Stops after full backup

    My master Zimbra ldap-only server has stopped working each weekend for the last two weeks. I believe it is related to the full backup that is run each weekend:

    Code:
    0 1 * * 6 /opt/zimbra/bin/zmbackup 
    0 0 * * * /opt/zimbra/bin/zmbackup -del 1m
    It seems that the weekend's backup starts more than one zmslapcat which makes LDAP unresponsive to one of my store servers so that LMTP fails and users on that server cannot login (I am using multiple store servers). LDAP then won't shut down until the slapcats are each manually killed. While it is down I have no problem running a full backup with zmbackupldap which produces the same ldiff files that I find on my secondary ldap server which is syncrepl'd from the master.

    I've read some of the code to trace down what I think is happending: zmbackupldap calls zmslapcat which seems to call OpenLDAP's standard slapcat command whose man page says: "slapd(8) should not be running (at least, not in read-write mode) when you do this [slapcat] to ensure consistency of the database". I don't think I see any commands to shut down ldap in zmbackupldap. I could have overlooked something.

    /var/log/zimbra.log at the time of incident shows nothing out of the ordinary until I initiated a shutdown which produced many ZimbraLdapContext ServiceUnavailableException until I killed the slapcats and slapd was restarted.

    Questions:

    1. Does zmbackup stop Zimbra's LDAP for backups and is this relevant?

    2. Is there anywhere else that I should look?

  2. #2
    fultonj is offline Senior Member
    Join Date
    Feb 2008
    Location
    Easton PA
    Posts
    63
    Rep Power
    7

    Default LDAP failure might be triggered by store server full backup

    The logging level of LDAP in zldap0 was increased to 16640 as per what the Zimbra wiki described as "good for debug":

    LDAP - Zimbra :: Wiki

    Code:
    [zimbra@zldap0 ~]$ zmlocalconfig | grep ldap_log_level
    ldap_log_level = 49152
    [zimbra@zldap0 ~]$ zmlocalconfig -e ldap_log_level=16640
    [zimbra@zldap0 ~]$ zmlocalconfig | grep ldap_log_level
    ldap_log_level = 16640
    [zimbra@zldap0 ~]$
    LDAP was restarted, which was the only service affecting event. I then ran the backup script as it is run by cron on weekends. Aside from finishing quickly and producing the expected ldif file, there was no service interuption and the logs dropped out for the amount of time it takes to run the backup. It seems that it stopped logging when I ran the backup because a log level which generated 93 results in the 6 seconds before:

    Code:
    [zimbra@zldap0 ~]$ for x in 0 1 2 3 4 5 6; \ 
    do grep 05:28:0$x /var/log/zimbra.log | wc -l; done
    0
    3
    21
    17
    15
    7
    30
    [zimbra@zldap0 ~]$
    went dark for the amount of time it took to do the backup:

    Code:
    Apr  1 05:28:06 zldap0 slapd[27922]: conn=332 fd=26 closed (connection lost) 
    Apr  1 05:28:12 zldap0 slapd[27922]: conn=333 fd=26 ACCEPT from IP=139.147.11.13
    1:55635 (IP=139.147.11.133:389)
    Running the slapcat that was revealed by ps during the original system failure directly, i.e. running the following:

    Code:
    /opt/zimbra/openldap/sbin/slapcat 
       -v -d 16640 
       -f /opt/zimbra/conf/slapd.conf 
       -l /opt/zimbra/backup/sessions/fultonj_test_4_1_01/ldap.bak.1
    > /opt/zimbra/backup/sessions/fultonj_test_4_1_01/log.1
    produced nothing extra in /var/log/zimbra.log and a set of hex ids which nearly to map one-to-one with with DNs in the ldif file:

    Code:
    [zimbra@zldap0 fultonj_test_4_1_01]$ grep dn ldap.bak | wc -l
    7733
    [zimbra@zldap0 fultonj_test_4_1_01]$ wc -l log.1 
    7728 log.1
    [zimbra@zldap0 fultonj_test_4_1_01]$
    I'm not sure how to get more data from LDAP to debug aside from something extremely verbose like strace. I doubt it would be revelaing since I can't seem to break the service by running a slapcat even when the server is up.

    My new conjecture is that the full backup from the store server, which breaks until the slapcats are killed and ldap is restarted on the ldap server, is what is causing the problem. I will run its full backup over the weekend and keep an eye on it and the ldap server. I will also remove the full backup from its crontab and run it by hand since I'd rather choose when to bring the system down during a scheduled maintenance window. I'll keep this log level on LDAP since I have enough disk space to hold it, though queries will be a little slower.

    I'll share my results on this page in hopes that they help someone else. Please post suggestions if you think I'm missing anything.

  3. #3
    quanah is offline Zimbra Employee
    Join Date
    May 2007
    Location
    Zimbra
    Posts
    1,185
    Rep Power
    9

    Default

    Quote Originally Posted by fultonj View Post

    I've read some of the code to trace down what I think is happending: zmbackupldap calls zmslapcat which seems to call OpenLDAP's standard slapcat command whose man page says: "slapd(8) should not be running (at least, not in read-write mode) when you do this [slapcat] to ensure consistency of the database". I don't think I see any commands to shut down ldap in zmbackupldap. I could have overlooked something.

    Questions:

    1. Does zmbackup stop Zimbra's LDAP for backups and is this relevant?
    It has been safe to run slapcat while slapd was running since around the OpenLDAP 2.2 release, IIRC. I think the problem is that you have more than one slapcat running at a time, and they are causing some sort of lock contention in the underlying database. Have you checked what db_stat is reporting when the server is locked up?

    Also, the ldap_log_level parameter is for the running slapd service, and does not have any effect on what slapcat will report. So I think it'd be fairly unlikely you would see anything logged, regardless of ldap_log_level setting.

    --Quanah
    Quanah Gibson-Mount
    Server Architect
    Zimbra, Inc
    --------------------
    Zimbra :: the leader in open source messaging and collaboration

  4. #4
    quanah is offline Zimbra Employee
    Join Date
    May 2007
    Location
    Zimbra
    Posts
    1,185
    Rep Power
    9

    Default

    Just to note, you can get information on how to use db_stat with Zimbra at:

    OpenLDAP & BDB perf wiki

    You'll want the output of db_stat -c particularly for the lock info, but I'd read over that whole section to make sure you've properly tuned the DB in its entirety as well.
    Quanah Gibson-Mount
    Server Architect
    Zimbra, Inc
    --------------------
    Zimbra :: the leader in open source messaging and collaboration

  5. #5
    fultonj is offline Senior Member
    Join Date
    Feb 2008
    Location
    Easton PA
    Posts
    63
    Rep Power
    7

    Default Multiple slapcat contention

    Quote Originally Posted by quanah View Post
    I think the problem is that you have more than one slapcat running at a time, and they are causing some sort of lock contention in the underlying database. Have you checked what db_stat is reporting when the server is locked up?
    Thanks for the suggestion. I have not been able to get the database to lock up. During a running full backup on the store server I have run several full backups of the LDAP server and have not seen any slapcats running, even with a 'watch "ps axu | grep slap"', though I see an LIDF file is produced. I see no change in the values returned from db_stat -c during this time (output below). If it locks up again I will paste the output of the ps and db_stat commands here.

    Is there any reason that more than one slapcat would be running at the same time?

    Code:
    [zimbra@zldap0 sessions]$ /opt/zimbra/sleepycat/bin/db_stat -c -h /opt/zimbra/openldap-data 
    886     Last allocated locker ID.
    2147M   Current maximum unused locker ID.
    9       Number of lock modes.
    3000    Maximum number of locks possible.
    1500    Maximum number of lockers possible.
    1500    Maximum number of lock objects possible.
    19      Number of current locks.
    380     Maximum number of locks at any one time.
    91      Number of current lockers.
    106     Maximum number of lockers at any one time.
    19      Number of current lock objects.
    202     Maximum number of lock objects at any one time.
    159M    Total number of locks requested.
    159M    Total number of locks released.
    0       Total number of lock requests failing because DB_LOCK_NOWAIT was set.
    2437    Total number of locks not immediately available due to conflicts.
    0       Number of deadlocks.
    0       Lock timeout value.
    0       Number of locks that have timed out.
    0       Transaction timeout value.
    0       Number of transactions that have timed out.
    1MB 368KB       The size of the lock region..
    7055    The number of region locks granted after waiting.
    273M    The number of region locks granted without waiting.
    [zimbra@zldap0 sessions]$

  6. #6
    quanah is offline Zimbra Employee
    Join Date
    May 2007
    Location
    Zimbra
    Posts
    1,185
    Rep Power
    9

    Default

    Hm, from your original post:

    LDAP then won't shut down until the slapcats are each manually killed.
    gives me the impression more than one slapcat is running?

    It sounds like from your tests that slapcat exits very quickly when run, so I'm guessing you have a fairly small database. I see you are also running an older version of ZCS which has two fewer patches for OpenLDAP than the latest ZCS, though that may be unrelated.

    Is it possible a cronjob is starting two slapcats simultaneously on the server?
    Quanah Gibson-Mount
    Server Architect
    Zimbra, Inc
    --------------------
    Zimbra :: the leader in open source messaging and collaboration

  7. #7
    fultonj is offline Senior Member
    Join Date
    Feb 2008
    Location
    Easton PA
    Posts
    63
    Rep Power
    7

    Default

    Quote Originally Posted by quanah View Post
    Hm, from your original post:
    Is it possible a cronjob is starting two slapcats simultaneously on the server?
    I have almost 5,000 users, so yes it is relatively small. I doubt a second cronjob started since my zimbra crontab came directly from the install. I'll display it below to be sure. Also, none of the other users have crontabs.

    Code:
    [zimbra@zldap0 ~]$ crontab -l
    # ZIMBRASTART -- DO NOT EDIT ANYTHING BETWEEN THIS LINE AND ZIMBRAEND
    #
    # Log pruning
    #
    30 2 * * * find /opt/zimbra/log/ -type f -name \*.log\* -mtime +8 -exec rm {} \; > /dev/null 2>&1
    35 2 * * * find /opt/zimbra/log/ -type f -name \*.out.???????????? -mtime +8 -exec rm {} \; > /dev/null 2>&1
    #
    # Status logging
    #
    */2 * * * * /opt/zimbra/libexec/zmstatuslog
    */10 * * * * /opt/zimbra/libexec/zmdisklog
    #
    # Backups
    #
    # BACKUP BEGIN
    0 1 * * 6 /opt/zimbra/bin/zmbackup 
    0 0 * * * /opt/zimbra/bin/zmbackup -del 1m
    # BACKUP END
    #
    # crontab.ldap
    #
    #
    # ZIMBRAEND -- DO NOT EDIT ANYTHING BETWEEN THIS LINE AND ZIMBRASTART
    [zimbra@zldap0 ~]$

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 658
    Last Post: 04-04-2014, 09:01 AM
  2. zimbra opesource Backup Script Problem!
    By tashi in forum Administrators
    Replies: 4
    Last Post: 03-30-2009, 11:49 PM
  3. Zimbra Open Source Backup Help
    By ayush1440 in forum Administrators
    Replies: 8
    Last Post: 11-13-2008, 04:56 AM
  4. [SOLVED] Backups failing, "unable to read metadata for account"
    By smcgrath1111 in forum Administrators
    Replies: 10
    Last Post: 04-10-2008, 03:15 PM
  5. 3 testing: LDAP: 389 Failed when restore zimbra
    By victorLeong in forum Administrators
    Replies: 15
    Last Post: 05-24-2007, 06:45 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •