
LDAP Cache Tuning in Messaging Server
Topics:
Background
The topic of how to tune service.authcachesize and service.authcachettl surfaced recently. For various historical reasons, a customer had set the size way too big--so big that the behavior seemed like a memory leak, which took days or weeks to build up to a system-crippling level.
But this customer issue still leaves you wondering how to tune the cache parameters. The question is: How do you observe the effects of these cache settings, and how many different caches are in Messaging Server? Even if you limit the discussion to just caches of LDAP information, it is a very big topic.
Store auth cache
The auth cache controlled by the service.authcachesize and service.authcachettl configutil parameters is a per-process cache used to reduce LDAP requests for repeated logins by the same user. You can use the following dtrace script to monitor the usage of this cache in an individual imapd, popd, or mshttpd process on a 32-bit Solaris 10 Operating System (OS).
#!/usr/sbin/dtrace -qs
pid$target::digestcache_find:entry
{
/* @cache_users[ustack(), arg0] = count(); */
@cache_misses[arg0] = max((int)*(int *)copyin(arg0+12,4));
@cache_hits[arg0] = max((int)*(int *)copyin(arg0+16,4));
@cache_MinEnts[arg0] = min((int)*(int *)copyin(arg0+20,4));
@cache_MaxEnts[arg0] = max((int)*(int *)copyin(arg0+20,4));
@cache_expires[arg0] = max((int)*(int *)copyin(arg0+24,4));
@cache_overflow[arg0] = max((int)*(int *)copyin(arg0+28,4));
}
profile:::tick-20s
{
exit(0);
}
END
{
/* printa("%k 0x%x %@d\n", @cache_users); */
printf("\n cache %8s %8s %8s %8s %8s %8s\n", "misses", "hits", "minEnts", "maxEnts",
"expires", "overflow");
printa("0x%x %@8d %@8d %@8d %@8d %@8d %@8d\n", @cache_misses, @cache_hits, @cache_MinEnts,
@cache_MaxEnts, @cache_expires, @cache_overflow);
}
As coded previously, the script waits 20 seconds and then ends with the report. You can change the "tick-20s" to change the timeout or Ctrl/C. If no activity occurs (that is, on an idle lab system), the script produces nothing, because it monitors the cache statistics structures as the cache is being used. So, if no use occurs, no results occur:
# ./authcache_stats.d -p 8043 cache misses hits minEnts maxEnts expires overflow #
If you log in to IMAP while the dtrace script is running, it might look like:
cache misses hits minEnts maxEnts expires overflow 0x292058 11 20 3 3 8 0 0x2f3b70 13 8 1 1 5 0
If you uncomment the two references to @cache_users, the report will also show which routines use the cache and from that information you can more precisely determine which cache is which.
The misses, hits, expires, and overflows can only increase. But the number of entries can increase and decrease, so you record both the min and max values. Actually, if nEntries decreases, it is only for a brief time until the entry is reused, either because of an expire or an overflow. So, you could say that the number of entries will only increase, but you should be more interested in how much this number moves during the monitoring period.
misses is incremented both when you simply do not find what you were looking for in the cache and when you found the information but its time-to- live (TTL) had expired. In both cases, you have to go to LDAP to perform the lookup.
hits is obvious. You found what you were looking for and used it.
The cache-hit rate (that is, hits/(hits+misses)) depends on the number of repeat logins during the TTL and the distribution of logins to the number of processes for each service. In other words fewer processes (for example, 64-bit) equal a better cache-hit ratio.
The minEnts and maxEntrs values are the lowest and highest rates seen for the number of entries in the cache. These entries are not free. They might be under or over their TTL. You don't know that until you happen to look for the entry later. So this outcome just means that this item was put in the cache and it is still there, not whether its TTL is valid.
The expires value shows that you found what you were looking for in the cache, but the TTL had expired, so you had to perform the LDAP lookup again anyway.
The overflows value is the number of times you wanted to add an entry to the cache. In this case, the maximum number of entries in the cache had been reached, none were free, and the least frequently used entry needed to be freed.
The size of an entry depends on whether you are using 32-bit or 64-bit software. The main body of this cache is preallocated when the process starts. However, a nonfree entry in the first cache also contains an LDAP result structure. So the simplest, although not perfectly accurate, way to describe this value is to say 3 Kbytes per entry.
An overflows value is not a bad thing. It just means cache entries are being reclaimed from the least-frequently used list.
You want to balance memory use versus load on the LDAP server. To try to completely avoid the LDAP server, you would set the cache large enough to contain all the users on this system and the TTL to a very high value. Then you would never see any expires or overflows. Of course these settings could use a lot of memory and cause problems when you try to change passwords or user status. So you also need to balance the TTL against your need for updates in the LDAP service to be noticed.
On the heavily loaded systems of the customer who had service.authcachesize set way too high, the default value, 10,000 was adequate. So the short version of this long story is that you don't need to tune this parameter. But it would be interesting to see some cache usage statistics from various customers if you want to paste yours here.
More recently, the following stats were captured from a customer with a completely different load profile. This imapd had been running about 20 hours:
cache misses hits minEnts maxEnts expires overflow 0x255b70 60200 10960 10000 10000 8132 42068 0x1f4058 65096 6064 10000 10000 5350 49746
The hit rate (hits/(hits+misses) ) is not good: 15% and 8%, respectively. About 18 minutes later, the same process showed the following:
0x255b70 61957 11246 10000 10000 8406 43551 0x1f4058 66990 6213 10000 10000 5535 51455
The delta over about 18 minutes showed a hit rate of 13% and 7%:
1757 286 274 1483
1894 149 185 1709
This customer was experiencing delays logging in to IMAP. A possible solution to this problem is to increase either the size or the TTL (or both), depending on how often the same users log in again within a relatively short time. But having eight imapd processes might defeat the auth cache. Examining imap log files shows that, for example, one user logged in 11 times in seven minutes and the LDAP access log showed eight lookups to authenticate that user. During that time, only about 3500 different users were logging in. Why was that user looked up more than once? Because he was unlucky enough to hit all eight different imapd processes in which his entry, if it existed, had expired.
So another consideration for tuning the auth cache is to look at the number of processes. Setting the numprocesses too high defeats the auth cache.
MTA LDAP Caches
- MTA: dispatcher.cnf LDAP domain, users (groups?) – Positive and negative
- Domain map – In MTA, store access daemons, and MMP
Cache information can be collected on a per-tcp_smtp_server process basis by running the xsta command, for example:
bash-2.05$ telnet server 25 Trying 129.158.87.191... Connected to server.aus.sun.com. Escape character is '^]'. 220 server.aus.sun.com -- Server ESMTP (Sun Java(tm) System Messaging Server 6.3-6.03 (built Mar 14 2008; 32bit)) xsta <snip> 250-2.3.0 Alias cache statistics: 250-2.3.0 Hits 0 250-2.3.0 Misses 1 250-2.3.0 Adds 1 250-2.3.0 Deletes 0 250-2.3.0 Timeouts 0 250-2.3.0 Entries 1 250-2.3.0 Percent used 0.100100 250-2.3.0 Percent chains used 0.390625 250-2.3.0 Ave chain length 1.000000 250-2.3.0 Max chain length 1 250-2.3.0
MMP auth cache
The Messaging Server 6.3 and 7.0 Messaging Multiplexor use the High performance User Lookup and Authentication (HULA) libraries for user authentication. The following dtrace script can be used on 32-bit Solaris 10 OS installations to determine various statistics about the cache performance.
<mmpcache_stats.d>
#!/usr/sbin/dtrace -qs
pid$target::user_incache:entry
{
self->hula_cache = (int)*(int *)copyin(arg0+24,4);
this->hent = (int)*(int *)copyin((self->hula_cache)+12,4);
@cache_minEnt[self->hula_cache] = min(this->hent);
@cache_maxEnt[self->hula_cache] = max(this->hent);
@cache_avgEnt[self->hula_cache] = avg(this->hent);
}
pid$target::user_incache:return
/arg1 == 0/
{
@cache_miss[self->hula_cache] = count();
}
pid$target::user_incache:return
/arg1 != 0/
{
@cache_hit[self->hula_cache] = count();
}
profile:::tick-20s
{
printf (" cache %8s %8s %8s %8s %8s\n", "minimum", "maximum", "average", "hits", "misses");
printa ("0x%x %@8d %@8d %@8d %@8d %@8d\n", @cache_minEnt, @cache_maxEnt, @cache_avgEnt, @cache_hit, @cache_miss);
printf ("\n");
}
</mmpcache_stats.d>
To run this script, provide the PID of the AService process to the script, for example:
./mmpcache_stats.d -p 26741
cache minimum maximum average hits misses
0x839a24c 0 0 0 0 3
0x83968bc 4 4 4 3 0
Statistics are provided once every 20 seconds and continue till Ctrl+C is run. The interval between statistic reporting is controlled by the "profile:::tick-20s" line.
The previous example shows two cache-line addresses. One is for the IMAP service and the other is for the POP service. The "hits" and "misses" fields are cumulative from the start of the script. A higher hit-to-miss ratio is preferred and is determined by the pattern of traffic, the size of the cache, and the TTL of the cache entries.
The "minimum", "maximum" and "average" fields describe the number of entries in the cache since the startup of the script.
The HULA cache mechanism operates differently to the store auth cache. When a cache lookup is performed, existing expired cache entries in the same hash "bucket" are removed, which helps to minimize the overall number of entries in the cache. Therefore, if you find that the cache size is approaching the maximum cache size, increase the size to ensure a high hit-to-miss ratio.
UWC LDAP cache
To be added.
Convergence LDAP cache
To be added.

