Directory Server Monitoring

Directory Server Monitoring

[ Directory Server Monitoring ] [ System Resource Monitoring ] [ CPU ] [ CPU Pressure ] [ Processor Online ] [ Memory Pressure ] [ Directory Server ] [ readWaiters ] [ request-que-backlog ] [ Hit Ratio(s) in the Entry Cache(s) ] [ Database Cache Efficiency ] [ Response Time ] [ Log Analyzer Tool ] [ Replication ] [ Replication Delay ] [ Replication Conflict ] [ SNMP ] [ Links ] [ Contributors ]

Directory Server installations differ, use the following table as a minimum as to what should be monitored. Values in the table, where they exist, are for guidance, they are not (necessarily) best practice.

data point sample rate minimum useful data retention periods nominal minimum value nominal maximum value actionable threshold data retention period comparison to baseline values
CPU utilization 10m
Memory pressure 10m
swap space 10
disk space 3m
I/O pressure 3m
concurrent connections 3m
queued connections 5m
request queue backlog 5m
entry cache hit ratio 24h
database cache performance 24h
response times 5m
operations initiated vs completed 24h
replication latency 30m

System Resource Monitoring

CPU

CPU Pressure

CPU pressure occurs for a variety of reasons, and is measured in different ways. CPU utilization as a percentage in the user space and the system space as shown below:

$ sar -u 1 10
17:48:22  %usr   %sys   %idle
17:48:23    3     11     86
17:48:24    1      3     96
17:48:25    4      1     95
17:48:26    3      0     97

where %usr is user space utilization and %sys is system space utilization.

CPU pressure can also manifest as the queue of processes that are ready to to run but are blocked. This is shown in the leftmost column of vmstat output on a Solaris system - the run queue. In most cases, consistent non-zero values indicate a system under CPU pressure, and possibly on the road to CPU starvation. One could almost say that any system exhibiting consistently non-zero run queues is CPU starved relative to the load placed on the system.

A system that is not exhibiting CPU pressure:

solaris-devx / # vmstat 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr cd cd -- --   in   sy   cs us sy id
 0 0 0 235776 80648   1   6  1  0  0  0  1  0  1  0  0  263  220  324  1  5 95
 0 0 0 208744 54368   0  22 54  0  0  0  0  0  0  0  0  305  419  401  1  6 93
 0 0 0 208736 54388   0   0  0  0  0  0  0  0  0  0  0  270  282  338  1  6 93
 0 0 0 208736 54396   0   0  0  0  0  0  0  0  0  0  0  250  242  306  0  4 96
 0 0 0 208736 54396   0   0  0  0  0  0  0  0  0  0  0  254  198  309  0  4 96
 0 0 0 208736 54396   0   0  0  0  0  0  0  0  0  0  0  259  276  335  1  5 94
 0 0 0 208736 54400   0   0  0  0  0  0  0  0  0  0  0  279  205  340  0  5 95

And then that same system comes under severe CPU pressure seconds later:

 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr cd cd -- --   in   sy   cs us sy id
 0 0 0 203200 47572   0   0  0  0  0  0  0  0  0  0  0  284  293  361  1  5 94
 0 0 0 203200 47688   0   0  0  0  0  0  0  0  0  0  0  262  240  323  0  5 95
 0 0 0 203200 47720   0   0  0  0  0  0  0  0  0  0  0  279  322  361  1  5 94
 0 0 0 164092 34208 619 3411 0  0  0  0  0  0  0  0  0  296 12805 1733 27 64 9
 32 0 0 170660 38920  2 1297 0  0  0  0  0  0  0  0  0  289 17470 3543 33 67 0
 2 0 0 156364 27356   1 2914 0  0  0  0  0  7  0  0  0  375 18106 2519 27 73 0
 110 0 0 145904 19976 6 2712 0  0  0  0  0  0  0  0  0  266 16687 2926 33 67 0
 9 0 0 135384 11296   0 1855 4  0 36  0 14715 1 0 0  0  272 15540 2210 24 76 0
 120 0 0 144496 19824 1 1811 12 0  0  0  0  2  0  0  0  291 18754 2828 36 64 0
 108 0 0 137624 14340 0 897  0  0 28  0 10136 0 0 0  0  268 17041 2679 22 78 0
 301 0 0 141616 19704 0 2064 0  0  0  0  0  0 24  0  0  396 20956 2741 26 74 0
 90 0 0 155932 32980  0 1341 0  0  0  0  0  0 381 0  0  425 22025 3937 25 75 0
 2 0 0 150372 28428   0 1600 0  0  0  0  0  0  0  0  0  279 21120 3810 29 71 0

The following table is from a suggestion by Adrian Cockcroft regarding CPU pressure and the run queue:

run queue rule level action
0 White Idle
0 < runQueuePerCPU < 3 Green No problem
3 <= runQueuePerCPU < 5 Orange Busy (warning)
5 <= runQueuePerCPU < 5 Orange possible CPU starvation condition (alert)

In this table, runQueuePerCpu is (1st column of vmstat output)/(number of CPUs).

Processor Online

Check that processors are online with psrinfo:

solaris-devx / # psrinfo
0       on-line   since 09/20/2007 10:35:34
solaris-devx / # 

Memory Pressure

Memory availability is difficult to measure. Suffice it to say that on Solaris, one can easily measure the amount of memory on the free list using System Activity Reporting (SAR) and also using vmstat. One condition to avoid at all costs is paging. Swapping is even worse, but paging is bad enough.

The 12th column of vmstat output on Solaris indicates the level of paging activity. The column is labeled "sr", meaning "scan rate". If the number in this column is non-zero, the pager daemon is attempting to relieve memory pressure by paging pieces (pages, actually) of processes to backing store to satisfy requests from other processes. If a system is consistently paging, that system is being subjected to "memory pressure" and performance of processes on the system will be affected. Paging should be traced to root cause and corrected immediately.

A system that is not paging at the moment, although there has been some history of paging:

solaris-devx / # vmstat 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr cd cd -- --   in   sy   cs us sy id
 0 0 0 235696 80608   2   8  1  0  0  0  2  0  1  0  0  263  228  325  1  5 95
 0 0 0 267604 105760  6  25 12  0  0  0  0  0  0  0  0  273  247  350  0  5 95
 0 0 0 267604 105864  0   0  0  0  0  0  0  0  0  0  0  264  260  341  1  5 94
 0 0 0 267604 106052  0   0  0  0  0  0  0  0  0  0  0  261  221  319  0  5 95

Directory Server

readWaiters

The readWaiters attribute is maintained by Directory Server and indicates the number of connections that are pending but not currently assigned to a thread or serviced by a thread. If this number is non-zero, Directory Server is unable to assign a thread to service a connection

request-que-backlog

The request-que-backlog attribute is maintained by Directory Server and indicates the number of requests that are waiting to be processed by a thread. This number should be zero, or nearly zero. If it is consistently non-zero, requests are being held in a queue to be processed and LDAP clients will not see a response to an LDAP request until they are processed. To correct, load-balance LDAP clients and/or increase the number of threads available to Directory Server.

Hit Ratio(s) in the Entry Cache(s)

Database Cache Efficiency

Response Time

Log Analyzer Tool

Replication

Replication Delay

Replication Conflict

SNMP

Links

Contributors

UserEditsCommentsLabels
ff1959 30011
bora_baysal 001
chad.klunck 001
IgorMinar 010

Labels

performance performance Delete
memory memory Delete
cpu cpu Delete
monitoring monitoring Delete
ldap ldap Delete
directoryserver directoryserver Delete
replication replication Delete
paging paging Delete
pressure pressure Delete
readwaiters readwaiters Delete
threads threads Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Dec 05, 2007

    IgorMinar says:

    Comment from Chad Klunck: The final line of the table, which reads: 5 <= ru...

    Comment from Chad Klunck:

    The final line of the table, which reads:

    5 <= runQueuePerCPU < 5 | Orange | possible CPU starvation condition (alert)

    should probably read:

    5 <= runQueuePerCPU | Red | possible CPU starvation condition (alert)

Sign up or Log in to add a comment or watch this page.


The individuals who post here are part of the extended Sun Microsystems community and they might not be employed or in any way formally affiliated with Sun Microsystems. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Sun nor any other party necessarily agrees with them.

Copyright 1994-2009 Sun Microsystems, Inc.
Powered by Atlassian Confluence
Sun Guidelines on Public Discourse Privacy Policy Terms of Use Trademarks Site Map Employment Investor Relations Contact