Derived Values and Deletion Rules

Searching Sun Grid Engine 6.2

Sun Grid Engine Information Center
Accounting and Reporting Console (ARCo)
Index


Derived Values and Deletion Rules

Derived Values

At dbwriter startup, and in continuous mode once an hour, derived values are calculated. You can configure which values to calculate in an XML file, which is by default in $SGE_ROOT/dbwriter/database/<database_type>/dbwriter.xml. <database_type> defines the type of database being used; currently, Oracle, PostgreSQL and MySQL are supported. The path to the configuration file is passed to dbwriter during installation and is stored in the dbwriter.conf file as the value of the parameter DBWRITER_CALCULATION_FILE.

The configuration file uses an XML format, and contains rules for both derived values and deleted values (described in the next section).

Derived Values Format

The rules for derived values have the following format.

  1. The top-level start tag is <derive>. The <derive> tag has three required attributes:
    • object – Based on this attribute, the derived value is ultimately stored in one of: sge_host_values, sge_queue_values, sge_user_values, sge_group_values, sge_department_values, sge_project_values. The object is one of the following:
      • host
      • queue
      • project
      • department
      • user
      • group
    • interval – The time range specifying how often to calculate the derived values. The time range is one of the following:
      • day
      • hour
      • month
      • year
    • variable – This is the name of the variable to hold the calculated data.
  2. A second-level start tag describes the way that the value should be derived. This tag must be either <sql> or <auto>:
    • <sql> – This tag contains an SQL statement used for calculating the derived values. The exact syntax of the entries depends upon the type of database being used. The statement must produce the following columns:
      • time_start – Together with time_end, specifies the time period for the calculated value.
      • time_end
      • value – The calculated derived value.

        The SQL statement can contain the following placeholders. dbwriter replaces the placeholders for each query, based on a rule:
        • time_start – Start time for the query. dbwriter searches for the last previously calculated derived value from this rule, and uses this timestamp as the start time for the next query.
        • time_end – End time for the query. This timestamp specifies the end of the last passed time range.
          For example, if the time range is day, and if derived values are calculated at 00:30, 00:00 is taken as time_end.
        • _key_0, key_1, . . . , key_n_ – Components of the primary key for the specified object type.
          For example, the sge_hosts table has the primary h_hostname. If a rule is processed for the host object type, one query is executed per entry in the sge_hosts table, the _key_0_ placeholder in the SQL statement is replaced by the hostname.

          The sge_queue table has a composed primary key that is made up of q_qname and q_hostname.
    • <auto> – For certain simple derived values, this tag can be used instead of a full SQL query.This tag has two attributes:
      • function – which gives the aggregate function to apply to the variable. This can be any function valid for the type of database being used. Some typical functions are AVG, SUM, VALUE, COUNT, MIN or MAX.
      • variable – which can be any variable tracked in the following tables: sge_host_values, sge_queue_values, sge_user_values, sge_group_values, sge_department_values, sge_project_values the variable specified must be from the table indicated by the object attribute of the enclosing <derive> tag, for example, if the object is host, the variable must be found in sge_host_values.
  3. Two end tags that match the two start tags.

Derived Values Examples

Here is an example of a derivation rule using the <sql> tag. The sge_queue table has a composed primary key comprised of q_qname and q_hostname. For a rule specified for the queue object_type, a query will be made for each entry in the sge_queue table, the placeholders _key_0 will be replaced by the queue name and key_1_ will be replaced by the hostname.

<!--average queue utilization per hour-->
 <derive object="queue" interval="hour" variable="h_utilized">
    <sql>
         SELECT DATE_TRUNC( 'hour', qv_time_start) AS time_start, 
                DATE_TRUNC( 'hour', qv_time_start) + INTERVAL '1 hour' AS time_end, 
                AVG(qv_dvalue * 100 / qv_dconfig) AS value  
         FROM sge_queue_values 
         WHERE qv_variable = 'slots' AND 
               qv_parent = (SELECT q_id FROM sge_queue WHERE q_qname = __key_0__ AND q_hostname = __key_1__) AND 
               qv_time_start &lt;= '__time_end__' AND 
               qv_time_end &gt; '__time_start__' 
         GROUP BY time_start    
    </sql>
  </derive>

Here is an example when the rule above is processed by the dbwriter. A query will be made for each entry in the sge_queue table, the placeholders _key_0 will be replaced by the queue name and key_1_ will be replaced by the hostname. In this example, the results of these queries will be inserted in the sge_queue_values table, because object="queue".

SELECT DATE_TRUNC('hour', qv_time_start) AS time_start, 
       DATE_TRUNC( 'hour', qv_time_start) + INTERVAL '1 hour' AS time_end, 
       AVG(qv_dvalue * 100 / qv_dconfig) AS value  
FROM sge_queue_values 
WHERE qv_variable = 'slots' AND 
      qv_parent = (SELECT q_id FROM sge_queue WHERE q_qname = 'all.q' AND q_hostname = 'my.hostname') AND 
      qv_time_start <= '2008-05-21 00:00:00.0' AND 
      qv_time_end > '1970-01-01 01:00:00.0' 
GROUP BY time_start;    

Here is an example of a derivation rule using the <auto> tag.

<derive object="host" interval="day" variable="d_load">
   <auto function="AVG" variable="h_load" />
</derive>

Deleting Outdated Records

At dbwriter startup, and in continuous mode once an hour, outdated records will be deleted. You can configure how these records are calculated in an XML file, by default in $SGE_ROOT/dbwriter/database/<database_type>/dbwriter.xml. <database_type> is the type of database being used; currently, Oracle, PostgreSQL and MySQL are supported. The path to the configuration file is passed to dbwriter during installation and is stored in the dbwriter.conf file as the value of the parameter DBWRITER_CALCULATION_FILE.

Deletion Rules Format

The configuration file contains rules for both derived values and deleted values. Deletion rules are of the following format.

  • A top-level start tag <delete> with three attributes:
    • scope — which specifies the type of data to be deleted. Valid entries are:
      • job
      • job_log
      • share_log
      • host_values
      • queue_values
      • project_values
      • department_values
      • user_values
      • group_values
      • ar
        Based on this attribute, the values are deleted from the table with the same name with sge_ prepended.
    • time_range — which gives the unit of time_amount.
    • time_amount — which is the number of units (time_range) during which a record is kept.
  • An optional second-level start tag <sub_scope>, which specifies an additional condition for deletion. A subscope can be configured for all *_values scopes and the share_log scope.
  • One or two end tags matching the two start tags

For certain scopes, a sub-scope can be configured. The sub-scope specifies an additional condition for deletion. A sub-scope can be configured for all *_values scopes and for the share_log scope. The following rules apply:

  • If a sub-scope is configured for a *_values rule, it contains a space-separated list of variables to delete.
  • If a sub-scope is specified for the share_log, it contains a space-separated list of share-tree nodes to delete.
  • If sub-scope are used, you should always have a fall-back rule without sub-scope, which will delete all objects that are not explicitly named by the sub-scope.

Here is an example of a delete tag:

<?xml version="1.0" encoding="UTF-8"?>
<DbWriterConfig>
  <!-- keep host values for 2 years -->
  <delete scope="host_values" time_range="year" time_amount="2"/>

  <!-- keep queue values one month -->
  <delete scope="queue_values" time_range="month" time_amount="1">
    <sub_scope>slots</sub_scope>
    <sub_scope>state</sub_scope>
  </delete>
</DbWriterConfig>

Deletion Rules Examples

The following rule indicates that the four variables given in the subscope should be deleted from the table sge_host_values after 7 days.

<delete scope="host_values" time_range="day" time_amount="7">
        <sub_scope>np_load_avg</sub_scope>
        <sub_scope>cpu<sub_scope>
        <sub_scope>mem_free</sub_scope>
        <sub_scope>virtual_free</sub_scope>
</delete>

The following rule says to delete all variables from the table sge_host_values after two years:

<delete scope="host_values" time_range="year" time_amount="2"/>

The following rule says to delete all records for user fred after one month:

<delete scope="share_log" time_range="month" time_amount="1">
        <sub_scope>fred</sub_scope>
</delete>

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Nov 13

    weicfa says:

    Hi In our grid environment (about 500 servers , 3000 cpus , 12000 job every da...

    Hi

    In our grid environment (about 500 servers , 3000 cpus , 12000 job every day ), The size of host_values table (mysql database for ARCO )grow up very fast( 350 million row every day ) even I change deletion rule to 1 minute as follow.
    <delete scope="host_values" time_range="minute" time_amount="1">
    <sub_scope>np_load_avg</sub_scope>
    <sub_scope>cpu</sub_scope>
    <sub_scope>h_cpu</sub_scope>
    <sub_scope>h_load</sub_scope>
    <sub_scope>d_load</sub_scope>
    <sub_scope>h_jobs_finished</sub_scope>
    <sub_scope>h_job_finished</sub_scope>
    <sub_scope>mem_free</sub_scope>
    <sub_scope>mem_used</sub_scope>
    <sub_scope>virtual_free</sub_scope>
    </delete>
    I need to delete the obsolete data in host_values table manually , otherwise it will become a monster(over 100G for 2 months) and query speed will become very slow.
    The speed of deletion can not catch up the speed of data insert into host_values table. May be the root cause
    was the limitation of 500 line per deletion action(http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=107536).
    It's really bother gridengine user if we need to use accounting function .

    If there are detail descriptions document about how to disable data insert into database in host_values table or other tables. It will be very useful when user do not need those accounting data.

Sign up or Log in to add a comment or watch this page.


The individuals who post here are part of the extended Sun Microsystems community and they might not be employed or in any way formally affiliated with Sun Microsystems. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Sun nor any other party necessarily agrees with them.

Copyright 1994-2009 Sun Microsystems, Inc.
Powered by Atlassian Confluence
Sun Guidelines on Public Discourse Privacy Policy Terms of Use Trademarks Site Map Employment Investor Relations Contact