How Resources Are Matched to Requests

Grid Engine Home > Getting Started >

How Resources Are Matched to Requests

A Banking Analogy

As an analogy, imagine a large "money-center" bank in one of the world's capital cities. In the bank's lobby are dozens of customers waiting to be served. Each customer has different requirements. One customer wants to withdraw a small amount of money from his account. Arriving just after him is another customer, who has an appointment with one of the bank's investment specialists. She wants advice before she undertakes a complicated venture. Another customer in front of the first two customers wants to apply for a large loan, as do the eight customers in front of her.

Different customers with different needs require different types of service and different levels of service from the bank. Perhaps the bank on this particular day has many employees who can handle the one customer's simple withdrawal of money from his account. But at the same time the bank has only one or two loan officers available to help the many loan applicants. On another day, the situation might be reversed.

The effect is that customers must wait for service unnecessarily. Many of the customers could receive faster service if only their needs were immediately recognized and then matched to available resources.

If the Grid Engine system were the bank manager, the service would be organized differently:

  • On entering the bank lobby, customers would be asked to declare their name, their affiliations, and their service needs.
  • Each customer's time of arrival would be recorded.
  • Based on the information that the customers provided in the lobby, the bank might serve the following customers in the following order:
    1. Customers whose needs match suitable and immediately available resources
    2. Customers whose requirements have the highest priority
    3. Customers who were waiting in the lobby for the longest time
  • In a "Grid Engine system bank," one bank employee might be able to help several customers at the same time. The Grid Engine software would try to assign new customers to the least-loaded and most-suitable bank employee.
  • As bank manager, the Grid Engine system would allow the bank to define service policies. Typical service policies might be the following:
    • To provide preferential service to commercial customers because those customers generate more profit
    • To make sure a certain customer group is served well, because those customers have received bad service in the past
    • To ensure that customers with an appointment get a timely response
    • To provide preferential treatment to certain customers because those customers were identified by a bank executive as high priority customers
  • These policies would be implemented, monitored, and adjusted automatically by a Grid Engine system manager. Customers that have preferential access would be served sooner. Such customers would receive more attention from employees. The Grid Engine manager would recognize if the customers do not make progress. The manager would immediately respond by adjusting service levels to comply with the bank's service policies.
Jobs and Queues

In a Grid Engine system, jobs correspond to bank customers. Jobs wait in a computer holding area instead of a lobby. Queues, which provide services for jobs, correspond to bank employees. As in the case of bank customers, the requirements of each job, such as available memory, execution speed, available software licenses, and similar needs, can be very different. Only certain queues might be able to provide the corresponding service.

To continue the analogy, the Grid Engine software arbitrates available resources and job requirements in the following way:

  • A user who submits a job through the Grid Engine software declares a requirement profile for the job. In addition, the software retrieves the identity of the user. The software also retrieves the user's affiliation with projects or user groups. The time that the user submitted the job is also stored.
  • The moment that a queue is available to run a new job, the Grid Engine software determines what are the suitable jobs for the queue. The software immediately dispatches the job that has either the highest priority or the longest waiting time.
  • Queues allow concurrent execution of many jobs. The Grid Engine software tries to start new jobs in the least loaded and most suitable queue.

Usage Policies

The administrator of a cluster can define high-level usage policies that are customized according to the site. Four usage policies are available:

  • Urgency – Using this policy, each job's priority is based on an urgency value. The urgency value is derived from the job's resource requirements, the job's deadline specification, and how long the job waits before it is run.
  • Functional – Using this policy, an administrator can provide special treatment because of a user's or a job's affiliation with a certain user group, project, and so forth.
  • Share-based – Under this policy, the level of service depends on an assigned share entitlement, the corresponding shares of other users and user groups, the past usage of resources by all users, and the current presence of users within the system.
  • Override – This policy requires manual intervention by the cluster administrator, who modifies the automated policy implementation.

Policy management automatically controls the use of shared resources in the cluster to best achieve the goals of the administration. High priority jobs are dispatched preferentially. Such jobs receive higher CPU entitlements if the jobs compete for resources with other jobs. The Grid Engine software monitors the progress of all jobs and adjusts their relative priorities correspondingly and with respect to the goals defined in the policies.

Using Tickets to Administer Policies

The functional, share-based, and override policies are defined through a Grid Engine concept that is called tickets. You might compare tickets to shares of a public company's stock. The more shares of stock that you own, the more important you are to the company. If shareholder A owns twice as many shares as shareholder B, A also has twice the votes of B. Therefore shareholder A is twice as important to the company. Similarly, the more tickets that a job has, the more important the job is. If job A has twice the tickets of job B, job A is entitled to twice the resource usage of job B.

Jobs can retrieve tickets from the functional, share-based, and override policies. The total number of tickets, as well as the number retrieved from each ticket policy, often changes over time.

The administrator controls the number of tickets that are allocated to each ticket policy in total. Just as ticket allocation does for jobs, this allocation determines the relative importance of the ticket policies among each other. Through the ticket pool that is assigned to particular ticket policies, the Grid Engine software can run in different ways. For example, the software can run in a share-based mode only. Or the software can run in a combination of modes, for example, 90% share-based and 10% functional.

Using the Urgency Policy to Assign Job Priority

The urgency policy can be used in combination with two other job priority specifications:

  • The number of tickets assigned by the functional, share-based, and override policies
  • A priority value specified by the qsub -p command

A job can be assigned an urgency value, which is derived from three sources:

  • The job's resource requirements
  • The length of time that a job must wait before the job runs
  • The time at which a job must finish running

The administrator can separately weight the importance of each of these sources to arrive at a job's overall urgency value. For more information, see Managing Policies.

The following figure shows the correlation among policies in a Grid Engine system.

"Graphic shows functional

Participate
Have a best practice to share? Questions? Suggestions? Comments?

Learn More
For more on this topic, check out the following resources:

Labels

gettingstarted gettingstarted Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

Sign up or Log in to add a comment or watch this page.


The individuals who post here are part of the extended Sun Microsystems community and they might not be employed or in any way formally affiliated with Sun Microsystems. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Sun nor any other party necessarily agrees with them.

Copyright 1994-2009 Sun Microsystems, Inc.
Powered by Atlassian Confluence
Sun Guidelines on Public Discourse Privacy Policy Terms of Use Trademarks Site Map Employment Investor Relations Contact