How the System Operates

Grid Engine Home > Getting Started >

How the System Operates

The Grid Engine system does the following:

  • Accepts jobs. Jobs are users' requests for computer resources. Each job includes a description of what to do and a set of property definitions that that describe how the job should be run. Users can submit jobs via the command line interface or Grid Engine's graphical user interface, QMON. Users can also use the optional Distributed Resource Management Application API (DRMAA) to automate grid engine functions by writing scripts to submit and control jobs.
  • Holds jobs. The Sun Grid Engine master daemon holds jobs until the needed compute resources become available.
  • Sends jobs. When the compute resources become available, the master daemon sends the job to the appropriate execution host. The execution daemon on that host then executes the job.
  • Manages running jobs. The master daemon manages running jobs. At a fixed interval, the master daemon receives reports from each execution daemon.
  • Logs the record of job execution when the jobs are finished. The master daemon stores raw data. Users can also use the Accounting and Reporting Console (ARCo) to gather live reporting data from the Grid Engine system and to store the data for historical analysis in the reporting database, which is a standard SQL database.

Component Description More Info
Cluster A collection of machines, called hosts, on which Grid Engine system functions occur. See Configuring Clusters.
Master Host The master host is central to cluster activity. The master host runs the master daemon and usually also runs the scheduler. The master host requires no further configuration other than that performed by the installation procedure. By default, the master host is also an administration host and a submit host. For information about how to initially set up the master host, see How to Install the Master Host. For information about how to configure dynamic changes to the master host, see Configuring Hosts.
Master Daemon The master daemon does the following:
  • Accepts incoming jobs from users.
  • Maintains tables about hosts, queues, jobs, system load, and user permissions.
  • Performs scheduling functions and requests actions from execution daemons on the appropriate execution hosts.
  • Decides which jobs are dispatched to which queues and how to reorder and reprioritize jobs to maintain share, priority, or deadline
See Configuring Hosts.
Execution Host Systems that have permission to run Grid Engine system jobs. These systems host queue instances, and run the execution daemon. Execution hosts are systems that have permission to execute jobs. Therefore, queue instances are attached to the execution hosts. An execution host is initially set up by the installation procedure, as described in How to Install Execution Hosts. For installation planning guidance, see Host System Requirements. See Configuring Hosts for more information on managing your cluster.
Execution Daemon The execution daemon receives jobs from the master daemon and executes them locally on its host. An execution daemon is responsible for the queue instances on its host and for the running of jobs in these queue instances. Periodically, the execution daemon forwards information such as job status or load on its host to the master daemon. See Configuring Hosts.
Scheduler The scheduler is responsible for prioritizing pending jobs and deciding which jobs to schedule to which resources. For more information on the scheduler, see Managing the Scheduler.
Administration Host Administration hosts are hosts that have permission to carry out any kind of administrative activity for the Grid Engine system. See Configuring Hosts.
Submit Host Submit hosts enable users to submit and control batch jobs only. In particular, a user who is logged in to a submit host can submit jobs with the qsub command, can monitor the job status with the qstat command, and can use the Grid Engine system OSF/1 Motif graphical user interface QMON, which is described in QMON, the Grid Engine System's Graphical User Interface. See Configuring Hosts.
Shadow Master Host Shadow master hosts reduce unplanned cluster downtown. One or more shadow master hosts may be running on additional nodes in a cluster. In the case that the master daemon or the host on which it is running fails, one of the shadow masters will promote the host on which it is running to the new master daemon system by locally starting a new master daemon. See How to Configure Shadow Master Hosts.
DRMAA The optional Distributed Resource Management Application API (DRMAA) automates Sun Grid Engine functions by writing scripts that run Sun Grid Engine commands and parse the results. See Automating Grid Engine Functions Through DRMAA.
ARCo The optional Accounting and Reporting Console (ARCo) enables you to gather live reporting data from the Grid Engine system and to store the data for historical analysis in the reporting database, which is a standard SQL database. For more information, see Accounting and Reporting Console.
SDM The optional Service Domain Manager (SDM) module distributes resources between different services according to configurable Service Level Agreements (SLAs). The SLAs are based on Service Level Objectives (SLOs). SDM functionality enables you to manage resources for all kind of scalable services. See Service Domain Manager for more information.

Participate
Have a best practice to share? Questions? Suggestions? Comments?

Learn More
For more on this topic, check out the following resources:

Labels

gettingstarted gettingstarted Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

Sign up or Log in to add a comment or watch this page.


The individuals who post here are part of the extended Sun Microsystems community and they might not be employed or in any way formally affiliated with Sun Microsystems. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Sun nor any other party necessarily agrees with them.

Copyright 1994-2009 Sun Microsystems, Inc.
Powered by Atlassian Confluence
Sun Guidelines on Public Discourse Privacy Policy Terms of Use Trademarks Site Map Employment Investor Relations Contact