- 1 Administering Sun Grid Engine
- 2 Managing User Access
- 3 Setting Up a User
- 4 Configuring User Access
- 4.1 Types of Users
- 4.2 Configuring Manager Accounts
- 4.2.1 How to Configure Manager Accounts With QMON
- 4.2.2 Configuring Manager Accounts From the Command Line
- 4.3 Configuring Operator Accounts
- 4.3.1 How to Configure Operator Accounts With QMON
- 4.3.2 Configuring Operator Accounts From the Command Line
- 4.4 Configuring User Access Lists
- 4.4.1 How to Configure User Access Lists With QMON
- 4.4.2 Configuring User Access Lists From the Command Line
- 4.5 Configuring Users
- 5 Defining Projects
- 6 Configuring Default Requests
- 7 Using Path Aliasing
- 7.1 Format of Path-Aliasing Files
- 7.2 How Path-Aliasing Files Are Interpreted
- 7.2.1.1.1 Example - Path Aliasing File
- 8 Configuring Hosts and Clusters
- 8.1 About Hosts and Daemons
- 8.2 About Configuring Hosts
- 8.2.1 Invalid Host Names
- 9 Basic Cluster Configuration
- 9.1 About Basic Cluster Configuration
- 9.2 Configuring Clusters With QMON
- 9.2.1 How to Display Cluster Configuration With QMON
- 9.2.2 How to Display Global Cluster Configuration With QMON
- 9.2.3 How to Add and Modify Global and Host Configurations With QMON
- 9.2.4 How to Delete a Cluster Configuration With QMON
- 9.3 Working With Basic Cluster Configurations From the Command Line
- 10 Changing the Master Host
- 10.1 How to Migrate qmaster to Another Host by Using a Script
- 10.2 How to Migrate qmaster to Another Host Manually
- 11 Configuring Shadow Master Hosts
- 11.1 About Shadow Master Hosts
- 11.2 Shadow Master Host Requirements
- 11.3 Shadow Master Host File
- 11.4 Starting Shadow Master Hosts
- 11.5 Configuring Shadow Master Hosts Environment Variables
- 12 Configuring Hosts With QMON
- 12.1 Configuring Execution Hosts With QMON
- 12.1.1 About the Execution Host Tab
- 12.1.2 How to Add or Modify an Execution Host
- 12.1.3 How to Delete an Execution Host
- 12.1.4 How to Shut Down an Execution Host Daemon
- 12.2 Configuring Administration Hosts With QMON
- 12.3 Configuring Submit Hosts With QMON
- 12.4 Configuring Host Groups With QMON
- 13 Configuring Hosts From the Command Line
- 13.1 Configuring Execution Hosts From the Command Line
- 13.2 Configuring Administration Hosts From the Command Line
- 13.3 Configuring Submit Hosts From the Command Line
- 13.4 Configuring Host Groups From the Command Line
- 13.5 Monitoring Execution Hosts With qhost
- 13.5.1.1.1 Example - Sample qhost Output
- 13.6 Killing Daemons From the Command Line
- 13.7 Restarting Daemons From the Command Line
- 14 Configuring Queues
- 14.1 About Configuring Queues
- 14.2 How to Configure Queues With QMON
- 14.2.1 How to Configure General Parameters
- 14.2.2 How to Configure Execution Method Parameters
- 14.2.3 How to Configure the Checkpointing Parameters
- 14.2.4 How to Configure Parallel Environments
- 14.2.5 How to Configure Load and Suspend Thresholds
- 14.2.6 How to Configure Limits
- 14.2.7 How to Configure Complex Resource Attributes
- 14.2.7.1.1 Next Steps
- 14.2.8 How to Configure Subordinate Queues
- 14.2.9 How to Configure User Access Parameters
- 14.2.10 How to Configure Project Access Parameters
- 14.2.11 How to Configure Owners Parameters
- 14.3 Configuring Queues From the Command Line
- 15 Configuring Queue Calendars
- 15.1 About Queue Calendars
- 15.2 How to Configure Queue Calendars With QMON
- 15.3 Configuring Queue Calendars From the Command Line
- 16 Managing the Scheduler
- 16.1 Administering the Scheduler
- 16.1.1 About Scheduling
- 16.1.2 Scheduling Strategies
- 16.1.2.1 Dynamic Resource Management
- 16.1.2.2 Tickets
- 16.1.2.3 Queue Sorting
- 16.1.2.4 Job Sorting
- 16.1.2.5 About the Urgency Policy
- 16.1.2.6 Resource Reservation and Backfilling
- 16.1.2.7 What Happens in a Scheduler Interval
- 16.1.2.8 Scheduler Monitoring
- 16.1.3 Configuring the Scheduler
- 16.1.3.1 Default Scheduling
- 16.1.3.2 Scheduling Alternatives
- 16.1.3.3 Changing the Scheduling Algorithm
- 16.1.3.4 Scaling System Load
- 16.1.3.5 Selecting Queue by Sequence Number
- 16.1.3.6 Selecting Queue by Share
- 16.1.3.7 Restricting the Number of Jobs per User or Group
- 16.1.4 Changing the Scheduler Configuration With QMON
- 17 Managing Policies
- 17.1 About Grid Engine Policies
- 17.2 Configuring Policy-Based Resource Management With QMON
- 17.2.1 Specifying Policy Priority
- 17.2.2 Configuring the Urgency Policy
- 17.2.3 Configuring Ticket-Based Policies
- 17.2.3.1 Editing Tickets
- 17.2.3.2 Sharing Override Tickets
- 17.2.3.3 Sharing Functional Ticket Shares
- 17.2.3.3.1 Example - Functional Policy
- 17.2.3.4 Tuning Scheduling Run Time
- 17.2.3.5 Setting the Ticket Policy Hierarchy
- 17.2.4 Configuring the Share-Based Policy
- 17.2.4.1 The Half-Life Factor
- 17.2.4.2 Compensation Factor
- 17.2.4.3 Hierarchical Share Tree
- 17.2.4.4 Configuring the Share-Tree Policy With QMON
- 17.2.4.5 Node Attributes
- 17.2.4.6 Share Tree Policy Parameters
- 17.2.4.7 About the Special User default
- 17.2.5 How to Create Project-Based Share-Tree Scheduling
- 17.2.6 Configuring the Functional Policy
- 17.2.6.1 Functional Shares
- 17.2.6.2 Configuring the Functional Share Policy With QMON
- 17.2.6.3 Function Category List
- 17.2.6.4 Functional Shares Table
- 17.2.6.5 Changing Functional Configurations
- 17.2.6.6 Ratio Between Sorts of Functional Tickets
- 17.2.7 Creating User-Based, Project-Based, and Department-Based Functional Scheduling
- 17.2.8 Configuring the Override Policy
- 17.2.8.1 Configuring the Override Policy With QMON
- 17.2.8.2 Override Category List
- 17.2.8.3 Override Table
- 17.2.8.4 Changing Override Configurations
- 17.3 Configuring Policies From the Command Line
- 18 Managing Resource Quotas
- 18.1 Resource Quota Overview
- 18.1.1 About Resource Quota Sets
- 18.1.1.1.1 Example - Sample Resource Quota Set
- 18.1.2 Static and Dynamic Resource Quotas
- 18.1.2.1.1 Example - Dynamic Limit Example
- 18.2 Managing Resource Quotas With QMON
- 18.3 Monitoring Resource Quota Utilization From the Command Line
- 18.3.1.1.1 Example - Sample qquota Command
- 18.4 Configuring Resource Quotas From the Command Line
- 18.5 Resource Quota Command Line Examples
- 18.5.1.1.1 Example - Rule Set
- 18.5.1.1.2 Example - qstat Output
- 18.5.1.1.3 Example - qquota Output
- 18.6 Performance Considerations
- 18.6.1 Efficient Rule Sets
- 19 Managing Advance Reservations
- 19.1 About Advance Reservations
- 19.1.1 Capabilities
- 19.1.2 Advance Reservation States
- 19.2 Using QMON for Advance Reservations
- 19.2.1 How to Create Advance Reservations Using QMON
- 19.2.2 How to View Advance Reservations Using QMON
- 19.2.3 How to Delete Advance Reservations Using QMON
- 19.3 Configuring Advance Reservations
- 19.3.1 User Access
- 19.4 ARCo Queries for Advance Reservations
- 19.5 Advance Reservation Command Reference
- 20 Managing Parallel Environments
- 20.1 About Parallel Environments
- 20.2 How to Configure Parallel Environments With QMON
- 20.2.1 How to Add or Modify Parallel Environments
- 20.2.2 Example - Displaying Configured Parallel Environment Interfaces With QMON
- 20.3 Configuring Parallel Environments From the Command Line
- 20.4 Parallel Environment Startup Procedure
- 20.5 Termination of the Parallel Environment
- 20.6 Tight Integration of Parallel Environments and Grid Engine Software
- 21 Managing Checkpointing Environments
- 21.1 About Checkpointing
- 21.2 About Checkpointing Environments
- 21.3 How to Configure Checkpointing Environments With QMON
- 21.4 Configuring Checkpointing Environments From the Command Line
- 22 Configuring Complex Resource Attributes
- 22.1 About Complex Resource Attributes
- 22.2 Configuring Complex Resource Attributes With QMON
- 22.2.1 How to Configure Complex Resource Attributes
- 22.2.2 Assigning Resource Attributes to Queues, Hosts, and the Global Cluster
- 22.2.2.1 Queue Resource Attributes
- 22.2.2.2 Host Resource Attributes
- 22.2.2.3 Global Resource Attributes
- 22.2.2.4 Adding Resource Attributes to the Complex
- 22.2.3 Consumable Resources
- 22.2.3.1 Setting Up Consumable Resources
- 22.2.4 Examples of Setting Up Consumable Resources
- 22.3 Configuring Complex Resource Attributes From the Command Line
- 22.3.1.1.1 Example - qconf -sc Sample Output
- 23 Load Parameters
- 23.1 Default Load Parameters
- 23.2 Adding Site-Specific Load Parameters
- 23.3 Writing Your Own Load Sensors
- 23.3.1 Load Sensor Rules Format
- 23.3.2 Example of a Load Sensor Script
- 23.3.2.1.1 Example - Load Sensor Bourne Shell Script
- 24 Managing Grid Engine SMF Services
- 25 Generating Accounting Statistics (qacct)
- 26 Backing Up and Restoring Grid Engine Configuration
- 26.1 Backing Up the Grid Engine System Configuration
- 26.1.1 How to Perform a Manual Backup
- 26.1.2 How to Restore from a Backup
- 27 Improving Grid Engine Performance
- 27.1 Fine-Tuning Your Grid Environment
- 27.1.1 Scheduler Monitoring
- 27.1.2 Finished Jobs
- 27.1.3 Job Validation
- 27.1.4 Load Thresholds and Suspend Thresholds
- 27.1.5 Load Adjustments
- 27.1.6 Immediate Scheduling
- 27.1.7 Urgency Policy and Resource Reservation
- 27.2 Using DTrace for Performance Tuning
- 28 Using Files and Scripts for Administration Tasks
- 28.1 Using Files to Add or Modify Objects
- 28.2 Using Files to Modify Queues, Hosts, and Environments
- 28.2.1.1.1 Example - Changing the Queue Type
- 28.2.1.1.2 Example - Modifying the Queue Type and the Shell Start Behavior
- 28.2.1.1.3 Example - Adding Resource Attributes
- 28.2.1.1.4 Example - Attaching a Resource Attribute to a Host
- 28.2.1.1.5 Example - Changing a Resource Value
- 28.2.1.1.6 Example - Deleting a Resource Attribute
- 28.2.1.1.7 Example - Adding a Queue to the List of Queues for a Checkpointing Environment
- 28.2.1.1.8 Example - Changing the Number of Slots in a Parallel Environment
- 28.2.2 Targeting Queue Instances With the qselect Command
- 28.2.2.1.1 Example - Listing Queues
- 28.2.2.1.2 Example - Using qselect in qconf Commands
- 28.3 Using Files to Modify a Global Configuration or the Scheduler
- 28.3.1.1.1 Example - Modifying the Schedule Interval
|
Sun Grid Engine Information Center Administering Sun Grid EngineAs a Sun Grid Engine administrator, you need to perform the following tasks:
Administration ToolsTo perform these tasks, you can use either of the following mechanisms:
For general information about these administration tools, see Interacting With Sun Grid Engine. Administration TasksFor detailed information about performing Grid Engine administration tasks, see:
|
|
Sun Grid Engine Information Center Managing User AccessThis section contains information about managing user accounts and other related accounts. The following topics are covered:
|
|
Sun Grid Engine Information Center Setting Up a UserYou need to perform the following tasks to set up a user for the Grid Engine system:
|
|
Sun Grid Engine Information Center Configuring User AccessTypes of UsersThe Grid Engine system has the following four categories of users:
Queue owners can be managers, operators, or users. Queue owners are restricted to suspending and resuming, or disabling and enabling, the queues that they own. These privileges are necessary for successful use of qidle. Users are commonly declared to be owners of the queue instances that reside on their desktop workstations. See How to Configure Owners Parameters for more information. Configuring Manager AccountsHow to Configure Manager Accounts With QMON
Configuring Manager Accounts From the Command LineTo configure a manager account from the command line, type the following command with appropriate options: # qconf <options> The following options are available:
Configuring Operator AccountsHow to Configure Operator Accounts With QMON
Configuring Operator Accounts From the Command LineTo configure an operator account from the command line, type the following command with appropriate options: # qconf <options> The following options are available:
Configuring User Access ListsAny user with a valid login ID on at least one submit host and one execution host can use the Grid Engine system. However, Grid Engine system managers can prohibit access for certain users to certain queues or to all queues. Furthermore, managers can restrict the use of facilities such as specific parallel environments. See Configuring Parallel Environments for more information. To define access permissions, you must define user access lists, which are made up of named sets of users. In the Grid Engine system, these are referred to as usersets. You use user names and UNIX group names to define user access lists. The user access lists are then used either to deny or to allow access to a specific resource in any of the following configurations:
Usersets are also used to define Grid Engine system projects and departments. For details about projects, see Defining Projects. How to Configure User Access Lists With QMON
Configuring User Access Lists From the Command LineTo configure user access lists from the command line, type the following command with appropriate options: # qconf <options> The following options are available:
Configuring UsersYou must declare user names before you define the share-based, functional, or override policies for users. See Configuring Policy-Based Resource Management With QMON. If you do not want to explicitly declare user names before you define policies, the Grid Engine system can automatically create users for you, based on predefined default values. The automatic creation of users can significantly reduce the administrative burden for sites with many users. To have the system create users automatically, set the Enforce User parameter on the Cluster Settings dialog box to Auto. To set default values for automatically created users, specify values for the following Automatic User Defaults on the Cluster Settings dialog box:
For more information about the cluster configuration, see Basic Cluster Configuration. How to Configure User Objects With QMON
Configuring User Objects From the Command LineTo configure user objects from the command line, type the following command with appropriate options: # qconf <options> The following options are available:
|
|
Sun Grid Engine Information Center Defining ProjectsAbout ProjectsProjects provide a means to organize joint computational tasks from multiple users. A project also defines resource usage policies for all jobs that belong to such a project. Projects must be declared before they can be used in any of the three scheduling policy policies. Projects are used in three policy areas:
Grid Engine system managers define projects by giving them a name and some attributes. Grid Engine users can attach a job to a project when they submit the job. Attachment of a job to a project influences the job's dispatching, depending on the project's share of share-based, functional, or override tickets. How to Define Projects With QMONGrid Engine system managers can define and update definitions of projects by using the Project Configuration dialog box.
Defining Projects From the Command LineTo define projects from the command line, type the following command with appropriate options: # qconf <options> The following options are available:
|
|
Sun Grid Engine Information Center Configuring Default RequestsBatch jobs are normally assigned to queues with respect to a request profile. The user defines a request profile for a particular job. The user assembles a set of requests that must be met to successfully run the job. The scheduler considers only those queues that satisfy the set of requests for this job. If the user does not specify any requests for a job, the scheduler considers any queue to which the user has access without further restrictions. However, the Grid Engine software enables you to configure default requests that define resource requirements for jobs even when the user does not specify resource requirements explicitly. You can configure default requests globally for all users of a cluster, as well as privately for any user. The default request configuration is stored in default request files. The global request file is located under $SGE_ROOT/$SGE_CELL/common/sge_request. The user-specific request file can be located either in the user's home directory or in the current working directory. The working directory is where the qsub command is run. The user-specific request file is called .sge_request. If these files are present, they are evaluated for every job. The order of evaluation is as follows:
You can prevent the Grid Engine system from using the default request files by using the qsub -clear command, which discards any previous requirement specifications. Format of Default Request FilesThe format of both the local and the global default request files is as follows:
Suppose a user's local default request file is configured the same as test.sh, the script in the following example. Example of a Default Request File # Local Default Request File # exec job on a sun4 queue offering 5h cpu -l arch=solaris64,s_cpu=5:0:0 # exec job in current working dir -cwd To run the script, the user types the following command: % qsub test.sh The effect of running the test.sh script is the same as if the user specified all qsub options directly in the command line, as follows: % qsub -l arch=solaris64,s_cpu=5:0:0 -cwd test.sh |
|
Sun Grid Engine Information Center Using Path AliasingIn Solaris and in other networked UNIX environments, users often have the same home directory, or part of it, on different machines. For example, consider user home directories that are available across NFS and automounter. A user might have a home directory /home/foo on the NFS server. This home directory is accessible under this path on all properly installed NFS clients that are running automounter. However, /home/foo on a client is just a symbolic link to /tmp_mnt/home/foo. /tmp_mnt/home/foo is the actual location on the NFS server from where automounter physically mounts the directory. A user on a client host might use the qsub -cwd command to submit a job from somewhere within the home directory tree. The -cwd flag requires the job to be run in the current working directory. However, if the execution host is the NFS server, the Grid Engine system might not be able to locate the current working directory on that host. The reason is that the current working directory on the submit host is /tmp_mnt/home/foo, which is the physical location on the submit host. This path is passed to the execution host. However, if the execution host is the NFS server, the path cannot be resolved, because its physical home directory path is /home/foo, not /tmp_mnt/home/foo. Other occasions that can cause similar problems are the following:
To prevent such problems, the Grid Engine software enables both the administrator and the user to configure a path aliasing file. The locations of two such files are as follows:
Format of Path-Aliasing FilesBoth path-aliasing files share the same format:
How Path-Aliasing Files Are InterpretedThe files are interpreted in the following order:
Example – Path Aliasing File# cluster global path aliases file # src-path subm-host exec-host dest-path /tmp_mnt/ * * / |
|
Sun Grid Engine Information Center Configuring Hosts and ClustersThis section provides background information about configuring various aspects of the Grid Engine system. For specific configuration tasks, see the following topics:
About Hosts and DaemonsYou can classify Grid Engine system hosts into four categories, depending on which daemons are running on the system and on how the hosts are registered at sge_qmaster:
About Configuring HostsThe Grid Engine software maintains object lists for all types of hosts except for the master host. The lists of administration host objects and submit host objects indicate whether a host has administrative or submit permission. The list of execution host objects includes other parameters. Among these parameters are the load information that is reported by the sge_execd running on the host, and the load parameter scaling factors that are defined by the administrator. You can configure host objects with QMON or from the command line. Invalid Host NamesThe following host names are invalid, reserved, or otherwise not allowed to be used:
|
|
Sun Grid Engine Information Center Basic Cluster Configuration
About Basic Cluster ConfigurationThe basic cluster configuration is a set of information that is configured to reflect site dependencies and to influence Grid Engine system behavior. Site dependencies include valid paths for programs such as mail or xterm. A global configuration is provided for the master host as well as for every host in the Grid Engine system pool. In addition, you can configure the system to use a configuration local to each host to override particular entries in the global configuration. The cluster administrator should adapt the global configuration and local host configurations to the site's needs immediately after the installation. The configurations should be kept up to date afterwards. The sge_conf(5) man page contains a detailed description of the configuration entries. Configuring Clusters With QMONYou use the QMON Main Control window to configure and view information about clusters. How to Display Cluster Configuration With QMON
How to Display Global Cluster Configuration With QMON
How to Add and Modify Global and Host Configurations With QMON
See the sge_conf(5) man page for a complete description of all cluster configuration parameters. How to Delete a Cluster Configuration With QMON
Working With Basic Cluster Configurations From the Command LineYou can display or modify cluster configurations from the command line. Displaying Cluster Configurations From the Command LineTo display the current cluster configuration, use the qconf -sconf command. See the qconf(1) man page for a detailed description. Type one of the following commands: % qconf -sconf % qconf -sconf global % qconf -sconf <host>
Modifying Cluster Configurations From the Command Line
Type one of the following commands: % qconf -mconf global % qconf -mconf <host>
The qconf commands that are described here are examples of the many available qconf commands. See the qconf(1) man page for others. |
|
Sun Grid Engine Information Center Changing the Master HostThis section explains how to change the system that Grid Engine considers to be the master host by moving the sge_qmaster daemon. How to Migrate qmaster to Another Host by Using a Script
Important Notes About MigrationThe migration procedure migrates to the host on which the sgemaster -migrate command is issued. If the file primary_qmaster exists, any subsequent calls of sgemaster on the machine contained in the primary_qmaster file will cause a migration back to that machine. To avoid such a situation, change or delete the $SGE_ROOT/$SGE_CELL/common/primary_qmaster file.
Although jobs may continue to run during the migration procedure, the grid should be inactive. While the migration is taking place, any running Grid Engine commands, such as qsub or qstat, will return an error. If the current qmaster is down, the scheduler will not shut down until it times out waiting for contact with the qmaster. The shadow_masters file has no direct effect on the migration procedure. This file only exists if one or more shadow masters have been configured. For more information on how to set up shadow masters, see Configuring Shadow Master Hosts. How to Migrate qmaster to Another Host Manually
|
|
Sun Grid Engine Information Center Configuring Shadow Master HostsAbout Shadow Master HostsShadow master hosts are machines in the cluster that can detect a failure of the master daemon and take over its role as master host. When the shadow master daemon detects that the master daemon sge_qmaster has failed abnormally, it starts up a new sge_qmaster daemon on the host where the shadow master daemon is running.
The automatic failover start of a sge_qmaster on a shadow master host takes approximately one minute. Meanwhile, you get an error message whenever a Grid Engine system command is run.
Shadow Master Host RequirementsTo prepare a host as a shadow master, the following requirements must be met:
As soon as these requirements are met, the shadow-master-host facility is activated for this host. You do not have to restart the Grid Engine system daemons to activate the feature. Shadow Master Host FileThe shadow master host file, $SGE_ROOT/$SGE_CELL/common/shadow_masters, contains the following:
The format of the shadow master host file is as follows:
The order of the shadow master hosts is significant. The primary master host is the first line in the file. If the primary master host fails to proceed, the shadow master defined in the second line takes over. If this shadow master also fails, the shadow master defined in the third line takes over, and so forth. Starting Shadow Master HostsTo start a shadow sge_qmaster, the system must be sure either that the old sge_qmaster has terminated, or that it will terminate without performing actions that interfere with the newly started shadow sge_qmaster. In very rare circumstances, you might not be able to determine that the old sge_qmaster has terminated or that it will terminate. In such cases, an error message is logged to the messages log file of the sge_shadowd daemons on the shadow master hosts. See Chapter 10, Fine Tuning, Error Messages, and Troubleshooting for further information. Also, any attempts to open a tcp connection to a sge_qmaster daemon permanently fails. If this occurs, make sure that no master daemon is running, and then restart sge_qmaster manually on any of the shadow master machines. See Restarting Daemons From the Command Line for further details. Configuring Shadow Master Hosts Environment VariablesThree environment variables affect the takeover time for a shadow master:
These variables interact in the following ways:
A reasonable configuration might be to set the SGE_CHECK_INTERVAL to 45 seconds and the SGE_GET_ACTIVE_INTERVAL to 90 seconds. So, after about 2 minutes, the takeover will occur. If you want to check the operation of the shadow host after you have configured these environment variables, you will have to disconnect the master host's network cable to simulate a failure. |
|
Sun Grid Engine Information Center Configuring Hosts With QMONThe QMON Host Configuration dialog box has four tabs: The qconf command provides the command-line interface for managing host objects. See Configuring Hosts From the Command Line for more details. Configuring Execution Hosts With QMONBefore you configure an execution host, you must first install the software on the execution host as described in How to Install Execution Hosts. About the Execution Host TabTo configure execution hosts, click the Host Configuration button on the QMON Main Control window, and then click the Execution Host tab. The Execution Host tab looks like the following figure:
Note the following in the Execution Host tab:
How to Add or Modify an Execution Host
How to Delete an Execution Host
How to Shut Down an Execution Host Daemon
Configuring Administration Hosts With QMONUse the Administration Host tab to configure hosts on which administrative commands are allowed. The Host list displays the hosts that already have administrative permission. How to Add or Remove an Administration Host With QMONTo configure Administration Hosts with QMON, do the following:
Configuring Submit Hosts With QMONUse the Submit Host tab to declare the hosts from which jobs can be submitted, monitored, and controlled. The Host list displays the hosts that already have submit permission.
How to Add or Remove a Submit Host With QMON
Configuring Host Groups With QMONUse the Host Groups tab to configure host groups. The Hostgroup list displays the currently configured host groups. The Members list displays all the hosts that are members of the selected host group. About the Host Groups TabTo group similar hosts together, click the Host Configuration button on the QMON Main Control window, and then click the Host Groups tab. Host groups enable you to use a single name to refer to multiple hosts. A host group can include other host groups as well as multiple individual hosts. Host groups that are members of another host group are subgroups of that host group. For example, you might define a host group called @bigMachines that includes the following members:
The initial @ sign indicates that the name is a host group. The host group @bigMachines includes all hosts that are members of the two subgroups @solaris64 and @solaris32. @bigMachines also includes two individual hosts, fangorn and balrog. How to Add or Modify a Host Group With QMON
|
|
Sun Grid Engine Information Center Configuring Hosts From the Command Line
Configuring Execution Hosts From the Command LineTo configure execution hosts from the command line, use the following arguments for the qconf command:
Configuring Administration Hosts From the Command LineTo configure administration hosts from the command line, use the following arguments for the qconf command:
Configuring Submit Hosts From the Command LineTo configure submit hosts from the command line, use the following arguments for the qconf command:
Configuring Host Groups From the Command LineTo configure host groups from the command line, use the following arguments for the qconf command:
Monitoring Execution Hosts With qhostUse the qhost command to retrieve a quick overview of the execution host status: % qhost This command produces output that is similar to the following example: Example – Sample qhost OutputHOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - arwen aix43 1 - - - - - baumbart irix65 2 0.00 1.1G 91.5M 128.0M 0.0 boromir hp11 1 - 128.0M - 256.0M - carc lx24-amd64 2 0.00 3.8G 989.8M 1.0G 0.0 denethor aix51 1 4.54G - - - - durin lx24-x86 1 0.37 123.1M 46.5M 213.6M 26.6M eomer sol-sparc64 1 0.13 256.0M 248.0M 513.0M 93.0M lolek tru64 1 0.02 1.0G 790.0M 1.0G 8.0K mungo lx22-alpha 1 1.00 248.9M 78.8M 129.8M 2.5M nori sol-x86 2 0.38 1023.0M 372.0M 512.0M 37.0M pippin darwin 1 0.00 640.0M 264.0M 0.0 0.0 smeagol hp11 1 0.35 512.0M 425.0M 1.0G 95.0M See the qhost(1) man page for a description of the output format and for more options. Killing Daemons From the Command LineTo kill Grid Engine system daemons from the command line, use one of the following commands:
% qconf -ke[j] {<hostname>[,...] | all}
% qconf -ks
% qconf -km
You must have manager or operator privileges to use these commands. See Managing Users Access for more information about manager and operator privileges.
If you want to wait for any active jobs to finish before you run the shutdown procedure, use the qmod -dq command for each cluster queue, queue instance, or queue domain before you run the qconf sequence described above. For information about cluster queues, queue instances, and queue domains, see About Configuring Queues.
% qmod -dq {<cluster-queue> | <queue-instance> | <queue-domain>}
The qmod -dq command prevents new jobs from being scheduled to the disabled queue instances. You should then wait until no jobs are running in the queue instances before you kill the daemons. Restarting Daemons From the Command LineLog in as root on the machine on which you want to restart Grid Engine system daemons. % $SGE_ROOT/$SGE_CELL/common/sgemaster % $SGE_ROOT/$SGE_CELL/common/sgeexecd These scripts looks for the daemons that are normally running on this host and then starts them. |
|
Sun Grid Engine Information Center Configuring QueuesThis section provides the following information about configuring queues:
For information about configuring queue calendars, see Configuring Queue Calendars. About Configuring QueuesQueues are containers for different categories of jobs. Queues provide the corresponding resources for concurrent execution of multiple jobs that belong to the same category. In Sun Grid Engine, you can associate a queue with one host or with multiple hosts. Because queues can extend across multiple hosts, they are called cluster queues. Cluster queues enable you to manage a cluster of execution hosts by means of a single cluster queue configuration. Each host that is associated with a cluster queue receives an instance of that cluster queue, which resides on that host. This guide refers to these instances as queue instances. Within any cluster queue, you can configure each queue instance separately. By configuring individual queue instances, you can manage a heterogeneous cluster of execution hosts by means of a single cluster queue configuration. When you modify a cluster queue, all of its queue instances are modified simultaneously. Within a single cluster queue, you can specify differences in the configuration of queue instances. Consequently, a typical setup might have only a few cluster queues, and the queue instances controlled by those cluster queues remain largely in the background.
When you configure a cluster queue, you can associate any combination of the following host objects with the cluster queue:
Use the queue_conf(5) attribute pe_list to identify the suited PEs. Then, to link the PE and queues, use either the QMON utility or the following form of the qconf command: # qconf -mq <queue_name> A host group is a group of hosts that can be treated collectively as identical. Host groups enable you to manage multiple hosts by means of a single host group configuration. For more information about host groups, see Configuring Host Groups With QMON. When you associate individual hosts with a cluster queue, the name of the resulting queue instance on each host combines the cluster queue name with the host name. The cluster queue name and the host name are separated by an @ sign. For example, if you associate the host myexechost with the cluster queue myqueue, the name of the queue instance on myexechost is myqueue@myexechost. When you associate a host group with a cluster queue, you create what is known as a queue domain. Queue domains enable you to manage groups of queue instances that are part of the same cluster queue and whose assigned hosts are part of the same host group. A queue domain name combines a cluster queue name with a host group name, separated by an @ sign. For example, if you associate the host group myhostgroup with the cluster queue myqueue, the name of the queue domain is myqueue@@myhostgroup.
Jobs do not wait in queue instances. Jobs start running immediately as soon as they are dispatched. The scheduler's list of pending jobs is the only waiting area for jobs. Configuring queues registers the queue attributes with sge_qmaster. As soon as queues are configured, they are instantly visibly to the whole cluster and to all users on all hosts belonging to the Grid Engine system. For further details, see the queue_conf(5) man page. How to Configure Queues With QMON
How to Configure General Parameters
How to Configure Execution Method Parameters
How to Configure the Checkpointing Parameters
How to Configure Parallel Environments
How to Configure Load and Suspend Thresholds
How to Configure Limits
How to Configure Complex Resource Attributes
Next StepsUse the Complex Configuration dialog box to check or modify the current complex configuration before you attach user-defined resource attributes to a queue or before you detach them from a queue. To access the Complex Configuration dialog box, click the Complex Configuration button on the QMON Main Control window. For more information, see See Configuring Complex Resource Attributes. How to Configure Subordinate Queues
How to Configure User Access Parameters
How to Configure Project Access Parameters
How to Configure Owners Parameters
Configuring Queues From the Command LineTo configure queues from the command line, type the following command with the appropriate options: # qconf <options> The qconf command has the following options:
The qconf command provides the following set of options that you can use to change specific queue attributes:
For a description of how to use these options and for some examples of their use, see Using Files to Modify Queues, Hosts, and Environments. For detailed information about these options, see the qconf(1) man page. |
|
Sun Grid Engine Information Center Configuring Queue CalendarsFor information about configuring queues, see Configuring Queues. About Queue CalendarsQueue calendars define the availability of queues according to the day of the year, the day of the week, or the time of day. You can configure queues to change their status at specified times. You can change the queue status to disabled, enabled, suspended, or resumed (unsuspended). The Grid Engine system enables you to define a site-specific set of calendars, each of which specifies status changes and the times at which the changes occur. These calendars can be associated with queues. Each queue can attach a single calendar, thereby adopting the availability profile defined in the attached calendar. The syntax of the calendar format is described in detail in the calendar_conf(5) man page. A few examples are given in the next sections, along with a description of the corresponding administration facilities. How to Configure Queue Calendars With QMON
Configuring Queue Calendars From the Command LineTo configure queue calendars from the command line, type the following command with appropriate options: % qconf <options> The following options are available:
|
|
Sun Grid Engine Information Center Managing the SchedulerThis section contains information about scheduling Grid Engine system policy implementation through the scheduler. The following topics are covered: For information about Grid Engine policies, see Managing Policies. Administering the SchedulerThis section describes how the Grid Engine system schedules jobs for execution, describes different types of scheduling strategies, and explains how to configure the scheduler. About SchedulingThe Grid Engine system includes the following job-scheduling activities:
The Grid Engine software schedules jobs across a heterogeneous cluster of computers, based on the following criteria:
Decisions about scheduling are based on the strategy for the site and on the instantaneous load characteristics of each computer in the cluster. A site's scheduling strategy is expressed through the Grid Engine system's configuration parameters. Load characteristics are ascertained by collecting performance data as the system runs. Scheduling StrategiesThe administrator can set up strategies with respect to the following scheduling tasks:
Dynamic Resource ManagementThe Grid Engine software uses a weighted combination of the following three ticket-based policies to implement automated job scheduling strategies:
You can set up the Grid Engine system to routinely use either a share-based policy, a functional policy, or both. You can combine these policies in any combination. For example, you could give zero weight to one policy and use only the second policy. Or you could give both policies equal weight. Along with routine policies, administrators can also override share-based and functional scheduling temporarily or, for certain purposes such as express queues, permanently. You can apply an override to one job or to all jobs associated with a user, a department, a project, or a job class (that is, a queue). In addition to the three policies for mediating among all jobs, the Grid Engine system sometimes lets users set priorities among the jobs they own. For example, a user might say that jobs one and two are equally important, but that job three is more important than either job one or job two. Users can set their own job priorities if the combination of policies includes the share-based policy, the functional policy, or both. Also, functional tickets must be granted to jobs. TicketsThe share-based, functional, and override scheduling policies are implemented with tickets. Each policy has a pool of tickets. A policy allocates tickets to jobs as the jobs enter the multimachine Grid Engine system. Each routine policy that is in force allocates some tickets to each new job. The policy might also reallocate tickets to running jobs at each scheduling interval. Tickets weight the three policies. For example, if no tickets are allocated to the functional policy, that policy is not used. If the functional ticket pool and the share-based ticket pool have an equal number of tickets, both policies have equal weight in determining a job's importance. Tickets are allocated to the routine policies at system configuration by Grid Engine system managers. Managers and operators can change ticket allocations at any time with immediate effect. Additional tickets are injected into the system temporarily to indicate an override. Policies are combined by assignment of tickets. When tickets are allocated to multiple policies, a job gets a portion of each policy's tickets, which indicates the job's importance in each policy in force. The Grid Engine system grants tickets to jobs that are entering the system to indicate their importance under each policy in force. At each scheduling interval, each running job can gain tickets, lose tickets, or keep the same number of tickets. For example, a job might gain tickets from an override. A job might lose tickets because it is getting more than its fair share of resources. The number of tickets that a job holds represents the resource share that the Grid Engine system tries to grant that job during each scheduling interval. You configure a site's dynamic resource management strategy during installation. First, you allocate tickets to the share-based policy and to the functional policy. You then define the share tree and the functional shares. The share-based ticket allocation and the functional ticket allocation can change automatically at any time. The administrator manually assigns or removes tickets. Queue SortingThe following means are provided to determine the order in which the Grid Engine system attempts to fill up queues:
Job SortingBefore the Grid Engine system starts to dispatch jobs, the jobs are brought into priority order, highest priority first. The system then attempts to find suitable resources for the jobs in priority sequence. Without any administrator influence, the order is first-in, first-out (FIFO). The administrator has the following means to control the job order:
For each priority type, a weighting factor can be specified. This weighting factor determines the degree to which each type of priority affects overall job priority. To make it easier to control the range of values for each priority type, normalized values are used instead of the raw ticket values, urgency values, and POSIX priority values. The following formula expresses how a job's priority values are determined: job_priority = weight_urgency * normalized_urgency_value + weight_ticket * normalized_ticket_value + weight_priority * normalized_POSIX_priority_value You can use the qstat command to monitor job priorities:
About the Urgency PolicyThe urgency policy defines an urgency value for each job. The urgency value is derived from the sum of three contributions:
The resource requirement contribution is derived from the sum of all hard resource requests, one addend for each request. If the resource request is of the type numeric, the resource request addend is the product of the following three elements:
If the resource request is of the type string, the resource request addend is the resource's urgency value as defined in the complex. The waiting time contribution is the product of the job's waiting time, in seconds, and the waiting-weight value specified in the Policy Configuration dialog box. The deadline contribution is zero for jobs without a deadline. For jobs with a deadline, the deadline contribution is the weight-deadline value, which is defined in the Policy Configuration dialog box, divided by the free time, in seconds, until the deadline initiation time. For information about configuring the urgency policy, see Configuring the Urgency Policy. Resource Reservation and BackfillingResource reservation enables you to reserve system resources for specified pending jobs. When you reserve resources for a job, those resources are blocked from being used by jobs with lower priority. Jobs can reserve resources depending on criteria such as resource requirements, job priority, waiting time, resource sharing entitlements, and so forth. The scheduler enforces reservations in such a way that jobs with the highest priority get the earliest possible resource assignment. This avoids such well-known problems as "job starvation." You can use resource reservation to guarantee that resources are dedicated to jobs in job-priority order. Consider the following example. Job A is a large pending job, possibly parallel, that requires a large amount of a particular resource. A stream of smaller Jobs B(i) require a smaller amount of the same resource. Without resource reservation, a resource assignment for Job A cannot be guaranteed, assuming that the stream of B(i) jobs does not stop. The resource cannot be guaranteed even though Job A has a higher priority than the B(i) jobs. With resource reservation, Job A gets a reservation that blocks the lower priority Jobs B(i). Resources are guaranteed to be available for Job A as soon as possible. Backfilling enables a lower-priority job to use resources that are blocked due to a resource reservation. Backfilling work only if there is a runnable job whose prospective run time is small enough to allow the blocked resource to be used without interfering with the original reservation. In the example described earlier, a Job C, of very short duration, could use backfilling to start before Job A. Because resource reservation causes the scheduler to look ahead, using resource reservation affects system performance. In a small cluster, the effect on performance is negligible when there are only a few pending jobs. In larger clusters, however, and in clusters with many pending jobs, the effect on performance might be significant. To offset this potential performance degradation, you can limit the overall number of resource reservations that can be made during a scheduling interval. You can limit resource reservation in two ways:
You can configure the scheduler to monitor how it is influenced by resource reservation. When you monitor the scheduler, information about each scheduling run is recorded in the file $SGE_ROOT/$SGE_CELL/common/schedule. The following example shows what schedule monitoring does. Assume that the following sequence of jobs is submitted to a cluster where the global license consumable resource is limited to 5 licenses: qsub -N L4_RR -R y -l h_rt=30,license=4 -p 100 $SGE_ROOT/examples/jobs/sleeper.sh 20 qsub -N L5_RR -R y -l h_rt-30,license=5 $SGE_ROOT/examples/jobs/sleeper.sh 20 qsub -N L1_RR -R y -l h_rt=31,license=1 $SGE_ROOT/examples/jobs/sleeper.sh 20 Assume that the default priority settings in the scheduler configuration are being used: weight_priority 1.000000 weight_urgency 0.100000 weight_ticket 0.010000 The -p 100 priority of job L4_RR supersedes the license-based urgency, which results in the following prioritization: job-ID prior name --------------------- 3127 1.08000 L4_RR 3128 0.10500 L5_RR 3129 0.00500 L1_RR In this case, traces of these jobs can be found in the schedule file for six schedule intervals:
::::::::
3127:1:STARTING:1077903416:30:G:global:license:4.000000
3127:1:STARTING:1077903416:30:Q:all.q@carc:slots:1.000000
3128:1:RESERVING:1077903446:30:G:global:license:5.000000
3128:1:RESERVING:1077903446:30:Q:all.q@bilbur:slots:1.000000
3129:1:RESERVING:1077903476:31:G:global:license:1.000000
3129:1:RESERVING:1077903476:31:Q:all.q@es-ergb01-01:slots:1.000000
::::::::
3127:1:RUNNING:1077903416:30:G:global:license:4.000000
3127:1:RUNNING:1077903416:30:Q:all.q@carc:slots:1.000000
3128:1:RESERVING:1077903446:30:G:global:license:5.000000
3128:1:RESERVING:1077903446:30:Q:all.q@es-ergb01-01:slots:1.000000
3129:1:RESERVING:1077903476:31:G:global:license:1.000000
3129:1:RESERVING:1077903476:31:Q:all.q@es-ergb01-01:slots:1.000000
::::::::
3128:1:STARTING:1077903448:30:G:global:license:5.000000
3128:1:STARTING:1077903448:30:Q:all.q@carc:slots:1.000000
3129:1:RESERVING:1077903478:31:G:global:license:1.000000
3129:1:RESERVING:1077903478:31:Q:all.q@bilbur:slots:1.000000
::::::::
3128:1:RUNNING:1077903448:30:G:global:license:5.000000
3128:1:RUNNING:1077903448:30:Q:all.q@carc:slots:1.000000
3129:1:RESERVING:1077903478:31:G:global:license:1.000000
3129:1:RESERVING:1077903478:31:Q:all.q@es-ergb01-01:slots:1.000000
::::::::
3129:1:STARTING:1077903480:31:G:global:license:1.000000
3129:1:STARTING:1077903480:31:Q:all.q@carc:slots:1.000000
::::::::
3129:1:RUNNING:1077903480:31:G:global:license:1.000000
3129:1:RUNNING:1077903480:31:Q:all.q@carc:slots:1.000000
Each section shows, for a schedule interval, all resource usage that was taken into account. RUNNING entries show usage of jobs that were already running at the start of the interval. STARTING entries show the immediate uses that were decided within the interval. RESERVING entries show uses that are planned for the future, that is, reservations. The format of the schedule file is as follows:
The line :::::::: marks the beginning of a new schedule interval.
What Happens in a Scheduler IntervalThe Scheduler schedules work in intervals. Between scheduling actions, the Grid Engine system keeps information about significant events such as the following:
When scheduling occurs, the scheduler first does the following:
Then the Grid Engine system does the following tasks, as needed:
If share-based scheduling is used, the calculation takes into account the usage that has already occurred for that user or project. If scheduling is not at least in part share-based, the calculation ranks all the jobs running and waiting to run. The calculation then takes the most important job until the resources in the cluster (CPU, memory, and I/O bandwidth) are used as fully as possible. Scheduler MonitoringIf the reasons why a job does not get started are unclear to you, run the qalter -w v command for the job. The Grid Engine software assumes an empty cluster and checks whether any queue that is suitable for the job is available. Further information can be obtained by running the qstat -j job-id command. This command prints a summary of the job's request profile. The summary also includes the reasons why the job was not scheduled in the last scheduling interval. Running the qstat -j command without a job ID summarizes the reasons for all jobs not being scheduled in the last scheduling interval.
To retrieve even more detail about the decisions of the scheduler sge_schedd, use the -tsm option of the qconf command. This command forces sge_schedd to write trace output to the file. Configuring the SchedulerRefer to Configuring Policy-Based Resource Management With QMON for details on the scheduling administration of resource-sharing policies of the Grid Engine system. The following sections focus on administering the scheduler configuration sched_conf and related issues. Default SchedulingThe default scheduling is a first-in, first-out policy. In other words, the first job that is submitted is the first job the scheduler examines to dispatch it to a queue. If the first job in the list of pending jobs finds a queue that is suitable and available, that job is started first. A job ranked behind the first job can be started first only if the first job fails to find a suitable free resource. The default strategy is to select queue instances on the least-loaded host, provided that the queues deliver suitable service for the job's resource requirements. If several suitable queues share the same load, the queue to be selected is unpredictable. Scheduling AlternativesYou can modify the job scheduling and queue selection strategy in various ways:
The following sections explore these alternatives in detail. Changing the Scheduling AlgorithmThe scheduler configuration parameter algorithm provides a selection for the scheduling algorithm in use. See the sched_conf(5) man page for further information. Currently, default is the only allowed setting. Scaling System LoadTo select the queue to run a job, the Grid Engine system uses the system load information on the machines that host queue instances. This queue selection scheme builds up a load-balanced situation, thus guaranteeing better use of the available resources in a cluster. However, the system load may not always tell the truth. For example, if a multi-CPU machine is compared to a single CPU system, the multiprocessor system usually reports higher load figures, because it probably runs more processes. The system load is a measurement strongly influenced by the number of processes trying to get CPU access. But multi-CPU systems are capable of satisfying a much higher load than single-CPU machines. This problem is addressed by processor-number-adjusted sets of load values that are reported by default by sge_execd. Use these load parameters instead of the raw load values to avoid the problem described earlier. See Load Parameters and the $SGE_ROOT/doc/load_parameters.asc file for details. Another example of potentially improper interpretation of load values is when systems have marked differences in their performance potential or in their price performance ratio. In both cases, equal load values do not mean that arbitrary hosts can be selected to run a job. In this situation, the administrator should define load scaling factors for the relevant execution hosts and load parameters. See Configuring Execution Hosts With QMON, and related sections.
Another problem associated with load parameters is the need for an application-dependent and site-dependent interpretation of the values and their relative importance. The CPU load might be dominant for a certain type of application that is common at a particular site. By contrast, the memory load might be more important for another site and for the application profile to which the site's compute cluster is dedicated. To address this problem, the Grid Engine system enables the administrator to specify a load formula in the scheduler configuration file sched_conf. See the sched_conf(5) man page for more details. Site-specific information on resource usage and capacity planning can be taken into account by using site-defined load parameters and consumable resources in the load formula. See the sections Adding Site-Specific Load Parameters and Consumable Resources. Finally, the time dependency of load parameters must be taken into account. The load that is imposed by the jobs that are running on a system varies in time. Often the load, for example, the CPU load, requires some amount of time to be reported in the appropriate quantity by the operating system. If a job recently started, the reported load might not provide an accurate representation of the load that the job has imposed on that host. The reported load adapts to the real load over time. But the period of time in which the reported load is too low might lead to an over-subscription of that host. The Grid Engine system enables the administrator to specify load adjustment factors that are used in the scheduler to compensate for this problem. See the sched_conf(5) man page for detailed information on how to set these load adjustment factors. Load adjustments are used to virtually increase the measured load after a job is dispatched. In the case of oversubscribed machines, this helps to align with load thresholds. If you do not need load adjustments, you should turn them off. Load adjustments impose additional work on the scheduler in connection with sorting hosts and load thresholds verification. To disable load adjustments, on the Load Adjustment tab of the Scheduler Configuration dialog box, set the Decay Time to zero, and delete all load adjustment values in the table. See Changing the Scheduler Configuration With QMON. Selecting Queue by Sequence NumberAnother way to change the default scheme for queue selection is to set the global cluster configuration parameter queue_sort_method to seq_no instead of to the default load. In this case, the system load is no longer used as the primary method to select queues. Instead, the sequence numbers that are assigned to the queues by the queue configuration parameter seq_no define a fixed order for queue selection. The queues must be suitable for the considered job, and they must be available. See the queue_conf(5) and sched_conf(5) man pages for more details. This queue selection policy is useful if the machines that offer batch services at your site are ranked in a monotonous price per job order. For example, a job running on Machine A costs 1-unit of money. The same job costs 10-units on Machine B. And on Machine C the job costs 100-units. Thus the preferred scheduling policy is to first fill up Host A and then to use Host B. Host C is used only if no alternative remains.
Selecting Queue by ShareThe goal of this method is to place jobs so as to attempt to meet the targeted share of global system resources for each job. This method takes into account the resource capability represented by each host in relation to all the system resources. This method tries to balance the percentage of tickets for each host (that is, the sum of tickets for all jobs running on a host) with the percentage of the resource capability that particular host represents for the system. See Configuring Execution Hosts With QMON for instructions on how to define the capacity of a host. The host's load, although of secondary importance, is also taken into account in the sorting. Choose this sorting method for a site that uses the share-tree policy. Restricting the Number of Jobs per User or GroupThe administrator can assign an upper limit to the number of jobs that any user or any UNIX group can run at any time. To enforce this feature, do one of the following:
Changing the Scheduler Configuration With QMONOn the QMON Main Control window, click the Scheduler Configuration button. The Scheduler Configuration dialog box appears. The dialog box has two tabs:
To change general scheduling parameters, click the General Parameters tab. The General Parameters tab looks like the following figure. Use the General Parameters tab to set the following parameters:
Scheduler monitoring can help you find out the reason why certain jobs are not dispatched. However, providing this information for all jobs at all times can consume resources. Such information is usually not needed.
To change load adjustment parameters, click the Load Adjustment tab. The Load Adjustment tab looks like the following figure: The Load Adjustment tab displays following parameters:
To change load adjustment parameters, do the following:
See Scaling System Load for background information. See the sched_conf(5) man page for more details about the scheduler configuration. |
|
Sun Grid Engine Information Center Managing Policies
Grid Engine policies are implemented in conjunction with the Grid Engine scheduler. For information, see Managing the Scheduler. About Grid Engine PoliciesThe Grid Engine software orchestrates the delivery of computational power, based on enterprise resource policies that the administrator manages. The software uses these policies to examine available computer resources in the grid. The software gathers these resources, and then it allocates and delivers them automatically, in a way that optimizes usage across the grid. To enable cooperation in the grid, project owners must do the following:
As administrator, you can define high-level usage policies that are customized for your site. Four such policies are available:
Policy management automatically controls the use of shared resources in the cluster to achieve your goals. High-priority jobs are dispatched preferentially. These jobs receive greater CPU entitlements when they are competing with other, lower-priority jobs. The Grid Engine software monitors the progress of all jobs. It adjusts their relative priorities correspondingly, and with respect to the goals that you define in the policies. This policy-based resource allocation grants each user, team, department, and all projects an allocated share of system resources. This allocation of resources extends over a specified period of time, such as a week, a month, or a quarter. Configuring Policy-Based Resource Management With QMONOn the QMON Main Control window, click the Policy Configuration button. The Policy Configuration dialog box appears. The Policy Configuration dialog box lets you directly edit the following information:
You can also access detailed configuration dialog boxes for the three ticket-based policies.
To refresh the information displayed in the Policy Configuration dialog box, click Refresh. To save any changes that you make to the Policy Configuration, click Apply. To close the dialog box without saving changes, click Done. Specifying Policy PriorityBefore the Grid Engine system dispatches jobs, the jobs are brought into priority order, highest priority first. Without any administrator influence, the order is first-in, first-out (FIFO). On the Policy Configuration dialog box, under Policy Importance Factor, you can specify the relative importance of the three priority types that control the sorting order of jobs. For example, if you specify Priority as 1, Urgency as 0.1, and Ticket as 0.01, job priority that is specified by the qsub --p command is given the most weight, job priority that is specified by the Urgency Policy is considered next, and job priority that is specified by the Ticket Policy is given the least weight.
For more information about job priorities, see Job Sorting. You can specify a weighting factor for each priority type. This weighting factor determines the degree to which each type of priority affects overall job priority. To make it easier to control the range of values for each priority type, normalized values are used instead of the raw ticket values, urgency values, and POSIX priority values. The following formula expresses how a job's priority values are determined: Job priority = Urgency * normalized urgency value + Ticket * normalized ticket value + Priority * normalized priority value Configuring the Urgency PolicyThe Urgency Policy defines an urgency value for each job. This urgency value is determined by the sum of the following three contributing elements:
For details about how the Grid Engine system arrives at the urgency value total, see About the Urgency Policy. Configuring Ticket-Based PoliciesThe tickets that are currently assigned to individual policies are listed under Current Active Tickets in the Policy Configuration dialog box. The numbers reflect the relative importance of the policies. The numbers indicate whether a certain policy currently dominates the cluster or whether policies are in balance. Tickets provide a quantitative measure. For example, you might assign twice as many tickets to the share-based policy as you assign to the functional policy. This means that twice the resource entitlement is allocated to the share-based policy than is allocated to the functional policy. In this sense, tickets behave very much like stock shares. The total number of all tickets has no particular meaning. Only the relations between policies counts. Hence, total ticket numbers are usually quite high to allow for fine adjustment of the relative importance of the policies. Under Edit Tickets, you can modify the number of tickets that are allocated to the share tree policy and the functional policy. For details, see Editing Tickets. Select the Share Override Tickets check box to control the total ticket amount distributed by the override policy. Deselect the Share Override Tickets check box to control the importance of individual jobs relative to the ticket pools that are available for the other policies and override categories. With this setting, the number of jobs that are under a category member does not matter. The jobs always get the same number of tickets. However, the total number of override tickets in the system increases as the number of jobs with a right to receive override tickets increases. Other policies can lose importance in such cases. For detailed information, see Sharing Override Tickets. Select the Share Functional Tickets check box to give a category member a constant entitlement level for the sum of all its jobs. Deselect the check box to give each job the same entitlement level, based on its category member's entitlement. For detailed information, see Sharing Functional Ticket Shares. You can set the maximum number of jobs that can be scheduled in the functional policy. The default value is 200. You can set the maximum number of pending subtasks that are allowed for each array job. The default value is 50. Use this setting to reduce scheduling overhead. You can specify the Ticket Policy Hierarchy to resolve certain cases of conflicting policies. The resolving of policy conflicts applies particularly to pending jobs. For detailed information, see Setting the Ticket Policy Hierarchy. Editing TicketsYou can edit the total number of share-tree tickets and functional tickets. Override tickets are assigned directly through the override policy configuration. The other ticket pools are distributed automatically among jobs that are associated with the policies and with respect to the actual policy configuration. Sharing Override TicketsThe administrator assigns tickets to the different members of the override categories, that is, to individual users, projects, departments, or jobs. Consequently, the number of tickets that are assigned to a category member determines how many tickets are assigned to jobs under that category member. For example, the number of tickets that are assigned to user A determines how many tickets are assigned to all jobs of user A.
Use the Share Override Tickets check box to set the share_override_tickets parameter of sched_conf(5). This parameter controls how job ticket values are derived from their category member ticket value. When you select the Share Override Tickets check box, the tickets of the category members are distributed evenly among the jobs under this member. If you deselect the Share Override Tickets check box, each job inherits the ticket amount defined for its category member. In other words, the category member tickets are replicated for all jobs underneath. Select the Share Override Tickets check box to control the total ticket amount distributed by the override policy. With this setting, ticket amounts that are assigned to a job can become negligibly small if many jobs are under one category member. For example, ticket amounts might diminish if many jobs belong to one member of the user category. Deselect the Share Override Tickets check box to control the importance of individual jobs relative to the ticket pools that are available for the other policies and override categories. With this setting, the number of jobs that are under a category member does not matter. The jobs always get the same number of tickets. However, the total number of override tickets in the system increases as the number of jobs with a right to receive override tickets increases. Other policies can lose importance in such cases. Sharing Functional Ticket SharesThe functional policy defines entitlement shares for the functional categories. Then the policy defines shares for all members of each of these categories. The functional policy is thus similar to a two-level share tree. The difference is that a job can be associated with several categories at the same time. The job belongs to a particular user, for instance, but the job can also belong to a project or a department. However, as in the share tree, the entitlement shares that a job receives from a functional category is determined by the following:
Use the Share Functional Tickets check box to set the share_functional_shares parameter of sched_conf(5). This parameter defines how the category member shares are used to determine the shares of a job. The shares assigned to the category members, such as a particular user or project, can be replicated for each job. Alternatively, shares can be distributed among the jobs under the category member.
Those shares are comparable to stock shares. Such shares have no effect for the jobs that belong to the same category member. All jobs under the same category member have the same number of shares in both cases. But the share number has an effect when comparing the share amounts within the same category. Jobs with many siblings that belong to the same category member receive relatively small share portions if you select the Share Functional Tickets check box. On the other hand, if you clear the Share Functional Tickets check box, all sibling jobs receive the same share amount as their category member. Select the Share Functional Tickets check box to give a category member a constant entitlement level for the sum of all its jobs. The entitlement of an individual job can get negligibly small, however, if the job has many siblings. Deselect the Share Functional Tickets check box to give each job the same entitlement level, based on its category member's entitlement. The number of job siblings in the system does not matter.
Be aware that the setting of share functional shares does not determine the total number of functional tickets that are distributed. The total number is always as defined by the administrator for the functional policy ticket pool. The share functional shares parameter influences only how functional tickets are distributed within the functional policy. Example – Functional PolicyThe following example describes a common scenario where a user wishes to translate the Sun Grid Engine 5.3 Scheduler Option -user_sort true to a Sun Grid Engine 6.2 configuration but does not understand the share override functional policy ticket feature. For a plain user-based equal share, you configure your global configuration sge_conf(5) with
Then you use -weight_tickets_functional 10000 in the scheduler configuration sched_conf(5). This action causes the functional policy to be used for user-based equal share scheduling with 100 shares for each user. Tuning Scheduling Run TimePending jobs are sorted according to the number of tickets that each job has, as described in Job Sorting. The scheduler reports the number of tickets each pending job has to the master daemon sge_qmaster. However, on systems with very large numbers of jobs, you might want to turn off ticket reporting. When you turn off ticket reporting, you disable ticket-based job priority. The sort order of jobs is based only on the time each job is submitted. To turn off the reporting of pending job tickets to sge_qmaster, clear the Report Pending Job Tickets check box on the Policy Configuration dialog box. Doing so sets the report_pjob_tickets parameter of sched_conf(5) to false. Setting the Ticket Policy HierarchyTicket policy hierarchy provides the means to resolve certain cases of conflicting ticket policies. The resolving of ticket policy conflicts applies particularly to pending jobs. Such cases can occur in combination with the share-based policy and the functional policy. With both policies, assigning priorities to jobs that belong to the same leaf-level entities is done on a first-come, first-served basis. Leaf-level entities include the following:
Members of the job category are not included among leaf-level entities. So, for example, the first job of the same user gets the most, the second gets the next most, the third next, and so on. A conflict can occur if another policy mandates an order that is different. So, for example, the override policy might define the third job as the most important, whereas the first job that is submitted should come last. A policy hierarchy might gives the override policy higher priority over the share-tree policy or the functional policy. Such a policy hierarchy ensures that high-priority jobs under the override policy get more entitlements than jobs in the other two policies. Such jobs must belong to the same leaf level entity (user or project) in the share tree. The Ticket Policy Hierarchy can be a combination of up to three letters. These letters are the first letters of the names of the following three ticket policies:
Use these letters to establish a hierarchy of ticket policies. The first letter defines the top policy. The last letter defines the bottom of the hierarchy. Policies that are not listed in the policy hierarchy do not influence the hierarchy. However, policies that are not listed in the hierarchy can still be a source for tickets of jobs. However, those tickets do not influence the ticket calculations in other policies. All tickets of all policies are added up for each job to define its overall entitlement. The following examples describe two settings and how they influence the order of the pending jobs:
All combinations of the three letters are theoretically possible, but only a subset of the combinations are meaningful or have practical relevance. The last letter should always be S or F, because only those two policies can be influenced due to their characteristics described in the examples. The following form is recommended for policy_hierarchy settings: [O][S|F] If the override policy is present, O should occur as the first letter only, because the override policy can only influence. The share-based policy and the functional policy can only be influenced. Therefore S or F should occur as the last letter. Configuring the Share-Based PolicyShare-based scheduling grants each user and project its allocated share of system resources during an accumulation period such as a week, a month, or a quarter. Share-based scheduling is also called share tree scheduling. It constantly adjusts each user's and project's potential resource share for the near term, until the next scheduling interval. Share-based scheduling is defined for user or for project, or for both. Share-based scheduling ensures that a defined share is guaranteed to the instances that are configured in the share tree over time. Jobs that are associated with share-tree branches where fewer resources were consumed in the past than anticipated are preferred when the system dispatches jobs. At the same time, full resource usage is guaranteed, because unused share proportions are still available for pending jobs associated with other share-tree branches. By giving each user or project its targeted share as far as possible, groups of users or projects also get their targeted share. Departments or divisions are examples of such groups. Fair share for all entities is attainable only when every entity that is entitled to resources contends for those resources during the accumulation period. If a user, a project, or a group does not submit jobs during a given period, the resources are shared among those who do submit jobs. Share-based scheduling is a feedback scheme. The share of the system to which any user or user-group, or project or project-group, is entitled is a configuration parameter. The share of the system to which any job is entitled is based on the following factors:
The Grid Engine software keeps track of how much usage users and projects have already received. At each scheduling interval, the Scheduler adjusts all jobs' share of resources. Doing so ensures that all users, user groups, projects, and project groups get close to their fair share of the system during the accumulation period. In other words, resources are granted or are denied to keep everyone more or less at their targeted share of usage. The Half-Life FactorHalf-life is how fast the system "forgets" about a user's resource consumption. The administrator decides whether to penalize a user for high resource consumption, be it six months ago or six days ago. The administrator also decides how to apply the penalty. On each node of the share tree, Grid Engine software maintains a record of users' resource consumption. With this record, the system administrator can decide how far to look back to determine a user's under-usage or over-usage when setting up a share-based policy. The resource usage in this context is the mathematical sum of all the computer resources that are consumed over a "sliding window of time." The length of this window is determined by a "half-life" factor, which in the Grid Engine system is an internal decay function. This decay function reduces the impact of accrued resource consumption over time. A short half-life quickly lessens the impact of resource overconsumption. A longer half-life gradually lessens the impact of resource overconsumption. This half-life decay function is a specified unit of time. For example, consider a half-life of seven days that is applied to a resource consumption of 1,000 units. This half-life decay factor results in the following usage "penalty" adjustment over time:
The half-life-based decay diminishes the impact of a user's resource consumption over time, until the effect of the penalty is negligible.
Compensation FactorSometimes the comparison shows that actual usage is well below targeted usage. In such a case, the adjusting of a user's share or a project's share of resource can allow a user to dominate the system. Such an adjustment is based on the goal of reaching target share. This domination might not be desirable. The compensation factor enables an administrator to limit how much a user or a project can dominate the resources in the near term. For example, a compensation factor of two limits a user's or project's current share to twice its targeted share. Assume that a user or a project should get 20 percent of the system resources over the accumulation period. If the user or project currently gets much less, the maximum that it can get in the near term is only 40 percent. The share-based policy defines long-term resource entitlements of users or projects as per the share tree. When combined with the share-based policy, the compensation factor makes automatic adjustments in entitlements. If a user or project is either under or over the defined target entitlement, the Grid Engine system compensates. The system raises or lowers that user's or project's entitlement for a short term over or under the long-term target. This compensation is calculated by a share tree algorithm. The compensation factor provides an additional mechanism to control the amount of compensation that the Grid Engine system assigns. The additional compensation factor (CF) calculation is carried out only if the following conditions are true:
If either condition is not true, or if both conditions are not true, the compensation as defined and implemented by the share-tree algorithm is used. The smaller the value of the CF, the greater is its effect. If the value is greater than 1, the Grid Engine system's compensation is limited. The upper limit for compensation is calculated as long-term-entitlement multiplied by the CF. And as defined earlier, the short-term entitlement must exceed this limit before anything happens based on the compensation factor. If the CF is 1, the Grid Engine system compensates in the same way as with the raw share-tree algorithm. So a value of one has an effect that is similar to a value of zero. The only difference is an implementation detail. If the CF is one, the CF calculations are carried out without an effect. If the CF is zero, the calculations are suppressed. If the value is less than 1, the Grid Engine system overcompensates. Jobs receive much more compensation than they are entitled to based on the share-tree algorithm. Jobs also receive this overcompensation earlier, because the criterion for activating the compensation is met at lower short-term entitlement values. The activating criterion is short-term-entitlement > long-term-entitlement * CF. Hierarchical Share TreeThe share-based policy is implemented through a hierarchical share tree. The share tree specifies, for a moving accumulation period, how system resources are to be shared among all users and projects. The length of the accumulation period is determined by a configurable decay constant. The Grid Engine system bases a job's share entitlement on the degree to which each parent node in the share tree reaches its accumulation limit. A job's share entitlement is based on its leaf node share allocation, which in turn depends on the allocations of its parent nodes. All jobs associated with a leaf node split the associated shares. The entitlement derived from the share tree is combined with other entitlements, such as entitlements from a functional policy, to determine a job's net entitlement. The share tree is allotted the total number of tickets for share-based scheduling. This number determines the weight of share-based scheduling among the four scheduling policies. The share tree is defined during installation. The share tree can be altered at any time. When the share tree is edited, the new share allocations take effect at the next scheduling interval. Configuring the Share-Tree Policy With QMONOn the QMON Policy Configuration dialog box, click Share Tree Policy. The Share Tree Policy dialog box appears. Node AttributesUnder Node Attributes, the attributes of the selected node are displayed:
When a user node or a project node is removed and then added back, the user's or project's usage is retained. A node can be added back either at the same place or at a different place in the share tree. You can zero out that usage before you add the node back to the share tree. To do so, first remove the node from the users or projects configured in the Grid Engine system. Then add the node back to the users or projects there. Users or projects that were not in the share tree but that ran jobs have nonzero usage when added to the share tree. To zero out usage when you add such users or projects to the tree, first remove them from the users or projects configured in the Grid Engine system. Then add them to the tree. To add an interior node under the selected node, click Add Node. A blank Node Info window appears, where you can enter the node's name and number of shares. You can enter any node name or share number. To add a leaf node under the selected node, click Add Leaf. A blank Node Info window appears, where you can enter the node's name and number of shares. The node's name must be an existing Grid Engine user (Configuring User Objects With QMON) or project (Defining Projects). The following rules apply when you are adding a leaf node:
To edit the selected node, click Modify. A Node Info window appears. The window displays the mode's name and its number of shares. To cut or copy the selected node to a buffer, click Cut or Copy. To paste under the selected node the contents of the most recently cut or copied node, click Paste. To delete the selected node and all its descendants, click Delete. To clear the entire share-tree hierarchy, click Clear Usage. Clear the hierarchy when the share-based policy is aligned to a budget and needs to start from scratch at the beginning of each budget term. The Clear Usage facility also is handy when setting up or modifying test Grid Engine software environments. QMON periodically updates the information displayed in the Share Tree Policy dialog box. Click Refresh to force the display to refresh immediately. To save all the node changes that you make, click Apply. To close the dialog box without saving changes, click Done. To search the share tree for a node name, click Find, and then type a search string. Node names are indicated which begin with the case sensitive search string. Click Find Next to find the next occurrence of the search string. Click Help to open the online help system. Share Tree Policy ParametersTo display the Share Tree Policy Parameters, click the arrow at the right of the Node Attributes.
The actual usage of a user or project can be far below its targeted usage. The compensation factor prevents such users or projects from dominating resources when they first get those resources. See Compensation Factor for more information. About the Special User defaultYou can use the special user default to reduce the amount of share-tree maintenance for sites with many users. Under the share-tree policy, a job's priority is determined based on the node that the job maps to in the share tree. Users who are not explicitly named in the share tree are mapped to the default node, if it exists. The specification of a single default node allows for a simple share tree to be created. Such a share tree makes user-based fair sharing possible. You can use the default user also in cases where the same share entitlement is assigned to most users. Same share entitlement is also known as equal share scheduling. The default user configures all user entries under the default node, giving the same share amount to each user. Each user who submits jobs receives the same share entitlement as that configured for the default user. To activate the facility for a particular user, you must add this user to the list of Grid Engine users. The share tree displays "virtual" nodes for all users who are mapped to the default node. The display of virtual nodes enables you to examine the usage and the fair-share scheduling parameters for users who are mapped to the default node. You can also use the default user for "hybrid" share trees, where users are subordinated under projects in the share tree. The default user can be a leaf node under a project node. The short-term entitlements of users vary according to differences in the amount of resources that the users consume. However, long-term entitlements of users remain the same. You might want to assign lower or higher entitlements to some users while maintaining the same long-term entitlement for all other users. To do so, configure a share tree with individual user entries next to the default user for those users with special entitlements. In Example A, all users submitting to Project A get equal long-term entitlements. The users submitting to Project B only contribute to the accumulated resource consumption of Project B. Entitlements of Project B users are not managed. Example A
Compare Example A with Example B: Example B
In Example B, treatment for Project A is the same as for Example A. But all default users who submit jobs to Project B, except users A and B, receive equal long-term resource entitlements. Default users have 20 shares. User A, with 10 shares, receives half the entitlement of the default users. User B, with 40 shares, receives twice the entitlement as the default users. How to Create Project-Based Share-Tree SchedulingThe objective of this setup is to guarantee a certain share assignment of all the cluster resources to different projects over time.
Configuring the Functional PolicyFunctional scheduling is a nonfeedback scheme for determining a job's importance. Functional scheduling associates a job with the submitting user, project, or department. Functional scheduling is sometimes called priority scheduling. The functional policy setup ensures that a defined share is guaranteed to each user, project, job, or department at any time. Jobs of users, projects, or departments that have used fewer resources than anticipated are preferred when the system dispatches jobs to idle resources. At the same time, full resource usage is guaranteed, because unused share proportions are distributed among those users, projects, departments, and jobs that need the resources. Past resource consumption is not taken into account. Functional policy entitlement to system resources is combined with other entitlements in determining a job's net entitlement. For example, functional policy entitlement might be combined with override policy entitlement. The total number of tickets that are allotted to the functional policy determines the weight of functional scheduling among the scheduling policies. During installation, the administrator divides the total number of functional tickets among the functional categories of user, department, project, and job. Functional SharesFunctional shares are assigned to every member of each functional category: user, department, project, and job. These shares indicate the proportion of the tickets for a category to which each job associated with a member of the category is entitled. For example, user davidson has 200 shares, and user donlee has 100. A job submitted by davidson is entitled to twice as many user-functional-tickets as a job submitted by donlee. The functional tickets that are allotted to each category are shared among all the jobs that are associated with a particular category. Configuring the Functional Share Policy With QMONAt the bottom of the QMON Policy Configuration dialog box, click Functional Policy. The Functional Policy dialog box appears. Function Category ListSelect the functional category for which you are defining functional shares: user, project, department, or job. Functional Shares TableThe table under Functional Shares is scrollable. The table displays the following information:
QMON periodically updates the information displayed in the Functional Policy dialog box. Click Refresh to force the display to refresh immediately. To save all node changes that you make, click Apply. To close the dialog box without saving changes, click Done. Changing Functional ConfigurationsClick the jagged arrow above the Functional Shares table to open a configuration dialog box.
Ratio Between Sorts of Functional TicketsTo display the Ratio Between Sorts Of Functional Tickets, click the arrow at the right of the Functional Shares table. User [%], Department [%], Project [%], and Job [%] always add up to 100%. When you change any of the sliders, all other unlocked sliders change to compensate for the change. When a lock is open, the slider that it guards can change freely. The slider can change either because it is moved or because the moving of another slider causes this slider to change. When a lock is closed, the slider that it guards cannot change. If four locks are closed and one lock is open, no sliders can change.
Creating User-Based, Project-Based, and Department-Based Functional SchedulingUse this setup to create a certain share assignment of all the resources in the cluster to different users, projects, or departments. First-come, first-served scheduling is used among jobs of the same user, project, or department.
Configuring the Override PolicyOverride scheduling enables a Grid Engine system manager or operator to dynamically adjust the relative importance of one job or of all jobs that are associated with a user, a department, or a project. This adjustment adds tickets to the specified job, user, department, or project. By adding override tickets, override scheduling increases the total number of tickets that a user, department, project, or job has. As a result, the overall share of resources is increased. The addition of override tickets also increases the total number of tickets in the system. These additional tickets deflate the value of every job's tickets. You can use override tickets for the following two purposes:
Override tickets that are assigned directly to a job go away when the job finishes. All other tickets are inflated back to their original value. Override tickets that are assigned to users, departments, projects, and jobs remain in effect until the administrator explicitly removes the tickets. The Policy Configuration dialog box displays the current number of override tickets that are active in the system.
Configuring the Override Policy With QMONAt the bottom of the Policy Configuration dialog box, click Override Policy. The Override Policy dialog box appears. Override Category ListSelect the category for which you are defining override tickets: user, project, department, or job. Override TableThe override table is scrollable. It displays the following information:
QMON periodically updates the information that is displayed in the Override Policy dialog box. Click Refresh to force the display to refresh immediately. To save all override changes that you make, click Apply. To close the dialog box without saving changes, click Done. Changing Override ConfigurationsClick the jagged arrow above the override table to open a configuration dialog box.
Configuring Policies From the Command Line
Configuring the Share-Based Policy From the Command Line
To configure the share-based policy from the command line, use the qconf command with appropriate options.
Configuring the Functional Share Policy From the Command LineTo configure the functional share policy from the command line, use the qconf command with the appropriate options.
To assign functional shares to jobs, use the -js job_share option with the qsub, qsh, qrsh, qlogin, and qalter commands. The -js job_share option defines or redefines the job share of the job relative to other jobs. job_share is an unsigned integer value. The default job_share value for jobs is 0. Configuring the Override Policy From the Command LineTo configure the override policy from the command line, use the qconf command with the appropriate options.
To change the number of override tickets for the specified job, use the qalter -ot override_tickets command. |
|
Sun Grid Engine Information Center Managing Resource QuotasThis section explains how to use the resource quotas feature of the Grid Engine software to limit resources by user, project, host, cluster queue, or parallel environment. For convenience, you can express these limits using user access lists, departments, or host groups. This section covers the following topics: Resource Quota OverviewTo prevent users from consuming all available resources, the Grid Engine software supports complex attributes that you can configure on a global, queue or host layer. While this layered resource management approach is powerful, the approach leaves gaps that become particularly important in large installations that consist of many different custom resources, user groups, and projects. The resource quota feature closes this gap by enabling you to manage these enterprise environments to the extent that you can control which project or department must abdicate when single bottleneck resources run out. The resource quota feature enables you to apply limits to several kinds of resources and resource consumers, to all jobs in the cluster, and to combinations of consumers. In this context, resources are any defined complex attribute known by the Sun Grid Engine configuration. For more information about complex attributes, see the complex(5) man page. Resources can be slots, arch, mem_total, num_proc, swap_total, built-in resources, or any custom-defined resource like compiler_license. Resource consumers are (per) users, (per) queues, (per) hosts, (per) projects, and (per) parallel environments. The resource quota feature provides a way for you to limit the resources that a consumer can use at any time. This limitation provides an indirect method to prioritize users, departments, and projects. To define directly the priorities by which a user should obtain a resource, use the resource urgency and share-based policies described in Configuring the Urgency Policy and Configuring the Share-Based Policy. To limit resources through the Grid Engine software, use the qquota and qconf commands, or the QMON graphical interface. For more information, see the qquota(1) and qconf(1) man pages. About Resource Quota SetsResource quota sets enable you to specify the maximum resource consumption for any job requests. Once you define the resource quota sets, the scheduler uses them to select the next possible jobs to be run by watching that the quotas will not be exceeded. The ultimate result of setting resource quotas is that only those jobs that do not exceed their resource quotas will be scheduled and run. A resource quota set defines a maximum resource quota for a particular job request. All of the configured rule sets apply all of the time. If multiple resource quota sets are defined, the most restrictive set applies. Every resource quota set consists of one or more resource quota rules. These rules are evaluated in order, and the first rule that matches a specific request is used. A resource quota set always results in at most one effective resource quota rule for a specific request. A resource quota set consists of the following information:
Example – Sample Resource Quota SetThe following example resource quota set restricts user1 and user2 to 2 Gbytes of free virtual space on each host in the host group lx_hosts.
{
name max_virtual_free_on_lx_hosts
description "resource quota for virtual_free restriction"
enabled true
limit users {user1,user2} hosts {@lx_host} to virtual_free=2g
}
Static and Dynamic Resource QuotasResource quota rules always define a maximum value of a resource that can be used. In most cases, these values are static and equal for all matching filter scopes. Although you could define several different rules to apply to different scopes, you would then have several rules that are nearly identical. Instead of duplicating rules, you can instead define a dynamic limit. A dynamic limit uses an algebraic expression to derive the rule limit value. The algebraic formula can reference a complex attribute whose value is used to calculate the resulting limit. Example – Dynamic Limit ExampleThe following example illustrates the use of dynamic limits. Users are allowed to use five slots per CPU on all Linux hosts.
limit hosts {@linux_hosts} to slots=$num_proc*5
The value of num_proc is the number of processors on the host. The limit is calculated by the formula $num_proc*5, and can be different on each host. Expanding the example above, you could have the following resulting limits:
Instead of num_proc, you could use any other complex attribute known for a host as either a load value or a consumable resource. Managing Resource Quotas With QMONThe following task explains how to set resource quotas using the QMON graphical interface. How to Set Resource Quotas Using QMON
Monitoring Resource Quota Utilization From the Command LineUse the qquota command to view information about the current Sun Grid Engine resource quotas. The qquota command lists each resource quota that is being used at least once or that defines a static limit. For each applicable resource quota, qquota displays the following information:
The qquota command includes several options that you can use to limit the information to a specific host, cluster queue, project, parallel environments, resource, or user. If you use no options, qquota displays information about resource sets that apply to the user name from which you invoke the command. For more information, see the qquota(1) man page. Example – Sample qquota CommandThe following example shows information about the resource quota sets that apply to user user1: $ qquota -u user1 resource quota limit filter -------------------------------------------------------------------------------- maxujobs/1 slots=5/20 - max_linux/1 slots=5/5 hosts @linux max_per_host/1 slots=1/2 users user1 hosts host2 Configuring Resource Quotas From the Command LineUse the qconf command to add, modify, or delete resource quota sets and rules.
For more information about qconf, see the qconf(1) man page. Resource Quota Command Line ExamplesThe following example shows how you can use the various commands for resource quotas. The rule set shown in Example – Rule Set defines the following limit:
To configure the rule set, use one of the following forms of the qconf command:
After jobs are submitted for different users, the qstat command shows output similar to the example shown in Example – qstat Output. Example – Rule Set
{
name maxujobs
limit users * to slots=20
}
{
name max_linux
limit users * hosts @linux to slots=5
}
{
name max_per_host
limit users MyUser hosts {@linux} to slots=2
limit users {*} hosts {@linux} to slots=1
limit users * hosts * to slots=0
}
Example – qstat Output
$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
---------------------------------------------------------------------------------------------
27 0.55500 Sleeper MyUser r 02/21/2006 15:53:10 all.q@host1 1
29 0.55500 Sleeper MyUser r 02/21/2006 15:53:10 all.q@host1 1
30 0.55500 Sleeper MyUser r 02/21/2006 15:53:10 all.q@host2 1
26 0.55500 Sleeper MyUser r 02/21/2006 15:53:10 all.q@host2 1
28 0.55500 Sleeper user1 r 02/21/2006 15:53:10 all.q@host2 1
Example – qquota Output$ qquota # as user MyUser resource quota rule limit filter -------------------------------------------------------------------------------- maxujobs/1 slots=5/20 - max_linux/1 slots=5/5 hosts @linux max_per_host/1 slots=2/2 users MyUser hosts host2 max_per_host/1 slots=2/2 users MyUser hosts host1 $ qquota -h host2 # as user MyUser resource quota limit filter -------------------------------------------------------------------------------- maxujobs/1 slots=5/20 - max_linux/1 slots=5/5 hosts @linux max_per_host/1 slots=2/2 users MyUser hosts host2 $ qquota -u user1 resource quota limit filter -------------------------------------------------------------------------------- maxujobs/1 slots=5/20 - max_linux/1 slots=5/5 hosts @linux max_per_host/1 slots=1/2 users user1 hosts host2 $ qquota -u * resource quota limit filter -------------------------------------------------------------------------------- maxujobs/1 slots=5/20 - max_linux/1 slots=5/5 hosts @linux max_per_host/1 slots=2/2 users MyUser hosts host1 max_per_host/1 slots=2/2 users MyUser hosts host2 max_per_host/1 slots=1/2 users user1 hosts host2 Performance ConsiderationsEfficient Rule SetsTo provide the most efficient processing of jobs and resources in queues, put the most restrictive rule at the first position of a rule set. Following this convention helps the Sun Grid Engine scheduler to restrict the amount of suited queue instances in a particularly efficient manner, because the first rule is never shadowed by any subsequent rule in the same rule set and thus always stands for itself. To illustrate this rule, consider an environment similar to the following:
In such an environment, you might define a single rule set as follows:
{
name 30_for_each_project
description "not more than 30 per project"
enabled TRUE
limit projects {*} queues Q001 to F001=30
limit projects {*} queues Q002 to F002=30
limit projects {*} queues Q003 to F003=30
limit projects {*} queues Q004 to F004=30
limit to F001=0,F002=0,F003=0,F004=0
}
The single rule set limits the utilization of each managed resource to 30 for each project and constrains the jobs in eligible queues at the same time. This will work fine, but in a larger cluster with many hosts, the single rule set would become the cause of slow job dispatching. To help the Sun Grid Engine scheduler to foreclose as many queue instances as possible during matchmaking, use four separate rule sets.
{
name 30_for_each_project_in_Q001
description "not more than 30 per project of F001 in Q001"
enabled TRUE
limit queues !Q001 to F001=0
limit projects {*} queues Q001 to F001=30
}
{
name 30_for_each_project_in_Q002
description "not more than 30 per project of F002 in Q002"
enabled TRUE
limit queues !Q002 to F002=0
limit projects {*} queues Q002 to F002=30
}
{
name 30_for_each_project_in_Q003
description "not more than 30 per project of F003 in Q003"
enabled TRUE
limit queues !Q003 to F003=0
limit projects {*} queues Q003 to F003=30
}
{
name 30_for_each_project_in_Q004
description "not more than 30 per project of F004 in Q004"
enabled TRUE
limit queues !Q004 to F004=0
limit projects {*} queues Q004 to F004=30
}
These four rule sets constrain the very same per project resource quotas as the single rule set. However, the four rule sets can be processed much more efficiently due to unsuitable queue instances being shielded first. Consolidating these shields into a single resource quota set would not be doable in this case.
{
name 30_for_each_project_in_Q001
description "not more than 30 per project of F001/F002 in Q001"
enabled TRUE
limit queues !Q001 to F001=0,F002=0
limit projects {*} queues Q001 to F001=30,F002=30
}
{
name 30_for_each_project_in_Q002
description "not more than 30 per project of F003/F004 in Q002"
enabled TRUE
limit queues !Q002 to F003=0,F004=0
limit projects {*} queues Q002 to F003=30,F004=30 }
In this example, the queues are consolidated from Q001-Q004 down to Q001-Q002. However, this actually increases overall cluster utilization and throughput. |
|
Sun Grid Engine Information Center Managing Advance Reservations
About Advance ReservationsAn advance reservation is a reservation (possibly independent of a particular job) that a user or administrator can request and the scheduler can create. This reservation causes the associated resources to be reserved for the specified user, administrator, or job. An advance reservation might limit a particular resource capability over a defined time interval. The actual resource is likely obtained by the requestor (scheduler) from the resource owner through a negotiation process. You might better understand the concept of an advance reservation if you think about a travel reservation system. Using the Sun Grid Engine resource reservation capability, all passengers are guaranteed to get on a plane flight in the order in which the passengers arrive at the airport. What you really want is to be able to reserve your flights in advance so that you can arrange your specific flight schedule before you arrive at the airport. Grid Engine enables you to make those arrangements in advance based on an allocation scheme that the scheduler uses. CapabilitiesAn advance reservation is defined by the following:
Advance Reservation StatesGrid Engine supports the following advance reservation states:
Using QMON for Advance ReservationsHow to Create Advance Reservations Using QMON
How to View Advance Reservations Using QMON
How to Delete Advance Reservations Using QMON
Configuring Advance ReservationsUser AccessThe ability to create an advance reservation is limited to members of the arusers list. How to Enable a User to Create Advance Reservations
ARCo Queries for Advance ReservationsThe Accounting and Reporting Console (ARCo) provides several queries that specifically apply to advance reservations:
For more information about ARCo, see Accounting and Reporting Console (ARCo). Advance Reservation Command ReferenceThe following Grid Engine commands enable you to manage advance reservations. In addition, many standard Grid Engine commands supply information about your Advance Reservations. For example, the qsub command now includes a switch -ar that lets you specify the advance reservation into which to submit a specific job. qrsubUse the qrsub command to create an advance reservation and submit it to the Sun Grid Engine queuing system. You may define default request files (analogous to sge_request for qsub) that can contain any of the possible command line options. The file names are $SGE_ROOT/$SGE_CELL/common/sge_ar_request (global defaults file) and $HOME/.sge_ar_request (user private defaults file). OptionsMany of the options for qrsub are the same as those for qsub. For more information, see the submit(1) man page.
qrsub ExamplesThe following example reserves an slot in the queue all.q on host1 or host2 or host3.
qrsub -q "*@host1,*@host2,*@host3" -u $user -a 01121200 -d 1:0:0
The following example reserves 4 slots on a host with arch=sol-sparc64. qrsub -pe alloc_pe_slots 4 -l a=sol-sparc64 -u $user -a 01121200 -d 1:0:0 qrdelUse the qrdel command to delete an advance reservation. The qrdel command requires at least one advance reservation identifier, which can be either an AR-ID (number) or an AR name. The qrdel command deletes ARs in the order in which their identifiers are presented. Jobs referring to a advance reservation that is tagged for deletion will also be removed. Only if all jobs referring an AR are removed from the Sun Grid Engine database, the reservation also will be removed. Options
ExampleThe following example deletes the advance reservation 193. qrdel 193 qrstatUse the qrstat command to view the current status of the granted Sun Grid Engine advance reservations. You can get information about specific ARs or users. Without any options, qrstat displays an overview of all reservations. Options
ExamplesThe first example shows information about all advance reservations. The second example shows detailed information about the advance reservation whose ID is 193.
% qrstat
AR-ID name owner state start at end at duration
---------------------------------------------------------------------------------------
192 project_xy user1 r 12/14/2006 14:47:23 12/14/2006 14:57:33 0:10:10
193 user2 w 12/18/2006 10:00:00 12/19/2006 10:00:10 24:0:10
% qrstat -ar 193
==============================================================
id: 193
ar_name:
submission_time: Mon Nov 27 17:11:34 2006
owner: user1
acl_list: user1,user2
start_time: Mon Dec 18 10:00:00 2006
end_time: Tue Dec 19 10:00:10 2006
duration: 24:0:10
granted_slots: all.q@host1=2,all.q@host2=1
resource_list: myapp=2,myapp=1
...
|
|
Sun Grid Engine Information Center Managing Parallel Environments
About Parallel EnvironmentsA parallel environment (PE) is a software package that enables concurrent computing on parallel platforms in networked environments. A variety of systems have evolved over the past years into viable technology for distributed and parallel processing on various hardware platforms. The following are two examples of the most common message-passing environments:
Public domain as well as hardware vendor-provided implementations exist for both tools. All these systems show different characteristics and have separate requirements. To handle parallel jobs running on top of such systems, the Grid Engine system provides a flexible, powerful interface that satisfies various needs. The Grid Engine system enables you to run parallel jobs through the following programs:
Any number of different parallel environment interfaces can be configured concurrently. Interfaces between parallel environments and the Grid Engine system can be implemented if suitable startup and stop procedures are provided. The startup procedure and the stop procedure are described in Parallel Environment Startup Procedure and in Termination of the Parallel Environment. How to Configure Parallel Environments With QMON
How to Add or Modify Parallel Environments
Example – Displaying Configured Parallel Environment Interfaces With QMONThe following example defines a parallel job to be submitted. The job requests that the parallel environment interface mpi (message passing interface) be used with from 4 to 16 processes, with 16 being preferable.
Configuring Parallel Environments From the Command LineType the qconf command with appropriate options: qconf <options> The following options are available:
Example – Configuring a Parallel Environment From the Command LineThe qsub command that corresponds to the parallel job specification in Example -- Displaying Configured Parallel Environment Interfaces With QMON is as follows: % qsub -N Flow -p -111 -P devel -a 200012240000.00 -cwd \ -S /bin/tcsh -o flow.out -j y -pe mpi 4-16 \ -v SHARED_MEM=TRUE,MODEL_SIZE=LARGE \ -ac JOB_STEP=preprocessing,PORT=1234 \ -A FLOW -w w -r y -m s,e -q big_q\ -M me@myhost.com,me@other.address \ flow.sh big.data This example shows how to use the qsub -pe command to formulate an equivalent request. The qsub(1) man page provides more details about the -pe option. Select a suitable parallel environment interface for a parallel job, keeping the following considerations in mind:
Ask your Grid Engine administrator for the available parallel environment interfaces best suited for your types of parallel jobs. You can specify resource requirements along with your parallel environment request. The specifying of resource requirements further reduces the set of eligible queues for the parallel environment interface to those queues that fit the requirement. See Managing Resource Quotas. For example, assume that you run the following command: % qsub -pe mpi 1,2,4,8 -l nastran,arch=osf nastran.par The queues that are suitable for this job are queues that are associated with the parallel environment interface mpi by the parallel environment configuration. Suitable queues also satisfy the resource requirement specification specified by the qsub -l command. Parallel Environment Startup ProcedureThe Grid Engine software starts the parallel environment by using the exec system call to invoke a startup procedure. The name of the startup executable and the parameters passed to this executable are configurable from within the Grid Engine software. An example for such a startup procedure for the PVM environment is contained in the distribution tree of the Grid Engine software. The startup procedure is made up of a shell script and a C program that is invoked by the shell script. The shell script uses the C program to start up PVM cleanly. All other required operations are handled by the shell script. The shell script is located under $SGE_ROOT/pvm/startpvm.sh. The C program file is located under $SGE_ROOT/pvm/src/start_pvm.c.
The example script startpvm.sh requires the following three arguments:
These parameters can be passed to the startup script as described in Configuring Parallel Environments With QMON. The parameters are among the parameters provided to parallel environment startup and stop scripts by the Grid Engine software during runtime. The required host file, as an example, is generated by the Grid Engine software. The name of the file can be passed to the startup procedure in the parallel environment configuration by the special parameter name $pe_hostfile. A description of all available parameters is provided in the sge_pe(5) man page. The host file has the following format:
This file format is generated by the Grid Engine software. The file format is fixed. Parallel environments that need a different file format must translate it within the startup procedure. See the startpvm.sh file. PVM is an example of a parallel environment that needs a different file format. When the Grid Engine software starts the parallel environment startup procedure, the startup procedure launches the parallel environment. The startup procedure should exit with a zero exit status. If the exit status of the startup procedure is not zero, Grid Engine software reports an error and does not start the parallel job.
Termination of the Parallel EnvironmentWhen a parallel job finishes or is aborted, for example, by qdel, a procedure to halt the parallel environment is called. The definition and semantics of this procedure are similar to the procedures described for the startup program. The stop procedure can also be defined in a parallel environment configuration. See, for example, Configuring Parallel Environments With QMON. The purpose of the stop procedure is to shut down the parallel environment and to reap all associated processes.
The distribution tree of the Grid Engine software also contains an example of a stop procedure for the PVM parallel environment. This example resides under $SGE_ROOT/pvm/stoppvm.sh. It takes the following two arguments:
Similar to the startup procedure, the stop procedure is expected to return a zero exit status on success and a nonzero exit status on failure.
Tight Integration of Parallel Environments and Grid Engine SoftwareHow to Configure Parallel Environments With QMON mentions that using sge_execd and sge_shepherd to create parallel tasks offers benefits over parallel environments that create their own parallel tasks. The UNIX operating system allows reliable resource control only for the creator of a process hierarchy. Features such as correct accounting, resource limits, and process control for parallel applications, can be enforced only by the creator of all parallel tasks. Most parallel environments do not implement these features. Therefore parallel environments do not provide a sufficient interface for the integration with a resource management system like the Grid Engine system. To overcome this problem, the Grid Engine system provides an advanced parallel environment interface for tight integration with parallel environments. This parallel environment interface transfers the responsibility for creating tasks from the parallel environment to the Grid Engine software. The distribution of the Grid Engine system contains two examples of such a tight integration, one for the PVM public domain version, and one for the MPICH MPI implementation from Argonne National Laboratories. The examples are contained in the directories $SGE_ROOT/pvm and $SGE_ROOT/mpi, respectively. The directories also contain README files that describe the usage and any current restrictions. Refer to those README files for more details. For the purpose of comparison, the $SGE_ROOT/mpi/sunhpc/loose-integration directory contains a loose integration sample with Sun HPC ClusterTools software, and the $SGE_ROOT/mpi directory contains a loosely integrated variant of the interfaces for comparison.
|
|
Sun Grid Engine Information Center Managing Checkpointing EnvironmentsAbout CheckpointingCheckpointing is a facility that does the following tasks:
If you move a checkpoint from one host to another host, checkpointing can migrate jobs or applications in a cluster without significant loss of resources. Hence, dynamic load balancing can be provided with the help of a checkpointing facility. The Grid Engine system supports two levels of checkpointing:
Kernel-level checkpointing can be applied to complete jobs, that is, the process hierarchy created by a job. By contrast, user-level checkpointing is usually restricted to single programs. Therefore the job in which such programs are embedded needs to properly handle cases where the entire job gets restarted. Kernel-level checkpointing, as well as checkpointing based on checkpointing libraries, can consume many resources. The complete virtual address space that is in use by the job or application at the time of the checkpoint must be dumped to disk. By contrast, user-level checkpointing based on restart files can restrict the data that is written to the checkpoint on the important information only. About Checkpointing EnvironmentsThe Grid Engine software provides a configurable attribute description for each checkpointing method used. Different attribute descriptions reflect the different checkpointing methods and the potential variety of derivatives from these methods on different operating system architectures. This attribute description is called a checkpointing environment. Default checkpointing environments are provided with the distribution of the Grid Engine system and can be modified according to the site's needs. New checkpointing methods can be integrated in principal. However, the integration of new methods can be a challenging task. This integration should be performed only by experienced personnel or by your Grid Engine system support team. How to Configure Checkpointing Environments With QMON
Configuring Checkpointing Environments From the Command LineTo configure the checkpointing environment from the command line, type the qconf command with the appropriate options. The following options are available:
|
|
Sun Grid Engine Information Center Configuring Complex Resource AttributesThis section describes how to configure resource attribute definitions. Resource attribute definitions are stored in an entity called the Grid Engine system complex. This section includes the following topics:
For information about load parameters and writing your own load sensors, see Load Parameters. About Complex Resource AttributesThe complex configuration privides all pertinent information about the resource attributes users can request for jobs with the qsub -l or qalter -l commands. The complex configuration also provides information about how the Grid Engine system should interpret these resource attributes. The complex also builds the framework for the system's consumable resources facility. The resource attributes that are defined in the complex can be attached to the global cluster, to a host, or to a queue instance. The attached attribute identifies a resource with the associated capability. During the scheduling process, the availability of resources and the job requirements are taken into account. The Grid Engine system also performs the bookkeeping and the capacity planning that is required to prevent over-subscription of consumable resources. Typical consumable resource attributes include:
Attribute definitions in the Grid Engine complex define how resource attributes should be interpreted. The definition of a resource attribute includes the following:
Although you can define complex resource attributes from the command line, it is easier to use the QMON Complex Configuration dialog box. See:
Configuring Complex Resource Attributes With QMONHow to Configure Complex Resource Attributes
Assigning Resource Attributes to Queues, Hosts, and the Global ClusterResource attributes can be used in the following ways:
A set of default resource attributes is already attached to each queue and host. Default resource attributes are built in to the system and cannot be deleted, nor can their type be changed. User-defined resource attributes must first be defined in the complex before you can assign them to a queue instance, a host, or the global cluster. When you assign a resource attribute to one of these targets, you specify a value for the attribute. The following sections describe each attribute type in detail. Queue Resource AttributesDefault queue resource attributes are a set of parameters that are defined in the queue configuration. These parameters are described in the queue_conf(5) man page. You can add new resource attributes to the default attributes. New attributes are attached only to the queue instances that you modify. When the configuration of a particular queue instance references a resource attribute that is defined in the complex, that queue configuration provides the values for the attribute definition. For details about queue configuration see About Configuring Queues. For example, the queue configuration value h_vmem is used for the virtual memory size limit. This value limits the amount of total memory that each job can consume. An entry in the complex_values list of the queue configuration defines the total available amount of virtual memory on a host or assigned to a queue. For detailed information about consumable resources, see Consumable Resources. Host Resource AttributesHost resource attributes are parameters that are intended to be managed on a host basis. The default host-related attributes are load values. You can add new resource attributes to the default attributes, as described in Queue Resource Attributes. Every sge_execd periodically reports load to sge_qmaster. The reported load values are either the standard load values such as the CPU load average, or the load values defined by the administrator, as described in Load Parameters. The definitions of the standard load values are part of the default host resource attributes, whereas administrator-defined load values require extending the host resource attributes. Host-related attributes are commonly extended to include nonstandard load parameters. Host-related attributes are also extended to manage host-related resources such as the number of software licenses that are assigned to a host, or the available disk space on a host's local file system. If host-related attributes are associated with a host or with a queue instance on that host, a concrete value for a particular host resource attribute is determined by one of the following items:
In some cases, none of these values are available. For example, say the value is supposed to be a load parameter, but sge_execd does not report a load value for the parameter. In such cases, the attribute is not defined, and the qstat -F command shows that the attribute is not applicable. For example, the total free virtual memory attribute h_vmem is defined in the queue configuration as limit and is also reported as a standard load parameter. The total available amount of virtual memory on a host can be defined in the complex_values list of that host. The total available amount of virtual memory attached to a queue instance on that host can be defined in the complex_values list of that queue instance. Together with defining h_vmem as a consumable resource, you can efficiently exploit memory of a machine without risking memory over-subscription, which often results in reduced system performance that is caused by swapping. For more information about consumable resources, see Consumable Resources.
Global Resource AttributesGlobal resource attributes are cluster-wide resource attributes, such as available network bandwidth of a file server or the free disk space on a network-wide available file system. Global resource attributes can also be associated with load reports if the corresponding load report contains the GLOBAL identifier, as described in Load Parameters. Global load values can be reported from any host in the cluster. No global load values are reported by default, therefore there are no default global resource attributes. Concrete values for global resource attributes are determined by the following items:
Sometimes none of these cases apply. For example, a load value might not yet be reported. In such cases, the attribute does not exist. Adding Resource Attributes to the ComplexBy adding resource attributes to the complex, the administrator can extend the set of attributes managed by the Grid Engine system. The administrator can also restrict the influence of user-defined attributes to particular queues, hosts, or both. User-defined attributes are a named collection of attributes with the corresponding definitions as to how the Grid Engine software is to handle these attributes. You can attach one or more user-defined attributes to a queue, to a host, or globally to all hosts in the cluster. Use the complex_values parameter for the queue configuration and the host configuration. For more information, see About Configuring Queues and Configuring Hosts With QMON. The attributes defined become available to the queue and to the host, respectively, in addition to the default resource attributes. The complex_values parameter in the queue configuration and the host configuration must set concrete values for user-defined attributes that are associated with queues and hosts. For example, say the user-defined resource attributes permas and pamcrash, shown in the following figure, are defined. For at least one or more queues, add the resource attributes to the list of associated user-defined attributes as shown in the Complex tab of the Modify queue-name dialog box. For details on how to configure queues, see About Configuring Queues and its related sections. The displayed queue is configured to manage up to 10 licenses of the software package permas as shown in the following figure. The attribute permas becomes requestable for jobs, as expressed in the Available Resources list in the Requested Resources dialog box shown below. Consequently, the only eligible queues for these jobs are the queues that are associated with the user-defined resource attributes and that have permas licenses configured and available. For details about how to submit jobs, see Submitting Jobs. Alternatively, the user could submit jobs from the command line and could request attributes as follows: % qsub -l pm=1 permas.sh
Consumable ResourcesConsumable resources provide an efficient way to manage limited resources such as available memory, free space on a file system, network bandwidth, or floating software licenses. Consumable resources are also called consumables. The total available capacity of a consumable is defined by the administrator. The consumption of the corresponding resource is monitored by Grid Engine software internal bookkeeping. The Grid Engine software accounts for the consumption of this resource for all running jobs. Jobs are dispatched only if the internal bookkeeping indicates that sufficient consumable resources are available. Consumables can be combined with default load parameters or user-defined load parameters. Load values can be reported for consumable attributes. Conversely, the Consumable flag can be set for load attributes. Load measures the availability of the resource. Consumable resource management takes both the load and the internal bookkeeping into account, ensuring that neither exceeds a given limit. For more information about load parameters, see Load Parameters. To enable consumable resource management, you must define the total capacity of a resource. You can define resource capacity globally for the cluster, for specified hosts, and for specified queues. These categories can supersede each other in the given order. Thus a host can restrict availability of a global resource, and a queue can restrict host resources and global resources. You define resource capacities by using the complex_values attribute in the queue and host configurations. The complex_values definition of the global host specifies global cluster consumable settings. For more information, see the host_conf(5) and queue_conf(5) man pages, as well as About Configuring Queues and Configuring Hosts With QMON. To each consumable attribute in a complex_values list, a value is assigned that denotes the maximum available amount for that resource. The internal bookkeeping subtracts from this total the assumed resource consumption by all running jobs as expressed through the jobs' resource requests. A parallel job consumes as many consumable resources as it consumes job slots. For example, the following command consumes a total of 800 Mbytes of memory: qsub -l mem=100M -pe make=8 Memory usage is split across the queues and hosts on which the job runs. If four tasks run on host A and four tasks run on host B, the job consumes 400 Mbytes on each host. Setting Up Consumable ResourcesOnly numeric attributes can be configured as consumables. Numeric attributes are attributes whose type is INT, DOUBLE, MEMORY, or TIME. In the QMON Main Control window, click the Complex Configuration button. The Complex Configuration dialog box appears. To enable the consumable management for an attribute, set the Consumable flag for the attribute in the complex configuration. For example, the following figure shows that the Consumable flag is set for the virtual_free memory resource. Figure – Complex Configuration Dialog Box: virtual_free
To set up other consumable resources, follow these examples:
For each queue or host for which you want the Grid Engine software to do the required capacity planning, you must define the capacity in a complex_values list. An example is shown in the following figure, where 1 Gbyte of virtual memory is defined as the capacity value of the current host. Figure – Add/Modify Exec Host: virtual_free
The virtual memory requirements of all jobs running concurrently in any queue on that host are accumulated. The requirements are then subtracted from the capacity of 1 Gbyte to determine available virtual memory. If a job request for virtual_free exceeds the available amount, the job is not dispatched to a queue on that host.
For consumable attributes that are not explicitly requested by the job, the administrator can predefine a default value for resource consumption. Doing so is meaningful only if requesting the attribute is not forced, as explained in the previous note. The default value is set as 200 Mbytes. Examples of Setting Up Consumable ResourcesUse the following examples to guide you in setting up consumable resources for your site. Example 1 – Floating Software License ManagementSuppose you are using the software package pam-crash in your cluster, and you have access to 10 floating licenses. You can use pam-crash on every system as long as no more than 10 invocations of the software are active. The goal is to configure the Grid Engine system to prevent scheduling pam-crash jobs while all 10 licenses are occupied by other running pam-crash jobs. With consumable resources, you can achieve this goal easily. First you must add the number of available pam-crash licenses as a global consumable resource to the complex configuration, as shown in the following figure. In the figure above:
Consumables receive their value from the global, host, or queue configurations through the complex_values lists. See the host_conf(5) and queue_conf(5) man pages, as well as About Configuring Queues and Configuring Hosts With QMON. To activate resource planning for this attribute and for the cluster, the number of available pam-crash licenses must be defined in the global host configuration, as shown in the following figure.
In this figure, the value for the attribute pam-crash is set to 10, corresponding to 10 floating licenses.
Assume that a user submits the following job: % qsub -l pc=1 pam-crash.sh The job starts only if fewer than 10 pam-crash licenses are currently occupied. The job can run anywhere in the cluster, however, and the job occupies 1 pam-crash license throughout its run time. One of your hosts in the cluster might not be able to be included in the floating license. For example, you might not have pam-crash binaries for that host. In such a case, you can exclude the host from the pam-crash license management. You can exclude the host by setting to zero the capacity that is related to that host for the consumable attribute pam-crash. To exclude the host, use the Execution Host tab of the Host Configuration dialog box as shown in the following figure. Similarly, you might want to prevent a certain queue from running pam-crash jobs. For example, the queue might be an express queue with memory and CPU-time limits not suitable for pam-crash. In this case, set the corresponding capacity to zero in the queue configuration, as shown in the following figure.
Example 2 – Space Sharing for Virtual MemoryAdministrators must often tune a system to avoid performance degradation caused by memory over-subscription, and consequently swapping of a machine. The Grid Engine software can support you in this task through the Consumable Resources facility. The standard load parameter virtual_free reports the available free virtual memory, that is, the combination of available swap space and the available physical memory. To avoid swapping, the use of swap space must be minimized. In an ideal case, all the memory required by all processes running on a host should fit into physical memory. The Grid Engine software can guarantee the availability of required memory for all jobs started through the Grid Engine system, given the following assumptions and configurations:
An example of a possible virtual_free resource definition is shown in the Complex Configuration Dialog Box: virtual_free. A corresponding execution host configuration for a host with one Gbyte of main memory is shown in Add-Modify Exec Host: virtual_free. In the virtual_free resource definition example, the Requestable flag is set to YES instead of to FORCED, as in the example of a global configuration. This means that users need not indicate the memory requirements of their jobs. The value in the Default field is used if an explicit memory request is missing. The value of 1 Gbyte as default request in this case means that a job without a request is assumed to occupy all available physical memory. If you run different job classes with different memory requirements on one machine, you might want to partition the memory that these job classes use. This functionality is called space sharing. You can accomplish this functionality by configuring a queue for each job class. Then you assign to each queue a portion of the total memory on that host. In the example, the queue configuration attaches half of the total memory that is available to host carc to the queue fast.q for the host carc. Hence the accumulated memory consumption of all jobs that are running in queue fast.q on host carc cannot exceed 500 Mbytes. Jobs in other queues are not taken into account. Nonetheless, the total memory consumption of all running jobs on host carc cannot exceed 1 Gbyte.
Users might submit jobs to a system configured similarly to the example in either of the following forms: % qsub -l vf=100M honest.sh % qsub dont_care.sh The job submitted by the first command can be started as soon as at least 100 Mbytes of memory are available. This amount is taken into account in the capacity planning for the virtual_free consumable resource. The second job runs only if no other job is on the system, as the second job implicitly requests all the available memory. In addition, the second job cannot run in the queue fast.q because the job exceeds the queue's memory capacity. Example 3 – Managing Available Disk SpaceSome applications need to manipulate huge data sets stored in files. Such applications therefore depend on the availability of sufficient disk space throughout their run time. This requirement is similar to the space-sharing of available memory, as discussed in the preceding example. The main difference is that the Grid Engine system does not provide free disk space as one of its standard load parameters. Free disk space is not a standard load parameter because disks are usually partitioned into file systems in a site-specific way. Site-specific partitioning does not allow identifying the file system of interest automatically. Nevertheless, available disk space can be managed efficiently by the system through the consumables resources facility. You should use the host resource attribute h_fsize for this purpose. First, the attribute must be configured as a consumable resource, as shown in the following figure. In the case of local host file systems, a reasonable capacity definition for the disk space consumable can be put in the host configuration, as shown in the following figure. Submission of jobs to a Grid Engine system that is configured as described here works similarly to the previous examples: % qsub -l hf=5G big-sort.sh The h_fsize attribute is recommended because h_fsize also is used as the hard file size limit in the queue configuration. The file size limit restricts the ability of jobs to create files that are larger than what is specified during job submission. The qsub command in this example specifies a file size limit of 5 Gbytes. If the job does not request the attribute, the corresponding value from the queue configuration or host configuration is used. If the Requestable flag for h_fsize is set to FORCED in the example, a request must be included in the qsub command. If the Requestable flag is not set, a request is optional in the qsub command. By using the queue limit as the consumable resource, you control requests that the user specifies instead of the real resource consumption by the job scripts. Any violation of the limit is sanctioned, which eventually aborts the job. The queue limit ensures that the resource requests on which the Grid Engine system internal capacity planning is based are reliable. See the queue_conf(5) and the setrlimit(2) man pages for details. You might want applications that are not submitted to the Grid Engine system to occupy disk space concurrently. If so, the internal bookkeeping might not be sufficient to prevent application failure due to lack of disk space. To avoid this problem, you can periodically receive statistics about disk space usage. These statistics indicate the total disk space consumption, including any space that is consumed outside of the Grid Engine system. The load sensor interface enables you to enhance the set of standard load parameters with site-specific information, such as the available disk space on a file system. See Adding Site-Specific Load Parameters for more information. By adding an appropriate load sensor and reporting free disk space for h_fsize, you can combine consumable resource management and resource availability statistics. The Grid Engine system compares job requirements for disk space with the available capacity and with the most recent reported load value. Available capacity is derived from the internal resource planning. Jobs get dispatched to a host only if both criteria are met. Configuring Complex Resource Attributes From the Command LineTo configure the complex from the command line, type the following command with appropriate options: % qconf <options> See the qconf(1) man page for a detailed definition of the qconf command format and the valid syntax. The following options enable you to modify the Grid Engine system complex:
The following command prints the current complex configuration to the standard output stream in the file format defined in the complex(5) man page: % qconf -sc A sample output is shown in the following example. Example – qconf -sc Sample Output
#name shortcut type relop requestable consumable default urgency
#---------------------------------------------------------------------------
nastran na INT <= YES NO 0 0
pam-crash pc INT <= YES YES 1 0
permas pm INT <= FORCED YES 1 0
#---- # start a comment but comments are not saved across edits -----------
|
|
Sun Grid Engine Information Center Load ParametersDefault Load ParametersBy default, sge_execd periodically reports several load parameters and their corresponding values to sge_qmaster. These values are stored in the sge_qmaster internal host object, which is described in About Hosts and Daemons. However, the values are used internally only if a complex resource attribute with a corresponding name is defined. Such complex resource attributes contain the definition as to how load values are to be interpreted. See Assigning Resource Attributes to Queues, Hosts, and the Global Cluster for more information. After the primary installation, a standard set of load parameters is reported. All attributes required for the standard load parameters are defined as host-related attributes. Subsequent releases of Grid Engine software might provide extended sets of default load parameters, therefore the set of load parameters that is reported by default is documented in the file $SGE_ROOT/doc/load_parameters.asc. How load attributes are defined determines their accessibility. By defining load parameters as global resource attributes, you make them available for the entire cluster and for all hosts. By defining load parameters as host-related attributes, you provide the attributes for all hosts but not for the global cluster.
Adding Site-Specific Load ParametersThe set of default load parameters might not be adequate to completely describe the load situation in a cluster. This possibility is especially likely with respect to site-specific policies, applications, and configurations. The Grid Engine software provides the means to extend the set of load parameters. For this purpose, sge_execd offers an interface to feed load parameters and the current load values into sge_execd. Afterwards, these parameters are treated like the default load parameters. As for the default load parameters, corresponding attributes must be defined in the complex for the site-specific load parameters to become effective. See Default Load Parameters for more information. Writing Your Own Load SensorsTo feed sge_execd with additional load information, you must supply a load sensor. The load sensor can be a script or a binary executable. In either case, the load sensor's handling of the standard input and standard output streams and its control flow must comply with the following rules:
The load sensor then performs whatever operation is necessary to compute the desired load figures. At the end of the cycle, the load sensor writes the result to STDOUT.
Load Sensor Rules FormatThe format for the load sensor rules is as follows:
Example of a Load Sensor ScriptThe following example shows a load sensor. The load sensor is a Bourne shell script. Example – Load Sensor Bourne Shell Script#!/bin/sh myhost=`uname -n` while [ 1 ]; do # wait for input read input result=$? if [ $result != 0 ]; then exit 1 fi if [ $input = quit ]; then exit 0 fi #send users logged in logins=`who | cut -f1 -d" " | sort | uniq | wc -l | sed "s/^ *//"` echo begin echo "$myhost:logins:$logins" echo end done # we never get here exit 0 Save this script to the file load.sh. Assign executable permission to the file with the chmod command. To test the script interactively from the command line, type load.sh and repeatedly press the Return key. As soon as the procedure works, you can install it for any execution host. To install the procedure, configure the load sensor path as the load_sensor parameter for the cluster configuration, global configuration, or the host-specific configuration. See Basic Cluster Configuration or the sge_conf(5) man page for more information. The corresponding QMON window might look like the following figure: The reported load parameter logins is usable as soon as a corresponding attribute is added to the complex. The required definition might look like the last table entry shown in the following figure. |
|
Sun Grid Engine Information Center Managing Grid Engine SMF ServicesSee service names and changed behavior with SMF here. Observing SMF ServicesYou can use svcs command to query services present on your system. % svcs STATE STIME FMRI legacy_run 16:03:54 lrc:/etc/rcS_d/S29wrsmcfg legacy_run 16:04:11 lrc:/etc/rc2_d/S47pppd online 16:03:44 svc:/network/loopback:default online 16:03:47 svc:/system/filesystem/root:default online 16:03:47 svc:/system/scheduler:default online 16:03:47 svc:/system/boot-archive:default online 16:03:48 svc:/system/filesystem/usr:default online 16:03:49 svc:/network/physical:default online 16:03:49 svc:/milestone/network:default ... To query Grid Engine services, you can use mask "*sge*":
% svcs "*sge*"
online 16:03:47 svc:/application/sge/qmaster:prod_cluster
online 16:03:47 svc:/application/sge/qmaster:test_cluster
online 16:03:47 svc:/application/sge/execd:prod_cluster
online 16:03:47 svc:/application/sge/execd:test_cluster
To get a more information about single service use svcs -l <FMRI>: % svcs -l qmaster:prod_cluster fmri svc:/application/sge/qmaster:prod_cluster name Sun Grid Engine - QMaster service enabled true state online next_state none state_time Sun May 19 21:28:39 2008 logfile /var/svc/log/application-sge-qmaster:prod_cluster.log restarter svc:/system/svc/restarter:default contract_id 4912 dependency require_all/none svc:/milestone/network (online) dependency optional_all/none svc:/system/filesystem/autofs (online) You can see that each SMF service has an additional log file. This log file contains information related to the SMF framework and can contain many useful information when service fails. Controlling SMF servicesYou may use svcadm command to enable (start), disable (stop) or restart any SMF service. You must have appropriate permissions (solaris.smf.manage.*) to do so (typically root). Starting qmaster service (will be started on reboot): % svcadm enable qmaster:prod_cluster Stopping qmaster service just for now (will be started on reboot): % svcadm disable -t qmaster:prod_cluster Stopping qmaster service (will NOT be started on reboot): % svcadm disable qmaster:prod_cluster Start qmaster service until next reboot: % svcadm enable -t qmaster:prod_cluster |
|
Sun Grid Engine Information Center Generating Accounting Statistics (qacct)You can use the qacct command to generate alphanumeric accounting statistics. If you specify no options, qacct displays the aggregate usage on all machines of the cluster, as generated by all jobs that have finished and that are contained in the cluster accounting file $SGE_ROOT/$SGE_CELL/common/accounting. In this case, qacct reports three times, in seconds:
Several options are available for reporting accounting information about queues, users, and the like. In particular, you can use the qacct -l command to request information about all jobs that have finished and that match a resource requirement specification. Use the command qacct -j [job-id | job-name] to get direct access to the complete resource usage information stored by the Grid Engine system. This information includes the information that is provided by the getrusage system call. The -j option reports the resource usage entry for the jobs with job-id or with job-name. If no argument is given, all jobs contained in the referenced accounting file are displayed. If a job ID is specified, and if more than one entry is displayed, one of the following is true:
See the qacct(1) man page for more information. |
|
Sun Grid Engine Information Center Backing Up and Restoring Grid Engine ConfigurationBacking Up the Grid Engine System ConfigurationYou can back up your Grid Engine system configuration files automatically. The automatic backup process uses a configuration file called backup_template.conf. The backup configuration file is located by default in $SGE_ROOT/util/install_modules/backup_template.conf. The backup configuration file must define the following elements:
The backup template file looks like the following example: ################################################## # Autobackup Configuration File Template ################################################## # Please, enter your $SGE_ROOT here (mandatory) $SGE_ROOT="" # Please, enter your $SGE_CELL here (mandatory) $SGE_CELL="" # Please, enter your Backup Directory here # After backup you will find your backup files here (mandatory) # The autobackup will add a time /date combination to this dirname # to prevent an overwriting! BACKUP_DIR="" # Please, enter true to get a tar/gz package # and false to copy the files only (mandatory) TAR="true" # Please, enter the backup file name here. (mandatory) BACKUP_FILE="backup.tar" To start the automatic backup process, type the following command on the sge_qmaster host: inst_sge -bup -auto <backup-conf> backup-conf is the full path to the backup configuration file.
Your backup is created in the directory specified by BACKUP_FILE. A backup log file called install.pid is also created in this directory. pid is the process ID number. How to Perform a Manual Backup
How to Restore from a Backup
|
|
Sun Grid Engine Information Center Improving Grid Engine PerformanceFor information about troubleshooting, see Troubleshooting and Error Messages. Fine-Tuning Your Grid EnvironmentThe Grid Engine system is a full-function, general-purpose distributed resource management tool. The scheduler component of the system supports a wide range of different compute farm scenarios. To get the maximum performance from your compute environment, you should review the features that are enabled. You should then determine which features you really need to solve your load management problem. Disabling some of these features can improve performance on the throughput of your cluster. Scheduler MonitoringScheduler monitoring can help you to determine why certain jobs are not dispatched. However, providing this information for all jobs at all times can consume resources. You usually do not need this much information. To disable scheduler monitoring, set schedd_job_info to false in the scheduler configuration. See Changing the Scheduler Configuration With QMON, and the sched_conf(5) man page. Finished JobsIn the case of array jobs, the finished job list in qmaster can become quite large. By switching the finished job list off, you save memory and speed up the qstat process, because qstat also fetches the finished jobs list. To turn off the finished job list function, set finished_jobs to zero in the cluster configuration. See Changing the Scheduler Configuration With QMON, and the sge_conf(5) man page. Job ValidationForced validation at job submission time can be a valuable procedure to prevent non-dispatchable jobs from forever remaining in a pending state. However, job validation can also be a time-consuming task. Job validation can be especially time-consuming in heterogeneous environments with different execution nodes and consumable resources, and in which all users have their own job profiles. In homogeneous environments with only a few different jobs, a general job validation usually can be omitted. To disable job verification, add the qsub option -w n in the cluster-wide default requests. For more information, see How to Submit Advanced Jobs With QMON and the sge_request(5) man page. Load Thresholds and Suspend ThresholdsLoad thresholds are needed if you deliberately oversubscribe your machines and you need to prevent excessive system load. Suspend thresholds are also used to prevent overloading the system. Another case where you want to prevent the overloading of a node is when the execution node is still open for interactive load. Interactive load is not under the control of the Grid Engine system. A compute farm might be more single-purpose. For example, each CPU at a compute node might be represented by only one queue slot, and no interactive load might be expected at these nodes. In such cases, you can omit load_thresholds. To disable both thresholds, set load_thresholds to none and suspend_thresholds to none. See Configuring Load and Suspend Thresholds, and the queue_conf(5) man page. Load AdjustmentsLoad adjustments are used to increase the measured load after a job is dispatched. This mechanism prevents over-subscription of machines that is caused by the delay between job dispatching and the corresponding load impact. You can switch off load adjustments if you do not need them. Load adjustments impose on the scheduler some additional work in connection with sorting hosts and load thresholds verification. To disable load adjustments, set job_load_adjustments to none and load_adjustment_decay_time to zero in the scheduler configuration. See Changing the Scheduler Configuration With QMON, and the sched_conf(5) man page. Immediate SchedulingThe default for the Grid Engine system is to start scheduling runs in a fixed schedule interval. A good feature of fixed intervals is that they limit the CPU time consumption of the qmaster and the scheduler. A bad feature is that fixed intervals choke the scheduler, artificially resulting in a limited throughput. Many compute farms have machines specifically dedicated to qmaster and the scheduler, and such setups provide no reason to choke the scheduler. See schedule_interval in sched_conf(5). You can configure immediate scheduling by using the flush_submit_sec and flush_finish_sec parameters of the scheduler configuration. See Changing the Scheduler Configuration With QMON, and the sched_conf(5) man page. If immediate scheduling is activated, the throughput of a compute farm is limited only by the power of the machine that is hosting sge_qmaster and the scheduler. Urgency Policy and Resource ReservationThe urgency policy enables you to customize job priority schemes that are resource-dependent. Such job priority schemes include the following:
The implementing of both objectives is especially valuable if you are using resource reservation. Using DTrace for Performance TuningTroubleshooting in a distributed system that spans potentially thousands of active components can challenge even the most experienced system administrator. In practice, Grid Engine administrators have no explicit mechanism for identifying and reproducing issues that lead to degraded performance in their production environments. In the Solaris TM 10 environment, you can use the DTrace utility to monitor the on-site performance of the Grid Engine master component. DTrace is a comprehensive framework for tracing dynamic events in Solaris 10 environments. For general information about DTrace, see http://www.sun.com/bigadmin/content/dtrace/ and the dtrace man page. For detailed information about using DTrace with Grid Engine software, view the $SGE_ROOT/dtrace/README_dtrace.txt file. Tuning Performance From the Command Line Through DTraceIf you can use Solaris 10 DTrace, you can use the $SGE_ROOT/dtrace/monitor.sh script to monitor a Grid Engine master and look for any bottlenecks. The monitor.sh script supports the following options:
Analyzing Bottlenecks on the Grid Engine MasterTo provide effective performance tuning, you must understand the bottlenecks of distributed systems. The $SGE_ROOT/dtrace/monitor.sh script measures throughput-relevant data of the running Grid Engine master and compiles this data into a few indices that are printed in a single-line view per interval. This view shows four main categories of information:
For more information, see the example below. Sample DTrace Output for Bottleneck AnalysisThe following monitoring output sample illustrates a case where a Grid Engine master bottleneck can be detected. The example shows the following information:
In this example, performance degraded between 17:40:32 and 17:41:05. CPU ID FUNCTION:NAME 0 1 :BEGIN Time | #wrt wrt/ms |#rep #gdi #ack| #dsp dsp/ms #sad| #snd #rcv| #in++ #in-- #out++ #out--| #lck0 #ulck0 #lck1 #ulck1 0 36909 :tick-3sec 2006 Nov 24 17:39:23 | 43 3| 0 8 4| 3 691 121| 4 4| 11 11 15 15| 68 68 289 288 0 36909 :tick-3sec 2006 Nov 24 17:39:26 | 83 16| 0 10 3| 3 699 122| 3 3| 14 13 17 17| 90 90 681 681 0 36909 :tick-3sec 2006 Nov 24 17:39:29 | 117 24| 0 9 4| 4 1092 198| 4 4| 13 13 17 17| 71 71 591 591 0 36909 :tick-3sec 2006 Nov 24 17:39:32 | 19 4| 0 9 3| 3 591 147| 3 3| 12 12 15 15| 44 43 249 249 0 36909 :tick-3sec 2006 Nov 24 17:39:35 | 144 28| 0 9 4| 4 1012 173| 4 4| 13 13 17 17| 61 62 1246 1247 0 36909 :tick-3sec 2006 Nov 24 17:39:38 | 46 5| 0 8 3| 3 705 122| 3 3| 11 11 14 14| 67 67 293 293 0 36909 :tick-3sec 2006 Nov 24 17:39:41 | 154 31| 0 9 3| 4 894 198| 3 3| 13 13 16 16| 73 72 968 969 0 36909 :tick-3sec 2006 Nov 24 17:39:44 | 46 5| 0 10 4| 4 971 162| 4 4| 13 13 17 17| 71 72 304 304 0 36909 :tick-3sec 2006 Nov 24 17:39:47 | 154 29| 0 8 3| 3 739 158| 3 3| 11 11 14 14| 67 67 990 990 0 36909 :tick-3sec 2006 Nov 24 17:39:50 | 46 5| 0 10 4| 4 815 162| 4 4| 14 14 18 18| 76 76 692 693 0 36909 :tick-3sec 2006 Nov 24 17:39:53 | 74 15| 0 8 3| 3 746 136| 3 3| 12 12 15 15| 54 53 571 571 0 36909 :tick-3sec 2006 Nov 24 17:39:56 | 116 20| 0 11 4| 4 992 184| 4 4| 14 14 18 18| 80 81 669 669 0 36909 :tick-3sec 2006 Nov 24 17:39:59 | 87 18| 0 11 4| 4 851 176| 5 4| 15 15 21 21| 77 76 670 670 0 36909 :tick-3sec 2006 Nov 24 17:40:02 | 109 20| 0 12 5| 4 930 184| 4 5| 17 17 20 20| 77 78 624 624 0 36909 :tick-3sec 2006 Nov 24 17:40:05 | 88 15| 0 9 3| 4 995 176| 3 3| 12 12 15 15| 71 71 1026 1026 0 36909 :tick-3sec 2006 Nov 24 17:40:08 | 112 20| 0 12 4| 4 927 184| 5 4| 16 16 22 22| 81 81 652 652 0 36909 :tick-3sec 2006 Nov 24 17:40:11 | 32 6| 0 7 4| 3 618 121| 3 4| 11 11 13 13| 54 53 336 336 0 36909 :tick-3sec 2006 Nov 24 17:40:14 | 145 30| 0 11 4| 4 988 199| 4 4| 15 15 19 19| 64 65 827 827 0 36909 :tick-3sec 2006 Nov 24 17:40:17 | 43 3| 0 7 3| 3 618 121| 3 3| 10 10 13 13| 64 64 286 286 0 36909 :tick-3sec 2006 Nov 24 17:40:20 | 157 31| 0 11 4| 4 977 199| 4 4| 15 15 19 19| 80 80 1406 1408 0 36909 :tick-3sec 2006 Nov 24 17:40:23 | 43 4| 0 7 3| 3 701 121| 3 3| 10 10 13 13| 64 64 285 285 0 36909 :tick-3sec 2006 Nov 24 17:40:26 | 73 18| 0 11 4| 4 948 171| 4 4| 15 15 19 19| 77 77 700 700 0 36909 :tick-3sec 2006 Nov 24 17:40:29 | 127 31| 0 10 4| 4 968 189| 4 4| 14 14 18 18| 74 74 584 584 0 36909 :tick-3sec 2006 Nov 24 17:40:32 | 10 3| 0 6 0| 1 203 41| 0 0| 58 8 62 62| 23 22 106 106 0 36909 :tick-3sec 2006 Nov 24 17:40:35 | 19 5| 0 5 0| 0 0 0| 0 0| 8 5 13 13| 30 30 200 200 0 36909 :tick-3sec 2006 Nov 24 17:40:38 | 16 5| 0 5 1| 0 0 0| 0 0| 5 6 10 10| 27 26 558 559 0 36909 :tick-3sec 2006 Nov 24 17:40:41 | 1 0| 0 4 0| 0 0 0| 0 0| 7 4 11 11| 9 9 34 34 0 36909 :tick-3sec 2006 Nov 24 17:40:44 | 0 0| 0 4 0| 0 0 0| 0 0| 7 4 11 11| 8 8 28 28 0 36909 :tick-3sec 2006 Nov 24 17:40:47 | 0 0| 0 6 0| 1 744 81| 1 1| 10 6 15 15| 14 14 33 33 0 36909 :tick-3sec 2006 Nov 24 17:40:50 | 1 0| 0 5 1| 0 0 0| 0 0| 8 6 14 14| 11 11 49 49 0 36909 :tick-3sec 2006 Nov 24 17:40:53 | 0 0| 0 4 0| 0 0 0| 0 0| 9 4 12 12| 6 7 28 28 0 36909 :tick-3sec 2006 Nov 24 17:40:56 | 0 0| 0 5 0| 0 0 0| 0 0| 8 5 13 13| 12 12 420 420 0 36909 :tick-3sec 2006 Nov 24 17:40:59 | 0 0| 0 4 0| 0 0 0| 0 0| 8 4 12 12| 9 8 30 30 0 36909 :tick-3sec 2006 Nov 24 17:41:02 | 0 0| 0 4 1| 0 0 0| 0 0| 12 5 16 16| 7 8 25 25 0 36909 :tick-3sec 2006 Nov 24 17:41:05 | 165 41| 0 48 60| 0 0 0| 1 1| 23 106 71 71| 96 97 1236 1236 0 36909 :tick-3sec 2006 Nov 24 17:41:08 | 178 28| 0 15 53| 4 965 206| 4 4| 68 68 75 75| 130 130 1336 1336 0 36909 :tick-3sec 2006 Nov 24 17:41:11 | 106 23| 0 27 35| 4 855 166| 4 4| 82 82 91 91| 115 114 1040 1040 0 36909 :tick-3sec 2006 Nov 24 17:41:14 | 198 37| 0 41 70| 4 1189 196| 4 4| 185 185 185 185| 134 135 1327 1327 0 36909 :tick-3sec 2006 Nov 24 17:41:17 | 16 5| 0 9 5| 4 940 161| 3 3| 17 17 20 20| 43 42 234 234 0 36909 :tick-3sec 2006 Nov 24 17:41:20 | 162 35| 0 13 8| 4 958 200| 4 4| 23 23 28 28| 80 81 1018 1018 0 36909 :tick-3sec 2006 Nov 24 17:41:23 | 44 6| 0 6 3| 2 544 81| 3 3| 8 8 11 11| 63 63 747 747 0 36909 :tick-3sec 2006 Nov 24 17:41:26 | 150 34| 0 13 6| 4 921 199| 4 4| 21 21 25 25| 73 72 923 923 0 36909 :tick-3sec 2006 Nov 24 17:41:29 | 43 3| 0 5 2| 2 506 81| 2 2| 7 7 9 9| 57 57 260 260 0 36909 :tick-3sec 2006 Nov 24 17:41:32 | 157 37| 0 9 3| 4 978 199| 3 3| 13 13 16 16| 73 72 970 970 0 36909 :tick-3sec 2006 Nov 24 17:41:35 | 43 3| 0 7 3| 2 512 85| 3 3| 9 9 12 12| 61 62 274 274 0 36909 :tick-3sec 2006 Nov 24 17:41:38 | 127 29| 0 8 3| 4 994 185| 3 3| 11 11 14 14| 68 68 1265 1265 0 36909 :tick-3sec 2006 Nov 24 17:41:41 | 66 11| 0 10 4| 4 973 171| 4 4| 14 14 18 18| 67 67 354 354 0 36909 :tick-3sec 2006 Nov 24 17:41:44 | 48 10| 0 8 3| 3 785 128| 3 3| 11 11 14 14| 52 51 399 399 0 36909 :tick-3sec 2006 Nov 24 17:41:47 | 142 31| 0 12 4| 4 913 192| 5 4| 17 17 23 23| 89 90 830 830 0 36909 :tick-3sec 2006 Nov 24 17:41:50 | 64 13| 0 11 5| 4 853 168| 4 5| 15 15 18 18| 75 75 542 542 |
|
Sun Grid Engine Information Center Using Files and Scripts for Administration TasksYou can use the QMON graphical user interface to perform all administrative tasks in the Grid Engine system. You can also administer a Grid Engine system through commands that you type at a shell prompt and call from within shell scripts. Many experienced administrators find that using files and scripts is a more flexible, quicker, and more powerful way to change settings. This section describes how to use files and scripts to add or modify Grid Engine system objects such as queues, hosts, and environments. Using Files to Add or Modify ObjectsTo add objects according to specifications that you create in a file, use the qconf command with the following options:
To modify objects according to specifications you create in a file, use the qconf command with the following options:
Use these options in combination with the qconf -s command to take an existing object and modify it. You can then update the existing object or create a new object. Example – Modifying the Migration Command of a Checkpoint Environment
#!/bin/sh
# ckptmod.sh: modify the migration command
# of a checkpointing environment
# Usage: ckptmod.sh <checkpoint-env-name> <full-path-to-command>
TMPFILE=tmp/ckptmod.$$
CKPT=$1
MIGMETHOD=$2
qconf -sckpt $CKPT | grep -v '^migr_command' > $TMPFILE
echo "migr_command $MIGMETHOD" >> $TMPFILE
qconf -Mckpt $TMPFILE
rm $TMPFILE
Using Files to Modify Queues, Hosts, and EnvironmentsYou can modify individual queues, hosts, parallel environments, and checkpointing environments from the command line. Use the qconf command in combination with other commands. If you have already prepared a file, type the qconf command with appropriate options:
If you have not prepared a file, type the qconf command with appropriate options:
Both -M and -m mean modify, but the uppercase -M denotes modification from an existing file, whereas the lowercase -m does not. Instead, the lowercase -m opens a temporary file in an editor. When you save any changes that you make to this file and exit the editor, the system immediately reflects those changes. To change many objects at once, or you want to change object configuration non-interactively, use the qconf command with the options that modify object attributes. The following commands make modifications according to specifications in a file:
qconf -Aattr {queue | exechost | pe | ckpt} <filename>
qconf -Mattr {queue | exechost | pe | ckpt} <filename>
qconf -Rattr {queue | exechost | pe | ckpt} <filename>
qconf -Dattr {queue | exechost | pe | ckpt} <filename>
The following commands make modifications according to specifications on the command line:
qconf -aattr {queue | exechost | pe | ckpt} <attribute> <value> {<queue-list> | <host-list>}
qconf -mattr {queue | exechost | pe | ckpt} <attribute> <value> {<queue-list> | <host-list>}
qconf -rattr {queue | exechost | pe | ckpt} <attribute> <value> {<queue-list> | <host-list>}
qconf -dattr {queue | exechost | pe | ckpt} <attribute> <value> {<queue-list> | <host-list>}
In the above commands, filename, attribute, and value mean the following:
The following options modify object attributes:
The -aattr, -mattr, and -dattr options enable you to operate on individual values in a list of values. The -rattr option replaces the entire list of values with the new one that you specify, either on the command line or in the file. Example – Changing the Queue TypeThe following command changes the queue type of tcf27-e019.q to batch only: % qconf -rattr queue qtype batch tcf27-e019.q Example – Modifying the Queue Type and the Shell Start BehaviorThe following command uses the file new.cfg to modify the queue type and the shell start behavior of tcf27-e019.q: % cat new.cfg qtype batch interactive checkpointing shell_start_mode unix_behavior % qconf -Rattr queue new.cfg tcf27-e019.q Example – Adding Resource AttributesThe following command adds the resource attribute scratch1 with a value of 1000M and the resource attribute long with a value of 2:
% qconf -rattr exechost complex_values scratch1=1000M,long=2 tcf27-e019
Example – Attaching a Resource Attribute to a HostThe following command attaches the resource attribute short to the host with a value of 4:
% qconf -aattr exechost complex_values short=4 tcf27-e019
Example – Changing a Resource ValueThe following command changes the value of scratch1 to 500M, leaving other values unchanged: % qconf -mattr exechost complex_values scratch-=500M tcf27-e019 Example – Deleting a Resource AttributeThe following command deletes the resource attribute long:
% qconf -dattr exechost complex_values long tcf27-e019
Example – Adding a Queue to the List of Queues for a Checkpointing EnvironmentThe following command adds tcf27-b011.q to the list of queues for the checkpointing environment sph: % qconf -aattr ckpt queue_list tcf27-b011.q sph Example – Changing the Number of Slots in a Parallel EnvironmentThe following command changes the number of slots in the parallel environment make to 50: % qconf -mattr pe slots 50 make Targeting Queue Instances With the qselect CommandThe qselect command outputs a list of queue instances. If you specify options, qselect lists only the queue instances that match the criteria that you specify. You can use qselect in combination with the qconf command to target specific queue instances that you want to modify. Example – Listing QueuesThe following command lists all queue instances on Linux machines: % qselect -l arch=glinux The following command lists all queue instances on machines with two CPUs: % qselect -l num_proc=2 The following command lists all queue instances on all four-CPU 64-bit Solaris machines: % qselect -l arch=solaris64,num_proc=4 The following command lists queue instances that provide an application license. The queue instances were previously configured. % qselect -l app_lic=TRUE You can combine qselect with qconf to do wide-reaching changes with a single command line. To do this, put the entire qselect command inside backward quotation marks (` `) and use it in place of the queue-list variable on the qconf command line. Example – Using qselect in qconf CommandsThe following command sets the prolog script to sol_prolog.sh on all queue instances on Solaris machines: % qconf -mattr queue prolog /usr/local/scripts/sol_prolog.sh `qselect -l arch=solaris` The following command sets the attribute fluent_license to two on all queue instances on two-processor systems: % qconf -mattr queue complex_values fluent_license=2 `qselect -l num_proc=2` The most flexible way to automate the configuration of queue instances is to use the qconf command with the qselect command. With the combination of these commands, you can build up your own custom administration scripts. Using Files to Modify a Global Configuration or the SchedulerTo change a global configuration, use the qconf -mconf command. To change the scheduler, use the qconf -msconf command. Both of these commands open a temporary file in an editor. When you exit the editor, any changes that you save to this temporary file are processed by the system and take effect immediately. The editor used to open the temporary file is the editor specified by the EDITOR environment variable. If this variable is undefined, the vi editor is used by default. You can use the EDITOR environment variable to automate the behavior of the qconf command. Change the value of this variable to point to an editor program that modifies a file whose name is given by the first argument. After the editor modifies the temporary file and exits, the system reads in the modifications, which take effect immediately.
You can use this technique with any qconf -m... command. However, the technique is especially useful for administration of the scheduler and the global configuration, as you cannot automate the procedure in any other way. Example – Modifying the Schedule IntervalThe following example modifies the schedule interval of the scheduler: #!/bin/ksh # sched_int.sh: modify the schedule interval # usage: sched_int.sh <n>, where <n> is # the new interval, in seconds. n < 60 TMPFILE=/tmp/sched_int.$$ if [ $MOD_SGE_SCHED_INT ]; then grep -v schedule_interval $1 > $TMPFILE echo "schedule_interval 0:0:$MOD_SGE_SCHED_INT" >> $TMPFILE # sleep to ensure modification time changes sleep 1 mv $TMPFILE $1 else export EDITOR=$0 export MOD_SGE_SCHED_INT=$1 qconf -msconf fi This script modifies the EDITOR environment to point to itself. The script then calls the qconf -msconf command. This second nested invocation of the script modifies the temporary file specified by the first argument and then exits. The Grid Engine system automatically reads in the changes, and the first invocation of the script terminates. |









































































