This page is designed to enable you to easily export all of the Grid Engine documentation to a PDF or Word document. It includes all of the Grid Engine documentation in the same order it is presented on the wiki.
To print a version of this document, log in to wikis.sun.com, click Tools, then select Export to PDF or Export to Word.
Grid Engine Documentation
|
Introducing the Sun Grid Engine System
A grid is a collection of computing resources that perform tasks. In its simplest form, a grid appears to users as a large system that provides a single point of access to powerful distributed resources. In its more complex form, a grid can provide many access points to users.
The Sun Grid Engine software enables you to apply resource management strategies to distribute jobs across a grid. Users can submit millions of jobs at a time without being concerned about where the jobs run. The system supports clusters with up to 63,000 cores.
Sites configure the system to maximize usage and throughput, while the system supports varying levels of timeliness and importance. Job priority and user share are instances of importance.
The Sun Grid Engine software provides advanced resource management and policy administration for UNIX and Windows environments that are composed of multiple shared resources. For more on Sun Grid Engine's features, see the product site.
To familiarize yourself more with the product and the wiki, explore the resources below:
| Topic | Description |
|---|---|
| How the System Operates | Familiarize yourself with the Sun Grid Engine System components. |
| How Resources Are Matched to Requests | Learn how the Grid Engine system matches resources to requests, what jobs and queues are, and how usage policies assist in managing workload. |
| Choosing a User Interface | Learn about Sun Grid Engine's graphical user interface (QMON), the command-line interface, and the Distributed Resource Management Application API. |
| Users and User Categories | Learn about Sun Grid Engine's user categories: managers, operators, users, and owners. |
Using the Wiki
| Topic | Description |
|---|---|
| Using the Wiki | Learn how to monitor documentation changes, how to print the wiki, and and the Wiki FAQ. |
To print this section, see the Getting Started Guide (Printable).
How the System Operates
The Grid Engine system does the following:
- Accepts jobs. Jobs are users' requests for computer resources. Each job includes a description of what to do and a set of property definitions that that describe how the job should be run. Users can submit jobs via the command line interface or Grid Engine's graphical user interface, QMON. Users can also use the optional Distributed Resource Management Application API (DRMAA) to automate grid engine functions by writing scripts to submit and control jobs.
- Holds jobs. The Sun Grid Engine master daemon holds jobs until the needed compute resources become available.
- Sends jobs. When the compute resources become available, the master daemon sends the job to the appropriate execution host. The execution daemon on that host then executes the job.
- Manages running jobs. The master daemon manages running jobs. At a fixed interval, the master daemon receives reports from each execution daemon.
- Logs the record of job execution when the jobs are finished. The master daemon stores raw data. Users can also use the Accounting and Reporting Console (ARCo) to gather live reporting data from the Grid Engine system and to store the data for historical analysis in the reporting database, which is a standard SQL database.

| Component | Description | More Info |
|---|---|---|
| Cluster | A collection of machines, called hosts, on which Grid Engine system functions occur. | See Configuring Clusters. |
| Master Host | The master host is central to cluster activity. The master host runs the master daemon and usually also runs the scheduler. The master host requires no further configuration other than that performed by the installation procedure. By default, the master host is also an administration host and a submit host. | For information about how to initially set up the master host, see How to Install the Master Host. For information about how to configure dynamic changes to the master host, see Configuring Hosts. |
| Master Daemon | The master daemon does the following:
|
See Configuring Hosts. |
| Execution Host | Systems that have permission to run Grid Engine system jobs. These systems host queue instances, and run the execution daemon. Execution hosts are systems that have permission to execute jobs. Therefore, queue instances are attached to the execution hosts. | An execution host is initially set up by the installation procedure, as described in How to Install Execution Hosts. For installation planning guidance, see Host System Requirements. See Configuring Hosts for more information on managing your cluster. |
| Execution Daemon | The execution daemon receives jobs from the master daemon and executes them locally on its host. An execution daemon is responsible for the queue instances on its host and for the running of jobs in these queue instances. Periodically, the execution daemon forwards information such as job status or load on its host to the master daemon. | See Configuring Hosts. |
| Scheduler | The scheduler is responsible for prioritizing pending jobs and deciding which jobs to schedule to which resources. | For more information on the scheduler, see Managing the Scheduler. |
| Administration Host | Administration hosts are hosts that have permission to carry out any kind of administrative activity for the Grid Engine system. | See Configuring Hosts. |
| Submit Host | Submit hosts enable users to submit and control batch jobs only. In particular, a user who is logged in to a submit host can submit jobs with the qsub command, can monitor the job status with the qstat command, and can use the Grid Engine system OSF/1 Motif graphical user interface QMON, which is described in QMON, the Grid Engine System's Graphical User Interface. | See Configuring Hosts. |
| Shadow Master Host | Shadow master hosts reduce unplanned cluster downtown. One or more shadow master hosts may be running on additional nodes in a cluster. In the case that the master daemon or the host on which it is running fails, one of the shadow masters will promote the host on which it is running to the new master daemon system by locally starting a new master daemon. | See How to Configure Shadow Master Hosts. |
| DRMAA | The optional Distributed Resource Management Application API (DRMAA) automates Sun Grid Engine functions by writing scripts that run Sun Grid Engine commands and parse the results. | See Automating Grid Engine Functions Through DRMAA. |
| ARCo | The optional Accounting and Reporting Console (ARCo) enables you to gather live reporting data from the Grid Engine system and to store the data for historical analysis in the reporting database, which is a standard SQL database. | For more information, see Accounting and Reporting Console. |
| SDM | The optional Service Domain Manager (SDM) module distributes resources between different services according to configurable Service Level Agreements (SLAs). The SLAs are based on Service Level Objectives (SLOs). SDM functionality enables you to manage resources for all kind of scalable services. | See Service Domain Manager for more information. |
How Resources Are Matched to Requests
A Banking Analogy
As an analogy, imagine a large "money-center" bank in one of the world's capital cities. In the bank's lobby are dozens of customers waiting to be served. Each customer has different requirements. One customer wants to withdraw a small amount of money from his account. Arriving just after him is another customer, who has an appointment with one of the bank's investment specialists. She wants advice before she undertakes a complicated venture. Another customer in front of the first two customers wants to apply for a large loan, as do the eight customers in front of her.
Different customers with different needs require different types of service and different levels of service from the bank. Perhaps the bank on this particular day has many employees who can handle the one customer's simple withdrawal of money from his account. But at the same time the bank has only one or two loan officers available to help the many loan applicants. On another day, the situation might be reversed.
The effect is that customers must wait for service unnecessarily. Many of the customers could receive faster service if only their needs were immediately recognized and then matched to available resources.
If the Grid Engine system were the bank manager, the service would be organized differently:
- On entering the bank lobby, customers would be asked to declare their name, their affiliations, and their service needs.
- Each customer's time of arrival would be recorded.
- Based on the information that the customers provided in the lobby, the bank might serve the following customers in the following order:
- Customers whose needs match suitable and immediately available resources
- Customers whose requirements have the highest priority
- Customers who were waiting in the lobby for the longest time
- In a "Grid Engine system bank," one bank employee might be able to help several customers at the same time. The Grid Engine software would try to assign new customers to the least-loaded and most-suitable bank employee.
- As bank manager, the Grid Engine system would allow the bank to define service policies. Typical service policies might be the following:
- To provide preferential service to commercial customers because those customers generate more profit
- To make sure a certain customer group is served well, because those customers have received bad service in the past
- To ensure that customers with an appointment get a timely response
- To provide preferential treatment to certain customers because those customers were identified by a bank executive as high priority customers
- These policies would be implemented, monitored, and adjusted automatically by a Grid Engine system manager. Customers that have preferential access would be served sooner. Such customers would receive more attention from employees. The Grid Engine manager would recognize if the customers do not make progress. The manager would immediately respond by adjusting service levels to comply with the bank's service policies.
Jobs and Queues
In a Grid Engine system, jobs correspond to bank customers. Jobs wait in a computer holding area instead of a lobby. Queues, which provide services for jobs, correspond to bank employees. As in the case of bank customers, the requirements of each job, such as available memory, execution speed, available software licenses, and similar needs, can be very different. Only certain queues might be able to provide the corresponding service.
To continue the analogy, the Grid Engine software arbitrates available resources and job requirements in the following way:
- A user who submits a job through the Grid Engine software declares a requirement profile for the job. In addition, the software retrieves the identity of the user. The software also retrieves the user's affiliation with projects or user groups. The time that the user submitted the job is also stored.
- The moment that a queue is available to run a new job, the Grid Engine software determines what are the suitable jobs for the queue. The software immediately dispatches the job that has either the highest priority or the longest waiting time.
- Queues allow concurrent execution of many jobs. The Grid Engine software tries to start new jobs in the least loaded and most suitable queue.
Usage Policies
The administrator of a cluster can define high-level usage policies that are customized according to the site. Four usage policies are available:
- Urgency – Using this policy, each job's priority is based on an urgency value. The urgency value is derived from the job's resource requirements, the job's deadline specification, and how long the job waits before it is run.
- Functional – Using this policy, an administrator can provide special treatment because of a user's or a job's affiliation with a certain user group, project, and so forth.
- Share-based – Under this policy, the level of service depends on an assigned share entitlement, the corresponding shares of other users and user groups, the past usage of resources by all users, and the current presence of users within the system.
- Override – This policy requires manual intervention by the cluster administrator, who modifies the automated policy implementation.
Policy management automatically controls the use of shared resources in the cluster to best achieve the goals of the administration. High priority jobs are dispatched preferentially. Such jobs receive higher CPU entitlements if the jobs compete for resources with other jobs. The Grid Engine software monitors the progress of all jobs and adjusts their relative priorities correspondingly and with respect to the goals defined in the policies.
Using Tickets to Administer Policies
The functional, share-based, and override policies are defined through a Grid Engine concept that is called tickets. You might compare tickets to shares of a public company's stock. The more shares of stock that you own, the more important you are to the company. If shareholder A owns twice as many shares as shareholder B, A also has twice the votes of B. Therefore shareholder A is twice as important to the company. Similarly, the more tickets that a job has, the more important the job is. If job A has twice the tickets of job B, job A is entitled to twice the resource usage of job B.
Jobs can retrieve tickets from the functional, share-based, and override policies. The total number of tickets, as well as the number retrieved from each ticket policy, often changes over time.
The administrator controls the number of tickets that are allocated to each ticket policy in total. Just as ticket allocation does for jobs, this allocation determines the relative importance of the ticket policies among each other. Through the ticket pool that is assigned to particular ticket policies, the Grid Engine software can run in different ways. For example, the software can run in a share-based mode only. Or the software can run in a combination of modes, for example, 90% share-based and 10% functional.
Using the Urgency Policy to Assign Job Priority
The urgency policy can be used in combination with two other job priority specifications:
- The number of tickets assigned by the functional, share-based, and override policies
- A priority value specified by the qsub -p command
A job can be assigned an urgency value, which is derived from three sources:
- The job's resource requirements
- The length of time that a job must wait before the job runs
- The time at which a job must finish running
The administrator can separately weight the importance of each of these sources to arrive at a job's overall urgency value. For more information, see Managing Policies.
The following figure shows the correlation among policies in a Grid Engine system.

Choosing a User Interface
To meet the needs of your environment, the following interface tools are available:
QMON - The Graphical User Interface
If you prefer using a graphical user interface, you can use QMON to accomplish most Grid Engine system tasks. The QMON Main Control window, which is show below, is often the starting point for user and administrator functions.

For more information on QMON if you are an administrator, see Interacting With Sun Grid Engine as an Administrator.
For more information on QMON if you are an user, see Interacting With Sun Grid Engine as a User.
The Command Line Interface
If you prefer using the command line, the command line user interface includes a flexible a set of ancillary programs (commands) that enable you to interact with the Sun Grid Engine system.
For more information on the command line if you are an administrator, see Interacting With Sun Grid Engine as an Administrator.
For more information on the command line if you are an user, see Interacting With Sun Grid Engine as a User.
For information on the ancillary programs that Sun Grid Engine provides and which users have access to these commands, see Command Line Interface Ancillary Programs.
The Distributed Resource Management Application API (DRMAA)
You can automate Sun Grid Engine functions by writing scripts that run Sun Grid Engine commands and parse the results. However, for more consistent and efficient results, you can use the Distributed Resource Management Application API (DRMAA). For more information about the DRMAA concept and how to use it with the C and Java TM languages, see Automating Grid Engine Functions Through the Distributed Resource Management Application API.
Command Line Interface Ancillary Programs
For more information on available ancillary programs, see the Grid Engine Man Pages.
List of Ancillary Programs
The Grid Engine system provides the following set of ancillary programs:
| Program | Description |
|---|---|
| qacct | Extracts arbitrary accounting information from the cluster log file. For more information, see Generating Accounting Statistics. |
| qalter | Changes the attributes of submitted but pending jobs. |
| qconf | Provides the user interface for cluster configuration and queue configuration. For more information, see Using qconf. |
| qdel | Enables a user to delete one or more jobs. A manager or operator can delete jobs belonging to any user, while regular users can only delete their own jobs. For more information, see How to Control Jobs. |
| qhold | Holds back submitted jobs from execution. |
| qhost | Displays status information about execution hosts. |
| qlogin | Initiates a login session with automatic selection of a low-loaded, suitable host. |
| qmake | A replacement for the standard UNIX make facility. qmake extends make by its ability to distribute independent make steps across a cluster of suitable machines. For more information, see Parallel Makefile Processing With qmake. |
| qmod | Enables the owner to suspend or enable a queue. All currently active processes that are associated with this queue are also signaled. For more information, see How to Monitor and Control Queues and How to Control Jobs. |
| qmon | Provides an X Windows Motif command interface and monitoring facility. |
| qping | Checks application status of Sun Grid Engine daemons. |
| qquota | Shows current usage of Sun Grid Engine resource quotas. For more information, see How to Monitor Resource Quota Utilization From the Command Line. |
| qrdel | Deletes Sun Grid Engine advance reservations. For more information, see How to Configure Advance Reservations From the Command Line. |
| qresub | Creates new jobs by copying jobs that are running or pending. |
| qrls | Releases jobs from holds that were previously assigned to them, for example, through qhold. |
| qrsh | Can be used for various purposes, such as the following:
|
| qrstat | Shows the status of Sun Grid Engine advance reservations. For more information, see How to Configure Advance Reservations From the Command Line. |
| qrsub | Submits an advance reservation to Sun Grid Engine. For more information, see How to Configure Advance Reservations From the Command Line. |
| qselect | Prints a list of queue names corresponding to specified selection criteria. The output of qselect is usually sent to other Grid Engine system commands to apply actions on a selected set of queues. |
| qsh | Opens an interactive shell in an xterm on a lightly loaded host. Any kind of interactive jobs can be run in this shell. For more information, see How to Submit Interactive Jobs From the Command Line. |
| qstat | Provides a status listing of all jobs and queues associated with the cluster. For more information, see How to Monitor Jobs From the Command Line. |
| qsub | The user interface for submitting batch jobs to the Grid Engine system. |
| qtcsh | A fully compatible replacement for the widely known and used UNIX C shell (csh) derivative, tcsh. qtcsh provides a command shell with the extension of transparently distributing execution of designated applications to suitable and lightly loaded hosts through Grid Engine software. For more information see, Transparent Job Distribution With qtcsh. |
User Access to the Ancillary Programs
The following table shows the command capabilities that are available to the different user categories:
| Command | Manager | Operator | Owner | User |
|---|---|---|---|---|
| qacct | Full | Full | Own jobs only | Own jobs only |
| qalter | Full | Full | Own jobs only | Own jobs only |
| qconf | Full | No system setup modifications | Show only configurations and access permissions | Show only configurations and access permissions |
| qdel | Full | Full | Own jobs only | Own jobs only |
| qhold | Full | Full | Own jobs only | Own jobs only |
| qhost | Full | Full | Full | Full |
| qlogin | Full | Full | Full | Full |
| qmod | Full | Full | Own jobs and owned queues only | Own jobs only |
| qmon | Full | No system setup modifications | No configuration changes | No configuration changes |
| qrexec | Full | Full | Full | Full |
| qselect | Full | Full | Full | Full |
| qsh | Full | Full | Full | Full |
| qstat | Full | Full | Full | Full |
| qsub | Full | Full | Full | Full |
Users and User Categories
There are four categories of users that each have access to their own set of Grid Engine system commands:
- Managers – Managers have full capabilities to manipulate the Grid Engine system. By default, the superusers of all administration hosts have manager privileges.
- Operators – Users who can perform the same commands as managers except that they cannot change the configuration. Operators are supposed to maintain operation.
- Users – People who can submit jobs to the grid and run them if they have a valid login ID on at least one submit host and one execution host. Users have no cluster management or queue management capabilities.
- Owners – Users who can suspend or resume and disable or enable the queues they own. Typically, users are owners of the queue instances that reside on their workstations. Queue owners can be managers, operators, or users. Users are commonly declared to be owners of the queue instances that reside on their desktop workstations. See How to Configure Owners Parameters With QMON for more information.
For information on which command capabilities are available to the different user categories, see Command Line Interface Ancillary Programs.
Using the Wiki
Why a Wiki?
Since the release of Sun Grid Engine 6.2, all documentation for the product can be found on wikis.sun.com. This was done for the following reasons:
- To keep the documentation as up-to-date as possible. The product team and select community members can edit the information in real time, which ensures that the information will stay as current as possible.
- To encourage community input. Anyone can comment on the documentation. Let us know what you think.
- To build a library of community contributions. The simple wiki markup language makes it easy for anyone in the Grid Engine community to add their own input to the wiki. See the Expert Advice section to add your expertise or to survey current contributions.
Wiki Tasks
| Topic | Description |
|---|---|
| How to Monitor Documentation Changes | Learn how to monitor documentation changes using email notifications or RSS feeds. |
| How to Add a Comment | Learn how to add a comment to any page in the Sun Grid Engine wiki space. |
| How to Add a Page | Learn how to add pages to the Expert Advice section. |
| How to Print | Learn how to print what you need from the Sun Grid Engine wiki. |
Questions?
Do you have any questions about how to use the wiki? Please comment on this page with your question or send us an email.
How to Add a Comment
- Log in to wikis.sun.com.
- On the bottom of the page on which you would like to comment, Click Add Comment.
The comment box is displayed.
- Enter your comment.
- Click Post to save and publish your comment.
Click Cancel to close the comment box without saving your comment.
The page How to Add a Page does not exist.
How to Monitor Documentation Changes
| If you would like to watch a parent page and all of its child pages, you must set watches individually on the parent page and each of its child pages. |
| Function | Procedure |
|---|---|
| Setting up email notifications for the grid engine space |
|
| Removing email notifications for the grid engine space | Follow the above directions but instead of clicking "Start watching this space," click "Stop watching this space." |
| Setting up email notifications for a page |
|
| Removing email notifications for a page | Click Tools and then click Watch. The envelope icon will change back to white to indicate the watch has been removed. |
| Review your email notifications |
|
| Setting up RSS notifications of documentation changes |
|
How to Print
You can print a specific page, a section, or a PDF copy of the entire wiki. See below for guidance on how to print what you need.
How to Print a Page
If you would like to print a specific page, do the following:
- Click on Tools in the upper left hand corner of the page that you would like to print.
- Select either 'Export to PDF' or 'Export to Microsoft Word.'
- Print from the application that you selected.
How to Print a Section of the Documentation
If you would like to print one of the major sections, select one of the following printables and then print directly from your browser:
- Getting Started Guide (Printable)
- Planning the Installation Guide (Printable)
- Installing Guide (Printable)
- Upgrading Guide (Printable)
- Administering Guide (Printable)
- Using Guide (Printable)
- Service Domain Manager Guide (Printable)
- Accounting and Reporting Console Guide (Printable)
How to Print the Entire Documentation Set
For PDF copy of the entire documentation set, click here.
| This PDF snapshot was taken on March 3, 2009, the day that Grid Engine 6.2u2 was released. |
|
Before you install the Grid Engine software, you must plan how to achieve the results that fit your environment. This section helps you make the decisions that affect the rest of the procedure.
System Requirements
| Topic | Description |
|---|---|
| System Requirements | Review Grid Engine's disk space requirements and supported operating platforms. |
Installation Considerations
| Topic | Description |
|---|---|
| Planning Checklist | Document your installation plan. |
| Cluster Design | Determine whether to set up your Grid Engine system as a single cluster or a collection of loosely coupled clusters. |
| Queue Structure | Create a default cluster queue structure. |
| Host System Requirements | Learn about different hosts system requirements. |
| User Account Considerations | Determine whether you have the required permissions to submit jobs on the desired execution hosts. |
| Spooling Options | Determine whether you want to use classic spooling or Berkeley DB spooling during installation |
| Scheduler Profiles | Learn about the three scheduler profiles during installation for Grid Engine tuning. |
| Network Services | Determine whether to define network services as an NIS file or as local to each workstation in /etc/services. |
| Installation Methods | Learn about the different methods available for installing Grid Engine. |
| Directory Organization | Determining the directory organization for Grid Engine. |
| Other Installation Issues | If you are installing Grid Engine on a Linux system or on a system with IPMP, see Other Grid Engine Installation Issues for important information. |
Getting the Software
| Topic | Description |
|---|---|
| Getting the Software | Learn how to get your copy of the Grid Engine software. |
To print this section, see the Planning the Installation Guide (Printable).
System Requirements
To verify that the systems on which you intend to install Sun Grid Engine conform to required hardware and software specifications, review the system requirements listed below.
Disk Space Requirements
The Grid Engine software directory tree has the following fixed disk space requirements:
- 50 Mbytes for the installation files without any binaries
- Between 60 and 100 Mbytes for each set of binaries
The ideal disk space for Grid Engine system spool directories is as follows:
- 50-200 Mbytes for the master host spool directories
- 50-200 Mbytes for the Berkeley DB spool directories
The spool directories of the master host and of the execution hosts are configurable and need not reside under the default location, sge-root.
| Note You must satisfy several Windows platform-specific prerequisites before you can install Grid Engine on hosts that are running the Windows operating system. You might need to install additional software on your computer which might require additional disk space. See Microsoft Services For UNIX and Microsoft Subsystem for UNIX-based Applications. |
Supported Operating Platforms
The Sun Grid Engine 6.2 software supports the following operating systems and platforms:
- Solaris 10, 9, and 8 Operating Systems (SPARC Platform Edition)
- Solaris 10 and 9 Operating Systems (x86 Platform Edition)
- Solaris 10 Operating System (x64 Platform Edition)
- Apple Mac OS X 10.4 (Tiger), PPC platform
- Apple Mac OS X 10.4 (Tiger), x86 platform
- Apple Mac OS X 10.5 (Leopard), x86 platform
- Hewlett Packard HP-UX 11.00 or higher, 32 bit
- Hewlett Packard HP-UX 11.00 or higher, 64 bit (including HP-UX on IA64)
- IBM AIX 5.1, 5.3
- Linux x86, kernel 2.4, 2.6, glibc >= 2.3.2
- Linux x64, kernel 2.4, 2.6, glibc >= 2.3.2
- Linux IA64, kernel 2.4, 2.6, glibc >= 2.3.2
- Microsoft Windows Server 2003, Windows XP Professional with at least Service Pack 1, Windows 2000 Server with at least Service Pack 3, or Windows 2000 Professional with at least Service Pack 3
- Microsoft Windows Server 2003 Release 2, Windows Server 2008, Windows Vista Enterprise, Windows Vista Ultimate
Planning Checklist
Before you install the Grid Engine software, you must plan how to achieve the results that fit your environment. This section helps you make the decisions that affect the rest of the procedure. Write down your installation plan in a table similar to the following example.
| Parameter | Value |
|---|---|
| $SGE_ROOT directory | |
| Cell name | |
| Administrative user | |
| sge_qmaster port number (6444 is recommended) | |
| sge_execd port number (6445 is recommended) | |
| Master host | |
| Shadow master hosts | |
| Execution hosts | |
| Spooling for each execution host (global or local) | |
| Windows execution hosts (yes or no) | |
| Administration hosts | |
| Submit hosts | |
| Group ID range for jobs | |
| Spooling mechanism (Berkeley DB or Classic spooling) | |
| Berkeley DB server host (the master or another host) | |
| Berkeley DB spooling directory on the database server | |
| Scheduler tuning profile (Normal, High, Max) | |
| Installation method (interactive, secure, automated, or upgrade) |
If you are going to install Grid Engine 6.2 on Microsoft Windows Server 2003, Windows XP Professional with at least Service Pack 1, Windows 2000 Server with at least Service Pack 3, or Windows 2000 Professional with at least Service Pack 3, acquire and install Microsoft Services For UNIX. See Microsoft Services for UNIX for more information.
If you are going to install Grid Engine 6.2 on Microsoft Windows Server 2003 Release 2, Windows Server 2008, Windows Vista Enterprise or Windows Vista Ultimate, acquire and install Microsoft Subsystem for UNIX-based Applications. See Microsoft Subsystem for UNIX-based Applications for more information.
If you are going to install Grid Engine 6.2 on a Windows system, create the required Certificate Security Protocol (CSP) certificates before installing Grid Engine. See How to Install a CSP-Secured System for information about CSP certificates.
Check Other Grid Engine Installation Issues for applicability.
Cluster Design
Cells
You can set up the Grid Engine system as a single cluster or as a collection of loosely coupled clusters called cells. The $SGE_CELL environment variable indicates the cluster being referenced. When the Grid Engine system is installed as a single cluster, $SGE_CELL is not set, and the value default is assumed for the cell value.
Cluster Name
The $SGE_CLUSTER_NAME environment variable supports unique naming of the cluster. Unlike the $SGE_CELL variable, there are restrictions on $SGE_CLUSTER_NAME. If you decide to use Grid Engine SMF services on Solaris 10 or later hosts, you must select a new $SGE_CLUSTER_NAME. This name becomes part of the name of the Sun Grid Engine SMF services. The $SGE_CLUSTER_NAME is also used to distinguish multiple rc files for different clusters.
| note If your $SGE_CELL name already reflects the desired cluster name and also satisfies $SGE_CLUSTER_NAME restrictions, set the cluster name to the $SGE_CELL value. Otherwise, the proposed default value is pSGE_QMASTER_PORT, which uniquely identifies the running cluster by the port on which its qmaster daemon is running. See SMF administration for more information. |
Queue Structure
The installation procedure creates a default cluster queue structure, which is suitable for getting acquainted with the system. The default queue can be removed after installation.
| Note No matter what directory is used for the installation of the software, the administrator can change most settings that were created by the installation procedure. This change can be made while the system is running. |
Consider the following when determining a queue structure:
- Whether you need cluster queues for sequential, interactive, parallel, and other job types
- Which queue instances to put on which execution hosts
- How many job slots are needed in each queue
For more detailed information on administering cluster queues, see Configuring Queues.
Host System Requirements
Master Host
The master host controls the Grid Engine system. This host runs the master daemon sge_qmaster.
The master host must comply with the following requirements:
- The host must be a stable platform.
- The host must not be excessively busy with other processing.
- At least 60 to 120 Mbytes of unused main memory must be available to run the Grid Engine system daemons. For very large clusters that include many hundreds or thousands of hosts and tens of thousands of jobs in the system at any time, 1 GByte or more of unused main memory might be required and 2 CPUs might be beneficial.
- The master host must be installed before shadow master execution, administration, or submit hosts.
- (Optional) The Grid Engine software directory, $SGE_ROOT, should be installed locally to cut down on network traffic.

Note
Windows hosts cannot act as master hosts.
For more information, see How to Install the Master Host.
Shadow Master Hosts
These hosts back up the functionality of sge_qmaster in case the master host or the master daemon fails. To be a shadow master host, a machine must have the following characteristics:
- It must run sge_shadowd.
- It must share sge_qmaster status, job information, and queue configuration information that is logged to disk. In particular, the shadow master hosts need read/write root or administration user access to the sge_qmaster spool directory and to the $SGE_ROOT/$SGE_CELL/common directory.
- The $SGE_ROOT/$SGE_CELL/common/shadow_masters file must contain a line defining the host as a shadow master host.

Note
If no cell name is specified during installation, the value of $SGE_CELL is default.
The shadow master host facility is activated for a host as soon as these conditions are met. You do not need to restart the Grid Engine system daemons to make a host into a shadow master host.
| Note Windows hosts cannot act as shadow master hosts. |
For more information, see Configuring a Shadow Master Host From the Command Line.
Execution Hosts
Execution hosts run the jobs that users submit to the Grid Engine system. An execution host must first be set up as an administration host. You run an installation script on each execution host. For more information, see How to Install Execution Hosts.
Administration Hosts
Operators and managers of the Grid Engine system use administration hosts to perform administrative tasks such as reconfiguring queues or adding Grid Engine users.
The master host installation script automatically makes the master host an administration host. During the master host installation process, you can add other administration hosts. You can also manually add administration hosts on the master host at any time after installation.
Submit Hosts
Jobs can be submitted and controlled from submit hosts. The master host installation script automatically makes the master host a submit host.
Spooling Options
During the installation, you are given the option to choose between classic spooling and Berkeley DB spooling. If you choose Berkeley DB spooling, you are then given the option to spool to a local directory or to a separate host, known as a Berkeley DB spooling server.
Using a Berkeley DB spooling server might provide better performance than classic spooling. Part of this performance increase is because the master host can make non-blocking writes to the database, but has to make blocking writes to the text file used by classic spooling. Also consider file format and data integrity. Writing to the Berkeley DB provides a greater level of data integrity than writing to a text file. However, a text file stores data in a format that you can read and edit. Normally, you do not need to read these files, but the spooling directory contains the messages from the system daemons, which can be useful for debugging.
Database Server and Spooling Host
The master host can store its configuration and state to a Berkeley DB spooling database. The spooling database can be installed on the master server or on a separate host. When the Berkeley DB spools into a local directory on the master host, the performance is better. If you want to set up a shadow master host, you need to use a separate Berkeley DB spooling server (host). In this case, you have to choose a host with a configured RPC service. The master host connects through RPC to the Berkeley DB.
| Note This configuration does not provide a High-Availability (HA) solution. For example, scripts of pending jobs are not spooled through BDB spool server and thus are not available for a shadow master. |
With the introduction of NFS4 software available with the Solaris TM 10 operating system, you can use Berkeley DB spooling on a network file system. You could not use Berkeley DB spooling on previous NFS versions. This circumstance allows a shadow host installation spooled on Berkeley DB without setting up an additional Berkeley DB Spooling Server.
| Caution Although using a shadow master host is more reliable, using a separate Berkeley DB spooling host results in a potential security hole. RPC communication as used by the Berkeley DB can be easily compromised. Only use this alternative if your site is secure and if users can be trusted to access the Berkeley DB spooling host by means of TCP/IP communication. |
If you choose to use Berkeley DB spooling without a shadow master, you do not need to set up a separate spooling server. Likewise, if you choose not to use Berkeley DB spooling, you can set up a shadow master host without setting up a separate spooling server.
Once you determine whether you need a separate spooling server, you will also need to determine the location for the spooling directory. The spooling directory must be local to the spooling server. A default value for the location of the spooling directory is recommended during installation, but this default value is not suitable when the file server is different from the master host.
The requirements for the Berkeley DB spooling host are similar to the requirements for the master host:
- The host must be a stable platform.
- The host must not be excessively busy with other processing.
- At least 60 to 120 Mbytes of unused main memory must be available to run the Grid Engine system daemons. For very large clusters that include many hundreds or thousands of hosts and tens of thousands of jobs in the system at any time, one GByte or more of unused main memory might be required and two CPUs might be beneficial.
- (Optional) A separate spooling host must be installed before the master host.
- (Optional) The $SGE_ROOT directory should be installed locally, to cut down on network traffic.
Scheduler Profiles
You can choose from three scheduler profiles during the installation process: normal, high, and max. You can use these predefined profiles as a starting point for Grid Engine tuning.
Using these profiles, you can optimize the scheduler for one or more of the following:
- The amount of information that is tracked about a scheduling run
- The load adjustment during a scheduling run
- Interval scheduling (the default) or immediate scheduling
You can choose from three scheduler profiles:
- normal – This profile uses load adaptation and interval scheduling, and reports all the information that the scheduler gathers during the dispatch cycle. This profile is the starting point for most grids. Use this profile if your highest priority is gathering and reporting information about a scheduling run.
- high – This profile is more appropriate for a large cluster, where throughput is more important than gathering and reporting all the information from the scheduler. This profile also uses interval scheduling. Use this profile if you want to get better performance at the cost of getting less information about your scheduling runs.
- max – This profile disables all information gathering and reporting, enables immediate scheduling, and disables load adaptation. Immediate scheduling is very useful for sites with high throughput and very short running jobs. The advantage of immediate scheduling decreases as runtime of the jobs increases. This profile can be used in clusters of any size where only throughput is important and everything else is a lower priority.
For more information on how to configure scheduling, see Administering the Scheduler.
Network Services
Determine whether your site's network services are defined in an NIS database or in an /etc/services file that is local to each workstation. If your site uses NIS, determine the host name of your NIS server so that you can add entries to the NIS services map.
The Grid Engine system services are sge_execd and sge_qmaster. To add the services to your NIS map, choose reserved, unused port numbers. The following examples show sge_qmaster and sge_execd entries.
sge_qmaster 6444/tcp sge_execd 6445/tcp
Installation Methods
Several methods are available for installing the Grid Engine software:
- Interactive
- Interactive, with increased security
- Automated, using the inst_sge script and a configuration file
- Upgrade
To decide which installation method you should use, consider the following factors.
- Do you already have the Grid Engine software installed and running?
- If so, you will probably want to upgrade. The upgrade process is described in Upgrading From a Previous Version of the Software.
- If not, the master host installation is only done once. The master host is typically installed interactively, as described in Installing the Software From the Command Line.
- Do you need to install just a few execution hosts? If so, then you will probably want to install them interactively, as described in Installing the Software From the Command Line.
- Do you need to install a large number of execution hosts? If so, then you might want to perform automated installation, using the inst_sge script and a configuration file. See Using the inst_sge Utility and a Configuration Template.
- Do you require your grid to use encryption? If so, you have to perform an interactive installation with increased security. See Installing the Increased Security Features.
User Account Considerations
User Names
For the Grid Engine system to verify that users submitting jobs have permission to submit them on the desired execution hosts, users' names must be identical on the submit and execution hosts. You might therefore have to change user names on some machines, because Grid Engine user names map directly to system user accounts.
| Note User names on the master host are not relevant for permission checking. These user names do not have to match or even exist. |
Installation Accounts
You can install the Grid Engine software either as the root user or as an unprivileged user, for example, your own user account. However, if you install the software when you are logged in as an unprivileged user, the installation allows only that user to run Grid Engine jobs. Access is denied to all other accounts. Installing the software when you are logged in as root resolves this restriction. However, root permission is required for the complete installation procedure. Also, if you install as an unprivileged user, you are not allowed to use the qrsh, qtcsh, or qmake commands, nor can you run tightly integrated parallel jobs.
| Note To use SMF on Solaris 10 or later hosts and run the Grid Engine software as an unprivileged user, perform the following additional steps as root user (or user with appropriate permissions): For a local user:
For a distributed name service, such as NIS, NIS+, or LDAP:
|
File Access Permissions
If you install the software logged in as root, you might have a problem configuring root read/write access for all hosts on a shared file system. Therefore, you might have problems putting the $SGE_ROOT files onto a network-wide file system.
You can force Grid Engine software to run all Grid Engine system components through a non-root administrative user account, for example sgeadmin. With this setup, this particular user needs only read/write access to the shared $SGE_ROOT file system.
The installation procedure asks whether files should be created and owned by an administrative user account. If you answer "Yes" and provide a valid user name, files are created by this user. Otherwise, the user name under which you run the installation procedure is used. Create an administrative user, and answer "Yes" to this question.
Make sure in all cases that the account used for file handling on all hosts has read/write access to the $SGE_ROOT directory. Also, the installation procedure assumes that the host from which you access the Grid Engine software distribution media can write to the $SGE_ROOT directory.
Note
|
Directory Organization
When determining the directory organization, you must decide the following:
- The directory organization, for example, whether you will install a complete software tree on each workstation, cross-mounted directories, or a partial directory tree on some workstations.
- Where to locate each $SGE_ROOT root directory.
| Note Because changing the installation directory or the spool directories requires a new installation of the system, use extra care to select a suitable installation directory. Note that all important information from a previous installation can be preserved. |
By default, the installation procedure installs the Grid Engine software, man pages, spool areas, and the configuration files in a directory hierarchy under the installation directory as shown in the following figure. If you accept this default behavior, you should install or select a directory with the access permissions that are described in File Access Permissions.
Figure – Sample Directory Hierarchy

You can choose to put the spool areas in other locations during the primary installation. See Configuring Queues for more detailed instructions.
Spool Directories under the Root Directory
During the installation of the master host, you must specify the location of a spooling directory. This directory is used to spool jobs from execution hosts that do not have a local spooling directory.
| Note If you are using a Windows execution host, you must use the local spooling directory. |
- On the master host, spool directories are maintained under qmaster-spool-dir. The location of qmaster-spool-dir is defined during the master host installation process. The default value of qmaster-spool-dir is $SGE_ROOT/$SGE_CELL/spool/qmaster.
- On each execution host, a spool directory called execd-spool-dir is defined during the execution host installation processes. The default value of execd-spool-dir is $SGE_ROOT/$SGE_CELL/spool/exec-host. You will get better performance from execution hosts with local spooling directories than from execution hosts that have NFS mounted the master host's spooling directory.

Note
If no cell name is specified during installation, the value of $SGE_CELL is default.
You do not need to export these directories to other machines. However, exporting the entire $SGE_ROOT tree and making it write-accessible for the master host and all executable hosts makes administration easier.
| Note If you use a Lustre fileshare as the spool directory, you should disable file striping for these directories. Refer to the Lustre Operations Manual for information on how to disable file striping. |
Choosing Between Classic Spooling and Database Spooling
During the installation, you are given the option to choose between classic spooling and Berkeley DB spooling. If you choose Berkeley DB spooling, you are then given the option to spool to a local directory or to a separate host, known as a Berkeley DB spooling server.
Using a Berkeley DB spooling server might provide better performance than classic spooling. Part of this performance increase is because the master host can make non-blocking writes to the database, but has to make blocking writes to the text file used by classic spooling. Also consider file format and data integrity. Writing to the Berkeley DB provides a greater level of data integrity than writing to a text file. However, a text file stores data in a format that you can read and edit. Normally, you do not need to read these files, but the spooling directory contains the messages from the system daemons, which can be useful for debugging.
$SGE_ROOT Directory
You must create a directory into which to load the contents of the distribution media. This directory is called the root directory, or $SGE_ROOT. When the Grid Engine system is running, this directory stores the current cluster configuration and all other data that must be spooled to disk.
| Note For efficient spooling, place the spooling directories somewhere other than within $SGE_ROOT. |
Use a valid path name for the directory that is network-accessible on all hosts. For example, if the file system is mounted using automounter, set $SGE_ROOT to /usr/SGE6, not to /tmp_mnt/usr/SGE6.
| Note Throughout this information space, the $SGE_ROOT environment variable is used to refer to the directory into which the Sun Grid Engine software is installed. |
The $SGE_ROOT directory is the top level of the Grid Engine software directory tree. On startup, each Grid Engine software component in a cell needs read access to the $SGE_ROOT/$SGE_CELL/common directory. When Grid Engine software is installed as a single cluster, the value of $SGE_CELL is default.
For ease of installation and administration, this directory should be readable on all hosts on which you intend to run the Grid Engine software installation procedure. For example, you can select a directory that is available across a network file system, such as NFS. If you choose to select file systems that are local to the hosts, you must copy the installation directory to each host before you start the installation procedure for the particular machine. See File Access Permissions for a description of required permissions.
Getting the Software
The software is distributed through electronic download and on CD-ROM.
Electronic Download
Sun Grid Engine
To electronically download a copy of the Sun Grid Engine software, visit sun.com. The product distribution is in pkadd format for the Solaris Operating System (Solaris OS).
Grid Engine Open Source
If you would like to download a copy of the open source Grid Engine software, visit the download center.
CD-ROM Distribution
For information on how to access CD-ROMs, ask your system administrator or refer to your local system documentation. For instructions, see Loading the Distribution Files on a Workstation.
|
To effectively install Sun Grid Engine, perform the following tasks in the order that they are listed:
| Topic | Description |
|---|---|
| Release Notes | Learn about product enhancements that have been added since the release. |
| Planning the Installation | Strategically plan your installation to achieve results that fit your environment. |
| Loading the Distribution Files on a Workstation | Unpack and load the distribution files onto a workstation. |
| Installing the Software With the GUI Installer | Learn how to run the new GUI installer and install whole cluster. |
| Installing the Software From the Command Line | Learn how to run an installation script on the master host and on every execution host in the Grid Engine system and to register information about administration hosts and submit hosts. |
| Installing Security Features | Set up your system more securely. |
| Installing the Accounting and Reporting Console | Install the Accounting and Reporting Console, an optional feature that enables you to gather live reporting data from the Grid Engine system. |
| SDM Installation Overview | Install the Service Domain Manager module, an optional feature that distributes resources between different services according to configurable Service Level Agreements (SLAs). |
| Verifying the Installation | Verify that the daemon is running on the master host and on the Execution Hosts and how to run simple commands and submit test jobs. |
In addition, you might need to perform one or more related tasks:
| Topic | Description |
|---|---|
| Automating the Installation Process | Learn how to automate the Grid Engine installation process. |
| Installing SMF Services | Learn how to install the Service Management Facility (SMF) services. |
| Installing a JMX-Enabled System | Learn how to install a JMX-enabled system. |
| Removing the Software | Learn how to remove the Sun Grid Engine software. |
| Additional Software for the Microsoft Operating System | Learn how to install Sun Grid Engine on Microsoft Windows operating system. |
| User Management on Windows Hosts | Learn how to manage user accounts on Windows hosts. |
| Other Installation Issues | Learn how to identify additional considerations for installing Sun Grid Engine software. |
To print this section, see the Installing Guide (Printable).
Loading the Distribution Files on a Workstation
The Sun Grid Engine 6.2 software is distributed on CD-ROM and through electronic download. The CD-ROM distribution contains a directory named Sun_Grid_Engine_6_2. The product distribution is in this directory, in both tar.gz format and the pkgadd format. The pkgadd format is provided for the Solaris Operating System (Solaris OS). For all supported operating systems, the software is distributed in tar.gz format. For more on how to obtain the distribution files, see Getting the Software.
How to Load the Distribution Files on a Workstation
Ensure that the file systems and directories that are to contain the Grid Engine software distribution and the spool and configuration files are set up properly by setting the access permissions as defined in File Access Permissions.
Steps
- Provide access to the distribution media.
If you downloaded the software, rather than getting it on CD-ROM, just unzip the files into a directory. This directory must be located on a file system that has at least 350 MBytes free space.
- Log in to a system.
Log in preferably on a system that has a direct connection to a file server.
- Create the installation directory.
Create an installation directory as described in $SGE_ROOT Installation Directory.# mkdir /opt/sge6-2
In these instructions, the installation directory is abbreviated as sge-root.
- Install the binaries for all binary architectures that are to be used by any of your master, execution, and submit hosts in your Grid Engine system cluster.
You can use either the pkgadd method or the tar method.
pkgadd Method
The pkgadd format is provided for the Solaris Operating System. To facilitate remote installation, the pkgadd directories are also provided in zip files.
You can install the following packages:
| Package | Description |
|---|---|
| SUNWsgeec | Architecture independent files |
| SUNWsgeex | Solaris (SPARC platform) 64-bit binaries for Solaris 8, Solaris 9, and Solaris 10 Operating Systems |
| SUNWsgeei | Solaris (x86 platform) binaries for Solaris 8, Solaris 9, and Solaris 10 Operating Systems |
| SUNWsgeeax | Solaris (x64 platform) binaries for Solaris 10 Operating System |
| SUNWsgeea | Accounting and Reporting Console (ARCo) packages for the Solaris and Linux Operating systems. |
As you type the following commands, you must be prepared to respond to script questions about your base directory, sge-root, and the administrative user. The script requests the choices that you made during the planning steps of this installation. See Decisions That You Must Make for further details.
At the command prompt, type the following commands, responding to the script questions.
# cd cdrom_mount_point/Sun_Grid_Engine_6_2 # pkgadd -d ./Common/Packages SUNWsgeec
Depending on the Solaris binary that you need, type one of the following commands:
# pkgadd -d ./Solaris_sparc/Packages SUNWsgee # pkgadd -d ./Solaris_sparc/Packages SUNWsgeex # pkgadd -d ./Solaris_x86/Packages SUNWsgeei # pkgadd -d ./Solaris_x64/Packages SUNWsgeeax
tar Method
For all supported operating systems, the software is distributed in tar.gz format.
The following table contains files that you need to install, regardless of platform.
| File | Description |
|---|---|
| Common/tar/sge-6_2-common.tar.gz | Architecture independent files |
The tar files that contain platform-specific binaries use the naming convention of sge-6_2-bin-architecture.tar.gz.
The following table lists the platform-specific binaries. Install the file for each platform that you need to support. Note that each platform has its own directory under Sun_Grid_Engine_6_2.
| Platform-Specific File | Platform |
|---|---|
| Solaris_sparc/tar/sge-6_2-bin-solaris-sparcv9.tar.gz | Solaris (SPARC platform) 64-bit binaries for Solaris 8, Solaris 9, and Solaris 10 Operating Systems |
| Solaris_x86/tar/sge-6_2-bin-solaris-i586.tar.gz | Solaris (x86 platform) binaries for Solaris 8, Solaris 9, and Solaris 10 Operating Systems |
| Solaris_x64/tar/sge-6_2-bin-solaris-x64.tar.gz | Solaris (x64 platform) 64-bit binaries for Solaris 10 |
| Windows/tar/sge-6_2-bin-windows-x86.tar.gz | Microsoft Windows (x86 platform) 32-bit binaries for Windows 2000, XP and Windows Server 2003 |
| Linux24_i586/tar/sge-6_2-bin-linux24-i586.tar.gz | Linux (x86 platform) binaries for the 2.4 and 2.6 kernel |
| Linux24_amd64/tar/sge-6_2-bin-linux24-ia64.tar.gz | Linux (Itanium platform) binaries for the 2.4 and 2.6 kernel |
| Linux24_amd64/tar/sge-6_2-bin-linux24-x64.tar.gz | Linux binaries for the 2.4 and 2.6 kernel |
| MacOSX/tar/sge-6_2-bin-darwin-ppc.tar.gz | Apple Mac OS/X (PowerPC platform) |
| MacOSX/tar/sge-6_2-bin-darwin-x64.tar.gz | Apple Mac OS/X (Intel-based platform) |
| HPUX11/tar/sge-6_2-bin-hp11.tar.gz | Hewlett-Packard HP-UX 11 or higher |
| HPUX11/tar/sge-6_2-bin-hp11-64.tar.gz | 64-bit binaries for Hewlett-Packard HP-UX 11 or higher |
| Aix43/tar/n1ge-6_1-bin-aix51.tar.gz | IBM AIX 5.1 and 5.3 |
Type the following commands at the command prompt. In the example, <basedir> is the abbreviation for the full directory, cdrom-mount-point/Sun_Grid_Engine_6_2.
% su # cd <sge-root> # gzip -dc <basedir>/Common/tar/sge-6_2-common.tar.gz | tar xvpf - # gzip -dc <basedir>/Solaris_sparc/tar/sge-6_2-bin-solsparc32.tar.gz | tar xvpf - # gzip -dc <basedir>/Solaris_sparc/tar/sge-6_2-bin-solsparc64.tar.gz | tar xvpf - # SGE_ROOT=<sge-root>; export SGE_ROOT # util/setfileperm.sh $SGE_ROOT
Installing the Software With the GUI Installer
Sun Grid Engine 6.2u2 comes with a new GUI installer to simplify the installation process. The GUI installer enables you to easily install a whole cluster interactively. To install a cluster, you need to set up the environment in a similar way to an automatic installation.
Requirements
- The GUI installer requires at least Version 5 of the Java™ platform.
- Screen resolution of 1024x768 or larger.
- (Optional) Password-less ssh or rsh access as root user to all remote hosts that you want to install. If this requirement is not met you can only install Grid Engine components on a local host. For more information, see How to Configure Password-less Access for the root User. You can still use the GUI installer by starting it locally from each remote host.
Recommendations
|
For information on installation modes supported by the Sun Grid Engine 6.2u2 GUI installer, see these topics:
| Topic | Description |
|---|---|
| Express Installation | Enables first-time users to install the software easily. Provides a significantly reduced set of parameters that need to be configured. Requires password-less ssh access as root user to all remote hosts that you want to install. |
| Custom Installation | Enables you to configure almost all existing options that are available during the command-line installation. Offers more advanced features for the cluster host selection. Requires password-less ssh or rsh access as root user to all remote hosts that you want to install. |
For additional reference information, see these topics:
| Topic | Description |
|---|---|
| How to Configure Password-less Access for the root User | Procedure for configuring a password-less ssh or rsh access for the root user to install a whole Sun Grid Engine (SGE) cluster by using the GUI Installer. |
| Understanding Host and Installation States | Describes the different installation states that you might encounter while using the GUI installer. |
| Tweaking start_gui_installer | Describes the command-line options of the start_gui_installer command and how to use them to fine tune the performance of the installer. |
| Troubleshooting the GUI Installer | Contains known issues and their workarounds. |
Express Installation
The express installation mode is targeted at first-time users and provides a significantly reduced set of parameters to configure. This mode also provides reasonable default values for most of the parameters. You must have a password-less ssh or scp access if you are planning to install Sun Grid Engine on remote hosts. The following steps describe a complete cluster installation and assume that the password-less access is configured. (Click any of the screen captures in the following steps to view more details.)
Using the Express Installation Mode
The express installation steps are as follows.
Steps
- Start the GUI installer. On the welcome screen, click Next.

Note Ensure that you start the GUI installer on the qmaster host.
As root, run the start_gui_installer command in your sge-root directory. For example:
qmaster:/sge# ./start_gui_installer Starting Installer ...
- Agree to the terms of the license. Click Next.
- Choose components to install. Click Next.
See the following table for a brief explanation of options displayed on this screen.Host type Description Qmaster host Main component in Sun Grid Engine software. You must install exactly one qmaster component per Grid Engine installation. Execution host(s) Hosts that execute the tasks (jobs). Shadow host(s) Hosts that provide a high availability feature to the cluster. In case the qmaster fails (for example, due to a crash or network issue), one of the shadow hosts takes over the qmaster responsibility. Berkeley db host Host that implies a Berkeley db host spooling option. The Grid Engine then spools data to a remote server. Not recommended as the default option. If you are not sure what you want to install, keep the components selected by default.
- Modify the main configuration details. Click Next.
Option Description Admin user Grid Engine processes will be executed under this user name, and certain directories will be owned by this user. Qmaster host Host that will run qmaster daemon (main component). It can be changed later in the host selection. Grid Engine root directory Directory where you unpacked the Grid Engine tar.gz archive or installed a package (for example, rpm, pkg). It must not contain an automounter prefix. Cell name Name of this Grid Engine cell, a value that identifies an instance of a Grid Engine when several instances run simultaneously. Cluster name Name of this Grid Engine instance used by SMF on Solaris machines. In express installation mode, this instance is hidden and has a value of p6444. The following naming restrictions apply to this field: The cluster name must start with a letter ([A-Za-z]), followed by letters, digits ([0-9]), dashes ("-"), or underscores ("_"). Qmaster port Port that will be used by the qmaster daemon. Default value is 6444. Execd port Port that will be used by the execution daemon. Default value is 6445. Administrator mail Email address used by Grid Engine to report issues to the grid administrator. Default value is none (no emails will be sent). Automatically start service(s) at machine boot Component (service) will be automatically started at machine boot. By default, this is selected. Typically, one would provide a valid administrator email and click next.
- Select hosts to be installed and fix reported problems. Click Install to start the installation on the reachable hosts.

This screen allows you to select the hosts and components that you would like to install. Express installation mode has a slightly simplified selection model. Custom installation mode enables you to change the components that will be selected once new hosts are added. The qmaster host is added based on the qmaster host value from the main configuration screen by default. You can select the hosts in one of two different ways:
1. By a host name, host name pattern, or by an IP address or IP address pattern
2. From a file that you create using the installer's save action
The patterns do not support regular expressions. The supported expressions are lists and numeric ranges. For more information, see the following table:Description Input Resolved Value Host name grid00 grid00 IP address 192.168.0.1 192.168.0.1 List of hosts grid00 grid01 grid05 grid00 grid01 grid05 List of IP addresses 192.168.0.1 192.168.0.2 192.168.0.5 192.168.0.1 192.168.0.2 192.168.0.5 Host ranges grid[00-10] grid00, grid01, ..., grid10 Range of IP addresses 192.[168-169].0.[50-60] 192.168.0.50 ... 192.168.0.60, 192.169.0.50 ... 192.169.0.60
In the following screen sequence, hosts grid01 to grid07 are added as execution and submit hosts. However, three hosts have errors. See Understanding Host and Installation States for a complete list of errors and possible solutions. Note that each state has a tooltip that displays a better error message. Once the errors are resolved on the problematic hosts, select hosts that you want to verify and right-click. A pop-up menu enables you to refresh selected hosts. Optionally, invalid hosts can be removed. Once the states have been refreshed, a different error state or reachable state will be displayed.
- (Optional) Fix problems reported during pre-install validation, then click Install.
When you click the Install button as described in Step 5, the installation does not start immediately. First, the installer executes a series of advanced checks for each host to verify that there is no misconfiguration. If the validation fails, host states are updated and you are presented with an option to return to the host selection or to continue with the installation.
Note Continuing the installation after the installer reports errors will likely result in a failed installation. Before restarting the installation, you should return to the host selection and either resolve the reported problems or remove the hosts that have configuration errors.
In the following screen sequence, three hosts have configuration errors. See Understanding Host and Installation States for a complete list of errors and possible solutions. Notice that each state includes a tooltip that displays an error message.
- Monitor the progress of the installation, then click Next.
If there were any failures during the installation, the Failed tab is selected. See Understanding Host and Installation States for a complete list of installation states. Click the Log button for each failed installation for more information.
This error was produced because the cluster name p6444 already exists on this host (installation was not attempted).
- Review the overview information, then click Done.
Optionally, print or save the information about the Sun Grid Engine configuration for future reference. The page is also automatically saved to the $SGE_ROOT/$SGE_CELL/Readme_TIMESTAMP.html file. If the page could not be saved there, due to root being mapped to nobody on NFS shared file system, it is saved to /tmp/Readme_TIMESTAMP.html.
To verify the installation, go to Verifying the Installation.
Custom Installation
The custom installation mode is targeted at the experienced users. It offers more advanced customization of Sun Grid Engine installation than the express installation. It provides default values for most of the parameters. You must have a password-less ssh or rsh access if planning to install Sun Grid Engine on remote hosts. The following steps assume that the password-less access is configured and describe a cluster installation consisting of:
- Qmaster host with JMX feature enabled
- Eight execution hosts on various architectures
- Three shadow hosts
- Three administrative hosts
- Nine submit hosts
Using the Custom Installation Mode
The custom installation steps are as follows.
Steps
- Start the GUI Installer. On the welcome screen, click Next.

Note Ensure that you start the GUI installer on the qmaster host.
As root, run the start_gui_installer command in your sge-root directory. For example:
qmaster:/sge# ./start_gui_installer Starting Installer ...
- Agree to the terms of the license. Click Next.
- Choose components to install, including a shadow host and the custom installation option, and click Next.
See the following table for a brief explanation of options displayed on this screen.Host type Description Qmaster host Main component in Sun Grid Engine software. Exactly one qmaster component must be installed per Grid Engine installation. Execution host(s) Hosts that execute the tasks (jobs). Shadow host(s) Shadow hosts provide a high availability feature to the cluster. In case that the qmaster fails (crash, network issue), one of the shadow hosts will take over the qmaster responsibility. Berkeley db host Selecting it implies a Berkeley db host spooling option. The Grid Engine than spools data to a remote server. Not recommended as default option.
- Modify the main configuration details. Click Next.
Option Description Admin user Grid Engine processes will be executed under this user name, and certain directories will be owned by this user. Qmaster host Host that will run qmaster daemon (main component). It can be changed later in the host selection. Grid Engine root directory Directory where you unpacked the Grid Engine tar.gz archive or installed a package (for example, rpm, pkg). It must not contain an automounter prefix. Cell name Name of this Grid Engine cell, a value that identifies an instance of a Grid Engine when several instances run simultaneously. Cluster name Name of this Grid Engine instance used by SMF on Solaris machines. The following naming restrictions apply to this field: The cluster name must start with a letter ([A-Za-z]), followed by letters, digits ([0-9]), dashes ("-"), or underscores ("_"). Qmaster port Port that will be used by the qmaster daemon. Default value is 6444. Execd port Port that will be used by the execution daemon. Default value is 6445. Group id range Range of additional group IDs. The group IDs in this range must not be used anywhere else. The size of the range determines how many concurrent jobs can run in Sun Grid Engine. Choose a large value. Shell name Shell to be used while connecting to remote hosts (with ssh or rsh syntax). Expected values for this field are ssh or rsh. Copy command Command to be used while copying files to remote hosts (with scp or rcp syntax). Expected values for this field are scp or rcp. Administrator mail Email address used by the Sun Grid Engine to report issues to the grid administrator. Default value is none (no emails will be sent). Automatically start service(s) at machine boot Component (service) will be automatically started at machine boot. By default, this is selected. Use JMX Triggers installation of a JVM thread in qmaster. Currently only needed when you plan to install Service Domain Manager. By default, this is selected. Ignore domain names Sun Grid Engine will ignore domain names when comparing host names. By default, this is selected. Use CSP product mode Sun Grid Engine will be installed with certificate security protocol (CSP). Communication between Sun Grid Engine daemons will be protected by an SSL certificate. Has impact on cluster throughput. By default, this is not selected. Typically, one would customize the default values and click Next.
- Modify the JMX configuration details. Click Next.
Option Description JMX port Port number to be used by JVM thread in qmaster process. Enable SSL server authentication Once enabled, SSL certificate configuration will be presented later. The server certificate will be used for authentication and encryption. Enable SSL client authentication Client authentication will be used. Path to the keystore Path to Java keystore file that will be created during the qmaster installation. Keystore password Keystore password. Default value is changeit. Retype password Password to retype. Default value is changeit. JVM library path Path to the JVM library on the qmaster host. Be sure to enter a correct path. If you are on a 64-bit system, you might have executed the GUI installer in default Java 32-bit mode. In that case, the detected path to the library will be for 32-bit Java, and the thread will fail to start later in 64-bit mode. Additional JVM args Additional arguments to be used when starting the JVM in qmaster. Default is -Xmx256m.
- Modify the spooling configuration. Click Next.
Option Description Qmaster spool directory Directory for qmaster spooling data. Global execd spool directory Directory for execution daemon spooling directory used by default for all execution hosts. Unless overridden in the host selection screen, each execution host creates a subdirectory in the global execd spool directory. Classic spooling method Spooling is done in human readable format. Berkeley db spooling method Spooling is done to local Berkley db. Berkeley db spooling server spooling method Spooling is done to Berkley db server. Berkeley db host Host for Berkeley db server, enabled only when Berkeley db spooling server method is selected. Db directory Berkeley db spooling directory, either on local host or Berkeley db host, if Berkeley db spooling server method is selected.
- (Optional) Provide SSL certificate information. Click Next.

This screen is displayed only when you have previously selected the JMX or CSP features. An SSL certificate will be generated as part of qmaster installation. This certificate will then be used throughout the Grid Engine.Option Description Country code Two-character country code. Default value is DE. State State. Default value is GERMANY. Location Location. Default value is Building. Organization Organization. Default value is Organisation. Organization unit Organization unit. Default value is Organisation_unit. Email address Email address. Default value is name@yourdomain.com.
- Select hosts to be installed and fix reported problems. Click Install to start the installation on the reachable hosts.

This screen allows you to select the hosts and components that you would like to install. The qmaster host is added based on the qmaster host value from the main configuration screen by default. You can select the hosts in one of two different ways:
1. By a host name, host name pattern, or by an IP address or IP address pattern
2. From a file that you create using the installer's save action
The patterns do not support regular expressions. The supported expressions are lists and numeric ranges. For more information, see the following table:Description Input Resolved Value Host name grid00 grid00 IP address 192.168.0.1 192.168.0.1 List of hosts grid00 grid01 grid05 grid00 grid01 grid05 List of IP addresses 192.168.0.1 192.168.0.2 192.168.0.5 192.168.0.1 192.168.0.2 192.168.0.5 Host ranges grid[00-10] grid00, grid01, ..., grid10 Range of IP addresses 192.[168-169].0.[50-60] 192.168.0.50 ... 192.168.0.60, 192.169.0.50 ... 192.169.0.60
In the following screen sequence, eight execution and submit hosts are added from a file. One host has an error; it is unreachable. See Understanding Host and Installation States for a complete list of errors and possible solutions. Note that each state has a tooltip that displays a better error message. Hosts can be refreshed or removed using a context menu. In addition, two hosts are added as shadow and administrative hosts. Before actually adding the hosts by clicking the Add button, the default component selection must be changed from execution and submit host to shadow and admin host.
- (Optional) Fix problems reported during pre-install validation. Click Install.

When you click the Install button as described in Step 8, the installation does not started immediately. First, the installer executes a series of advanced checks for each host to verify that there is no misconfiguration. If the validation fails, host states are updated and you may return to the host selection or continue with the installation.
Note Continuing the installation after the installer reports errors will likely result in a failed installation. Before restarting the installation, you should return to the host selection and either resolve the reported problems or remove the hosts that have configuration errors.
See Understanding Host and Installation States for a complete list of errors and possible solutions. An example of a pre-install validation with errors can be found in Express Installation.
- Monitor the progress of the installation, then click Next.
If there were any failures during the installation, see Understanding Host and Installation States for a complete list of installation states. Click the Log for each failed installation for more information as shown in this example.
- Review the overview information, then click Done.
Optionally, print or save the information about the Sun Grid Engine configuration for future reference. The page is also automatically saved to the $SGE_ROOT/$SGE_CELL/Readme_TIMESTAMP.html file. If the page could not be saved there, due to root being mapped to nobody on NFS shared file system, it is saved to /tmp/Readme_TIMESTAMP.html.
To verify the installation, go to Verifying the Installation.
How to Configure Password-less Access for the root User
This section describes how to set up a password-less ssh or rsh access for the root user to install a whole Sun Grid Engine cluster at once by using the GUI Installer. The Sun Grid Engine installation must be started on the qmaster host, so you need to first decide which host is going to be the qmaster host. The following instructions use qmaster as the qmaster host name. You must replace qmaster with your qmaster host name.
| Recommendation You can skip this procedure if you plan to install Sun Grid Engine only on a local host. |
| Warning Enabling root login without a password can be a security risk! The commands and configuration files used in the following procedure are applicable only to the Solaris 10 operating system. You can substitute these with commands and configuration files that are appropriate for your operating system. |
| Note Installing Grid Engine cluster with CSP option may additionally require password-less access to the localhost (qmaster host to the qmaster host). |
Configuring Password-less ssh Access for the root User
- Enable root login.
For security reasons, using ssh as root is disabled on many platforms by default. Perform the following for each host on which you will log in using password-less ssh as the root user:- As root, open the /etc/ssh/sshd_config file.
- Modify PermitRootLogin no to PermitRootLogin without-password.
- Restart ssh service on all remote hosts.
As root type the following command.svcadm disable -st ssh ; svcadm enable ssh
- Generate a certificate on the qmaster host.
As root, type the following command to generate the RSA key on the qmaster host. You should leave the passphrase empty.# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Created directory '/root/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: ec:fa:48:55:c4:3d:59:40:a6:27:10:a2:90:11:de:dc root@qmaster
- Copy the certificate to all remote hosts.
Copy the generated public key contained in a id_rsa.pub file to every remote host that should accept root login without a password from this host.
The following example enables root access to host grid05 from host qmaster.qmaster# cat /root/.ssh/id_rsa.pub ssh-rsa ACCCB3NzaC1yc2EBBBBBIwAAAIEA1xfRiZMV7xt8EMDollLQH5RTAVz3lIXkr/FTfcbwjuMa0t/PdO9gBnJY03e1mIIpjDPiqT2IWfdrzHZB4xvl0MBNhMTWf8Gd3WDO4T7/zw7VhlqT6wUl0ncrhzE5BTIMB0i0X/amgidEzFbL+hE3RvPuowapNZUv+JC1IjDVmmE= root@qmaster qmaster# ssh grid05 he authenticity of host 'grid05 (192.168.1.5)' can't be established. RSA key fingerprint is ec:fa:48:55:c4:3d:59:40:a6:27:10:a2:90:11:de:dc. Are you sure you want to continue connecting (yes/no)? yes Password: grid05# mkdir -p ~/.ssh grid05# echo "ssh-rsa ACCCB3NzaC1yc2EBBBBBIwAAAIEA1xfRiZMV7xt8EMDollLQH5RTAVz3lIXkr/FTfcbwjuMa0t/PdO9gBnJY03e1mIIpjDPiqT2IWfdrzHZB4xvl0MBNhMTWf8Gd3WDO4T7/zw7VhlqT6wUl0ncrhzE5BTIMB0i0X/amgidEzFbL+hE3RvPuowapNZUv+JC1IjDVmmE= root@qmaster" >> ~/.ssh/authorized_keys
- Verify if you are able to connect to the hosts as root without a password.
As root, type the following command.ssh <remote_password-less_host>
If you are able to connect to the hosts without being prompted, password-less access to the hosts has been set up. Now, you can invoke the GUI installer using the start_gui_installer command from your sge-root directory.
Configuring Password-less rsh Access for the root User
- Enable root login.
Normally, the root user can only log in to the console /dev/console. You can remove this restriction by performing the following.- Open the /etc/default/login file.
- Comment out the CONSOLE=/dev/console line by inserting a # character at the beginning of the line.
You need to perform this for each remote host you would like to log in to.
- Set up access without a password.
- Create a .rhosts file.
- Add a single line that contains the qmaster's host name preceded by a + sign.
For example, if foo is the qmaster's host name, add the line +foo to the .rhosts file. - Copy this file to the root user's home directory on each of the remote hosts where you wish to install Sun Grid Engine.
This will allow root to log in from the qmaster host without a password to any machine that will be part of the cluster.
- Restart rlogin service on all remote hosts.
As root, type the following command.svcadm disable -st rlogin ; svcadm enable rlogin
- Verify if you are able to connect to the hosts as root without a password.
As root, type the following command.rlogin <remote_password-less_host>
If you are able to connect to the hosts without being prompted, password-less access to the hosts has been set up. Now, you can invoke the GUI installer using the start_gui_installer command from your sge-root directory. Choose the custom installation mode and replace ssh with rsh and scp with rcp in the Main configuration panel.
Understanding Host and Installation States
This section lists the different installation states that you might encounter while using the GUI installer. The installation states can be divided into the following three categories.
Host Resolving
When a new host is added in the Select hosts screen, the host name State field is immediately set to New unknown host and host name resolving process is initiated. The host name is marked as Reachable only if the architecture of the host can be retrieved. All the other states specify an error. The GUI installer cannot perform any installation on such a host. The following table lists all possible states.
| State | Description |
|---|---|
| New unknown host | Initial state. When the host name is added, the GUI installer immediately starts resolving the host name or IP address of the host, if there are available threads in the resolve pool. |
| Resolving | Temporary state. The host is being resolved based on the host name or IP address by using the default name service. |
| Unknown host | Final state. The host cannot be resolved by the name service. |
| Resolvable | Temporary state. After the host is resolved, the GUI installer immediately tries to retrieve the host's architecture, if there are available threads in the resolve pool. |
| Contacting | Temporary state. The host has been resolved and the host's architecture is being retrieved. |
| Missing remote file | Final state. Missing file '$SGE_ROOT/util/arch' on remote host. Is the sge-root path the same for the remote host and the local host? If not, fix the path or refer Using Path Aliasing. |
| Reachable | Final state. The host architecture cannot be retrieved. Password-less ssh or rsh access to remote hosts is working properly. |
| Unreachable | Final state. The host architecture cannot be retrieved. Password-less ssh or rsh access to remote hosts is not working properly. See How to Configure Password-less Access for more information. |
| Canceled | Final state. The user has canceled the host resolving process. |
Host Validating
After the hosts have been resolved and their architecture has been retrieved, they are moved to the Reachable tab in the Select hosts screen. You can install Sun Grid Engine on a host that is in the Reachable state. While clicking the Install button, the GUI installer first invokes additional remote host validation. If the installer discovers any configuration errors (see RED and ORANGE states in the list below), the installation is not initiated and the appropriate error message is displayed. You can return to the Select hosts screen and proceed with the installation if you wish.
| State | Description | Problem Resolution |
|---|---|---|
| Copy timeout | Timeout occurred when copying check_host or install_component files. See tooltip for the exact file name. |
Try again (press Install button one more time). If timeout reoccurs, save your host list to a file, stop the installer and restart it with increased timeout values. See tweaking start_gui_installer. |
| Copy failed | Copying files check_host or install_component to the remote host failed. See tooltip for the exact file name. |
Try again (press Install button one more time). If problems reoccurs try to copy a any file with scp or rcp to verify these commands work properly. If not make sure they do before new installation attempt. |
| Permission denied | Either of Berkeley DB, qmaster, execution daemon spool directory or JMX keystore file is not writable. See tooltip for the exact message. Installation will most likely fail, if you proceed anyway. |
Did you start the installation as root? What permissions are for the first existing directory? Are you on a NFS file system with root mapped to nobody? Is the UID for the admin user the same on the local and remote machine? |
| Admin user missing | The admin user entered in the main configuration screen does not exist on the remote machine. | Setup the host properly so that name service provides the name properly to the remote machine (or create the user locally). |
| Directory exists | Berkeley DB spool directory already exists! | Check the remote host for existing Berkeley DB installations. Remove the existing directory. |
| Wrong FS type | Specified Berkeley DB spool directory is on a local file system. | Go back to the spooling configuration screen and choose a proper local directory. |
| Unknown error | Unknown error has occurred. | Try again (press Install button one more time). If reoccurring, ignore and try to install anyway. |
| Reachable | Validation did not discover any issues for this remote host. | |
| Canceled | User canceled further host validation. |
Installation States
When the installation is started the host list with the chosen components is transformed to a task list. The task list is better suited to handle dependencies. These are the states one may encounter during the installation.
| State | Description |
|---|---|
| Waiting | Task is waiting to be executed. |
| Processing | Temporary state. Task is being processed. |
| Timeout | Task did not finish before timeout value has been reached. |
| Success | Task finished successfully. |
| Failed | Task finished unsuccessfully. Click the Log button to get more information. |
| Failed due to dependency | Task was not started, because it depended on a task that failed. Click the Log button to get more information. |
| Component already exists | Task was not started. The installation detected a previous conflicting component installation. Click the Log button to get more information. Remove any remains of the old installation, before trying again. |
| Canceled | User canceled the installation process. |
Tweaking start_gui_installer
The start_gui_installer command will start the Java™ GUI installer. This section describes the command-line options of start_gui_installer, that you might use to affect the performance of the installer in your environment or possibly use as a workaround for yet unknown issues.
The Help text can be invoked by calling the -help option.
master:/sge62u2 # ./start_gui_installer -help
Usage: start_gui_installer [-help] [-resolve_pool=<num>] [-resolve_timeout=<sec>]
[-install_pool=<num>] [-install_timeout=<sec>] [-connect_user=<usr>]
[-connect_mode=windows]
<num> ... decimal number greater than zero
<sec> ... number of seconds, must be greater then zero
<usr> ... user id
If no parameter is specified, the start_gui_installer command is started as if the following command was called:
master:/sge62u2 # ./start_gui_installer -resolve_pool=12 -resolve_timeout=20 -install_pool=8 -install_timeout=120
Every installation generates installation logs in the sge_root/sge_cell/install_logs directory. In addition, a GUI log file is created in a $TEMP directory (usually /var/tmp or /tmp) named SGE_Gui-Installer_Log_<date>.txt.
Description of start_gui_installer Options
| Option | Description |
|---|---|
| -help | displays help for start_gui_installer |
| -resolve_pool=<num> | Defaults to 12. Defines how many hosts can be resolved in parallel when adding new hosts, refreshing their states or when validating hosts. The higher the value the higher load will be generated when resolving hosts, refreshing host states, copying an installation script to remote hosts or validating hosts. |
| -resolve_timeout=<sec> | Defaults to 20 seconds. A timeout value for any operation in a resolve_pool (resolving hosts, refreshing host states, copying an installation script to remote host). Host validation has a timeout which is always equal to 2*resolve_timeout value. Increase the default value if you see hosts with Unreachable state and you are sure that password-less access is working correctly for the connect_user. |
| -install_pool=<num> | Defaults to 8. Defines how many execution daemons can be installed in parallel. The higher the value the higher load will be generated when performing installation tasks. |
| -install_timeout=<sec> | Defaults to 120 seconds. A timeout value for any installation task. Increase the default value if you see that the installation tasks are failing with a Timeout state. |
| -connect_user=<user> | Defaults to current user. User name that will be used when connecting to remote hosts. |
| -connect_mode=windows | When set, each connect_user is prefixed by a host domain (see examples below). This is useful when installing multiple windowd execution hosts that require a different connect_user. |
| -debug | Starts the installer in a debug mode. Prints a lot of output to the terminal. Intended for developer purposes, but may provide additional information when unexpected circumstances occur. |
Using start_gui_installer Options
Installing as a different connect_user
Suppose that you cannot log in as the root user, but can log in as another privileged user with uid=0, called admin. In this case an attempt for a remote connection would be done as current user, but due to uid=0 we would connect as root if root is the primary user with uid=0 on the remote host. Users admin and root would have different home directories and we assume that the password-less access was setup only for the user admin, so the connection without a password as currect user would fail. Invoking the following command will enforce that every remote connection is established as the admin user.
master:/sge62u2 # ./start_gui_installer -connect_user=admin
Installing single Windows execution host
Suppose you want to use the installer to add a single windows execution hosts to the existing cluster. The host is called win-01 and belongs to the WIN-01 domain. Also, the privileged user in this case is Admin (part of Administrators group).
The windows hosts can only be installed remotely from a UNIX/LINUX system and you cannot become an Admin user there. So you might use -connect_user=WIN01+Admin to connect as the correct user directly.
master:/sge62u2 # ./start_gui_installer -connect_user=WIN01+Admin
Installing multiple Windows execution hosts
Suppose you have additional hosts win-02 belonging to the WIN-02 domain and win_vista-01 belonging to the WIN_VISTA-01 domain. All hosts have Administrator user privileges. In this case, you can use the following command to start the GUI installer that will allow you to install all the three Windows execution hosts simultaneously.
master:/sge62u2 # ./start_gui_installer -connect_user=Administrator -connect_mode=windows
Every remote connection to host win-01 would be done as WIN-01+Administrator user.
Every remote connection to host win-02 would be done as WIN-02+Administrator user.
Every remote connection to host win_vista-01 would be done as WIN_VISTA-01+Administrator user.
Troubleshooting the GUI Installer
You will find the known issues and their workarounds in this section as well as additional answers to some frequently asked questions.
FAQs
Known issues and workarounds
Installing the Software From the Command Line
Note
|
Installation Overview
| Note The instructions in this section are for a new Grid Engine system only. For instructions on how to install a new system with additional security protection, see Installing the Increased Security Features. For instructions on how to upgrade an existing installation of an earlier version of the Grid Engine software, see Upgrading From a Previous Release of the Software. |
Full installation includes the following tasks:
- Running an installation script on the master host and on every execution host in the Grid Engine system
- Registering information about administration hosts and submit hosts
Performing an Installation
The following sections describe how to install all the components of the Grid Engine system, including the master, execution, administration, and submit hosts. If you need to install the system with enhanced security, see Installing the Increased Security Features before you continue installation. For more information about installing Grid Engine SMF services see Installing the SMF Services before you start the installation.
| Topic | Description |
|---|---|
| How to Install the Master Host (Example Master Host Installation) | Procedure for installing the master host. |
| How to Install Execution Hosts (Example Execution Host Installation) | Procedure for installing the execution host. |
| How to Register Administration Hosts | Procedure for registering an administration host. |
| How to Register Submit Hosts | Procedure for registering a submit host. |
| How to Install the Berkeley DB Spooling Server (Example Berkeley DB Spooling Server Installation) | Procedure for installing the necessary software for Berkeley DB spooling. |
How to Install the Master Host
The master host installation procedure creates the appropriate directory hierarchy that the master daemon requires and starts the Grid Engine master daemon sge_qmaster on the master host. The master host is also registered as a host with administrative and submit permission. The installation procedure creates a default configuration for the system on which it is run. The installation script queries the system for the type of operating system. The script then makes meaningful settings based on this information.
If, at any time during the installation, you think something went wrong, you can quit the installation procedure and restart it.
Before You Begin
- Extract the Grid Engine software, as described in Loading the Distribution Files on a Workstation.
- If you have decided to use an administrative user, as described in User Account Considerations, you should create that user before installing the master host.

Note
Windows hosts cannot act as master hosts.
Steps
- Log in to the master host as root.
- If the $SGE_ROOT environment variable is not set, set it by typing:
# SGE_ROOT=<path_to_installation_directory (the directory MUST contain all SGE files such as SGE binaries)>; export SGE_ROOT
To confirm that you have set the $SGE_ROOT environment variable, type:
# echo $SGE_ROOT
- Go to the installation directory.
- If the directory where the installation files reside is visible from the master host, change directory (cd) to the installation directory sge-root, and then proceed to the next step.
- If the directory is not visible and cannot be made visible, do the following:
- Create a local installation directory, sge-root, on the master host.
- Copy the installation files to the local installation directory sge-root across the network, for example, by using ftp or rcp.
- Change directory (cd) to the local sge-root directory.
- Type the install_qmaster command, adding the -csp flag if you are installing using the Certificate Security Protocol method described in Installing the Increased Security Features.
This command starts the master host installation procedure. You are asked several questions, and you might be required to run some administrative actions.
For a complete installation example, see Example Master Host Installation.# ./install_qmaster Welcome to the Grid Engine installation --------------------------------------- Grid Engine qmaster host installation ------------------------------------- . . . The qmaster installation procedure will take approximately 5-10 minutes. Hit <RETURN> to continue >> - Choose an administrative account owner.
See Step 5 in the Example Master Host Installation.
- Verify the $SGE_ROOT directory setting.
In the example shown Step 6 of the Example Master Host Installation, the value of $SGE_ROOT is /opt/sge62.
- Set up the TCP/IP services for the Grid Engine software.
See Step 7 in the Example Master Host Installation.
If TCP/IP services have not been configured, you will be notified. To configure TCP/IP services:
- Start a new terminal session or window to add the information /etc/services file or your NIS maps.
- Add the correct ports to the /etc/services file or your NIS services map, as described in Network Services.
The following example adds entries for both sge_qmaster and sge_execd to your /etc/services file.... sge_qmaster 6444/tcp sge_execd 6445/tcp
- Save your changes and return to the window where the installation script is running.
- Start a new terminal session or window to add the information /etc/services file or your NIS maps.
- Type the name of your cell or accept the default cell name.
See Step 8 in the Example Master Host Installation.
The use of Grid Engine system cells is described in Cells.- If you have decided to use cells, type the cell name now.
- If you have decided not to use cells, press the Return key.
- Set up a unique cluster name.
See Step 9 in the Example Master Host Installation.
For more information, see Cluster Name.- To accept the default cluster name, press the Return key.
- To enter a new cluster name, type the cluster name and press the Return key.
- Specify a spool directory.
See Step 10 in the Example Master Host Installation.
For guidelines on disk space requirements for the spool directory, see Disk Space Requirements. For information on where spool directory is installed, see Spool Directories Under the Root Directory.- To accept the default spool directory, press the Return key.
- If you want to use a different spool directory, then answer y to the prompt and provide a complete path name to the directory.
- Specify whether you plan to use Windows-based execution hosts.
See Step 11 in the Example Master Host Installation.- If you do not plan to use Windows support, answer No.
- If you want Windows support, answer Yes. You will be asked some Windows-specific questions later in the installation process. These questions will be marked as WINDOWS-ONLY.
- Verify or set the correct file permissions.
See Step 12 in the Example Master Host Installation.- If you used pkgadd or you know that the file permissions are correct, answer y to accept the current permissions.
- Answer n if you need to verify or change the file permissions.
- WINDOWS ONLY – If you specified that you wanted Windows Execution Host support in the previous question, you should let the script set the file permissions for you.
- Specify whether all Grid Engine hosts for this cluster are located in a single DNS domain.
See Step 13 in the Example Master Host Installation.- If all of your Grid Engine system hosts are located in a single DNS domain, then answer y. Grid Engine will not care if domain information is supplied. hostA and hostA.foo.com are equivalent.
- If all of your Grid Engine system hosts are not located in a single DNS domain, then answer n. You will be asked to configure a default domain to use in case a host is specified without domain information.
- Watch while Grid Engine creates directories according to the information that you provided so far.
See Step 14 in the Example Master Host Installation.
- Specify whether you want to use classic spooling or Berkeley DB.
See Step 15 in the Example Master Host Installation. By default, Grid Engine uses Berkeley Database spooling.
For more information on how to determine the type of spooling mechanism you want, please see Choosing Between Classic Spooling and Database Spooling.
- If you choose Berkeley DB spooling, you are asked to choose whether to use a local directory or a Berkeley DB Spooling Server.

Tip
To use a shadow master host for increased availability of the database, use the Berkeley DB Spooling Server.- To use a Berkeley DB spooling server, enter y. To install the Berkeley DB Spooling Server:
- Start a new terminal session or window and install the software, as described in How to Register Submit Hosts.
- After you have installed the software on the spooling server, return to the master installation window, and press the Return key.
- Type the name of the spooling server.
In Step 15 of the Example Master Host Installation, vector is the host name of the spooling server. - Type the name of the spooling directory.
In Step 15 of the Example Master Host Installation, /opt/sge62/default/spool/spooldb is the spooling directory.
- Start a new terminal session or window and install the software, as described in How to Register Submit Hosts.
- If you do not want to use a Berkeley DB spooling server, type n. You are asked to provide the complete path to the database directory. If the directory does not exist, it is created.
- To use a Berkeley DB spooling server, enter y. To install the Berkeley DB Spooling Server:
- To specify classic spooling, type classic.
- If you choose Berkeley DB spooling, you are asked to choose whether to use a local directory or a Berkeley DB Spooling Server.
- Type a range of IDs that will be assigned dynamically for jobs.
See Step 16 in the Example Master Host Installation.
For more information, see Group IDs.
- Verify the spooling directory for the execution daemon.
See Step 17 in the Example Master Host Installation.
The Grid Engine administrator must have access to create and write into this directory. For information on spooling, see Spool Directories Under the Root Directory.
- Type the email address of the user who should receive problem reports.
See Step 18 in the Example Master Host Installation. In the example, the user who will receive problem reports is me@my.domain.
- Verify the configuration parameters.
See Step 19 in the Example Master Host Installation.- If configuration parameters are correct, Grid Engine proceeds to create the local configuration.
- If configuration parameters are not correct, type y to change them.
- Specify whether you want the daemons to start when the system is booted.
See Step 20 in the Example Master Host Installation.
- WINDOWS-ONLY – If you specified that you want Windows support, you are asked to create Certificate Security Protocol (CSP) certificates.
Even if the system is not running in CSP mode, it is necessary to create certain CSP certificates for Windows support. These certificates are automatically generated during the master host installation. For instructions on how to transfer these certificates to the Windows execution hosts, see Step 6 of How to Install a CSP-Secured System.
- WINDOWS-ONLY – Add the Windows Administrator name to the Grid Engine manager list.
- Identify the hosts that you will later install as execution hosts.
See Step 23 in the Example Master Host Installation.
Tip
You can list hosts individually, separated by a blank space, or you can supply a file that contains host names.
Note
You can use the master host for executing jobs. To do so, you must carry out the execution host installation for the master machine. However, if you use a very slow machine as master host, or if your cluster is significantly large, do not use the master host as an execution host. - Select a scheduler profile.
See Step 24 in the Example Master Host Installation.
For information on how to determine which profile you should use, see Scheduler Profiles.
Once you answer this question, the installation process is complete. Several screens of information will be displayed before the script exits.
- WINDOWS-ONLY – Copy the certificate files to the Windows execution hosts.
You can use a script to perform this function.
Tip
To use this functionality without being asked for a password, the root user should use rsh or ssh to access the execution hosts. - Create the environment variables ($SGE_ROOT and $SGE_CELL) for use with the Grid Engine software.
See Step 26 in the Example Master Host Installation.
Note
If no cell name was specified during installation, the value of cell is default.- If you are using a C shell, type the following command:
% source $SGE_ROOT/$SGE_CELL/common/settings.csh
- If you are using a Bourne shell or Korn shell, type the following command:
$ . $SGE_ROOT/$SGE_CELL/common/settings.sh
- If you are using a C shell, type the following command:
| See Also For details about how you can verify that the execution host has been set up correctly, see [How to Verify That the Daemons Are Running on the Master Host]. |
Example Master Host Installation
The following example shows a complete Sun Grid Engine master host installation. Remember that this is only one step in the entire Sun Grid Engine installation process. The steps in this example coordinate with the master host installation description at How to Install the Master Host.
001 % su - 002 # cd sge-install-dir 003 # ./install_qmaster 004 Grid Engine License is displayed. 005 006 Do you agree with that license? (y/n) [n] >> 007 008 Welcome to the Grid Engine installation 009 --------------------------------------- 010 011 Grid Engine qmaster host installation 012 ------------------------------------- 013 014 Before you continue with the installation please read these hints: 015 016 - Your terminal window should have a size of at least 017 80x24 characters 018 019 - The INTR character is often bound to the key Ctrl-C. 020 The term >Ctrl-C< is used during the installation if you 021 have the possibility to abort the installation 022 023 The qmaster installation procedure will take approximately 5-10 minutes. 024 025 Hit <RETURN> to continue >> 026
Step 5
027 Grid Engine admin user account 028 ------------------------------ 029 030 The current directory 031 032 /opt/sge62 033 034 is owned by user 035 036 myusername 037 038 If user >root< does not have write permissions in this directory on *all* 039 of the machines where Grid Engine will be installed (NFS partitions not 040 exported for user >root< with read/write permissions) it is recommended to 041 install Grid Engine that all spool files will be created under the user id 042 of user >myusername<. 043 044 IMPORTANT NOTE: The daemons still have to be started by user >root<. 045 046 Do you want to install Grid Engine as admin user >myusername< (y/n) [y] >> 047 048 Installing Grid Engine as admin user >myusername< 049 050 Hit <RETURN> to continue >> 051 Choosing Grid Engine admin user account 052 --------------------------------------- 053 054 You may install Grid Engine that all files are created with the user id of an 055 unprivileged user. 056 057 This will make it possible to install and run Grid Engine in directories 058 where user >root< has no permissions to create and write files and directories. 059 060 - Grid Engine still has to be started by user >root< 061 062 - This directory should be owned by the Grid Engine administrator 063 064 Do you want to install Grid Engine 065 under an user id other than >root< (y/n) [y] >> y 066 067 Choosing a Grid Engine admin user name 068 -------------------------------------- 069 070 Please enter a valid user name >> sgeadmin 071 072 Installing Grid Engine as admin user >sgeadmin< 073 074 Hit <RETURN> to continue >> 075
Step 6
076 Checking $SGE_ROOT directory 077 ---------------------------- 078 079 The Grid Engine root directory is: 080 081 $SGE_ROOT = /opt/sge62 082 083 If this directory is not correct (e.g. it may contain an automounter 084 prefix) enter the correct path to this directory or hit <RETURN> 085 to use default [/opt/sge62] >> 086 087 Your $SGE_ROOT directory: /opt/sge62 088 089 Hit <RETURN> to continue >> 090
Step 7
Two actions – one for qmaster, one for execd
091 Grid Engine TCP/IP communication service 092 ---------------------------------------- 093 094 The port for sge_qmaster is currently set by the shell environment. 095 096 SGE_QMASTER_PORT = 10500 097 098 Now you have the possibility to set/change the communication ports by using the 099 >shell environment< or you may configure it via a network service, configured 100 in local >/etc/services<, >NIS< or >NIS+<, adding an entry in the form 101 102 sge_qmaster <port_number>/tcp 103 104 to your services database and make sure to use an unused port number. 105 106 How do you want to configure the Grid Engine communication ports? 107 108 Using the >shell environment<: [1] 109 110 Using a network service like >/etc/services<, >NIS/NIS+<: [2] 111 112 (default: 1) >> 1 113 114 Using the environment variable 115 116 $SGE_QMASTER_PORT=10500 117 118 as port for communication. 119 120 Hit <RETURN> to continue >> 121 122 Grid Engine TCP/IP communication service 123 ---------------------------------------- 124 125 The port for sge_execd is currently set by the shell environment. 126 127 SGE_EXECD_PORT = 10501 128 129 Now you have the possibility to set/change the communication ports by using the 130 >shell environment< or you may configure it via a network service, configured 131 in local >/etc/services<, >NIS< or >NIS+<, adding an entry in the form 132 133 sge_execd <port_number>/tcp 134 135 to your services database and make sure to use an unused port number. 136 137 How do you want to configure the Grid Engine communication ports? 138 139 Using the >shell environment<: [1] 140 141 Using a network service like >/etc/services<, >NIS/NIS+<: [2] 142 143 (default: 1) >> 1 144 145 Using the environment variable 146 147 $SGE_EXECD_PORT=10501 148 149 as port for communication. 150 151 Hit <RETURN> to continue >>
Step 8
152 Grid Engine cells 153 ----------------- 154 155 Grid Engine supports multiple cells. 156 157 If you are not planning to run multiple Grid Engine clusters or if you don't 158 know yet what is a Grid Engine cell it is safe to keep the default cell name 159 160 default 161 162 If you want to install multiple cells you can enter a cell name now. 163 164 The environment variable 165 166 $SGE_CELL=<your_cell_name> 167 168 will be set for all further Grid Engine commands. 169 170 Enter cell name [default] >> 171 172 Using cell >default<. 173 Hit <RETURN> to continue >> 174
Step 9
175 Unique cluster name 176 ------------------- 177 178 The cluster name uniquely identifies a specific Sun Grid Engine cluster. 179 The cluster name must be unique throughout your organization. The name 180 is not related to the SGE cell. 181 182 The cluster name must start with a letter ([A-Za-z]), followed by letters, 183 digits ([0-9]), dashes (-) or underscores (_). 184 185 Enter new cluster name or hit <RETURN> 186 to use default [p10500] >> 187 188 Your $SGE_CLUSTER_NAME: p10500 189 190 Hit <RETURN> to continue >>
Step 10
191 Grid Engine qmaster spool directory 192 ----------------------------------- 193 194 The qmaster spool directory is the place where the qmaster daemon stores 195 the configuration and the state of the queuing system. 196 197 The admin user >myusername< must have read/write access 198 to the qmaster spool directory. 199 200 If you will install shadow master hosts or if you want to be able to start 201 the qmaster daemon on other hosts (see the corresponding section in the 202 Grid Engine Installation and Administration Manual for details) the account 203 on the shadow master hosts also needs read/write access to this directory. 204 205 The following directory 206 207 [/opt/sge62/default/spool/qmaster] 208 209 will be used as qmaster spool directory by default! 210 211 Do you want to select another qmaster spool directory (y/n) [n] >> 212
Step 11
213 Windows Execution Host Support 214 ------------------------------ 215 216 Are you going to install Windows Execution Hosts? (y/n) [n] 217
Step 12
218 Verifying and setting file permissions 219 -------------------------------------- 220 221 Did you install this version with >pkgadd< or did you already 222 verify and set the file permissions of your distribution (y/n) [y] >> 223 224 Verifying and setting file permissions 225 -------------------------------------- 226 227 We may now verify and set the file permissions of your Grid Engine 228 distribution. 229 230 This may be useful since due to unpacking and copying of your distribution 231 your files may be unaccessible to other users. 232 233 We will set the permissions of directories and binaries to 234 235 755 - that means executable are accessible for the world 236 237 and for ordinary files to 238 239 644 - that means readable for the world 240 241 Do you want to verify and set your file permissions (y/n) [y] >> 242 243 Verifying and setting file permissions and owner in >3rd_party< 244 Verifying and setting file permissions and owner in >bin< 245 Verifying and setting file permissions and owner in >ckpt< 246 Verifying and setting file permissions and owner in >examples< 247 Verifying and setting file permissions and owner in >inst_sge< 248 Verifying and setting file permissions and owner in >install_execd< 249 Verifying and setting file permissions and owner in >install_qmaster< 250 Verifying and setting file permissions and owner in >lib< 251 Verifying and setting file permissions and owner in >mpi< 252 Verifying and setting file permissions and owner in >pvm< 253 Verifying and setting file permissions and owner in >qmon< 254 Verifying and setting file permissions and owner in >util< 255 Verifying and setting file permissions and owner in >utilbin< 256 Verifying and setting file permissions and owner in >catman< 257 Verifying and setting file permissions and owner in >doc< 258 Verifying and setting file permissions and owner in >include< 259 Verifying and setting file permissions and owner in >man< 260 261 Your file permissions were set 262 263 Hit <RETURN> to continue >> 264
Step 13
265 Select default Grid Engine hostname resolving method 266 ---------------------------------------------------- 267 268 Are all hosts of your cluster in one DNS domain? If this is 269 the case the hostnames 270 271 >hostA< and >hostA.foo.com< 272 273 would be treated as equal, because the DNS domain name >foo.com< 274 is ignored when comparing hostnames. 275 276 Are all hosts of your cluster in a single DNS domain (y/n) [y] >> 277 278 Ignoring domainname when comparing hostnames. 279 280 Hit <RETURN> to continue >> 281
Step 14
282 Making directories 283 ------------------ 284 285 creating directory: /opt/sge62/default/spool/qmaster 286 creating directory: /opt/sge62/default/spool/qmaster/job_scripts 287 Hit <RETURN> to continue >> 288
Step 15
289 Setup spooling 290 -------------- 291 Your SGE binaries are compiled to link the spooling libraries 292 during runtime (dynamically). So you can choose between Berkeley DB 293 spooling and Classic spooling method. 294 Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> 295 296 The Berkeley DB spooling method provides two configurations! 297 298 1) Local spooling: 299 The Berkeley DB spools into a local directory on this host (qmaster host) 300 This setup is faster, but you can't setup a shadow master host 301 302 2) Berkeley DB Spooling Server: 303 If you want to setup a shadow master host, you need to use 304 Berkeley DB Spooling Server! 305 In this case you have to choose a host with a configured RPC service. 306 The qmaster host connects via RPC to the Berkeley DB. This setup is more 307 failsafe, but results in a clear potential security hole. RPC communication 308 (as used by Berkeley DB) can be easily compromised. Please only use this 309 alternative if your site is secure or if you are not concerned about 310 security. Check the installation guide for further advice on how to achieve 311 failsafety without compromising security. 312 313 Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >> y 314 315 Berkeley DB Setup 316 317 ----------------- 318 Please, log in to your Berkeley DB spooling host and execute "inst_sge -db" 319 Please do not continue, before the Berkeley DB installation with 320 "inst_sge -db" is completed, continue with <RETURN> 321 322 Berkeley Database spooling parameters 323 ------------------------------------- 324 325 Please enter the name of your Berkeley DB Spooling Server! >> vector 326 327 328 Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >> 329 330 Hit <RETURN> to continue >> 331 332 Berkeley Database spooling parameters 333 ------------------------------------- 334 335 Please enter the Database Directory now, even if you want to spool locally, 336 it is necessary to enter this Database Directory. 337 338 Default: [/opt/sge62/default/spool/spooldb] >> /tmp/dom/spooldb 339 340 Dumping bootstrapping information 341 Initializing spooling database 342 343 Hit <RETURN> to continue >>
Step 16
344 Grid Engine group id range 345 -------------------------- 346 347 When jobs are started under the control of Grid Engine an additional group id 348 is set on platforms which do not support jobs. This is done to provide maximum 349 control for Grid Engine jobs. 350 351 This additional UNIX group id range must be unused group id's in your system. 352 Each job will be assigned a unique id during the time it is running. 353 Therefore you need to provide a range of id's which will be assigned 354 dynamically for jobs. 355 356 The range must be big enough to provide enough numbers for the maximum number 357 of Grid Engine jobs running at a single moment on a single host. E.g. a range 358 like >20000-20100< means, that Grid Engine will use the group ids from 359 20000-20100 and provides a range for 100 Grid Engine jobs at the same time 360 on a single host. 361 362 You can change at any time the group id range in your cluster configuration. 363 364 Please enter a range >> 20000-20100 365 366 Using >20000-20100< as gid range. Hit <RETURN> to continue >> 367
Step 17
368 Grid Engine cluster configuration 369 --------------------------------- 370 371 Please give the basic configuration parameters of your Grid Engine 372 installation: 373 374 <execd_spool_dir> 375 376 The pathname of the spool directory of the execution hosts. User >myusername< 377 must have the right to create this directory and to write into it. 378 379 Default: [/opt/sge62/default/spool] >> 380
Step 18
381 Grid Engine cluster configuration (continued) 382 --------------------------------------------- 383 <administator_mail> 384 385 The email address of the administrator to whom problem reports are sent. 386 387 It is recommended to configure this parameter. You may use >none< 388 if you do not wish to receive administrator mail. 389 390 Please enter an email address in the form >user@foo.com<. 391 392 Default: [none] >> me@my.domain 393
Step 19
394 The following parameters for the cluster configuration were configured: 395 396 execd_spool_dir /opt/sge62/default/spool 397 administrator_mail me@my.domain 398 399 Do you want to change the configuration parameters (y/n) [n] >> n 400 401 Creating local configuration 402 ---------------------------- 403 Creating >act_qmaster< file 404 Adding default complex attributes 405 Adding SGE default usersets 406 Adding >sge_aliases< path aliases file 407 Adding >qtask< qtcsh sample default request file 408 Adding >sge_request< default submit options file 409 Creating >sgemaster< script 410 Creating >sgeexecd< script 411 Creating settings files for >.profile/.cshrc< 412 413 Hit <RETURN> to continue >> 414
Step 20
415 qmaster startup script 416 ---------------------- 417 418 Do you want to start qmaster automatically at machine boot? 419 NOTE: If you select "n" SMF will be not used at all! (y/n) [y] >> 420 421 422 Hit <RETURN> to continue >> 423 424 Grid Engine qmaster startup 425 --------------------------- 426 427 Starting qmaster daemon. Please wait ... 428 Hit <RETURN> to continue >>
Step 23
429 Adding Grid Engine hosts 430 ------------------------ 431 432 Please now add the list of hosts, where you will later install your execution 433 daemons. These hosts will be also added as valid submit hosts. 434 435 Please enter a blank separated list of your execution hosts. You may 436 press <RETURN> if the line is getting too long. Once you are finished 437 simply press <RETURN> without entering a name. 438 439 You also may prepare a file with the hostnames of the machines where you plan 440 to install Grid Engine. This may be convenient if you are installing Grid 441 Engine on many hosts. 442 443 Do you want to use a file which contains the list of hosts (y/n) [n] >> n 444 445 Adding admin and submit hosts 446 ----------------------------- 447 448 Please enter a blank seperated list of hosts. 449 450 Stop by entering <RETURN>. You may repeat this step until you are 451 entering an empty list. You will see messages from Grid Engine 452 when the hosts are added. 453 454 Host(s): host1 host2 host3 host4 455 456 host1 added to administrative host list 457 host1 added to submit host list 458 host2 added to administrative host list 459 host2 added to submit host list 460 host3 added to administrative host list 461 host3 added to submit host list 462 host4 added to administrative host list 463 host4 added to submit host list 464 Hit <RETURN> to continue >> 465 466 Adding admin and submit hosts 467 ----------------------------- 468 469 Please enter a blank seperated list of hosts. 470 471 Stop by entering <RETURN>. You may repeat this step until you are 472 entering an empty list. You will see messages from Grid Engine 473 when the hosts are added. 474 475 Host(s): 476 Finished adding hosts. Hit <RETURN> to continue >> 477 478 If you want to use a shadow host, it is recommended to add this host 479 to the list of administrative hosts. 480 481 If you are not sure, it is also possible to add or remove hosts after the 482 installation with <qconf -ah hostname> for adding and <qconf -dh hostname> 483 for removing this host 484 485 Attention: This is not the shadow host installation 486 procedure. 487 You still have to install the shadow host separately 488 489 Do you want to add your shadow host(s) now? (y/n) [y] >> 490 491 Adding Grid Engine shadow hosts 492 ------------------------------- 493 494 Please now add the list of hosts, where you will later install your shadow 495 daemon. 496 497 Please enter a blank separated list of your execution hosts. You may 498 press <RETURN> if the line is getting too long. Once you are finished 499 simply press <RETURN> without entering a name. 500 501 You also may prepare a file with the hostnames of the machines where you plan 502 to install Grid Engine. This may be convenient if you are installing Grid 503 Engine on many hosts. 504 505 Do you want to use a file which contains the list of hosts (y/n) [n] >> 506 507 Adding admin hosts 508 ------------------ 509 510 Please enter a blank seperated list of hosts. 511 512 Stop by entering <RETURN>. You may repeat this step until you are 513 entering an empty list. You will see messages from Grid Engine 514 when the hosts are added. 515 516 Host(s): es-ergb01-01 517 adminhost "es-ergb01-01" already exists 518 Hit <RETURN> to continue >> 519 520 Please enter a blank seperated list of hosts. 521 522 Stop by entering <RETURN>. You may repeat this step until you are 523 entering an empty list. You will see messages from Grid Engine 524 when the hosts are added. 525 526 Host(s): 527 Finished adding hosts. Hit <RETURN> to continue >> 528 529 Creating the default <all.q> queue and <allhosts> hostgroup 530 ----------------------------------------------------------- 531 532 root@myhost added "@allhosts" to host group list 533 root@myhost added "all.q" to cluster queue list 534 535 Hit <RETURN> to continue >> 536 537 No CSP system installed! 538 No CSP system installed!
Step 24
539 Scheduler Tuning 540 ---------------- 541 The details on the different options are described in the manual. 542 543 Configurations 544 -------------- 545 1) Normal 546 Fixed interval scheduling, report scheduling information, 547 actual + assumed load 548 549 2) High 550 Fixed interval scheduling, report limited scheduling information, 551 actual load 552 553 3) Max 554 Immediate Scheduling, report no scheduling information, 555 actual load 556 557 Enter the number of your preferred configuration and hit <RETURN>! 558 Default configuration is [1] >> 559 560 561 We're configuring the scheduler with >Normal< settings! 562 Do you agree? (y/n) [y] >> 563 564 changed scheduler configuration
Step 26
565 Using Grid Engine 566 ----------------- 567 568 You should now enter the command: 569 570 source /scratch2/myusername/sge62/default/common/settings.csh 571 572 if you are a csh/tcsh user or 573 574 # . /scratch2/myusername/sge62/default/common/settings.sh 575 576 if you are a sh/ksh user. 577 578 This will set or expand the following environment variables: 579 580 - $SGE_ROOT (always necessary) 581 - $SGE_CELL (if you are using a cell other than >default<) 582 - $SGE_CLUSTER_NAME (always necessary) 583 - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<) 584 - $SGE_EXECD_PORT (if you haven't added the service >sge_execd<) 585 - $PATH/$path (to find the Grid Engine binaries) 586 - $MANPATH (to access the manual pages) 587 588 Hit <RETURN> to see where Grid Engine logs messages >> 589 590 Grid Engine messages 591 -------------------- 592 593 Grid Engine messages can be found at: 594 595 Startup messages can be found in SMF service log files. 596 You can get the name of the log file by calling svcs -l <SERVICE_NAME> 597 E.g.: svcs -l svc:/application/sge/qmaster:p10500 598 599 After startup the daemons log their messages in their spool directories. 600 601 Qmaster: /scratch2/myusername/sge62/default/spool/qmaster/messages 602 Exec daemon: <execd_spool_dir>/<hostname>/messages 603 604 605 Grid Engine startup scripts 606 --------------------------- 607 608 Grid Engine startup scripts can be found at: 609 610 /scratch2/myusername/sge62/default/common/sgemaster (qmaster) 611 /scratch2/myusername/sge62/default/common/sgeexecd (execd) 612 613 Do you want to see previous screen about using Grid Engine again (y/n) [n] >> 614 615 Your Grid Engine qmaster installation is now completed 616 ------------------------------------------------------ 617 618 Please now login to all hosts where you want to run an execution daemon 619 and start the execution host installation procedure. 620 621 If you want to run an execution daemon on this host, please do not forget 622 to make the execution host installation in this host as well. 623 624 All execution hosts must be administrative hosts during the installation. 625 All hosts which you added to the list of administrative hosts during this 626 installation procedure can now be installed. 627 628 You may verify your administrative hosts with the command 629 630 # qconf -sh 631 632 and you may add new administrative hosts with the command 633 634 # qconf -ah <hostname> 635 636 Please hit <RETURN> >> 637 638 sge_qmaster successfully installed!
How to Install Execution Hosts
The execution host installation procedure creates the appropriate directory hierarchy required by sge_execd, and starts the sge_execd daemon on the execution host. This section describes how to install execution hosts interactively from the command line. You can automate the installation of execution of multiple hosts by using the procedure described in Automating the Installation Process.
Before You Begin
Before installing an execution host, you first need to install the master server as described in How to Install the Master Host and share the common directory.
| Windows-Only You must satisfy several prerequisites before you can install Grid Engine execution hosts with Windows operating systems.
|
Steps
- Log in to the execution host as root.
- As you did for the master installation, either copy the installation files to a local installation directory sge-root or use a network installation directory.
- If the $SGE_ROOT environment variable is not set, set it by typing:
# SGE_ROOT=<path_to_install/unpacked_directory>; export SGE_ROOT
To confirm that you have set the $SGE_ROOT environment variable, type:
# echo $SGE_ROOT
- Change directory (cd) to the installation directory, sge-root.
- Verify that the execution host has been declared on the administration host.
- If you do not see the name of this execution host in the output of the qconf -sh command, you will need to declare it as an administration host.
- Start a new terminal session or window.
- In that window, log into the master host.
- Declare the execution host as an administration host, using the qconf command.
# qconf -ah quark quark added to administrative host list
- Log back out of the master host, and continue with the installation of the execution host.
- If you do not see the name of this execution host in the output of the qconf -sh command, you will need to declare it as an administration host.
- Type the install_execd command, adding the -csp flag if you are installing using the Certificate Security Protocol method described in Installing the Increased Security Features.
This command starts the execution host installation procedure.
For a complete installation example, see Example Execution Host Installation.# ./install_execd Welcome to the Grid Engine execution host installation ------------------------------------------------------ . . . The execution host installation will take approximately 5 minutes. Hit <RETURN> to continue >> - Verify the $SGE_ROOT directory setting.
In the example shown in lines 27 through 41 of the Example Execution Host Installation, the value of $SGE_ROOT is /scratch2/myusername/sge62.
- Type the name of your cell or accept the default cell name.
See lines 042 through 076 of the Example Execution Host Installation.
The use of Grid Engine system cells is described in Cells.- If you have decided to use cells, then type the cell names now.
- If you have decided not to use cells, then press the Return key.
- The install script checks to see what ports have been defined for the execution daemon.
See lines 077 through 085 of the Example Execution Host Installation.
If no ports have been defined, you will be asked to define them.
- The install script checks to see whether the admin user already exists.
If the admin user already exists, the script continues uninterrupted. If the admin user does not exist, the script shows the following screen where you must supply a password for the admin user. After the admin user is created, press the Return key.Local Admin User ---------------- The local admin user sgeadmin, does not exist! The script tries to create the admin user. Please enter a password for your admin user >>Creating admin user sgeadmin, now ... Admin user created, hit <ENTER> to continue! - Verify the execution host has been declared as an administration host.
See lines 086 through 092 of the Example Execution Host Installation.
- Specify whether you want to use a local spool directory.
See lines 093 through 122 of the Example Execution Host Installation.
For information on spooling, see Spool Directories Under the Root Directory.- If you do not want a local spool directory, answer n.
- If you do want a local spool directory, answer y.
In the example, /tmp/dom/execs is used as the local spool directory on domain.com. Choose any directory that meets the disk space requirements described in Disk Space Requirements.
- Specify whether you want execd to start automatically at boot time.
See lines 123 through 131 of the Example Execution Host Installation.
You might not want to install the startup script if you are installing a test cluster or you would rather start the daemon manually on reboot.
- WINDOWS ONLY – Choose whether to display the GUI for Windows jobs.
See lines 132 through 163 of the Example Execution Host Installation.
A Grid Engine Helper Service is included with the Sun Grid Engine distribution. This service enables Windows jobs to display a GUI on the visible desktop of the execution host. The visible desktop is either the desktop of the user currently logged in on the execution host or the desktop of the next user who will log in. It is not the log in screen.
The Helper Service is a independent component loosely coupled with the execution daemon. The startup of the Helper Service is plugged in the Services dialog box in the Windows control panel. You can install only one Helper Service per host. There can be only one execution daemon installed per Helper Server.
The installation script asks during the installation of a execution host whether you want to see the GUI of Windows jobs.
- Specify a queue for this host.
See lines 164 through 183 of the Example Execution Host Installation.
Once you answer this question, the installation process is complete. Several screens of information will be displayed before the script exits.
- Create the environment variables ($SGE_ROOT and $SGE_CELL) for use with the Grid Engine Software.
See lines 184 through 234 of the Example Execution Host Installation.
Note
If no cell name was specified during installation, the value of cell is default.- If you are using a C shell, type the following command:
% source $SGE_ROOT/$SGE_CELL/common/settings.csh
- If you are using a Bourne shell or Korn shell, type the following
command:$ . $SGE_ROOT/$SGE_CELL/common/settings.sh
- If you are using a C shell, type the following command:
| See Also For details about how you can verify that the execution host has been set up correctly, see How to Verify That the Daemons Are Running on the Execution Hosts. |
Example Execution Host Installation
The following example shows a complete Sun Grid Engine execution host installation. Before you install the execution host, you need to first install the master server as described in How to Install the Master Host. The line numbers in this example are referred to from the execution host installation description at How to Install Execution Hosts.
Steps 1-6
001 % su - 002 # qstat -f 003 # ./install_execd 004 005 Welcome to the Grid Engine execution host installation 006 ------------------------------------------------------ 007 008 If you haven't installed the Grid Engine qmaster host yet, you must execute 009 this step (with >install_qmaster<) prior the execution host installation. 010 011 For a sucessful installation you need a running Grid Engine qmaster. It is 012 also necessary that this host is an administrative host. 013 014 You can verify your current list of administrative hosts with 015 the command: 016 017 # qconf -sh 018 019 You can add an administrative host with the command: 020 021 # qconf -ah <hostname> 022 023 The execution host installation will take approximately 5 minutes. 024 025 Hit <RETURN> to continue >> 026
Step 7
027 Checking $SGE_ROOT directory 028 ---------------------------- 029 030 The Grid Engine root directory is: 031 032 $SGE_ROOT = /scratch2/myusername/sge62 033 034 If this directory is not correct (e.g. it may contain an automounter 035 prefix) enter the correct path to this directory or hit <RETURN> 036 to use default [/scratch2/myusername/sge62] >> 037 038 Your $SGE_ROOT directory: /scratch2/myusername/sge62 039 040 Hit <RETURN> to continue >> 041
Step 8
042 Grid Engine cells 043 ----------------- 044 045 Please enter cell name which you used for the qmaster 046 installation or press <RETURN> to use [default] >> 047 048 Using cell: >default< 049 050 Hit <RETURN> to continue >> 051 052 ... set owner of /var/sgeCA/port10500 to bofur+myusername 053 054 ... copy /var/sgeCA/port10500/default/userkeys/root to 055 /var/sgeCA/port10500/default/userkeys/bofur+Administrator 056 cp: /var/sgeCA/port10500/default/userkeys/root: No such file or directory 057 058 ... copy /var/sgeCA/port10500/default/userkeys/root to 059 /var/sgeCA/port10500/default/userkeys/Administrator 060 cp: /var/sgeCA/port10500/default/userkeys/root: No such file or directory 061 062 ... copy /var/sgeCA/port10500/default/userkeys/myusername to 063 /var/sgeCA/port10500/default/userkeys/bofur+myusername 064 065 ... set owner of /var/sgeCA/port10500/default/userkeys/Administrator to Administrator 066 067 ... set owner of /var/sgeCA/port10500/default/userkeys/bofur+Administrator to bofur+Administrator 068 069 ... set owner of /var/sgeCA/port10500/default/userkeys/myusername to myusername 070 071 ... set owner of /var/sgeCA/port10500/default/userkeys/bofur+myusername to bofur+myusername 072 073 ... remove old /var/sgeCA/port10500/default/userkeys/root certificates 074 075 WINDOWS certificates are copied and permissions are set! 076
Step 9
077 Grid Engine TCP/IP communication service
078 ----------------------------------------
079
080 The port for sge_execd is currently set BOTH as service and by the
081 shell environment
082
083 SGE_EXECD_PORT = 10501
084 sge_execd service set to port 725
085
Step 10
If the admin user already exists, the script automatically skips this step. See How to Install Execution Hosts for more information.
Step 11
086 Checking hostname resolving
087 ---------------------------
088
089 This hostname is known at qmaster as an administrative host.
090
091 Hit <RETURN> to continue >>
092
Step 12
093 Local execd spool directory configuration 094 ----------------------------------------- 095 096 During the qmaster installation you've already entered a global 097 execd spool directory. This is used, if no local spool directory is configured. 098 099 Now you can configure a local spool directory for this host. 100 ATTENTION: The local spool directory doesn't have to be located on a local 101 drive. It is specific to the <local> host and can be located on network drives, 102 too. But for performance reasons, spooling to a local drive is recommended. 103 104 FOR WINDOWS USER: On Windows systems the local spool directory MUST be set 105 to a local harddisk directory. 106 Installing an execd without local spool directory makes the host unuseable. 107 Local spooling on local harddisk is mandatory for Windows systems. 108 109 Do you want to configure a local spool directory 110 for this host (y/n) [n] >> y 111 112 Please enter the local spool directory now! >> /tmp/dom/execs 113 Using local execd spool directory [/tmp/dom/execs] 114 Hit <RETURN> to continue >> 115 116 Creating local configuration 117 ---------------------------- 118 myusername@domain.com modified "domain.com" in configuration list 119 Local configuration for host >domain.com< created. 120 121 Hit <RETURN> to continue >> 122
Step 13
123 execd startup script
124 --------------------
125
126 We can install the startup script that will
127 start execd at machine boot (y/n) [y] >> n
128
129
130 Hit <RETURN> to continue >>
131
Step 14
132 Windows Helper Service Installation 133 --------------------------------------- 134 135 If you're going to run Windows job's using GUI support, you have 136 to install the Windows Helper Service 137 Do you want to install the Windows Helper Service? (y/n) [n] >> y 138 139 Testing, if a service is already installed! 140 141 ... a service is already installed! 142 ... stopping service! 143 ... uninstalling old service! 144 Service successfully uninstalled. 145 146 147 ... moving new service binary! 148 ... installing new service! 149 Service successfully installed. 150 151 152 ... starting new service! 153 154 Hit <RETURN> to continue >> 155 156 Grid Engine execution daemon startup 157 ------------------------------------ 158 159 Starting execution daemon. Please wait ... 160 starting sge_execd 161 162 Hit <RETURN> to continue >> 163
Step 15
164 Adding a queue for this host 165 ---------------------------- 166 167 We can now add a queue instance for this host: 168 169 - it is added to the >allhosts< hostgroup 170 - the queue provides 1 slot(s) for jobs in all queues 171 referencing the >allhosts< hostgroup 172 173 You do not need to add this host now, but before running jobs on this host 174 it must be added to at least one queue. 175 176 Do you want to add a default queue instance for this host (y/n) [y] >> 177 178 No modification because "bofur" already exists in "hostlist" of "hostgroup" 179 root@domain.com modified "@allhosts" in host group list 180 root@domain.com modified "all.q" in cluster queue list 181 182 Hit <RETURN> to continue >> 183
Step 16
184 Using Grid Engine 185 ----------------- 186 187 You should now enter the command: 188 189 source /scratch2/myusername/sge62/default/common/settings.csh 190 191 if you are a csh/tcsh user or 192 193 # . /scratch2/myusername/sge62/default/common/settings.sh 194 195 if you are a sh/ksh user. 196 197 This will set or expand the following environment variables: 198 199 - $SGE_ROOT (always necessary) 200 - $SGE_CELL (if you are using a cell other than >default<) 201 - $SGE_CLUSTER_NAME (always necessary) 202 - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<) 203 - $SGE_EXECD_PORT (if you haven't added the service >sge_execd<) 204 - $PATH/$path (to find the Grid Engine binaries) 205 - $MANPATH (to access the manual pages) 206 207 Hit <RETURN> to see where Grid Engine logs messages >> 208 209 Grid Engine messages 210 -------------------- 211 212 Grid Engine messages can be found at: 213 214 /tmp/qmaster_messages (during qmaster startup) 215 /tmp/execd_messages (during execution daemon startup) 216 217 After startup the daemons log their messages in their spool directories. 218 219 Qmaster: /scratch2/myusername/sge62/default/spool/qmaster/messages 220 Exec daemon: <execd_spool_dir>/<hostname>/messages 221 222 223 Grid Engine startup scripts 224 --------------------------- 225 226 Grid Engine startup scripts can be found at: 227 228 /scratch2/myusername/sge62/default/common/sgemaster (qmaster) 229 /scratch2/my/sge62/default/common/sgeexecd (execd) 230 231 Do you want to see previous screen about using Grid Engine again (y/n) [n] >> 232 233 Your execution daemon installation is now completed. 234
How to Register Administration Hosts
The master host is implicitly allowed to run administrative tasks and to submit, monitor, and delete jobs. The master host does not require any additional installation or configuration to perform administration functions. By contrast, pure administration hosts do require registration.
| Note You can also install administration hosts by using the QMON graphical user interface. See How to Configure Administration Hosts With QMON. |
To register an administration host from the command line:
- On the master host, log in to the Grid Engine system administrative account, for example, the sgeadmin account.
- Type the following command:
% qconf -ah <admin-host-name>[,...]
How to Register Submit Hosts
| Note You can also install submit hosts by using the QMON graphical user interface. See How to Configure Submit Hosts With QMON. |
To register a submit host from the command line:
- On the master host, log in to the Grid Engine system administrative account, for example, the sgeadmin account.
- Type the following command:
% qconf -as <submit-host-name>[,...]
Refer to About Hosts and Daemons for more details and other means to configure the different host types.
How to Install the Berkeley DB Spooling Server
The installation procedure installs the Grid Engine software necessary for Berkeley DB spooling.
- Load the Grid Engine software onto a local file system.
For details on how to extract the files, see How to Load the Distribution Files On a Workstation.
- Log in to the spooling server host as root.
- If the $SGE_ROOT environment variable is not set, set it by typing:
# SGE_ROOT=sge-root; export SGE_ROOT
To confirm that you have set the $SGE_ROOT environment variable, type:
# echo $SGE_ROOT
- Change to the installation directory.
# cd $SGE_ROOT
- Type the inst_sge command with the -db option.
# sge-root/inst_sge -db
This command starts the spooling server installation procedure. You are asked several questions. If you think something went wrong, you can quit the installation procedure and restart it at any time.
- Choose an administrative account owner.
Choosing Grid Engine admin user account --------------------------------------- You may install Grid Engine that all files are created with the user id of an unprivileged user. This will make it possible to install and run Grid Engine in directories where user >root< has no permissions to create and write files and directories. - Grid Engine still has to be started by user >root< - this directory should be owned by the Grid Engine administrator Do you want to install Grid Engine under an user id other than >root< (y/n) [y] >> y Choosing a Grid Engine admin user name -------------------------------------- Please enter a valid user name >> sgeadmin Installing Grid Engine as admin user >sgeadmin< Hit <RETURN> to continue >>
- Verify the $SGE_ROOT directory setting.
In the following example, the value of $SGE_ROOT is /opt/sge62.Checking $SGE_ROOT directory ---------------------------- The Grid Engine root directory is: $SGE_ROOT = /opt/sge62 If this directory is not correct (e.g. it may contain an automounter prefix) enter the correct path to this directory or hit <RETURN> to use default [/opt/n1ge6] >> Your $SGE_ROOT directory: /opt/sge62 Hit <RETURN> to continue >>
- Type the name of your cell.
The use of Grid Engine system cells is described in Cells.Grid Engine cells ----------------- Grid Engine supports multiple cells. If you are not planning to run multiple Grid Engine clusters or if you don't know yet what is a Grid Engine cell it is safe to keep the default cell name default If you want to install multiple cells you can enter a cell name now. The environment variable $SGE_CELL=<your_cell_name> will be set for all further Grid Engine commands. Enter cell name [default] >>
- Select Berkeley DB spooling.
Setup spooling -------------- Your SGE binaries are compiled to link the spooling libraries during runtime (dynamically). So you can choose between Berkeley DB spooling and Classic spooling method. Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>
- Verify your host name.
In this example, the installation script is being run on host2.Berkeley Database spooling parameters ------------------------------------- You are going to install an RPC Client/Server mechanism! In this case, qmaster will contact an RPC server running on a separate server machine. If you want to use the SGE shadowd, you have to use the RPC Client/Server mechanism. Enter database server name or hit <RETURN> to use default [host2] >>
- Type the directory path of your spooling directory.
You might need to change this path if this directory is NFS mounted, or if you do not have write permissions to this directory.Enter the database directory or hit <RETURN> to use default [/opt/sge62/default//spooldb] >> creating directory: /opt/sge62/default//spooldb
- Start the RPC server.
Now we have to startup the rc script >/opt/sge62/default/common/sgebdb< on the RPC server machine If you already have a configured Berkeley DB Spooling Server, you have to restart the Database with the rc script now and continue with >NO< Shall the installation script try to start the RPC server? (y/n) [y] >> y Starting rpc server on host host2! The Berkeley DB has been started with these parameters: Spooling Server Name: host2 DB Spooling Directory: /opt/sge62/default//spooldb Please remember these values, during Qmaster installation you will be asked for them! Hit <RETURN> to continue!
- Specify whether you want Berkeley DB service to start automatically at boot time.
Berkeley DB startup script -------------------------- We can install the startup script that Grid Engine is started at machine boot (y/n) [y] >> y
Once you answer this question, the installation process is complete.
- Create the environment variables for use with the Grid Engine software.

Note
If no cell name was specified during installation, the value of $SGE_CELL is default.- If you are using a C shell, type the following command:
% source $SGE_ROOT/$SGE_CELL/common/settings.csh
- If you are using a Bourne shell or Korn shell, type the following command:
$ . $SGE_ROOT/$SGE_CELL/common/settings.sh
- If you are using a C shell, type the following command:
How to Install the Berkeley DB Spooling Server
The installation procedure installs the Grid Engine software necessary for Berkeley DB spooling.
- Load the Grid Engine software onto a local file system.
For details on how to extract the files, see How to Load the Distribution Files On a Workstation.
- Log in to the spooling server host as root.
- If the $SGE_ROOT environment variable is not set, set it by typing:
# SGE_ROOT=sge-root; export SGE_ROOT
To confirm that you have set the $SGE_ROOT environment variable, type:
# echo $SGE_ROOT
- Change to the installation directory.
# cd $SGE_ROOT
- Type the inst_sge command with the -db option.
# sge-root/inst_sge -db
This command starts the spooling server installation procedure. You are asked several questions. If you think something went wrong, you can quit the installation procedure and restart it at any time.
- Choose an administrative account owner.
Choosing Grid Engine admin user account --------------------------------------- You may install Grid Engine that all files are created with the user id of an unprivileged user. This will make it possible to install and run Grid Engine in directories where user >root< has no permissions to create and write files and directories. - Grid Engine still has to be started by user >root< - this directory should be owned by the Grid Engine administrator Do you want to install Grid Engine under an user id other than >root< (y/n) [y] >> y Choosing a Grid Engine admin user name -------------------------------------- Please enter a valid user name >> sgeadmin Installing Grid Engine as admin user >sgeadmin< Hit <RETURN> to continue >>
- Verify the $SGE_ROOT directory setting.
In the following example, the value of $SGE_ROOT is /opt/sge62.Checking $SGE_ROOT directory ---------------------------- The Grid Engine root directory is: $SGE_ROOT = /opt/sge62 If this directory is not correct (e.g. it may contain an automounter prefix) enter the correct path to this directory or hit <RETURN> to use default [/opt/n1ge6] >> Your $SGE_ROOT directory: /opt/sge62 Hit <RETURN> to continue >>
- Type the name of your cell.
The use of Grid Engine system cells is described in Cells.Grid Engine cells ----------------- Grid Engine supports multiple cells. If you are not planning to run multiple Grid Engine clusters or if you don't know yet what is a Grid Engine cell it is safe to keep the default cell name default If you want to install multiple cells you can enter a cell name now. The environment variable $SGE_CELL=<your_cell_name> will be set for all further Grid Engine commands. Enter cell name [default] >>
- Select Berkeley DB spooling.
Setup spooling -------------- Your SGE binaries are compiled to link the spooling libraries during runtime (dynamically). So you can choose between Berkeley DB spooling and Classic spooling method. Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>
- Verify your host name.
In this example, the installation script is being run on host2.Berkeley Database spooling parameters ------------------------------------- You are going to install an RPC Client/Server mechanism! In this case, qmaster will contact an RPC server running on a separate server machine. If you want to use the SGE shadowd, you have to use the RPC Client/Server mechanism. Enter database server name or hit <RETURN> to use default [host2] >>
- Type the directory path of your spooling directory.
You might need to change this path if this directory is NFS mounted, or if you do not have write permissions to this directory.Enter the database directory or hit <RETURN> to use default [/opt/sge62/default//spooldb] >> creating directory: /opt/sge62/default//spooldb
- Start the RPC server.
Now we have to startup the rc script >/opt/sge62/default/common/sgebdb< on the RPC server machine If you already have a configured Berkeley DB Spooling Server, you have to restart the Database with the rc script now and continue with >NO< Shall the installation script try to start the RPC server? (y/n) [y] >> y Starting rpc server on host host2! The Berkeley DB has been started with these parameters: Spooling Server Name: host2 DB Spooling Directory: /opt/sge62/default//spooldb Please remember these values, during Qmaster installation you will be asked for them! Hit <RETURN> to continue!
- Specify whether you want Berkeley DB service to start automatically at boot time.
Berkeley DB startup script -------------------------- We can install the startup script that Grid Engine is started at machine boot (y/n) [y] >> y
Once you answer this question, the installation process is complete.
- Create the environment variables for use with the Grid Engine software.

Note
If no cell name was specified during installation, the value of $SGE_CELL is default.- If you are using a C shell, type the following command:
% source $SGE_ROOT/$SGE_CELL/common/settings.csh
- If you are using a Bourne shell or Korn shell, type the following command:
$ . $SGE_ROOT/$SGE_CELL/common/settings.sh
- If you are using a C shell, type the following command:
Installing the Increased Security Features
Use the instructions in this section to set up your system more securely. These instructions will help you set up your system with Certificate Security Protocol (CSP)-based encryption.
Why Install the Increased Security Features?
Instead of transferring messages in clear text, the messages in this secure system are encrypted with a secret key. The secret key is exchanged using a public/private key protocol. Users present their certificates through the Grid Engine system to prove identity. Users receive the certificate to ensure that they are communicating with the correct systems. After this initial announcement phase, communication continues transparently in encrypted form. The session is valid only for a certain period, after which the session must be re-announced.
Additional Setup Required
The steps required to set up the Certificate Security Protocol enhanced version of the Grid Engine system are similar to the standard setup. You generally follow the instructions in Planning the Installation, Loading the Distribution Files on a Workstation, How to Install the Master Host, How to Install Execution Hosts and How to Register Administration Hosts.
However, the following additional tasks are required:
- Generating the Certificate Authority (CA) system keys and certificates on the master host by calling the installation script with the -csp flag
- Distributing the system keys and certificates to the execution and submit hosts using a secure method such as ssh
- Generating user keys and certificates automatically, after master installation
- Adding new users
| Topic | Description |
|---|---|
| How to Install a CSP-Secured System | Procedure for installing a CSP-secured system. |
| How to Generate Certificates and Private Keys for Users | Procedure for generating user-specific certificates and private keys. |
| How to Renew Certificates | Procedure for renewing user-specific certificates. |
| How to Check Certificates | Procedure for checking user-specific certificates. |
How to Install a CSP-Secured System
Install the Grid Engine software as outlined in Performing an Installation, with the following exception: use the additional flag -csp when invoking the various installation scripts. To install a CSP-secured system do the following:
- Change the master host installation procedure.
Type the following command and respond to the prompts from the installation script.# ./install_qmaster -csp
- Supply the following information to generate the CSP certificates and keys:
- Two-letter country code, for example, US for the United States
- State
- Location, such as a city
- Organization
- Organizational unit
- CA email address
As the installation proceeds, the Certificate Authority is created. A CA specific to the Grid Engine system is created on the master host. The directories that contain information relevant to security are as follows: - The publicly accessible CA and daemon certificate are stored in
$SGE_ROOT/$SGE_CELL/common/sgeCA
- The corresponding private keys are stored in
/var/sgeCA/{sge_service| portSGE_QMASTER_PORT}/cell/private
- User keys and certificates are stored in
/var/sgeCA/{sge_service| portSGE_QMASTER_PORT}/cell/userkeys/$USER
- The script prompts you for site information.
- Confirm whether the information you supplied is correct.
- Continue the installation.
After the security-related setup of the master host sge_qmaster is finished, the script prompts you to continue with the rest of the installation procedure, as in the following example:SGE startup script -------------------- Your system wide SGE startup script is installed as: "/scratch2/eddy/sge_sec/default/common/sgemaster" Hit Return to continue >> - Transfer the directory that contains the private key and the random file to each execution host.
- As root on the master host, type the following commands to prepare to copy the private keys to the machines you set up as execution hosts:
# umask 077 # cd / # tar cvpf /var/sgeCA/port536.tar /var/sgeCA/port536/default
- As root on each execution host, use the following commands to securely copy the files:
# umask 077 # cd / # scp masterhost:/var/sgeCA/port536.tar . # umask 022 # tar xvpf /port536.tar # rm /port536.tar
Note
On a Windows execution host, the tar utility cannot restore the ownerships and permissions. In this case, the Administrator must set the ownerships and permissions manually. - Type the following command to verify the file permissions:
# ls -lR /var/sgeCA/port536/
The output should look like the following example:/var/sgeCA/port536/: total 2 drwxr-xr-x 4 eddy other 512 Mar 6 10:52 default /var/sgeCA/port536/default: total 4 drwx------ 2 eddy staff 512 Mar 6 10:53 private drwxr-xr-x 4 eddy staff 512 Mar 6 10:54 userkeys /var/sgeCA/port536/default/private: total 8 -rw------- 1 eddy staff 887 Mar 6 10:53 cakey.pem -rw------- 1 eddy staff 887 Mar 6 10:53 key.pem -rw------- 1 eddy staff 1024 Mar 6 10:54 rand.seed -rw------- 1 eddy staff 761 Mar 6 10:53 req.pem /var/sgeCA/port536/default/userkeys: total 4 dr-x------ 2 eddy staff 512 Mar 6 10:54 eddy dr-x------ 2 root staff 512 Mar 6 10:54 root /var/sgeCA/port536/default/userkeys/eddy: total 16 -r-------- 1 eddy staff 3811 Mar 6 10:54 cert.pem -r-------- 1 eddy staff 887 Mar 6 10:54 key.pem -r-------- 1 eddy staff 2048 Mar 6 10:54 rand.seed -r-------- 1 eddy staff 769 Mar 6 10:54 req.pem /var/sgeCA/port536/default/userkeys/root: total 16 -r-------- 1 root staff 3805 Mar 6 10:54 cert.pem -r-------- 1 root staff 887 Mar 6 10:54 key.pem -r-------- 1 root staff 2048 Mar 6 10:53 rand.seed -r-------- 1 root staff 769 Mar 6 10:54 req.pem
- As root on the master host, type the following commands to prepare to copy the private keys to the machines you set up as execution hosts:
- Install the Grid Engine software on each execution host.
# cd $SGE_ROOT # ./install_execd -csp
- Respond to the prompts from the installation script.
The execution host installation procedure creates the appropriate directory hierarchy required by sge_execd, and starts the sge_execd daemon on the execution host.
If the root user does not have write permissions in the $SGE_ROOT directory on all of the machines where Grid Engine software will be installed, you are asked whether to install the software as the user to whom the directory belongs. If you answer yes, you must install the security-related files into that user's $HOME/.sge directory, as shown in the following example.% su - sgeadmin % source $SGE_ROOT/default/common/settings.csh % $SGE_ROOT/util/sgeCA/sge_ca -copy % logoutIn the above example, sgeadmin is the name of the user who owns the installation directory.
- After completing all remaining installation steps, refer to the instructions below in How to Generate Certificates and Private Keys for Users.
How to Generate Certificates and Private Keys for Users
To use the CSP-secured system, the user must have access to a user-specific certificate and private key. The most convenient method of gaining access is to create a text file identifying the users.
- On the master host, create and save a text file that identifies users.
Use the format of the file myusers.txt shown in the following example. The fields of the file are UNIX_username:Gecos_field:email_address.eddy:Eddy Smith:eddy@my.org sarah:Sarah Miller:sarah@my.org leo:Leo Lion:leo@my.org
- As root on the master host, type the following command:
# $SGE_ROOT/util/sgeCA/sge_ca -usercert myusers.txt
- Confirm by typing the following command:
# ls -l /var/sgeCA/port536/default/userkeys
This directory listing produces output similar to the following example.
dr-x------ 2 eddy staff 512 Mar 5 16:13 eddy dr-x------ 2 sarah staff 512 Mar 5 16:13 sarah dr-x------ 2 leo staff 512 Mar 5 16:13 leo
- Tell each user to install security related files in their directories.
Tell each user listed in the file (myusers.txt in the example) to install the security-related files in their $HOME/.sge directories by typing the following commands.% source $SGE_ROOT/default/common/settings.csh % $SGE_ROOT/util/sgeCA/sge_ca -copyUsers should see the following confirmation (user eddy in the example).
Certificate and private key for user eddy have been installed
For every Grid Engine software installation, a subdirectory for the corresponding SGE_QMASTER_PORT number is installed. The following example, based on the myusers.txt file, is a result of issuing the command preceding the output.
% ls -lR $HOME/.sge /home/eddy/.sge: total 2 drwxr-xr-x 3 eddy staff 512 Mar 5 16:20 port536 /home/eddy/.sge/port536: total 2 drwxr-xr-x 4 eddy staff 512 Mar 5 16:20 default /home/eddy/.sge/port536/default: total 4 drwxr-xr-x 2 eddy staff 512 Mar 5 16:20 certs drwx------ 2 eddy staff 512 Mar 5 16:20 private /home/eddy/.sge/port536/default/certs: total 8 -r--r--r-- 1 eddy staff 3859 Mar 5 16:20 cert.pem /home/eddy/.sge/port536/default/private: total 6 -r-------- 1 eddy staff 887 Mar 5 16:20 key.pem -r-------- 1 eddy staff 2048 Mar 5 16:20 rand.seed
How to Renew Certificates
- Change to $SGE_ROOT and become root on the master host.
# tcsh # source $SGE_ROOT/default/settings.csh
Note
This assumes that $SGE_CELL is the default. - Edit $SGE_ROOT/util/sgeCA/renew_all_certs.csh, and change the number of days that the certificates are valid:
# extend the validity of the CA certificate by set CADAYS = 365 # extend the validity of the daemon certificate by set DAEMONDAYS = 365 # extend the validity of the user certificate by set USERDAYS = 365
- Run the changed script.
# util/sgeCA/renew_all_certs.csh

Note
The default for all extension times is 365 days from the day the script is run. - Replace the old certificates against the new ones on all hosts that installed them locally.
That is, under /var/sgeCA/..., see the execution daemon installation.
- If users have copied certificates and keys to $HOME/.sge, they have to repeat $SGE_ROOT/util/sgeCA/sge_ca -copy to have access to the renewed certificates.
Verifying the Installation
The verification phase includes the following tasks:
- Ensuring that the master daemon is running on the master host
- Ensuring that the daemons are running on all execution hosts
- Ensuring that you can run simple commands
- Submitting test jobs
To ensure that the Grid Engine system daemons are running, look for the sge_qmaster daemon on the master host and the sge_execd daemon on the execution hosts. Once you have verified that the daemons are running, you can try to use commands and prepare to submit jobs.
| Note If no cell name was specified during installation, the value of $SGE_CELL is default. |
| Topic | Description |
|---|---|
| How to Verify That the Daemon Is Running on the Master Host | Procedure for verifying that the Daemon is running on the master host. |
| How to Verify That the Daemons Are Running on the Execution Hosts | Procedure for verifying that the Daemons are running on the execution hosts. |
| How to Run Simple Commands | Procedure for verifying that the Sun Grid Engine software is operational by running some trial commands. |
| How to Submit Test Jobs | Procedure for submitting test jobs. |
How to Verify That the Daemon is Running on the Master Host
- Log in to the master host.
Look in the file $SGE_ROOT/$SGE_CELL/common/act_qmaster to see if you really are on the master host.
- Verify that the master daemon is running.
- On BSD-based UNIX systems, type the following command:
% ps -ax | grep sge
You should see output similar to the following example.
14676 p1 S < 4:47 /gridware/sge/bin/solaris/sge_qmaster
- On systems running a UNIX System 5-based operating system (such as the Solaris Operating System), type the following command:
% ps -ef | grep sge
You should see output similar to the following example.
root 439 1 0 Jun 2 ? 3:37 /gridware/sge/bin/solaris/sge_qmaster
- On BSD-based UNIX systems, type the following command:
- If you do not see the appropriate string, restart the daemon.
To start the master host daemon, sge_qmaster:# $SGE_ROOT/$SGE_CELL/common/sgemaster start
- Continue the verification process.
After you have verified that the master host and the execution host daemons are running, continue the verification process. See How to Run Simple Commands.
How to Verify That the Daemons Are Running on the Execution Hosts
- Log in to the execution hosts on which you ran the execution host installation procedure.
- Verify that the daemons are running.
- On BSD-based UNIX systems, type the following command:
% ps -ax | grep sge
You should see output similar to the following example.
14688 p1 S < 4:27 /gridware/sge/bin/solaris/sge_execd
- On systems running a UNIX System 5-based operating system (such as the Solaris Operating System), type the following command:
% ps -ef | grep sge
You should see output similar to the following example.
root 171 1 0 Jun 22 ? 7:11 /gridware/sge/bin/solaris/sge_execd
- On BSD-based UNIX systems, type the following command:
- If you do not see similar output, restart the daemon.
# $SGE_ROOT/$SGE_CELL/common/sgeexecd start
- Continue the verification process.
After you have verified that the master host and the execution host daemons are running, continue the verification process. See How to Run Simple Commands below for details.
How to Run Simple Commands
If both the necessary daemons are running on the master and execution hosts, the Grid Engine software should be operational. Check by issuing a trial command.
- Log in to either the master host or another administrative host.
In your standard search path, make sure to include $SGE_ROOT/bin.
- From the command line, type the following command:
% qconf -sconf
This qconf command displays the current global cluster configuration Configuring Clusters.
If this command fails, your $SGE_ROOT environment variable is not set correctly.- Check whether the environment variables SGE_EXECD_PORT and SGE_QMASTER_PORT are set in the script files, $SGE_ROOT/$SGE_CELL/common/settings.csh or $SGE_ROOT/$SGE_CELL/common/settings.sh.

Note
If no cell name was specified during installation, the value of $SGE_CELL is default.- If so, make sure that the environment variables SGE_EXECD_PORT and {{SGE_QMASTER_PORT} are set to the correct value before you try the command again.
- If not, verify whether your NIS services map contains entries for sge_qmaster and sge_execd.
If the SGE_EXECD_PORT and SGE_QMASTER_PORT variables are not used in these files, then the services database (/etc/services or the NIS services map for example) on the machine from which you run the command must provide entries for both sge_qmaster and sge_execd. If these entries do not exist, add an entry to the machine's services database, giving it the same value as is configured on the master host.
- Retry the qconf command.
- Check whether the environment variables SGE_EXECD_PORT and SGE_QMASTER_PORT are set in the script files, $SGE_ROOT/$SGE_CELL/common/settings.csh or $SGE_ROOT/$SGE_CELL/common/settings.sh.
- Try to submit test jobs.
How to Submit Test Jobs
Before you start submitting batch scripts to the Grid Engine system, check to see whether your site's standard shell resource files (.cshrc, .profile, or .kshrc) as well as your personal resource files contain commands such as stty. Batch jobs do not have a terminal connection by default, and therefore calls to stty result in an error.
- Log in to the master host.
- Type the following command.
% rsh <exec-host-name> date
The exec-host-name refers to one of the already installed execution hosts. You should try this test on all execution hosts if your login or home directories differ from host to host. The rsh command should give you output similar to the date command run locally on the master host. If any additional lines contain error messages, you must fix the cause of the errors before you can run a batch job successfully.
For all command interpreters, you can check on an actual terminal connection before you run a command such as stty.
The following is an example of a Bourne shell script to test the terminal connection.tty -s if [ $? = 0 ]; then stty erase ^H fi
The following example shows C shell syntax.tty -s if ( $status = 0 ) then stty erase ^H endif - Submit one of the sample scripts contained in the $SGE_ROOT/examples/jobs directory.
% qsub $SGE_ROOT/examples/jobs/simple.sh
- Use the qstat command to monitor the job's behavior.
For more information about submitting and monitoring batch jobs, see Submitting Batch Jobs.
- After the job finishes executing, check your home directory for the redirected stdout/stderr files script-name.ejob-id and script-name.ojob-id.
The job-id is a consecutive unique integer number assigned to each job.
In case of problems, see Fine-Tuning Your Environment and Using DTrace for Performance Tuning.
Automating the Installation Process
This section describes how you can automate the software installation process for the following reasons:
- To install the Grid Engine software on many hosts
- To install the Grid Engine software without user interaction
You can use the $SGE_ROOT/inst_sge utility to install and uninstall Sun Grid Engine master hosts, execution hosts, shadow host and Berkeley DB spooling server hosts. You can also use this utility to backup automatically the Sun Grid Engine configuration and accounting data.
| Note Using the Berkeley DB Spooling Server host does not provide high availability, and it has no authentication mechanism. It should only be used on a closed network with fully trusted users. |
You can use the inst_sge utility in interactive mode to supplant any of the commands that were described in Installing the Software From the Command Line.
To simplify automatic installation and backup processes, use the configuration templates that are located in the $SGE_ROOT/util/install_modules directory.
The automatic installation requires no user interaction. No messages are displayed on the terminal during the installation.
When the installation finishes, a message indicates where the installation log file resides. The name of the installation log file format is install_hostname_timestamp.log. Normally, you can find information about errors during installation in this file. In case of serious errors though, the installation script might not be able to move the log file into the spool directory. In this situation, the log file is placed in the /tmp directory.
| Topic | Description |
|---|---|
| Automatic Installation | Perform an automatic installation by setting up a configuration file. |
| Automatic Uninstallation | Learn how to uninstall hosts automatically. |
| How to Start the Automatic Backup | Procedure for backing up configuration and accounting data by using the interactive backup procedure. |
| Troubleshooting Automatic Installation and Uninstallation | Troubleshoot errors that might be encountered during automatic installation. |
Automatic Installation
Special Considerations
The first step in performing an automatic installation is to set up a configuration file. You can find configuration file templates in the $SGE_ROOT/util/install_modules directory. Consider the following as you plan your automatic installation:
- To use automatic installation on remote hosts, the root user must be able to access those hosts through rsh or ssh without supplying a password.
- For local spooling, that is, spooling on the master host, no special configuration is needed. However, the directory where the spooling occurs must not be on an NFS version 3 volume. You may use an NFS version 4 volume for local spooling.
- To run the Berkeley DB spooling server on a host other than the master host, you must install and configure RPC services on this separate host.
To perform this step manually before you start the automatic installation, use the following command:
./inst_sge -db
You can also use the following command to install automatically the Berkeley DB Spooling Server:
% ./inst_sge -db -m -x -auto <full-path-to-configuration-file>
This command checks the SPOOLING_SERVER entry within the configuration file and starts the Berkeley DB installation on the server host.
| Note If you start the automatic installation on the master host, the entire cluster can be installed with one command. The automatic installation script accesses the remote hosts through rsh or ssh and starts the installation remotely. This process requires a well-configured configuration file, which each host must be able to read. That file should be installed on each host or shared through NFS. |
Using the inst_sge Utility and a Configuration Template
To automate system installation, use the inst_sge utility in combination with a configuration file. See How to Automate Other Installations Through a Configuration File.
| Note You cannot use the auto installation procedure to install remotely a Windows execution host. You must run the auto installation procedure directly on the Windows execution host. |
| Topic | Description |
|---|---|
| How to Automate the Master Host Installation | Procedure for automating the master host installation. |
| How to Automate Other Installations Through a Configuration File | Procedure for performing a variety of other automatic installations using the configuration file. |
| How to Automate Installation With Increased Security (CSP) | Procedure for automating installation with Certificate Security Protocol (CSP) mode. |
How to Automate Installation With Increased Security (CSP)
The automatic installation also supports the Certificate Security Protocol (CSP) mode described in Installing the Increased Security Features. To use the CSP security mode, you must fill out the CSP parameters of the template files. The parameters are as follows:
# This section is used for csp installation mode. # CSP_RECREATE recreates the certs on each installation, if true. # In case of false, the certs will be created, if not existing. # Existing certs won't be overwritten. (mandatory for csp install) CSP_RECREATE="true" # The created certs won't be copied, if this option is set to false # If true, the script tries to copy the generated certs. This # requires passwordless ssh/rsh access for user root to the # execution hosts CSP_COPY_CERTS="false" # csp information, your country code (only 2 characters) # (mandatory for csp install) CSP_COUNTRY_CODE="DE" # your state (mandatory for csp install) CSP_STATE="Germany" # your location, eg. the building (mandatory for csp install) CSP_LOCATION="Building" # your organisation (mandatory for csp install) CSP_ORGA="Organisation" # your organisation unit (mandatory for csp install) CSP_ORGA_UNIT="Organisation_unit" # your email (mandatory for csp install) CSP_MAIL_ADDRESS="name@yourdomain.com"
To start the installation, type the following command:
inst_sge -m -csp -auto template-file-name
| Note Certificates are created during the installation process. These certificates have to be copied to each host of the installed cluster. The installation process can do this for you; however, you need to perform the following steps to allow the installation process appropriate permissions to copy the certificates:
|
How to Automate Other Installations Through a Configuration File
In addition to installing the master host, you can perform a variety of other automatic installations using a similar process. The actual form of the inst_sge command differs slightly, and different sections of the configuration file apply. This section provides some examples.
- To install a shadow host, use the following form of the command:
inst_sge -sm -auto <full-path-to-configuration-file>

Tip
To install more than one shadow host, enter the host names in the <SHADOW_HOST> parameter section within the configuration file. - You can install a separate execution host installation if the master host was installed without identified compute hosts or if you need to add additional compute hosts. For the execution host installation, you also need to have a configuration file.
To install all configured execution hosts, use the following form of the command:inst_sge -x -auto <full-path-to-configuration-file>
- To install the Berkeley database server, use the following form of the command:
inst_sge -db -auto <full-path-to-configuration-file>
See Configuration File Templates.
How to Automate the Master Host Installation
Before You Begin
You need to complete the planning process as outlined in Planning the Installation.
In addition, you need to be able to connect to each of the remote hosts using the rsh or ssh commands, without supplying a password. If this type of access is not allowed on your network, you cannot use this method of installation.
Steps
- Create a copy of the configuration template, $SGE_ROOT/util/install_modules/inst_template.conf.
# cd $SGE_ROOT/util/install_modules # cp inst_template.conf my_configuration.conf
- Edit your configuration template, using the values from the worksheet you completed in Planning the Installation.
The configuration file template includes liberal comments to help you decide where appropriate information belongs. See Configuration File Templates.
- Log in as root on the system that you want to be the Sun Grid Engine master host.
- Create the $SGE_ROOT directory.
The $SGE_ROOT directory is the root directory of the Sun Grid Engine software hierarchy, for example /opt/sge62.
- Go to the $SGE_ROOT directory and start the installation.
# cd $SGE_ROOT # ./inst_sge -m -auto <full-path-to-configuration-file>
The -m option starts the master host installation and installs the master daemon on the local machine. In addition, the -auto option sets up any remote hosts, as specified in the configuration file.
| Note You cannot install remotely a master host. You must always install a master host locally. |
To prevent data loss or destroying an already installed cluster, the automatic installation terminates if the configured $SGE_CELL directory or the configured Berkeley DB spooling directory already exists. If the installation terminates, the script displays the reason for the termination on the screen.
A log file of the master installation is created in the $SGE_ROOT/default/spool/qmaster directory. The file name is created using the format install_hostname_date_time.log.
| Tip You can also combine options if you want to perform multiple installations with one command. For example, the following command installs the master daemon on the local machine and installs all execution hosts that are configured in the configuration file: |
./inst_sge -m -x -auto <full-path-to-configuration-file>
a. Wait for notification that the installation has completed.
b. When the automatic installation exits successfully, it displays a message similar to the following:
The Install log can be found in the
{{/opt/sge62/spool/install_myhost_30mar2007_090152.log}} file.
The installation log file includes any script or error messages that were generated during installation. If the qmaster_spooling_dir directory exists, the log files will be in that directory. If the directory does not exist, the log files will be in the /tmp directory.
| Troubleshooting If you do not want your execution hosts to spool locally, be sure to set EXECD_SPOOL_DIR_LOCAL="", with no space between the double quotes (""). |
Automating Other Installations Through a Configuration File
In addition to installing the master host, you can perform a variety of other automatic installations using a similar process. The actual form of the inst_sge command differs slightly, and different sections of the configuration file apply. This section provides some examples.
- To install a shadow host, use the following form of the command:
inst_sge -sm -auto <full-path-to-configuration-file>

Tip
To install more than one shadow host, enter the host names in the <SHADOW_HOST> parameter section within the configuration file. - You can install a separate execution host installation if the master host was installed without identified compute hosts or if you need to add additional compute hosts. For the execution host installation, you also need to have a configuration file.
To install all configured execution hosts, use the following form of the command:inst_sge -x -auto <full-path-to-configuration-file>
- To install the Berkeley database server, use the following form of the command:
inst_sge -db -auto <full-path-to-configuration-file>
See Configuration File Templates.
Automatic Uninstallation
You can also uninstall hosts automatically.
| Note Uninstall all compute hosts before you uninstall the master host. If you uninstall the master host first, you have to uninstall all execution hosts manually. |
To ensure that you have a clean environment, always source the $SGE_ROOT/$SGE_CELL/common/settings.csh file before proceeding.
| Topic | Description |
|---|---|
| How to Uninstall the Master Host Automatically | Procedure for uninstalling the master host automatically. |
| How to Uninstall Execution Hosts Automatically | Procedure for uninstalling the execution hosts automatically. |
| How to Uninstall the Shadow Master Host | Procedure for uninstalling the shadow host. |
How to Uninstall the Master Host Automatically
The master host uninstallation removes all of the Sun Grid Engine configuration files. After the uninstallation procedure completes, only the binary files remain. If you think that you will need the configuration information after the uninstallation, perform a backup of the master host. The master host uninstallation supports both interactive and automatic mode.
To start the automatic uninstallation of the master host, type the following command:
% ./inst_sge -um -auto <full-path-to-configuration-file>
This command performs the same procedure as in interactive mode, except the user is not prompted for confirmation of any steps and all terminal output is suppressed. Once the uninstall process is started, it cannot be stopped.
How to Uninstall Execution Hosts Automatically
During the execution host uninstallation, all configuration information for the targeted hosts is deleted. The uninstallation attempts to stop the exec hosts in a graceful manner.
First, the queue instances associated with the target host of the uninstallation will be disabled, so that new jobs will not be started. Then, in sequence, the following actions are done on each of the running jobs: checkpoint the job; reschedule the job; do forced rescheduling of the job.
At this point, the queue instance will be empty, and the execution daemon will be shut down, then the configuration, global spool directory or local spool directory will be removed.
The configuration file template has a section for identifying hosts that can be uninstalled automatically. Look for this section:
# Remove this execution hosts in automatic mode EXEC_HOST_LIST_RM="host1 host2 host3 host4"
Every host in the EXEC_HOST_LIST_RM list will be automatically removed from the cluster.
To start the automatic uninstallation of execution hosts, type the following command:
% ./inst_sge -ux -auto <full-path-to-configuration-file>
How to Uninstall the Shadow Master Host
To start the automatic uninstallation of the shadow host, type the following command:
% ./inst_sge -usm -auto <full-path-to-configuration-file>
Troubleshooting Automatic Installation and Uninstallation
The following errors might be encountered during auto-installation:
| Problem | Solution |
|---|---|
| If the $SGE_CELL directory exists, the installation terminates to avoid overwriting a previous installation. | Remove or rename the directory. |
| If the Berkeley database spooling directory exists, the installation terminates to avoid overwriting a previous installation. | This directory must be removed or renamed in order to proceed. Make sure that the ADMINUSER has permissions to write into the location where the Berkeley database spooling directory is located. The ADMINUSER will be the owner of the Berkeley database spooling directory. |
| The execution host installation appears to succeed, but the execution daemon is not started, or no load values are shown. | Verify that user root is allowed to rsh or ssh to the other host, without entering a password. |
If your network does not allow user root to have permissions to connect to other hosts through rsh or ssh without asking for a password, the automatic installation will not work remotely. In this case, log in to the host and use the following command to start the automatic installation locally on each host:
% ./inst_sge -x -noremote -auto /tmp/install_config_file.conf
Installing SMF Services
The Service Management Facility (SMF) is a new feature in Solaris 10. It provides a unified model for controlling services, replaces RC scripts, handles service dependencies, provides better service availability, and speeds up boot process. If you do not use at least Version 10 of the Solaris OS in your cluster, or you do not plan to use SMF, continue with Installing the Software From the Command Line.
| Note SMF is now the default for the hosts that run at least Version 10 of the Solaris OS. If you want to use the old behavior (RC files) for the Solaris hosts, you need to start the installation with the -nosmf option. Use the following command: ./inst_sge -x -nosmf |
Installing SMF services includes the following topics:
Why Install SMF Services?
SMF provides a unified administrative model of the persistent services. It solves many challenges of the previous approaches. All services have a common place for log files. Persistent services are automatically restarted on failure. For more information, see SMF documentation.
Additional Setup Required
If you want unprivileged users to use SMF services, you should create a role sge_admin. Assign this role to the users who should be able to manipulate the Grid Engine SMF services as described here.
Then, you can simply answer y when prompted to use SMF during the installation.
How Do SMF Services Compare to the Normal Services?
The biggest difference between SMF and normal services is that SMF does not consider kill -9 to be a correct service shutdown. SMF interprets kill -9 to restart the service.
Within the SMF framework, a service is uniquely identified by its fault resource management identifier (FMRI).
qmaster Daemon
Service name (FMRI) is svc:/application/sge/qmaster:$SGE_CLUSTER_NAME.
| SGE version | sgemaster stop | qconf -km | kill -15 | kill -9 | reboot |
|---|---|---|---|---|---|
| 6.1 | stop | stop | stop | stop | restart 1 |
| 6.2 | stop | stop | stop | restart | restart |
1 - Restart the daemon if RC scripts were installed
shadowd Daemon
Service name (FMRI) is svc:/application/sge/shadowd:$SGE_CLUSTER_NAME.
| SGE version | sgemaster -shadow stop | kill -15 | kill -9 | reboot |
|---|---|---|---|---|
| 6.1 | stop | stop | stop | restart 1 |
| 6.2 | stop | stop | restart | restart |
1 - Restart the daemon if RC scripts were installed
execd Daemon
Service name (FMRI) is svc:/application/sge/execd:$SGE_CLUSTER_NAME.
| SGE version | sgeexecd stop | qconf -ke | kill -15 | kill -9 | reboot |
|---|---|---|---|---|---|
| 6.1 | stop | stop | stop | stop | restart 1 |
| 6.2 | stop | stop | stop | restart | restart |
1 - Restart the daemon if RC scripts were installed
Berkeley RPC Server
Service name (FMRI) is svc:/application/sge/bdb:$SGE_CLUSTER_NAME.
| SGE version | berkeley_svc stop | kill -15 | kill -9 | reboot |
|---|---|---|---|---|
| 6.1 | stop | stop | stop | restart 1 |
| 6.2 | stop | restart | restart | restart |
1 - Restart the server if RC scripts were installed
dbwriter Software
Service name (FMRI) is svc:/application/sge/dbwriter:$SGE_CLUSTER_NAME.
| SGE version | sgedbwriter stop | kill -15 | kill -9 | reboot |
|---|---|---|---|---|
| 6.1 | stop | stop | stop | restart 1 |
| 6.2 | stop | restart | restart | restart |
1 - Restart the dbwriter if RC scripts were installed
| Additionally you may use new SMF interfaces to interact with services. For more information, see the svcadm(1M) man page. New actions:
|
Installing a JMX-Enabled System
The JMX agent functionality enables access to a subset of sge_qmaster functionality via the JMX protocol. For Sun Grid Engine 6.2, the main purpose of the JMX agent is to provide an interface between the SDM Grid Engine adapter and the Sun Grid Engine system.
Additional Setup Required
The steps required to set up the JMX agent feature of Grid Engine are similar to the standard setup. You generally follow the instructions in Planning the Installation, Loading the Distribution Files on a Workstation, How to Install the Master Host, How to Install Execution Hosts and How to Register Administration Hosts.
However, you have to perform a few additional tasks:
- Generating necessary configuration files on the master host by calling the installation script with the -jmx flag and depending on the JMX specific installation settings the optional generation of certificates, keys and keystore files.
- Optional distribution of security relevant files to the execution and submit hosts using a secure method such as ssh
- Generating user keys, certificates and keys automatically, after master installation
- Adding new users
- Tweaking of JMX-specific files
| Topic | Description |
|---|---|
| How to Install a JMX Agent-Enabled System | Procedure for installing Sun Grid Engine using the jmx flag when invoking the qmaster installation scripts. |
| How to Generate Certificates, Private Keys and Keystores for Users | Procedure for generating user-specific certificates, private keys, and keystores. |
| How to Check Certificates, Private Keys and Keystores for Users | Procedure for checking certificates, private keys, and keystores. |
| JMX Configuration Files | Describes the JMX configuration files in detail. |
| Testing and Troubleshooting | Testing and troubleshooting a JMX-enabled system. |
How to Install a JMX Agent-Enabled System
Install the Grid Engine software as outlined in Installing the Software From the Command Line, with the following exception: use the additional flag -jmx when invoking the qmaster installation scripts.
To install a JMX agent enabled system do the following:
- Change the master host installation procedure.
Type the following command and respond to the prompts from the installation script.# ./install_qmaster -jmx [-csp]
- Supply the following information to generate necessary configuration files and optionally the certificates, keys and keystores:
- JMX agent options
- JAVA_HOME (mandatory)
- Additional JVM arguments (optional)
- JMX MBean server port (mandatory)
- JMX ssl authentication (default: true)
- JMX ssl client authentication (default: true)
- JMX ssl server keystore path
(/var/sgeCA/{sge_qmaster| port$SGE_QMASTER_PORT}/$SGE_CELL/private/keystore) - JMX ssl server keystore password
- Optional certificate specific options, if there is no CA available
- Two-letter country code, for example, US for the United States
- State
- Location, such as a city
- Organization
- Organizational unit
- CA email address
As the installation proceeds, several JMX specific configuration files are installed:
**jvm_threads is set to 1 instead of 0 if JMX is enabled in $SGE_ROOT/$SGE_CELL/common/bootstrap:... jvm_threads 1 ...
- Several JMX agent specific configuration files are generated as:
$SGE_ROOT/$SGE_CELL/common/jmx/jaas.config $SGE_ROOT/$SGE_CELL/common/jmx/java.policy $SGE_ROOT/$SGE_CELL/common/jmx/jmxremote.access $SGE_ROOT/$SGE_CELL/common/jmx/jmxremote.password $SGE_ROOT/$SGE_CELL/common/jmx/logging.properties $SGE_ROOT/$SGE_CELL/common/jmx/management.properties
For a detailed description, see the comments in the files and the description below.
Optionally the Certificate Authority is created. The directories that contain information relevant to security are as follows:- The publicly accessible CA and daemon certificate are stored in $SGE_ROOT/$SGE_CELL/common/sgeCA
- The publicly accessible user certificates are stored in $SGE_ROOT/$SGE_CELL/common/sgeCA/usercerts
- The corresponding private keys and keystore are stored in /var/sgeCA/{sge_qmaster| port$SGE_QMASTER_PORT}/$SGE_CELL/private
- User keys, certificates and keystore are stored in /var/sgeCA/{sge_qmaster| port$SGE_QMASTER_PORT}/$SGE_CELL/userkeys/$USER
- JMX agent options
- The script prompts you for site information.
- Confirm whether the information you supplied is correct.
- Continue the installation.
After the security-related setup of the master host is finished, the script prompts you to continue with the rest of the installation procedure, as in the following example:SGE startup script -------------------- Your system wide SGE startup script is installed as: "/scratch2/eddy/sge_sec/$SGE_CELL/common/sgemaster" Hit Return to continue >> - Proceed to the next task.
For more information, see How to Generate Certificates and Private Keys for Users.
How to Generate Certificates, Private Keys and Keystores for Users
To use the CSP-secured system, the user must have access to a user-specific certificate, private key and keystore. Usually the steps outlined in How to Generate Certificates and Private Keys for Users are performed. After that the following procedure can be done to generate the corresponding keystore files for the users.
- As root on the master host run the following command:
# $SGE_ROOT/util/sgeCA/sge_ca -userks [-kspwf <kspwf-file>]
- Confirm that the creation has been successful.
# ls -lR /var/sgeCA/port$SGE_QMASTER_PORT/$SGE_CELL/userkeys /var/sgeCA/port$SGE_QMASTER_PORT/$SGE_CELL/userkeys/: total 8 drwx------ 2 eddy staff 512 Mar 13 11:33 eddy drwx------ 2 sarah staff 512 Mar 13 11:33 sarah drwx------ 2 leo staff 512 Mar 13 11:33 leo /var/sgeCA/port$SGE_QMASTER_PORT/$SGE_CELL/userkeys/eddy: total 16 -rw------- 1 eddy staff 1586 Mar 13 11:32 cert.pem -rw------- 1 eddy staff 891 Mar 13 11:32 key.pem -rw------- 1 eddy staff 3031 Mar 13 11:33 keystore -rw------- 1 eddy staff 1024 Mar 13 11:32 rand.seed -rw------- 1 eddy staff 818 Mar 13 11:32 req.pem ...
The page XHow to Check Certificates, Private Keys and Keystores for Users does not exist.
JMX Configuration Files
The following configuration files are installed into $SGE_ROOT/$SGE_CELL/common/jmx and are explained in detail here. Manual modification is usually not necessary and the preinstalled configurations should be sufficient.
jaas.config
Before using the JMX interface, you must run a special authentication against sge_qmaster. This process adds the correct principle that gives you the necessary role to access the JMX interfaces in read-only or read-write mode. The responsible section in the jaas.config file is named GridwareConfig or TestConfig (for testing only).
In general, the jaas.config file defines which login modules are used for which application case. The choice of the login module is defined either in a configuration file like management.properties or programmatically.
The jaas.config file contains different sections and allows the replacement of the authentication mechanism, e.g. authentication via unix pam or via LDAP (see the GridwareConfig section and TestConfig section below). The different modules can be stacked. For a general overview of Jaas, see http://java.sun.com/developer/technicalArticles/Security/jaasv2/. Here the procedure consists of two steps in the GridwareConfig section:
- Authenticate the user (for example with keystore or Unix login or with LDAP).
- In the JGDILogin module, add the JMXPrincipal that gives the defined role to the user. This role is used later in the jmx.access file to check if the user has read-only or read-write access.
/* * Default login configuration for qmaster's jmx server */ GridwareConfig { /** * Accepts all clients which have a certificate which is signed with * the CA certificate. */ com.sun.grid.security.login.GECATrustManagerLoginModule requisite caTop="${com.sun.grid.jgdi.caTop}"; /* * Accepts all clients which has a valid username/password. * * The username/password validation is done with the authuser binary (included * in the Grid Engine distribution, $SGE_ROOT/utilbin/$ARCH/authuser). * * ATTENTION: The authuser binary needs the suid bit. It does not work if grid * engine is installed on a nosuid file system. * */ com.sun.grid.security.login.UnixLoginModule requisite sge_root="${com.sun.grid.jgdi.sgeRoot}" auth_method="system"; /* * Username password authentication against LDAP. * * Alternative username/password authentication if * com.sun.grid.security.login.UnixLoginModule is not working. * * The LDAP specific parameters have to be adjusted to the local requirements * For details please have a look at the LdapLoginModule javadocs. * * ATTENTION: The LdapLoginModule is only available in java 6. The * parameter libjvm_path must point to a java 6 jvm * (qconf -sconf | grep libjvm_path) */ /* com.sun.security.auth.module.LdapLoginModule requisite userProvider="ldap://sun-ds/ou=people,dc=sun,dc=com" userFilter="(&(uid={USERNAME})(objectClass=inetOrgPerson))" useSSL=false; */ /* * The JGDILoginModule adds a JGDIPrincipal to the subject. The username of * the JGDIPrincipal is the name of the first trusted principal. This name * treated as username for gdi communication. * For each login a new jgdi session id is created. * * In the jmxremote.access file users who can access the system are defined * Any principal matching these entries is given the corresponding role. * Usually a jmxPrincipal is defined to give a user access to the system. * (e.g. com.sun.grid.security.login.UserPrincipal = xyz & * jmxPrincipal="controlRole" gives user xyz access under controlRole * ) */ com.sun.grid.jgdi.security.JGDILoginModule optional trustedPrincipal="com.sun.grid.security.login.UserPrincipal" trustedPrincipal1="com.sun.security.auth.UserPrincipal" jmxPrincipal="controlRole"; }; /* * TestConfig accepts any user. Only for testing */ TestConfig { com.sun.grid.security.login.TestLoginModule requisite; com.sun.grid.jgdi.security.JGDILoginModule optional trustedPrincipal="com.sun.grid.security.login.UserPrincipal" jmxPrincipal="controlRole"; }; /* * Mandatory if native jgdi is used with a csp system * (e.g. jgdish in csp mode) */ jgdi { com.sun.security.auth.module.KeyStoreLoginModule required keyStoreURL="file:./keystore" debug=false; };
java.policy
The java.policy file that is used by the JGDIAgent restricts the possibilities of code that can be run in sge_qmaster's JVM.
Usually changes here are only necessary to change the access to a subset of the overall functionality. To tweak the policy settings to your needs it is useful to run the JMX server with security debugging enabled and to consult the generated logging files. (qconf -mconf, additional_jvm_args = -Djavax.net.debug=ssl -Djava.security.debug=access,failure)
/* ** ** with LdapLoginModule ** grant principal com.sun.security.auth.UserPrincipal "controlRole" ** ** with jmxremote.password ** grant principal javax.management.remote.JMXPrincipal "controlRole" ** */ grant codeBase "file:${com.sun.grid.jgdi.sgeRoot}/lib/jgdi.jar" { permission java.net.SocketPermission "*:1024-", "accept,connect"; permission java.net.SocketPermission "localhost:1024-", "listen,resolve"; permission java.lang.RuntimePermission "loadLibrary.jgdi"; permission java.lang.RuntimePermission "shutdownHooks"; permission java.lang.RuntimePermission "setContextClassLoader"; permission javax.security.auth.AuthPermission "createLoginContext.jgdi"; permission javax.security.auth.AuthPermission "doAs"; permission javax.security.auth.AuthPermission "getSubject"; permission java.util.PropertyPermission "*", "read"; permission java.util.logging.LoggingPermission "control"; permission java.lang.FilePermission "${com.sun.grid.jgdi.sgeRoot}/${com.sun.grid.jgdi.sgeCell}/common/jmx/-", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/util/-", "execute"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/utilbin/-", "execute"; permission javax.management.MBeanServerPermission "createMBeanServer"; permission javax.management.MBeanPermission "*", "*"; permission javax.management.MBeanTrustPermission "register"; permission java.lang.management.ManagementPermission "monitor"; permission java.lang.management.ManagementPermission "control"; permission java.lang.RuntimePermission "setIO"; permission java.io.FilePermission "jgdi.stdout", "write"; permission java.io.FilePermission "jgdi.stderr", "write"; permission java.io.FilePermission "jgdi0.log.lck", "delete"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/${com.sun.grid.jgdi.sgeCell}/common/jmx/*", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/lib/-", "read"; permission java.lang.RuntimePermission "accessClassInPackage.sun.management.jmxremote"; permission java.lang.RuntimePermission "accessClassInPackage.sun.management.resources"; permission java.lang.RuntimePermission "accessClassInPackage.sun.management"; permission java.lang.RuntimePermission "accessClassInPackage.sun.rmi.server"; permission java.lang.RuntimePermission "accessClassInPackage.sun.management.snmp.util"; permission java.lang.RuntimePermission "accessClassInPackage.sun.rmi.registry"; permission java.util.PropertyPermission "java.rmi.server.randomIDs", "write"; permission javax.security.auth.AuthPermission "modifyPrincipals"; permission javax.security.auth.AuthPermission "createLoginContext.*"; permission javax.security.auth.AuthPermission "createLoginContext.JMXPluggableAuthenticator"; permission java.security.SecurityPermission "createAccessControlContext"; permission javax.management.remote.SubjectDelegationPermission "javax.management.remote.JMXPrincipal.controlRole"; }; grant principal javax.management.remote.JMXPrincipal "controlRole" { permission javax.management.MBeanPermission "com.sun.grid.jgdi.management.mbeans.JGDIJMX#*", "*"; permission javax.management.MBeanPermission "sun.management.*#*", "*"; permission javax.security.auth.AuthPermission "createLoginContext.jgdi"; permission javax.security.auth.AuthPermission "doAs"; permission javax.security.auth.AuthPermission "getSubject"; permission java.util.PropertyPermission "*", "read"; permission java.util.PropertyPermission "user.timezone", "read,write"; permission java.util.logging.LoggingPermission "control"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/lib/-", "read"; permission java.lang.management.ManagementPermission "monitor"; permission java.net.SocketPermission "*", "resolve"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#-[java.lang:type=OperatingSystem]", "isInstanceOf"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#-[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#ProcessCpuTime[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#Name[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#Version[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#Arch[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#AvailableProcessors[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#CommittedVirtualMemorySize[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#TotalPhysicalMemorySize[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#FreePhysicalMemorySize[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#TotalSwapSpaceSize[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#FreeSwapSpaceSize[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "javax.management.MBeanServerDelegate#-[JMImplementation:type=MBeanServerDelegate]", "addNotificationListener"; permission javax.management.MBeanPermission "javax.management.MBeanServerDelegate#-[JMImplementation:type=MBeanServerDelegate]", "isInstanceOf"; permission javax.management.MBeanPermission "javax.management.MBeanServerDelegate#-[JMImplementation:type=MBeanServerDelegate]", "getMBeanInfo"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#-[java.lang:type=OperatingSystem]", "queryNames"; permission javax.management.MBeanPermission "java.util.logging.Logging#-[java.util.logging:type=Logging]", "queryNames"; permission javax.management.MBeanPermission "javax.management.MBeanServerDelegate#-[JMImplementation:type=MBeanServerDelegate]", "queryNames"; permission javax.management.MBeanPermission "java.util.logging.Logging#-[java.util.logging:type=Logging]", "isInstanceOf"; permission javax.management.MBeanPermission "java.util.logging.Logging#-[java.util.logging:type=Logging]", "getMBeanInfo"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#-[java.lang:type=OperatingSystem]", "getMBeanInfo"; }; grant { permission java.util.logging.LoggingPermission "control"; permission java.util.PropertyPermission "*", "read"; permission java.util.PropertyPermission "user.timezone", "write"; permission java.lang.RuntimePermission "setIO"; permission java.lang.RuntimePermission "loadLibrary.jgdi"; permission java.io.FilePermission "jgdi.stdout", "write"; permission java.io.FilePermission "jgdi.stderr", "write"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/lib/-", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/util/arch", "execute"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/utilbin/-", "execute"; permission javax.security.auth.AuthPermission "modifyPrincipals"; permission java.io.FilePermission "${com.sun.grid.jgdi.caTop}", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.caTop}/cacert.pem", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.caTop}/ca-crl.pem", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.caTop}/usercerts/-", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.serverKeystore}", "read"; }; /* grant { permission java.security.AllPermission; }; */
management.properties
This file describes the general JMX server configuration and the default template looks similar to this example and is usually adapted automatically during the installation process replacing the @@SGE_*@@ variables by concrete values.
The meaning of the @@SGE_*@@ variables is:
- @@SGE_JMX_PORT@@ is the configured JMX port
- @@SGE_JMX_SSL@@ is true or false if SSL shall be enabled for JMX or not
- @@SGE_JMX_SSL_CLIENT@@ is true or false if client authentication is required
- @@SGE_JMX_SSL_KEYSTORE@@ the keystore used for enabled SSL
- @@SGE_JMX_SSL_KEYSTORE_PW@@ the corresponding keystore password
- @@SGE_ROOT@@ the $SGE_ROOT root directory
- @@SGE_CELL@@ the $SGE_CELL name usually 'default'
##################################################################### # Default Configuration File for JGDI JMX ##################################################################### # # The Management Configuration file (in java.util.Properties format) # will be read if one of the following system properties is set: # -Dcom.sun.grid.jgdi.management.jmxremote.port=<port-number> # or -Dcom.sun.grid.jgdi.management.config.file=<this-file> # # The default Management Configuration file is: # # $SGE_ROOT/{$SGE_CELL|default}/common/jmx/management.properties # # ################ Management Agent Port ######################### # # For setting the JMX RMI agent port use the following line # com.sun.grid.jgdi.management.jmxremote.port=<port-number> com.sun.grid.jgdi.management.jmxremote.port=@@SGE_JMX_PORT@@ ##################################################################### # RMI Management Properties ##################################################################### # # If system property -Dcom.sun.grid.jgdi.management.jmxremote.port=<port-number> # is set then # - A MBean server is started # - JRE Platform MBeans are registered in the MBean server # - RMI connector is published in a private readonly registry at # specified port using a well known name, "jmxrmi" # - the following properties are read for JMX remote management. # # The configuration can be specified only at startup time. # Later changes to above system property (e.g. via setProperty method), # this config file, the password file, or the access file have no effect to the # running MBean server, the connector, or the registry. # # # ###################### RMI SSL ############################# # # com.sun.grid.jgdi.management.jmxremote.ssl=true|false # Default for this property is true. (Case for true/false ignored) # If this property is specified as false then SSL is not used. # #For RMI monitoring without SSL use the following line # com.sun.grid.jgdi.management.jmxremote.ssl=false com.sun.grid.jgdi.management.jmxremote.ssl=@@SGE_JMX_SSL@@ # com.sun.grid.jgdi.management.jmxremote.ssl.enabled.cipher.suites=<cipher-suites> # The value of this property is a string that is a comma-separated list # of SSL/TLS cipher suites to enable. This property can be specified in # conjunction with the previous property "com.sun.management.jmxremote.ssl" # in order to control which particular SSL/TLS cipher suites are enabled # for use by accepted connections. If this property is not specified then # the SSL RMI Server Socket Factory uses the SSL/TLS cipher suites that # are enabled by default. # # com.sun.grid.jgdi.management.jmxremote.ssl.enabled.protocols=<protocol-versions> # The value of this property is a string that is a comma-separated list # of SSL/TLS protocol versions to enable. This property can be specified in # conjunction with the previous property "com.sun.management.jmxremote.ssl" # in order to control which particular SSL/TLS protocol versions are # enabled for use by accepted connections. If this property is not # specified then the SSL RMI Server Socket Factory uses the SSL/TLS # protocol versions that are enabled by default. # # com.sun.grid.jgdi.management.jmxremote.ssl.need.client.auth=true|false # Default for this property is false. (Case for true/false ignored) # If this property is specified as true in conjunction with the previous # property "com.sun.management.jmxremote.ssl" then the SSL RMI Server # Socket Factory will require client authentication. # #For RMI monitoring with SSL client authentication use the following line #com.sun.grid.jgdi.management.jmxremote.ssl.need.client.auth=true com.sun.grid.jgdi.management.jmxremote.ssl.need.client.auth=@@SGE_JMX_SSL_CLIENT@@ # # ################ RMI User authentication ################ # # com.sun.grid.jgdi.management.jmxremote.authenticate=true|false # Default for this property is true. (Case for true/false ignored) # If this property is specified as false then no authentication is # performed and all users are allowed all access. # # For RMI monitoring without any checking use the following line # com.sun.grid.jgdi.management.jmxremote.authenticate=false com.sun.grid.jgdi.management.jmxremote.authenticate=true # # ################ RMI Login configuration ################### # # com.sun.grid.jgdi.management.jmxremote.login.config=<config-name> # Specifies the name of a JAAS login configuration entry to use when # authenticating users of RMI monitoring. # # Setting this property is optional - the default login configuration # specifies a file-based authentication that uses the password file. # # When using this property to override the default login configuration # then the named configuration entry must be in a file that gets loaded # by JAAS. In addition, the login module(s) specified in the configuration # should use the name and/or password callbacks to acquire the user's # credentials. See the NameCallback and PasswordCallback classes in the # javax.security.auth.callback package for more details. # # If the property "com.sun.management.jmxremote.authenticate" is set to # false, then this property and the password & access files are ignored. # # For a non-default login configuration use the following line # com.sun.grid.jgdi.management.jmxremote.login.config=<config-name> com.sun.grid.jgdi.management.jmxremote.login.config=GridwareConfig # # ################ RMI Password file location ################## # # com.sun.grid.jgdi.management.jmxremote.password.file=filepath # Specifies location for password file # This is optional - default location is # $JRE/lib/management/jmxremote.password # # If the property "com.sun.grid.jgdi.management.jmxremote.authenticate" is set to # false, then this property and the password & access files are ignored. # For a non-default password file location use the following line # com.sun.grid.jgdi.management.jmxremote.password.file=filepath com.sun.grid.jgdi.management.jmxremote.password.file=@@SGE_ROOT@@/@@SGE_CELL@@/common/jmx/jmxremote.password # # ################ RMI Access file location ##################### # # com.sun.grid.jgdi.management.jmxremote.access.file=filepath # Specifies location for access file # This is optional - default location is # $JRE/lib/management/jmxremote.access # # If the property "com.sun.management.jmxremote.authenticate" is set to # false, then this property and the password & access files are ignored. # Otherwise, the access file must exist and be in the valid format. # If the access file is empty or non-existent then no access is allowed. # # For a non-default access file location use the following line # com.sun.grid.jgdi.management.jmxremote.access.file=filepath com.sun.grid.jgdi.management.jmxremote.access.file=@@SGE_ROOT@@/@@SGE_CELL@@/common/jmx/jmxremote.access # For the JGDI keystore module use this settings for the server keystore and keystore password com.sun.grid.jgdi.management.jmxremote.ssl.serverKeystore=@@SGE_JMX_SSL_KEYSTORE@@ com.sun.grid.jgdi.management.jmxremote.ssl.serverKeystorePassword=@@SGE_JMX_SSL_KEYSTORE_PW@@
jmx.access
The jmx access file defines which principals are mapped to a special role.
###################################################################### # Default Access Control File for Remote JMX(TM) Monitoring ###################################################################### # # Access control file for Remote JMX API access to monitoring. # This file defines the allowed access for different roles. The # password file (jmxremote.password by default) defines the roles and their # passwords. To be functional, a role must have an entry in # both the password and the access files. # # Default location of this file is $JRE/lib/management/jmxremote.access # You can specify an alternate location by specifying a property in # the management config file $JRE/lib/management/management.properties # (See that file for details) # # The file format for password and access files is syntactically the same # as the Properties file format. The syntax is described in the Javadoc # for java.util.Properties.load. # Typical access file has multiple lines, where each line is blank, # a comment (like this one), or an access control entry. # # An access control entry consists of a role name, and an # associated access level. The role name is any string that does not # itself contain spaces or tabs. It corresponds to an entry in the # password file (jmxremote.password). The access level is one of the # following: # "readonly" grants access to read attributes of MBeans. # For monitoring, this means that a remote client in this # role can read measurements but cannot perform any action # that changes the environment of the running program. # "readwrite" grants access to read and write attributes of MBeans, # to invoke operations on them, and to create or remove them. # This access should be granted to only trusted clients, # since they can potentially interfere with the smooth # operation of a running program # # A given role should have at most one entry in this file. If a role # has no entry, it has no access. # If multiple entries are found for the same role name, then the last # access entry is used. # # # Default access control entries: # o The "monitorRole" role has readonly access. # o The "controlRole" role has readwrite access. monitorRole readonly controlRole readwrite
jmx.password
This is also a possible simple authentication mechanism though not recommended. Usually the jaas login module is preferred since it is much more flexible. You can specify a password for the different roles there. If a simple login mechanism is required it is recommended to change management.properties to use TestConfig instead of GridwareConfig, which allows any valid Unix user to connect to JGDI JMX server without a password.
logging.properties
To enable JGDI and JMX logging the delivered logging file has to be adjusted and sge_qmaster or at least the JMX server has to be restarted. The generated logging files default to jgdi0.log, jgdi.stderr and jgdi.stdout in the master spooling directory. The logging can also be influenced by changing the additional_jvm_args configuration to enable additional debugging messages for example.
# # Java Logging Configuration for JMX MBean server # # Specify the handlers to create in the root logger # (all loggers are children of the root logger) # The following creates two handlers # Per default we log to the console #handlers = java.util.logging.ConsoleHandler # Use FileHandler handlers = java.util.logging.FileHandler # ------------------------------------------------------------------------------ # Definition of log levels # ------------------------------------------------------------------------------ # Set the default logging level for the root logger .level = INFO #com.sun.grid.jgdi.JGDI.level = FINE #com.sun.grid.jgdi.rmi.level = FINE #com.sun.grid.jgdi.configuration.xml.XMLUtil.level = FINE #com.sun.grid.jgdi.configuration.ClusterQueueTestCase.level = FINE #com.sun.grid.jgdi.management.level = FINER #com.sun.grid.jgdi.event.level = FINER # For authuser login module debugging #com.sun.grid.security.login.level = FINER #com.sun.grid.util.expect.level = FINER # ------------------------------------------------------------------------------ # Settings for ConsoleHandler # ------------------------------------------------------------------------------ # Set the default logging level for new ConsoleHandler instances java.util.logging.ConsoleHandler.level = INFO # Set the default formatter for new ConsoleHandler instances java.util.logging.ConsoleHandler.formatter = com.sun.grid.jgdi.util.SGEFormatter # ------------------------------------------------------------------------------ # Settings for FileHandler # ------------------------------------------------------------------------------ # Set the default logging level for new FileHandler instances java.util.logging.FileHandler.level = ALL # qmaster runs in qmaster spool dir, so the file is created there java.util.logging.FileHandler.pattern=jgdi%u.log java.util.logging.FileHandler.formatter=com.sun.grid.jgdi.util.SGEFormatter # # Possible columns: # # time timestamp of the log message # host hostname of the log message # name name of the logger # thread id of the thread # level log level (short form) # source class and method name # level_long log_level long form # com.sun.grid.jgdi.util.SGEFormatter.columns = time thread source level message # # Print the stacktrace of the log record # com.sun.grid.jgdi.util.SGEFormatter.withStacktrace=true # # Delimiter between columns # com.sun.grid.jgdi.util.SGEFormatter.delimiter = |
Testing and Troubleshooting
To connect to the JMX server jconsole can be used for testing. It is the responsibility of the administrator to allow/disallow access to the system via JMX. To force also client authentication of jconsole the management.properties file must be configured with:
- com.sun.grid.jgdi.management.jmxremote.ssl=true
- com.sun.grid.jgdi.management.jmxremote.ssl.need.client.auth=true
% jconsole -J-Djava.security.manager=java.rmi.RMISecurityManager \ -J-Djava.security.policy=$SGE_ROOT/util/rmiconsole.policy \ -J-Djavax.net.ssl.trustStore=<server truststore> \ [-J-Djavax.net.ssl.keyStore=/<safe>/mykeystore \ -J-Djavax.net.ssl.keyStorePassword=<mykeystore_pw> \ -J-Djavax.net.ssl.keyPassword=<mykeystore_pw> ] \ [-J-Djavax.net.debug=ssl]
where <server truststore> usually is either:
/var/sgeCA/port5322/$SGE_CELL/private/keystore
(only the server certificate is accessible without password)
or a special truststore is made available by the administrator:
keytool -export -alias "root" \ -keystore /var/sgeCA/port$SGE_QMASTER_PORT/$SGE_CELL/private/keystore -rfc -file /tmp/jmxserver.cer keytool -import -file /tmp/jmxserver.cer -keystore /tmp/truststore Enter keystore password: <pwd> ... Trust this certificate? [no]: yes Certificate was added to keystore
The optional arguments are required if client authentication is set to true or for debugging.
The following simple example can be used to connect via JMX and monitor events
% java [-Dcom.sun.grid.jgdi.keyStore=\ /var/sgeCA/port$SGE_QMASTER_PORT/$SGE_CELL/private/keystore \ -Dcom.sun.grid.jgdi.caTop="$SGE_ROOT/$SGE_CELL/common/sgeCA" \ -Djava.util.logging.config.file=util/shell_logging.properties ] \ -cp $SGE_ROOT/lib/juti.jar:$SGE_ROOT/lib/jgdi.jar \ com.sun.grid.jgdi.examples.jmxeventmonitor.Main
The optional arguments can be skipped and serve only to preset the login dialog with useful values. If a connection has been established once a preferences file is written, that is reused afterwards.
To have the correct environment variables set the SGE settings.(c)sh file has to be sourced. To get access to the keystore the command must be run by the admin user in the example above.
For troubleshooting the following settings and files might give some additional insights:
- Messages file in the master spool directory if the JMX server can't be started
- $SGE_ROOT/$SGE_CELL/common/bootstrap to check if jvm_threads is enabled at all
- jgdi* log files in the master spool directory are the main source for finding out the reason for failure analysis
- $SGE_ROOT/$SGE_CELL/common/jmx/logging.properties to enable more detailed logging
- qconf -mconf with an additional_jvm_args parameter
For example, add these two arguments -Djava.security.debug=all -Djavax.net.debug=ssl to trace any permission and authentication problems.
Removing the Software
This section consists of the following topics.
| Topic | Description |
|---|---|
| How to Remove the Software Interactively | Procedure for removing the Sun Grid Engine software interactively. |
| How to Remove the Software Using the inst_sge Utility and a Configuration Template | Procedure for removing the Sun Grid Engine software using the inst_sge utility and a configuration template. |
How to Remove the Software Interactively
To remove the software interactively, follow the steps below.
| Note Remove the software from the execution hosts before removing it from the master host. If you remove the software from the master host first, you cannot automate the removal of the software from the execution hosts. |
- Ensure that your environment variables are set up properly.

Note
If no cell name was specified during installation, the value of $SGE_CELL is default.- If you are using a C shell, type the following command:
# source $SGE_ROOT/$SGE_CELL/common/settings.csh
- If you are using a Bourne or Korn shell, type the following command:
# . $SGE_ROOT/$SGE_CELL/common/settings.sh
- If you are using a C shell, type the following command:
- On the master host, issue the $SGE_ROOT/inst_sge -ux command.
This example uninstalls the execution hosts: host1, host2 and host3.# $SGE_ROOT/inst_sge -ux -host "host1 host2 host3"
Note
You are not prompted for any information during this process. However, the output from this process will be displayed to the terminal window where you run the command. - (Optional) If you have any shadow master hosts, uninstall them:
# $SGE_ROOT/inst_sge -usm -host "host4" - Uninstall the master host.
# $SGE_ROOT/inst_sge -um
How to Remove the Software Using the inst_sge Utility and a Configuration Template
Unlike the interactive uninstallation method, the automated uninstallation method suppresses output during the process. Also, the automated method requires a properly formatted configuration file.
To remove the software using the inst_sge utility and a configuration template, follow these steps:
| Note Remove the software from the execution hosts before removing it from the master host. If you remove the software from the master host first, you cannot automate the removal of the software from the execution hosts. |
- Ensure that your environment variables are set up properly.

Note
If no cell name was specified during installation, the value of $SGE_CELL is default.- If you are using a C shell, type the following command:
# source $SGE_ROOT/$SGE_CELL/common/settings.csh
- If you are using a Bourne or Korn shell, type the following command:
# . $SGE_ROOT/$SGE_CELL/common/settings.sh
- If you are using a C shell, type the following command:
- Create a copy of the configuration template, $SGE_ROOT/util/inst_sge_modules/inst_sge_template.conf.
# cd $SGE_ROOT/util/inst_sge_modules/ # cp inst_sge_template.conf my_configuration.conf
- Edit your configuration template.
Every host that is in the EXEC_HOST_LIST_RM list will be removed.# Remove these execution hosts in automatic mode EXEC_HOST_LIST_RM="host1 host2 host3 host4" - On the master host type the $SGE_ROOT/inst_sge -ux -auto command.
This example uninstalls the execution hosts: host1, host2 and host3.
Type the following command as one string, with a space between the -auto and the $SGE_ROOT/util/inst_sge_modules/my_configuration.conf components.# $SGE_ROOT/inst_sge -ux -auto $SGE_ROOT/util/inst_sge_modules/my_configuration.conf

Note
You are not prompted for any information during this process. However, the output from this process will be displayed to the terminal window where you run the command. - (Optional) If you have any shadow master hosts, uninstall them.
Type the following command as one string, with a space between the -auto and the $SGE_ROOT/util/inst_sge_modules/my_configuration.conf components.# $SGE_ROOT/inst_sge -usm -auto $SGE_ROOT/util/inst_sge_modules/my_configuration.conf
- Uninstall the master host.
Type the following command as one string, with a space between the -auto and the $SGE_ROOT/util/inst_sge_modules/my_configuration.conf components.# $SGE_ROOT/inst_sge -um -auto $SGE_ROOT/util/inst_sge_modules/my_configuration.conf
Additional Software for the Microsoft Operating System
Microsoft Windows Services for UNIX (SFU) and Microsoft Subsystem for UNIX-based Applications (SUA) make it possible to integrate some Windows operating systems into existing UNIX environments. SFU and SUA provide components that simplify network administration and user management across the UNIX and Windows platforms.
Additional Software
The following sections describe the Microsoft Windows Services for UNIX (SFU) and Microsoft Subsystem for UNIX-based Applications (SUA) in detail.
| Topic | Description |
|---|---|
| Microsoft Services for UNIX | Learn how Microsoft Windows Services for UNIX (SFU) makes it possible to integrate some Windows operating systems into existing UNIX environments. |
| Microsoft Subsystem for UNIX-based Applications | Learn how Microsoft Subsystem for UNIX-based Applications (SUA) makes it possible to integrate some Windows operating systems into existing UNIX environments. |
| Changing Default Behavior to Case Sensitivity | Choose between default behavior and case sensitivity for object names. |
| Disabling DEP | Learn how to disable DEP for different Windows platforms. |
| Enabling suid Behavior for Interix Programs | Learn how to enable suid Behavior for Interix Programs. |
Microsoft Services for UNIX
Microsoft Windows Services for UNIX (SFU) makes it possible to integrate some Windows operating systems into existing UNIX environments. SFU provides components that simplify network administration and user management across the UNIX and Windows platforms. You can use SFU to do the following:
- Integrate Windows hosts into Grid Engine clusters. This means that the execution and client environment of Grid Engine can be used on Microsoft Windows hosts. You must use Grid Engine in combination with SFU for this to occur.
- Access the network file system (NFS). This makes it possible for you to share files between the UNIX and Windows environments.
- Possibly access account and password services on UNIX and Windows systems (PCNFS, NIS) using the user mapping service.
- Synchronize passwords and map authentication credentials between the UNIX and Windows operating systems. You can use the "single sign-on" capability for Windows and UNIX environments.
- Execute UNIX shell scripts and applications to run on Windows platform-based computers in full-featured UNIX environments.
Interix, SFU's UNIX environment subsystem, offers the following features:
- A complete, high-performance UNIX environment. You can use the csh shell or the ksh shell.
- Several hundred tools and utilities.
- A complete set of development tools and libraries that make it possible to port your UNIX-based applications to the Interix sub-system.
SFU is an essential prerequisite to install Grid Engine on Microsoft Windows Server 2003, Windows XP Professional with at least Service Pack 1, Windows 2000 Server with at least Service Pack 3, or Windows 2000 Professional with at least Service Pack 3.
For Microsoft Windows Server 2003 Release 2, Windows Server 2008, Windows Vista Enterprise, Windows Vista Ultimate, please see Microsoft Subsystem for UNIX-based Applications.
Unsupported Grid Engine Functionality
The following Grid Engine components are not supported in a Microsoft Windows environment and cannot be used on Windows Hosts even though they are standard to a Grid Engine installation:
- Master and Scheduler (sge_qmaster and sge_shadowd)
- Graphical User Interface (qmon)
- DRMAA
- qsh client command
| Topic | Description |
|---|---|
| How to Install Services for Unix | Learn how to install Microsoft Services for Unix. |
| Troubleshooting SFU | Learn how to troubleshoot Microsoft Services for Unix. |
| Configuring User Name Mapping | Learn how to configure user name mapping. |
How to Install Microsoft Services for Unix
System Requirements
The following system requirements apply to the SFU installation:
- You must install at least Version 5.0 of Internet Explorer, before running the SFU setup.
- You cannot install SFU on a system running Microsoft Services for Network File System. For example, Microsoft Services for NFS is a component of Windows Storage Server 2003.
- You must install the latest Windows service pack before installing SFU and Grid Engine. Then, you can install additional Windows service packs as they become available.
- The hard disk requirements for an SFU installation depend on which components you need to install. The following installation parameters apply:
- The minimum disk space required is 20 MB.
- The maximum disk space requirement is 360 MB.
- SFU must be installed on a partition that is formatted with the NTFS file system.
- You must disable Data Execution Prevention (DEP). DEP is not compatible with some parts of SFU and might cause segmentation faults. See http://support.microsoft.com/kb/875352 for more information about DEP. To disable DEP, see Disabling DEP.
You can find more details concerning SFU requirements at http://www.microsoft.com/windows/sfu/.
Services for UNIX Installation
Microsoft's SFU is required to install Grid Engine successfully. You can download SFU from Microsoft. Search the site for "Windows Services for Unix" to find the current download information.
- Get the SFU distribution media.
- Execute the application to unzip the files into a directory.
This directory must be located on a file system that has at least 480 MBytes free space.
- Log in to the Windows system with the Administrator account.
- Start the setup.exe application that you unpacked previously.
- Enter your User name and Organization.
- Accept the license agreement for SFU.
- Choose the standard installation (recommended) or the custom installation.
If disk space is limited, you might want to choose the custom installation. Make sure that you install at least the following components:
- Utilities -> Base Utilities
- Interix GNU components -> Interix GNU utilities
- Remote connectivity components -> Telnet Server and Windows Remote Shell
- If you intend to use NFS shared file systems, you also need Authentication tools for NFS -> User Mapping and Server for NFS Authentication.
- Depending on the Windows operating system, you might be presented with the following two options concerning SFU security settings, shown in the dialog box below:
If you need further information, consult Microsoft's SFU documentation.
- Configure User Name Mapping.

Note
User Name Mapping is part of SFU and not part of Sun Grid Engine. Consult Microsoft documentation and support to set up user mapping correctly.Your selection in the dialog box, shown below, depends on the hosts and services that are currently provided in your Windows and UNIX environments. If there is no Remote User Mapping server in your environment, then you should select Local User Name Mapping Server.


Note
You should install SFU and enable the User Name Mapping service on your host that acts as a Domain Controller for your windows environment. All other hosts should contact that Remote User Name Mapping Server. If you choose Local User Name Mapping Server, then you might either select Network Information Services (NIS) to access your passwd and group NIS-maps. Otherwise, select l if you can provide the files yourself.See Configuring User Name Mapping for further details.
- Depending on your previous selections, you can either enter the NIS Domain name and NIS Server name or the path of the passwd and group files.

Below is an example of the files that have the standard UNIX format. This means that you can also use your /etc/passwd and /etc/group files from your UNIX environment.C:\Unix\etc\passwd root:x:0:0:UNIX root user:/home/root:/bin/tcsh user1:x:1002:100:Full name of user1:/home/user1:/bin/tcsh C:\Unix\etc\group root::0:

Note
Some NIS maps do not contain an entry for the root user. If this is the case, follow these steps to map Administrator to root:- First create a password file containing the root entry.
- If the SFU installation is finished, start the Services for UNIX Administration application and create the mapping: Administrator <-> root.
- Switch to NIS mapping.
- Use simple mapping or add manual mappings.
At this point the installation starts installing components. Wait until all components are installed.
- When the installation process finishes, you might need to reboot the machine, depending on the version of Windows that you are using.
- Make sure that the Interix Subsystem Startup starts during boot time.
If you intend to use NFS shares and user mapping, then also start Client for NFS and User Name Mapping.
Depending on the installation options and your version of the Windows operating system, one or more of these services are disabled by default.
Post SFU Installation Tasks
There are several steps you should follow after you install the SFU software.
- Before you start using SFU and install Grid Engine, check that the user mapping is working correctly by following these steps:
- Open an Interix shell locally on the Interix host.
- Use the login command to switch to a known user that is not the Administrator.
- Verify the access permissions for NFS shares that should be accessible to that user.
- Try to access these network resources. If a user cannot access a Network drive, most likely the User Name Mapping is not working correctly.
- Check users' home directories.
To enable the automounting of the users' home directories, use the following series of menus:Control Panel -> Administrative Tools -> Computer Management -> Users -> Properties -> Profile
Click connect to, select a drive letter, and enter the path of the user's home directory in UNC notation: \\<server>\<share>\<user home>.
Within the Interix subsystem, you might access all NFS shares through the special directory: /net/server/share.
You might also create links to these directories to access the shares directly, for example, ln -s /net/myserver/export/share00/home/home.
- Enable Administrator names on your machines.
Make sure that the administrator accounts on all machines that are enabled as execution hosts for Grid Engine use the same account name, such as Administrator.
Also make sure that this user has manager privileges in your Sun Grid Engine cluster. If this is not the case, add the privileges using qconf -am administrator before the installation of the execution daemon.
- Set the CLI commands.
This starts an editor. Make sure to set the EDITOR environment variable to vi, or your preferred UNIX editor, within the Interix subsystem before you start using UNIX commands.
- Mount NFS shares.
There are two ways to mount NFS shares to the Interix host:- The recommended way is to use the auto mount functionality of Interix. All network shares that the Computer Browser service of Windows can find are automatically mounted to the /net directory of Interix. Although only some of these shares might be listed with ls /net, all shares are accessible. The syntax of the auto mount is /net/server/share, such as /net/myserver/home. A link to an auto-mounted share can be created to make it accessible under exactly the same name as on a UNIX host. For example, ln -s /net/myserver/home /home makes the users' UNIX home directories accessible through /home/username. Automounted shares are available starting at boot time. They are available for all users who have the permissions to access these shares. The shares cannot get lost by misconfiguration.
- Network shares can also be mapped to drive letters by using the command nfsmount. The syntax is /usr/sbin/nfsmount -u: \\<computername>\<sharename> <devicename>. For example:
/usr/sbin/nfsmount -u: \\\\myserver\\home Z:
This drive is now accessible through /def/fs/Z. A link can be created to this drive to have the same path as on a UNIX host.

Note
As shown in the example above, all backslashes must be written twice because the shell interprets a single backslash as an escape character.
- Configure the users' home directories.
If the users' home directories are located on an NFS server, follow these steps to configure the users' home directories in Windows:- In the Profile tab of the users properties dialog box, select Connect.
- Select a free drive letter.
- Enter the path to the user's home directory in the UNC notation \\<server>\<share>\<directory>, for example, \\myserver\home\Peter.
Troubleshooting SFU
The following section describes some common problems that users may encounter when installing and using Grid Engine in a Services for UNIX environment on a Windows system.
- Impossible to connect to the Interix subsystem through telnet or rsh.
Make sure that the correct services are started. The corresponding Windows services must be disabled. The Interix versions of telnetd and rshd must be started. You can do this task by removing the pound sign (#) from the following lines in /etc/inetd.conf:#telnet stream tcp nowait NULL /usr/sbin/in.telnetd in.telnetd -i #shell stream tcp nowait NULL /usr/sbin/in.rshd in.rshd -a
If you still cannot connect to the machine, check your firewall configuration. Do not block connections to corresponding ports:
Service | Ports ---------+----------- ftp | 20, 21 ssh | 22 telnet | 23 rsh | 514
- The wrong default login shell is started. Why?
Both the .rhost and host.equiv authentications fail if new user accounts are created and if the passwords of existing users are changed. In this case, the command regpwd needs to be called. After that, follow the steps to register passwords correctly. - Why is the access to NFS mounted home directories slow?
User Name Mapping might be the cause. For a large number of user maps, installing User Name Mapping on a Domain Controller improves performance by reducing network traffic. You can create a User Name Mapping server pool. This method means that you use DNS round-robin to create a pool of computers running User Name Mapping. This provides improved performance on wide area networks and provides failover when one of the servers is no longer available. - How can I map user root if it does not exist in the NIS maps?
First create a passwd file which contains an entry for the user root. Then, explicitly map the root account (no basic mapping) using the created passwd file. Finally, change the mapping to use the NIS maps. Note that the previous root mapping will persist. - NIS Server cannot be







