- 1 Installing Sun Grid Engine Software
- 2 Planning the Installation
- 2.1 Decisions That You Must Make
- 2.2 Gather the Necessary Information
- 2.2.1 Disk Space Requirements
- 2.2.2 $SGE_ROOT Directory
- 2.2.3 Directory Organization
- 2.2.3.1.1 Figure - Sample Directory Hierarchy
- 2.2.4 Cells
- 2.2.5 Cluster Name
- 2.2.6 User Names
- 2.2.7 Installation Accounts
- 2.2.8 File Access Permissions
- 2.2.9 Network Services
- 2.2.10 Master Host
- 2.2.11 Shadow Master Hosts
- 2.2.12 Spool Directories under the Root Directory
- 2.2.13 Choosing Between Classic Spooling and Database Spooling
- 2.2.14 Database Server and Spooling Host
- 2.2.15 Execution Hosts
- 2.2.16 Group IDs
- 2.2.17 Administration Hosts
- 2.2.18 Submit Hosts
- 2.2.19 Cluster Queues
- 2.2.20 Scheduler Profiles
- 2.2.21 Installation Method
- 2.2.22 Check the Other Installation Issues Appendix
- 3 Installation Planning Checklist
- 4 Supported Operating Platforms for Sun Grid Engine
- 5 Loading the Distribution Files on a Workstation
- 5.1 Software Distribution
- 5.2 How to Load the Distribution Files on a Workstation
- 5.2.1.1.1 Before You Begin
- 5.2.1.1.2 Steps
- 5.2.2 pkgadd Method
- 5.2.3 tar Method
- 6 Installing the Sun Grid Engine Software Interactively
- 7 How to Install the Master Host
- 7.1.1.1.1 Before You Begin
- 7.1.1.1.2 Steps
- 8 Example Master Host Installation
- 9 How to Install Execution Hosts
- 9.1.1.1.1 Before You Begin
- 9.1.1.1.2 Steps
- 10 Example Execution Host Installation
- 11 How to Install the Berkeley DB Spooling Server
- 12 Example Berkeley DB Spooling Server Installation
- 13 Registering Administration Hosts
- 14 Registering Submit Hosts
- 15 Installing the Increased Security Features
- 15.1 Why Install the Increased Security Features?
- 15.2 Additional Setup Required
- 15.3 How to Install a CSP-Secured System
- 15.4 How to Generate Certificates and Private Keys for Users
- 15.5 How to Renew Certificates
- 15.6 Checking Certificates
- 15.6.1 Displaying a Certificate
- 15.6.2 Check Issuer
- 15.6.3 Check Subject
- 15.6.4 Show Email of Certificate
- 15.6.5 Show Validity
- 15.6.6 Show Fingerprint
- 16 Upgrading From a Previous Version of Sun Grid Engine Software
- 16.1 About Upgrading the Software
- 16.2 Before You Upgrade
- 16.2.1 Constraints
- 16.2.2 Additional Constraints for the New 6.2 Installation with Cloned Configuration
- 16.3 Back Up the Configuration of the Old Cluster
- 16.3.1 What the Backup Contains
- 16.3.2 How to Back Up the Cluster
- 16.4 How to Install the 6.2 Software Using the Cloned Cluster Configuration Method
- 16.5 How to Upgrade the Original Cluster to 6.2 Software (Real Upgrade)
- 17 Example Upgrade for Cloned Cluster Configuration
- 18 How to Upgrade the Software From 5.3 to 6.0 Update 2
- 18.1.1.1.1 Before You Begin
- 18.1.1.1.2 Steps
- 19 Verifying Sun Grid Engine Installation
- 20 Automating the Installation Process
- 20.1 About Automatic Installation
- 20.1.1 Special Considerations
- 20.2 Using the inst_sge Utility and a Configuration Template
- 20.2.1 How to Automate the Master Host Installation
- 20.2.1.1.1 Before You Begin
- 20.2.1.1.2 Steps
- 20.2.2 Automating Other Installations Through a Configuration File
- 20.3 Automatic Installation With Increased Security (CSP)
- 20.4 Automatic Uninstallation
- 20.4.1 Uninstalling Execution Hosts
- 20.4.2 Uninstalling the Master Host
- 20.4.3 Uninstalling the Shadow Host
- 20.5 Automatic Backup
- 20.5.1 Starting an Automatic Backup
- 20.5.1.1.1 Example - Backup Configuration File
- 20.6 Troubleshooting Automatic Installation and Uninstallation
- 20.6.1.1.1 Supplemental Information
- 20.6.1.1.2 Index
- 21 Configuration File Templates
- 21.1.1.1.1 Example - Configuration File
- 22 Installing SMF Services
- 22.1 Why Install SMF Services?
- 22.2 Additional Setup Required
- 22.3 How Do SMF Services Compare to the Normal Services?
- 22.3.1 qmaster Daemon
- 22.3.2 shadowd Daemon
- 22.3.3 execd Daemon
- 22.3.4 Berkeley RPC Server
- 22.3.5 dbwriter Software
- 23 Installing a JMX-Enabled System
- 23.1 Additional Setup Required
- 23.2 How to Install a JMX Agent-Enabled System
- 23.3 How to Generate Certificates, Private Keys and Keystores for Users
- 23.4 JMX Configuration Files
- 23.4.1 jaas.config
- 23.4.2 java.policy
- 23.4.3 management.properties
- 23.4.4 jmx.access
- 23.4.5 jmx.password
- 23.4.6 logging.properties
- 23.5 Testing and Troubleshooting
- 24 Removing the Grid Engine Software
- 24.1 How to Remove the Software Interactively
- 24.2 How to Remove the Software Using the inst_sge Utility and a Configuration Template
- 25 Microsoft Services for UNIX
- 25.1 Overview
- 25.2 Unsupported Grid Engine Functionality
- 25.3 System Requirements
- 25.4 Services for UNIX Installation
- 25.5 Post SFU Installation Tasks
- 25.6 Troubleshooting SFU
- 26 Changing Default Behavior to Case Sensitivity
- 27 Configuring User Name Mapping
- 28 Disabling Data Eexcution Prevention (DEP)
- 29 Enable suid Behavior for Interix Programs
- 30 User Management for Sun Grid Engine on Windows Hosts
- 30.1 Overview
- 30.2 Managing Users on Windows Hosts
- 30.2.1 Windows User Example
- 30.2.1.1.1 Table - Using Domain Accounts
- 30.2.2 UNIX User Management
- 30.3 Using Grid Engine in a Microsoft Windows Environment
- 30.4 Adding Windows Hosts to Existing Grid Engine Systems
- 31 Other Sun Grid Engine Installation Issues
- 31.1 Verifying and Installing Linux Motif Libraries
- 31.2 Installing the Grid Engine on a System With IPMP
- 31.2.1 What Is IP Multipathing?
- 31.2.2 Issues Between IPMP and Grid Engine
- 31.2.3 Installing the Grid Engine Master Node With IPMP
- 31.2.3.1 Ignoring the Error Messages
- 31.2.3.2 Temporarily Disabling IPMP
- 31.2.4 Installing a Grid Engine on an Execution Host With IPMP
- 31.2.5 Enabling Administrative and Submit Hosts With IPMP
|
Sun Grid Engine Information Center Planning the InstallationWhether you have installed previous versions of the Sun Grid Engine software or this is your first time, you must do some planning before you extract and install the software. This section describes the decisions that you must make, and, wherever possible, gives you criteria on which you can base your decisions. This section consists of the following topics:
Decisions That You Must MakeYou must make several decisions before you can plan the installation:
Gather the Necessary InformationBefore you install the Grid Engine software, you must plan how to achieve the results that fit your environment. This section helps you make the decisions that affect the rest of the procedure. Write down your installation plan in a table similar to the following example. You can view the worksheet alone (for printing).
If you are going to install Grid Engine 6.2 on a Windows system, acquire and install Microsoft Services For UNIX. See Microsoft Services For UNIX for more information. If you are going to install Grid Engine 6.2 on a Windows system, create the required Certificate Security Protocol (CSP) certificates before installing Grid Engine. See How to Install a CSP-Secured System for information about CSP certificates. Check Other Grid Engine Installation Issues for applicability. Disk Space RequirementsThe Grid Engine software directory tree has the following fixed disk space requirements:
The ideal disk space for Grid Engine system spool directories is as follows:
The spool directories of the master host and of the execution hosts are configurable and need not reside under the default location, sge-root.
$SGE_ROOT DirectoryYou must create a directory into which to load the contents of the distribution media. This directory is called the root directory, or $SGE_ROOT. When the Grid Engine system is running, this directory stores the current cluster configuration and all other data that must be spooled to disk.
Use a valid path name for the directory that is network-accessible on all hosts. For example, if the file system is mounted using automounter, set $SGE_ROOT to /usr/SGE6, not to /tmp_mnt/usr/SGE6.
The $SGE_ROOT directory is the top level of the Grid Engine software directory tree. On startup, each Grid Engine software component in a cell needs read access to the $SGE_ROOT/$SGE_CELL/common directory. When Grid Engine software is installed as a single cluster, the value of $SGE_CELL is default. For ease of installation and administration, this directory should be readable on all hosts on which you intend to run the Grid Engine software installation procedure. For example, you can select a directory that is available across a network file system, such as NFS. If you choose to select file systems that are local to the hosts, you must copy the installation directory to each host before you start the installation procedure for the particular machine. See File Access Permissions for a description of required permissions. Directory OrganizationWhen determining the directory organization, you must decide the following:
By default, the installation procedure installs the Grid Engine software, man pages, spool areas, and the configuration files in a directory hierarchy under the installation directory as shown in the following figure. If you accept this default behavior, you should install or select a directory with the access permissions that are described in File Access Permissions. Figure – Sample Directory Hierarchy
You can choose to put the spool areas in other locations during the primary installation. See Configuring Queues for more detailed instructions. CellsYou can set up the Grid Engine system as a single cluster or as a collection of loosely coupled clusters called cells. The $SGE_CELL environment variable indicates the cluster being referenced. When the Grid Engine system is installed as a single cluster, $SGE_CELL is not set, and the value default is assumed for the cell value. Cluster NameThe $SGE_CLUSTER_NAME environment variable supports unique naming of the cluster. Unlike the $SGE_CELL variable, there are restrictions on $SGE_CLUSTER_NAME. If you decide to use Grid Engine SMF services on Solaris 10 or later hosts, you must select a new $SGE_CLUSTER_NAME. This name becomes part of the name of the Sun Grid Engine SMF services. The $SGE_CLUSTER_NAME is also used to distinguish multiple rc files for different clusters.
User NamesFor the Grid Engine system to verify that users submitting jobs have permission to submit them on the desired execution hosts, users' names must be identical on the submit and execution hosts. You might therefore have to change user names on some machines, because Grid Engine user names map directly to system user accounts.
Installation AccountsYou can install the Grid Engine software either as the root user or as an unprivileged user, for example, your own user account. However, if you install the software when you are logged in as an unprivileged user, the installation allows only that user to run Grid Engine jobs. Access is denied to all other accounts. Installing the software when you are logged in as root resolves this restriction. However, root permission is required for the complete installation procedure. Also, if you install as an unprivileged user, you are not allowed to use the qrsh, qtcsh, or qmake commands, nor can you run tightly integrated parallel jobs.
File Access PermissionsIf you install the software logged in as root, you might have a problem configuring root read/write access for all hosts on a shared file system. Therefore, you might have problems putting the $SGE_ROOT files onto a network-wide file system. You can force Grid Engine software to run all Grid Engine system components through a non-root administrative user account, for example sgeadmin. With this setup, this particular user needs only read/write access to the shared $SGE_ROOT file system. The installation procedure asks whether files should be created and owned by an administrative user account. If you answer "Yes" and provide a valid user name, files are created by this user. Otherwise, the user name under which you run the installation procedure is used. Create an administrative user, and answer "Yes" to this question. Make sure in all cases that the account used for file handling on all hosts has read/write access to the $SGE_ROOT directory. Also, the installation procedure assumes that the host from which you access the Grid Engine software distribution media can write to the $SGE_ROOT directory.
Network ServicesDetermine whether your site's network services are defined in an NIS database or in an /etc/services file that is local to each workstation. If your site uses NIS, determine the host name of your NIS server so that you can add entries to the NIS services map. The Grid Engine system services are sge_execd and sge_qmaster. To add the services to your NIS map, choose reserved, unused port numbers. The following examples show sge_qmaster and sge_execd entries. sge_qmaster 6444/tcp sge_execd 6445/tcp Master HostThe master host controls the Grid Engine system. This host runs the master daemon sge_qmaster. The master host must comply with the following requirements:
Shadow Master HostsThese hosts back up the functionality of sge_qmaster in case the master host or the master daemon fails. To be a shadow master host, a machine must have the following characteristics:
The shadow master host facility is activated for a host as soon as these conditions are met. You do not need to restart the Grid Engine system daemons to make a host into a shadow master host.
Spool Directories under the Root DirectoryDuring the installation of the master host, you must specify the location of a spooling directory. This directory is used to spool jobs from execution hosts that do not have a local spooling directory.
You do not need to export these directories to other machines. However, exporting the entire $SGE_ROOT tree and making it write-accessible for the master host and all executable hosts makes administration easier.
Choosing Between Classic Spooling and Database SpoolingDuring the installation, you are given the option to choose between classic spooling and Berkeley DB spooling. If you choose Berkeley DB spooling, you are then given the option to spool to a local directory or to a separate host, known as a Berkeley DB spooling server. Using a Berkeley DB spooling server might provide better performance than classic spooling. Part of this performance increase is because the master host can make non-blocking writes to the database, but has to make blocking writes to the text file used by classic spooling. Also consider file format and data integrity. Writing to the Berkeley DB provides a greater level of data integrity than writing to a text file. However, a text file stores data in a format that you can read and edit. Normally, you do not need to read these files, but the spooling directory contains the messages from the system daemons, which can be useful for debugging. Database Server and Spooling HostThe master host can store its configuration and state to a Berkeley DB spooling database. The spooling database can be installed on the master server or on a separate host. When the Berkeley DB spools into a local directory on the master host, the performance is better. If you want to set up a shadow master host, you need to use a separate Berkeley DB spooling server (host). In this case, you have to choose a host with a configured RPC service. The master host connects through RPC to the Berkeley DB.
With the introduction of NFS4 software available with the Solaris TM 10 operating system, you can use Berkeley DB spooling on a network file system. You could not use Berkeley DB spooling on previous NFS versions. This circumstance allows a shadow host installation spooled on Berkeley DB without setting up an additional Berkeley DB Spooling Server.
If you choose to use Berkeley DB spooling without a shadow master, you do not need to set up a separate spooling server. Likewise, if you choose not to use Berkeley DB spooling, you can set up a shadow master host without setting up a separate spooling server. Once you determine whether you need a separate spooling server, you will also need to determine the location for the spooling directory. The spooling directory must be local to the spooling server. A default value for the location of the spooling directory is recommended during installation, but this default value is not suitable when the file server is different from the master host. The requirements for the Berkeley DB spooling host are similar to the requirements for the master host:
Execution HostsExecution hosts run the jobs that users submit to the Grid Engine system. An execution host must first be set up as an administration host. You run an installation script on each execution host. For more information, see How to Install Execution Hosts. Group IDsYou need to provide a range of IDs that will be assigned dynamically for jobs. The range must be big enough to provide enough numbers for the maximum number of Grid Engine jobs running at a single moment on a single host. A group ID is assigned to each Grid Engine job to monitor the resource utilization of the job. Each job will be assigned a unique ID while it is running. For example, a range of 20000-20100 allows 100 jobs to run concurrently on a single host. You can change the group ID range for your cluster configuration at any time, but the values in the UNIX group ID range must be unused on your system. Administration HostsOperators and managers of the Grid Engine system use administration hosts to perform administrative tasks such as reconfiguring queues or adding Grid Engine users. The master host installation script automatically makes the master host an administration host. During the master host installation process, you can add other administration hosts. You can also manually add administration hosts on the master host at any time after installation. Submit HostsJobs can be submitted and controlled from submit hosts. The master host installation script automatically makes the master host a submit host. Cluster QueuesThe installation procedure creates a default cluster queue structure, which is suitable for getting acquainted with the system. The default queue can be removed after installation.
Consider the following when determining a queue structure:
For more detailed information on administering cluster queues, see Configuring Queues. Scheduler ProfilesYou can choose from three scheduler profiles during the installation process: normal, high, and max. You can use these predefined profiles as a starting point for Grid Engine tuning. Using these profiles, you can optimize the scheduler for one or more of the following:
You can choose from three scheduler profiles:
For more information on how to configure scheduling, see Administering the Scheduler. Installation MethodSeveral methods are available for installing the Grid Engine software:
To decide which installation method you should use, consider the following factors.
Check the Other Installation Issues AppendixIf you are installing Grid Engine on a Linux system or on a system with IPMP, see Other Grid Engine Installation Issues for important information. |
|
Sun Grid Engine Information Center Supported Operating Platforms for Sun Grid EngineThe Sun Grid Engine 6.2 software supports the following operating systems and platforms:
|
|
Sun Grid Engine Information Center Loading the Distribution Files on a WorkstationSoftware DistributionThe Sun Grid Engine 6.2 software is distributed on CD-ROM and through electronic download. For information on how to access CD-ROMs, ask your system administrator or refer to your local system documentation. The CD-ROM distribution contains a directory named Sun_Grid_Engine_6_2. The product distribution is in this directory, in both tar.gz format and the pkgadd format. The pkgadd format is provided for the Solaris Operating System (Solaris OS). For all supported operating systems, the software is distributed in tar.gz format. How to Load the Distribution Files on a WorkstationBefore You BeginEnsure that the file systems and directories that are to contain the Grid Engine software distribution and the spool and configuration files are set up properly by setting the access permissions as defined in File Access Permissions. Steps
pkgadd MethodThe pkgadd format is provided for the Solaris Operating System. To facilitate remote installation, the pkgadd directories are also provided in zip files. You can install the following packages:
As you type the following commands, you must be prepared to respond to script questions about your base directory, sge-root, and the administrative user. The script requests the choices that you made during the planning steps of this installation. See Decisions That You Must Make for further details. At the command prompt, type the following commands, responding to the script questions. # cd cdrom_mount_point/Sun_Grid_Engine_6_2 # pkgadd -d ./Common/Packages SUNWsgeec Depending on the Solaris binary that you need, type one of the following commands: # pkgadd -d ./Solaris_sparc/Packages SUNWsgee # pkgadd -d ./Solaris_sparc/Packages SUNWsgeex # pkgadd -d ./Solaris_x86/Packages SUNWsgeei # pkgadd -d ./Solaris_x64/Packages SUNWsgeeax tar MethodFor all supported operating systems, the software is distributed in tar.gz format. The following table contains files that you need to install, regardless of platform.
The tar files that contain platform-specific binaries use the naming convention of sge-6_2-bin-architecture.tar.gz. The following table lists the platform-specific binaries. Install the file for each platform that you need to support. Note that each platform has its own directory under Sun_Grid_Engine_6_2.
Type the following commands at the command prompt. In the example, <basedir> is the abbreviation for the full directory, cdrom-mount-point/Sun_Grid_Engine_6_2. % su # cd <sge-root> # gzip -dc <basedir>/Common/tar/sge-6_2-common.tar.gz | tar xvpf - # gzip -dc <basedir>/Solaris_sparc/tar/sge-6_2-bin-solsparc32.tar.gz | tar xvpf - # gzip -dc <basedir>/Solaris_sparc/tar/sge-6_2-bin-solsparc64.tar.gz | tar xvpf - # SGE_ROOT=<sge-root>; export SGE_ROOT # util/setfileperm.sh $SGE_ROOT |
|
<< Previous: Loading the Distribution Files on a Workstation |
|
Sun Grid Engine Information Center Installing the Sun Grid Engine Software InteractivelyThis section covers the interactive installation of the Sun Grid Engine software and consists of the following topics:
Interactive Installation Overview
Full installation includes the following tasks:
Performing an InstallationThe following sections describe how to install all the components of the Grid Engine system, including the master, execution, administration, and submit hosts. If you need to install the system with enhanced security, see Installing the Increased Security Features before you continue installation. For more information about installing Grid Engine SMF services see Installing the SMF Services before you start the installation.
|
|
<< Previous: Loading the Distribution Files on a Workstation |
|
Sun Grid Engine Information Center How to Install the Master HostThe master host installation procedure creates the appropriate directory hierarchy that the master daemon requires and starts the Grid Engine master daemon sge_qmaster on the master host. The master host is also registered as a host with administrative and submit permission. The installation procedure creates a default configuration for the system on which it is run. The installation script queries the system for the type of operating system. The script then makes meaningful settings based on this information. If, at any time during the installation, you think something went wrong, you can quit the installation procedure and restart it. Before You Begin
Steps
|
|
Sun Grid Engine Information Center Example Master Host InstallationThe following example shows a complete Sun Grid Engine master host installation. Remember that this is only one step in the entire Sun Grid Engine installation process. The steps in this example coordinate with the master host installation description at How to Install the Master Host. 001 % su - 002 # cd sge-install-dir 003 # ./install_qmaster 004 Grid Engine License is displayed. 005 006 Do you agree with that license? (y/n) [n] >> 007 008 Welcome to the Grid Engine installation 009 --------------------------------------- 010 011 Grid Engine qmaster host installation 012 ------------------------------------- 013 014 Before you continue with the installation please read these hints: 015 016 - Your terminal window should have a size of at least 017 80x24 characters 018 019 - The INTR character is often bound to the key Ctrl-C. 020 The term >Ctrl-C< is used during the installation if you 021 have the possibility to abort the installation 022 023 The qmaster installation procedure will take approximately 5-10 minutes. 024 025 Hit <RETURN> to continue >> 026 Step 5 027 Grid Engine admin user account 028 ------------------------------ 029 030 The current directory 031 032 /opt/sge62 033 034 is owned by user 035 036 myusername 037 038 If user >root< does not have write permissions in this directory on *all* 039 of the machines where Grid Engine will be installed (NFS partitions not 040 exported for user >root< with read/write permissions) it is recommended to 041 install Grid Engine that all spool files will be created under the user id 042 of user >myusername<. 043 044 IMPORTANT NOTE: The daemons still have to be started by user >root<. 045 046 Do you want to install Grid Engine as admin user >myusername< (y/n) [y] >> 047 048 Installing Grid Engine as admin user >myusername< 049 050 Hit <RETURN> to continue >> 051 Choosing Grid Engine admin user account 052 --------------------------------------- 053 054 You may install Grid Engine that all files are created with the user id of an 055 unprivileged user. 056 057 This will make it possible to install and run Grid Engine in directories 058 where user >root< has no permissions to create and write files and directories. 059 060 - Grid Engine still has to be started by user >root< 061 062 - This directory should be owned by the Grid Engine administrator 063 064 Do you want to install Grid Engine 065 under an user id other than >root< (y/n) [y] >> y 066 067 Choosing a Grid Engine admin user name 068 -------------------------------------- 069 070 Please enter a valid user name >> sgeadmin 071 072 Installing Grid Engine as admin user >sgeadmin< 073 074 Hit <RETURN> to continue >> 075 Step 6 076 Checking $SGE_ROOT directory 077 ---------------------------- 078 079 The Grid Engine root directory is: 080 081 $SGE_ROOT = /opt/sge62 082 083 If this directory is not correct (e.g. it may contain an automounter 084 prefix) enter the correct path to this directory or hit <RETURN> 085 to use default [/opt/sge62] >> 086 087 Your $SGE_ROOT directory: /opt/sge62 088 089 Hit <RETURN> to continue >> 090 Step 7 091 Grid Engine TCP/IP communication service 092 ---------------------------------------- 093 094 The port for sge_qmaster is currently set by the shell environment. 095 096 SGE_QMASTER_PORT = 10500 097 098 Now you have the possibility to set/change the communication ports by using the 099 >shell environment< or you may configure it via a network service, configured 100 in local >/etc/services<, >NIS< or >NIS+<, adding an entry in the form 101 102 sge_qmaster <port_number>/tcp 103 104 to your services database and make sure to use an unused port number. 105 106 How do you want to configure the Grid Engine communication ports? 107 108 Using the >shell environment<: [1] 109 110 Using a network service like >/etc/services<, >NIS/NIS+<: [2] 111 112 (default: 1) >> 1 113 114 Using the environment variable 115 116 $SGE_QMASTER_PORT=10500 117 118 as port for communication. 119 120 Hit <RETURN> to continue >> 121 122 Grid Engine TCP/IP communication service 123 ---------------------------------------- 124 125 The port for sge_execd is currently set by the shell environment. 126 127 SGE_EXECD_PORT = 10501 128 129 Now you have the possibility to set/change the communication ports by using the 130 >shell environment< or you may configure it via a network service, configured 131 in local >/etc/services<, >NIS< or >NIS+<, adding an entry in the form 132 133 sge_execd <port_number>/tcp 134 135 to your services database and make sure to use an unused port number. 136 137 How do you want to configure the Grid Engine communication ports? 138 139 Using the >shell environment<: [1] 140 141 Using a network service like >/etc/services<, >NIS/NIS+<: [2] 142 143 (default: 1) >> 1 144 145 Using the environment variable 146 147 $SGE_EXECD_PORT=10501 148 149 as port for communication. 150 151 Hit <RETURN> to continue >> Step 8 152 Grid Engine cells 153 ----------------- 154 155 Grid Engine supports multiple cells. 156 157 If you are not planning to run multiple Grid Engine clusters or if you don't 158 know yet what is a Grid Engine cell it is safe to keep the default cell name 159 160 default 161 162 If you want to install multiple cells you can enter a cell name now. 163 164 The environment variable 165 166 $SGE_CELL=<your_cell_name> 167 168 will be set for all further Grid Engine commands. 169 170 Enter cell name [default] >> 171 172 Using cell >default<. 173 Hit <RETURN> to continue >> 174 Step 9 175 Unique cluster name 176 ------------------- 177 178 The cluster name uniquely identifies a specific Sun Grid Engine cluster. 179 The cluster name must be unique throughout your organization. The name 180 is not related to the SGE cell. 181 182 The cluster name must start with a letter ([A-Za-z]), followed by letters, 183 digits ([0-9]), dashes (-) or underscores (_). 184 185 Enter new cluster name or hit <RETURN> 186 to use default [p10500] >> 187 188 Your $SGE_CLUSTER_NAME: p10500 189 190 Hit <RETURN> to continue >> Step 10 191 Grid Engine qmaster spool directory 192 ----------------------------------- 193 194 The qmaster spool directory is the place where the qmaster daemon stores 195 the configuration and the state of the queuing system. 196 197 The admin user >myusername< must have read/write access 198 to the qmaster spool directory. 199 200 If you will install shadow master hosts or if you want to be able to start 201 the qmaster daemon on other hosts (see the corresponding section in the 202 Grid Engine Installation and Administration Manual for details) the account 203 on the shadow master hosts also needs read/write access to this directory. 204 205 The following directory 206 207 [/opt/sge62/default/spool/qmaster] 208 209 will be used as qmaster spool directory by default! 210 211 Do you want to select another qmaster spool directory (y/n) [n] >> 212 Step 11 213 Windows Execution Host Support 214 ------------------------------ 215 216 Are you going to install Windows Execution Hosts? (y/n) [n] 217 Step 12 218 Verifying and setting file permissions 219 -------------------------------------- 220 221 Did you install this version with >pkgadd< or did you already 222 verify and set the file permissions of your distribution (y/n) [y] >> 223 224 Verifying and setting file permissions 225 -------------------------------------- 226 227 We may now verify and set the file permissions of your Grid Engine 228 distribution. 229 230 This may be useful since due to unpacking and copying of your distribution 231 your files may be unaccessible to other users. 232 233 We will set the permissions of directories and binaries to 234 235 755 - that means executable are accessible for the world 236 237 and for ordinary files to 238 239 644 - that means readable for the world 240 241 Do you want to verify and set your file permissions (y/n) [y] >> 242 243 Verifying and setting file permissions and owner in >3rd_party< 244 Verifying and setting file permissions and owner in >bin< 245 Verifying and setting file permissions and owner in >ckpt< 246 Verifying and setting file permissions and owner in >examples< 247 Verifying and setting file permissions and owner in >inst_sge< 248 Verifying and setting file permissions and owner in >install_execd< 249 Verifying and setting file permissions and owner in >install_qmaster< 250 Verifying and setting file permissions and owner in >lib< 251 Verifying and setting file permissions and owner in >mpi< 252 Verifying and setting file permissions and owner in >pvm< 253 Verifying and setting file permissions and owner in >qmon< 254 Verifying and setting file permissions and owner in >util< 255 Verifying and setting file permissions and owner in >utilbin< 256 Verifying and setting file permissions and owner in >catman< 257 Verifying and setting file permissions and owner in >doc< 258 Verifying and setting file permissions and owner in >include< 259 Verifying and setting file permissions and owner in >man< 260 261 Your file permissions were set 262 263 Hit <RETURN> to continue >> 264 Step 13 265 Select default Grid Engine hostname resolving method 266 ---------------------------------------------------- 267 268 Are all hosts of your cluster in one DNS domain? If this is 269 the case the hostnames 270 271 >hostA< and >hostA.foo.com< 272 273 would be treated as equal, because the DNS domain name >foo.com< 274 is ignored when comparing hostnames. 275 276 Are all hosts of your cluster in a single DNS domain (y/n) [y] >> 277 278 Ignoring domainname when comparing hostnames. 279 280 Hit <RETURN> to continue >> 281 Step 14 282 Making directories 283 ------------------ 284 285 creating directory: /opt/sge62/default/spool/qmaster 286 creating directory: /opt/sge62/default/spool/qmaster/job_scripts 287 Hit <RETURN> to continue >> 288 Step 15 289 Setup spooling 290 -------------- 291 Your SGE binaries are compiled to link the spooling libraries 292 during runtime (dynamically). So you can choose between Berkeley DB 293 spooling and Classic spooling method. 294 Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> 295 296 The Berkeley DB spooling method provides two configurations! 297 298 1) Local spooling: 299 The Berkeley DB spools into a local directory on this host (qmaster host) 300 This setup is faster, but you can't setup a shadow master host 301 302 2) Berkeley DB Spooling Server: 303 If you want to setup a shadow master host, you need to use 304 Berkeley DB Spooling Server! 305 In this case you have to choose a host with a configured RPC service. 306 The qmaster host connects via RPC to the Berkeley DB. This setup is more 307 failsafe, but results in a clear potential security hole. RPC communication 308 (as used by Berkeley DB) can be easily compromised. Please only use this 309 alternative if your site is secure or if you are not concerned about 310 security. Check the installation guide for further advice on how to achieve 311 failsafety without compromising security. 312 313 Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >> y 314 315 Berkeley DB Setup 316 317 ----------------- 318 Please, log in to your Berkeley DB spooling host and execute "inst_sge -db" 319 Please do not continue, before the Berkeley DB installation with 320 "inst_sge -db" is completed, continue with <RETURN> 321 322 Berkeley Database spooling parameters 323 ------------------------------------- 324 325 Please enter the name of your Berkeley DB Spooling Server! >> vector 326 327 328 Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >> 329 330 Hit <RETURN> to continue >> 331 332 Berkeley Database spooling parameters 333 ------------------------------------- 334 335 Please enter the Database Directory now, even if you want to spool locally, 336 it is necessary to enter this Database Directory. 337 338 Default: [/opt/sge62/default/spool/spooldb] >> /tmp/dom/spooldb 339 340 Dumping bootstrapping information 341 Initializing spooling database 342 343 Hit <RETURN> to continue >> Step 16 344 Grid Engine group id range 345 -------------------------- 346 347 When jobs are started under the control of Grid Engine an additional group id 348 is set on platforms which do not support jobs. This is done to provide maximum 349 control for Grid Engine jobs. 350 351 This additional UNIX group id range must be unused group id's in your system. 352 Each job will be assigned a unique id during the time it is running. 353 Therefore you need to provide a range of id's which will be assigned 354 dynamically for jobs. 355 356 The range must be big enough to provide enough numbers for the maximum number 357 of Grid Engine jobs running at a single moment on a single host. E.g. a range 358 like >20000-20100< means, that Grid Engine will use the group ids from 359 20000-20100 and provides a range for 100 Grid Engine jobs at the same time 360 on a single host. 361 362 You can change at any time the group id range in your cluster configuration. 363 364 Please enter a range >> 20000-20100 365 366 Using >20000-20100< as gid range. Hit <RETURN> to continue >> 367 Step 17 368 Grid Engine cluster configuration 369 --------------------------------- 370 371 Please give the basic configuration parameters of your Grid Engine 372 installation: 373 374 <execd_spool_dir> 375 376 The pathname of the spool directory of the execution hosts. User >myusername< 377 must have the right to create this directory and to write into it. 378 379 Default: [/opt/sge62/default/spool] >> 380 Step 18 381 Grid Engine cluster configuration (continued) 382 --------------------------------------------- 383 <administator_mail> 384 385 The email address of the administrator to whom problem reports are sent. 386 387 It is recommended to configure this parameter. You may use >none< 388 if you do not wish to receive administrator mail. 389 390 Please enter an email address in the form >user@foo.com<. 391 392 Default: [none] >> me@my.domain 393 Step 19 394 The following parameters for the cluster configuration were configured: 395 396 execd_spool_dir /opt/sge62/default/spool 397 administrator_mail me@my.domain 398 399 Do you want to change the configuration parameters (y/n) [n] >> n 400 401 Creating local configuration 402 ---------------------------- 403 Creating >act_qmaster< file 404 Adding default complex attributes 405 Adding SGE default usersets 406 Adding >sge_aliases< path aliases file 407 Adding >qtask< qtcsh sample default request file 408 Adding >sge_request< default submit options file 409 Creating >sgemaster< script 410 Creating >sgeexecd< script 411 Creating settings files for >.profile/.cshrc< 412 413 Hit <RETURN> to continue >> 414 Step 20 415 qmaster startup script 416 ---------------------- 417 418 Do you want to start qmaster automatically at machine boot? 419 NOTE: If you select "n" SMF will be not used at all! (y/n) [y] >> 420 421 422 Hit <RETURN> to continue >> 423 424 Grid Engine qmaster startup 425 --------------------------- 426 427 Starting qmaster daemon. Please wait ... 428 Hit <RETURN> to continue >> Step 23 429 Adding Grid Engine hosts 430 ------------------------ 431 432 Please now add the list of hosts, where you will later install your execution 433 daemons. These hosts will be also added as valid submit hosts. 434 435 Please enter a blank separated list of your execution hosts. You may 436 press <RETURN> if the line is getting too long. Once you are finished 437 simply press <RETURN> without entering a name. 438 439 You also may prepare a file with the hostnames of the machines where you plan 440 to install Grid Engine. This may be convenient if you are installing Grid 441 Engine on many hosts. 442 443 Do you want to use a file which contains the list of hosts (y/n) [n] >> n 444 445 Adding admin and submit hosts 446 ----------------------------- 447 448 Please enter a blank seperated list of hosts. 449 450 Stop by entering <RETURN>. You may repeat this step until you are 451 entering an empty list. You will see messages from Grid Engine 452 when the hosts are added. 453 454 Host(s): host1 host2 host3 host4 455 456 host1 added to administrative host list 457 host1 added to submit host list 458 host2 added to administrative host list 459 host2 added to submit host list 460 host3 added to administrative host list 461 host3 added to submit host list 462 host4 added to administrative host list 463 host4 added to submit host list 464 Hit <RETURN> to continue >> 465 466 Adding admin and submit hosts 467 ----------------------------- 468 469 Please enter a blank seperated list of hosts. 470 471 Stop by entering <RETURN>. You may repeat this step until you are 472 entering an empty list. You will see messages from Grid Engine 473 when the hosts are added. 474 475 Host(s): 476 Finished adding hosts. Hit <RETURN> to continue >> 477 478 If you want to use a shadow host, it is recommended to add this host 479 to the list of administrative hosts. 480 481 If you are not sure, it is also possible to add or remove hosts after the 482 installation with <qconf -ah hostname> for adding and <qconf -dh hostname> 483 for removing this host 484 485 Attention: This is not the shadow host installation 486 procedure. 487 You still have to install the shadow host separately 488 489 Do you want to add your shadow host(s) now? (y/n) [y] >> 490 491 Adding Grid Engine shadow hosts 492 ------------------------------- 493 494 Please now add the list of hosts, where you will later install your shadow 495 daemon. 496 497 Please enter a blank separated list of your execution hosts. You may 498 press <RETURN> if the line is getting too long. Once you are finished 499 simply press <RETURN> without entering a name. 500 501 You also may prepare a file with the hostnames of the machines where you plan 502 to install Grid Engine. This may be convenient if you are installing Grid 503 Engine on many hosts. 504 505 Do you want to use a file which contains the list of hosts (y/n) [n] >> 506 507 Adding admin hosts 508 ------------------ 509 510 Please enter a blank seperated list of hosts. 511 512 Stop by entering <RETURN>. You may repeat this step until you are 513 entering an empty list. You will see messages from Grid Engine 514 when the hosts are added. 515 516 Host(s): es-ergb01-01 517 adminhost "es-ergb01-01" already exists 518 Hit <RETURN> to continue >> 519 520 Please enter a blank seperated list of hosts. 521 522 Stop by entering <RETURN>. You may repeat this step until you are 523 entering an empty list. You will see messages from Grid Engine 524 when the hosts are added. 525 526 Host(s): 527 Finished adding hosts. Hit <RETURN> to continue >> 528 529 Creating the default <all.q> queue and <allhosts> hostgroup 530 ----------------------------------------------------------- 531 532 root@myhost added "@allhosts" to host group list 533 root@myhost added "all.q" to cluster queue list 534 535 Hit <RETURN> to continue >> 536 537 No CSP system installed! 538 No CSP system installed! Step 24 539 Scheduler Tuning 540 ---------------- 541 The details on the different options are described in the manual. 542 543 Configurations 544 -------------- 545 1) Normal 546 Fixed interval scheduling, report scheduling information, 547 actual + assumed load 548 549 2) High 550 Fixed interval scheduling, report limited scheduling information, 551 actual load 552 553 3) Max 554 Immediate Scheduling, report no scheduling information, 555 actual load 556 557 Enter the number of your preferred configuration and hit <RETURN>! 558 Default configuration is [1] >> 559 560 561 We're configuring the scheduler with >Normal< settings! 562 Do you agree? (y/n) [y] >> 563 564 changed scheduler configuration Step 26 565 Using Grid Engine 566 ----------------- 567 568 You should now enter the command: 569 570 source /scratch2/myusername/sge62/default/common/settings.csh 571 572 if you are a csh/tcsh user or 573 574 # . /scratch2/myusername/sge62/default/common/settings.sh 575 576 if you are a sh/ksh user. 577 578 This will set or expand the following environment variables: 579 580 - $SGE_ROOT (always necessary) 581 - $SGE_CELL (if you are using a cell other than >default<) 582 - $SGE_CLUSTER_NAME (always necessary) 583 - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<) 584 - $SGE_EXECD_PORT (if you haven't added the service >sge_execd<) 585 - $PATH/$path (to find the Grid Engine binaries) 586 - $MANPATH (to access the manual pages) 587 588 Hit <RETURN> to see where Grid Engine logs messages >> 589 590 Grid Engine messages 591 -------------------- 592 593 Grid Engine messages can be found at: 594 595 Startup messages can be found in SMF service log files. 596 You can get the name of the log file by calling svcs -l <SERVICE_NAME> 597 E.g.: svcs -l svc:/application/sge/qmaster:p10500 598 599 After startup the daemons log their messages in their spool directories. 600 601 Qmaster: /scratch2/myusername/sge62/default/spool/qmaster/messages 602 Exec daemon: <execd_spool_dir>/<hostname>/messages 603 604 605 Grid Engine startup scripts 606 --------------------------- 607 608 Grid Engine startup scripts can be found at: 609 610 /scratch2/myusername/sge62/default/common/sgemaster (qmaster) 611 /scratch2/myusername/sge62/default/common/sgeexecd (execd) 612 613 Do you want to see previous screen about using Grid Engine again (y/n) [n] >> 614 615 Your Grid Engine qmaster installation is now completed 616 ------------------------------------------------------ 617 618 Please now login to all hosts where you want to run an execution daemon 619 and start the execution host installation procedure. 620 621 If you want to run an execution daemon on this host, please do not forget 622 to make the execution host installation in this host as well. 623 624 All execution hosts must be administrative hosts during the installation. 625 All hosts which you added to the list of administrative hosts during this 626 installation procedure can now be installed. 627 628 You may verify your administrative hosts with the command 629 630 # qconf -sh 631 632 and you may add new administrative hosts with the command 633 634 # qconf -ah <hostname> 635 636 Please hit <RETURN> >> 637 638 sge_qmaster successfully installed! |
|
Sun Grid Engine Information Center How to Install Execution HostsThe execution host installation procedure creates the appropriate directory hierarchy required by sge_execd, and starts the sge_execd daemon on the execution host. This section describes how to install execution hosts interactively from the command line. You can automate the installation of execution of multiple hosts by using the procedure described in Automating the Installation Process. Before You BeginBefore installing an execution host, you first need to install the master server as described in How to Install the Master Host and share the common directory.
Steps
|
|
Sun Grid Engine Information Center Example Execution Host InstallationThe following example shows a complete Sun Grid Engine execution host installation. Before you install the execution host, you need to first install the master server as described in How to Install the Master Host. The line numbers in this example are referred to from the execution host installation description at How to Install Execution Hosts. Steps 1-6 001 % su - 002 # qstat -f 003 # ./install_execd 004 005 Welcome to the Grid Engine execution host installation 006 ------------------------------------------------------ 007 008 If you haven't installed the Grid Engine qmaster host yet, you must execute 009 this step (with >install_qmaster<) prior the execution host installation. 010 011 For a sucessful installation you need a running Grid Engine qmaster. It is 012 also necessary that this host is an administrative host. 013 014 You can verify your current list of administrative hosts with 015 the command: 016 017 # qconf -sh 018 019 You can add an administrative host with the command: 020 021 # qconf -ah <hostname> 022 023 The execution host installation will take approximately 5 minutes. 024 025 Hit <RETURN> to continue >> 026 Step 7 027 Checking $SGE_ROOT directory 028 ---------------------------- 029 030 The Grid Engine root directory is: 031 032 $SGE_ROOT = /scratch2/myusername/sge62 033 034 If this directory is not correct (e.g. it may contain an automounter 035 prefix) enter the correct path to this directory or hit <RETURN> 036 to use default [/scratch2/myusername/sge62] >> 037 038 Your $SGE_ROOT directory: /scratch2/myusername/sge62 039 040 Hit <RETURN> to continue >> 041 Step 8 042 Grid Engine cells 043 ----------------- 044 045 Please enter cell name which you used for the qmaster 046 installation or press <RETURN> to use [default] >> 047 048 Using cell: >default< 049 050 Hit <RETURN> to continue >> 051 052 ... set owner of /var/sgeCA/port10500 to bofur+myusername 053 054 ... copy /var/sgeCA/port10500/default/userkeys/root to 055 /var/sgeCA/port10500/default/userkeys/bofur+Administrator 056 cp: /var/sgeCA/port10500/default/userkeys/root: No such file or directory 057 058 ... copy /var/sgeCA/port10500/default/userkeys/root to 059 /var/sgeCA/port10500/default/userkeys/Administrator 060 cp: /var/sgeCA/port10500/default/userkeys/root: No such file or directory 061 062 ... copy /var/sgeCA/port10500/default/userkeys/myusername to 063 /var/sgeCA/port10500/default/userkeys/bofur+myusername 064 065 ... set owner of /var/sgeCA/port10500/default/userkeys/Administrator to Administrator 066 067 ... set owner of /var/sgeCA/port10500/default/userkeys/bofur+Administrator to bofur+Administrator 068 069 ... set owner of /var/sgeCA/port10500/default/userkeys/myusername to myusername 070 071 ... set owner of /var/sgeCA/port10500/default/userkeys/bofur+myusername to bofur+myusername 072 073 ... remove old /var/sgeCA/port10500/default/userkeys/root certificates 074 075 WINDOWS certificates are copied and permissions are set! 076 Step 9
077 Grid Engine TCP/IP communication service
078 ----------------------------------------
079
080 The port for sge_execd is currently set BOTH as service and by the
081 shell environment
082
083 SGE_EXECD_PORT = 10501
084 sge_execd service set to port 725
085
Step 10 Step 11
086 Checking hostname resolving
087 ---------------------------
088
089 This hostname is known at qmaster as an administrative host.
090
091 Hit <RETURN> to continue >>
092
Step 12 093 Local execd spool directory configuration 094 ----------------------------------------- 095 096 During the qmaster installation you've already entered a global 097 execd spool directory. This is used, if no local spool directory is configured. 098 099 Now you can configure a local spool directory for this host. 100 ATTENTION: The local spool directory doesn't have to be located on a local 101 drive. It is specific to the <local> host and can be located on network drives, 102 too. But for performance reasons, spooling to a local drive is recommended. 103 104 FOR WINDOWS USER: On Windows systems the local spool directory MUST be set 105 to a local harddisk directory. 106 Installing an execd without local spool directory makes the host unuseable. 107 Local spooling on local harddisk is mandatory for Windows systems. 108 109 Do you want to configure a local spool directory 110 for this host (y/n) [n] >> y 111 112 Please enter the local spool directory now! >> /tmp/dom/execs 113 Using local execd spool directory [/tmp/dom/execs] 114 Hit <RETURN> to continue >> 115 116 Creating local configuration 117 ---------------------------- 118 myusername@domain.com modified "domain.com" in configuration list 119 Local configuration for host >domain.com< created. 120 121 Hit <RETURN> to continue >> 122 Step 13
123 execd startup script
124 --------------------
125
126 We can install the startup script that will
127 start execd at machine boot (y/n) [y] >> n
128
129
130 Hit <RETURN> to continue >>
131
Step 14 132 Windows Helper Service Installation 133 --------------------------------------- 134 135 If you're going to run Windows job's using GUI support, you have 136 to install the Windows Helper Service 137 Do you want to install the Windows Helper Service? (y/n) [n] >> y 138 139 Testing, if a service is already installed! 140 141 ... a service is already installed! 142 ... stopping service! 143 ... uninstalling old service! 144 Service successfully uninstalled. 145 146 147 ... moving new service binary! 148 ... installing new service! 149 Service successfully installed. 150 151 152 ... starting new service! 153 154 Hit <RETURN> to continue >> 155 156 Grid Engine execution daemon startup 157 ------------------------------------ 158 159 Starting execution daemon. Please wait ... 160 starting sge_execd 161 162 Hit <RETURN> to continue >> 163 Step 15 164 Adding a queue for this host 165 ---------------------------- 166 167 We can now add a queue instance for this host: 168 169 - it is added to the >allhosts< hostgroup 170 - the queue provides 1 slot(s) for jobs in all queues 171 referencing the >allhosts< hostgroup 172 173 You do not need to add this host now, but before running jobs on this host 174 it must be added to at least one queue. 175 176 Do you want to add a default queue instance for this host (y/n) [y] >> 177 178 No modification because "bofur" already exists in "hostlist" of "hostgroup" 179 root@domain.com modified "@allhosts" in host group list 180 root@domain.com modified "all.q" in cluster queue list 181 182 Hit <RETURN> to continue >> 183 Step 16 184 Using Grid Engine 185 ----------------- 186 187 You should now enter the command: 188 189 source /scratch2/myusername/sge62/default/common/settings.csh 190 191 if you are a csh/tcsh user or 192 193 # . /scratch2/myusername/sge62/default/common/settings.sh 194 195 if you are a sh/ksh user. 196 197 This will set or expand the following environment variables: 198 199 - $SGE_ROOT (always necessary) 200 - $SGE_CELL (if you are using a cell other than >default<) 201 - $SGE_CLUSTER_NAME (always necessary) 202 - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<) 203 - $SGE_EXECD_PORT (if you haven't added the service >sge_execd<) 204 - $PATH/$path (to find the Grid Engine binaries) 205 - $MANPATH (to access the manual pages) 206 207 Hit <RETURN> to see where Grid Engine logs messages >> 208 209 Grid Engine messages 210 -------------------- 211 212 Grid Engine messages can be found at: 213 214 /tmp/qmaster_messages (during qmaster startup) 215 /tmp/execd_messages (during execution daemon startup) 216 217 After startup the daemons log their messages in their spool directories. 218 219 Qmaster: /scratch2/myusername/sge62/default/spool/qmaster/messages 220 Exec daemon: <execd_spool_dir>/<hostname>/messages 221 222 223 Grid Engine startup scripts 224 --------------------------- 225 226 Grid Engine startup scripts can be found at: 227 228 /scratch2/myusername/sge62/default/common/sgemaster (qmaster) 229 /scratch2/my/sge62/default/common/sgeexecd (execd) 230 231 Do you want to see previous screen about using Grid Engine again (y/n) [n] >> 232 233 Your execution daemon installation is now completed. 234 |
|
Sun Grid Engine Information Center How to Install the Berkeley DB Spooling ServerThe installation procedure installs the Grid Engine software necessary for Berkeley DB spooling.
|
|
Sun Grid Engine Information Center Example Berkeley DB Spooling Server InstallationThe following example shows a complete Berkeley DB Spooling Server installation. Remember that this is only one step in the entire Sun Grid Engine installation process. The line numbers in this example are referred to from the spooling server installation description at How to Install the Berkeley DB Spooling Server. Steps 1-4 # cd $SGE_ROOT # sge-root/inst_sge -db Choosing Grid Engine admin user account --------------------------------------- You may install Grid Engine that all files are created with the user id of an unprivileged user. This will make it possible to install and run Grid Engine in directories where user >root< has no permissions to create and write files and directories. - Grid Engine still has to be started by user >root< - this directory should be owned by the Grid Engine administrator Do you want to install Grid Engine under an user id other than >root< (y/n) [y] >> y Choosing a Grid Engine admin user name -------------------------------------- Please enter a valid user name >> sgeadmin Installing Grid Engine as admin user >sgeadmin< Hit <RETURN> to continue >> Checking $SGE_ROOT directory ---------------------------- The Grid Engine root directory is: $SGE_ROOT = /opt/n1ge6 If this directory is not correct (e.g. it may contain an automounter prefix) enter the correct path to this directory or hit <RETURN> to use default [/opt/n1ge6] >> Your $SGE_ROOT directory: /opt/n1ge6 Hit <RETURN> to continue >> Grid Engine cells ----------------- Grid Engine supports multiple cells. If you are not planning to run multiple Grid Engine clusters or if you don't know yet what is a Grid Engine cell it is safe to keep the default cell name default If you want to install multiple cells you can enter a cell name now. The environment variable $SGE_CELL=<your_cell_name> will be set for all further Grid Engine commands. Enter cell name [default] >> Setup spooling -------------- Your SGE binaries are compiled to link the spooling libraries during runtime (dynamically). So you can choose between Berkeley DB spooling and Classic spooling method. Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> Berkeley Database spooling parameters ------------------------------------- You are going to install an RPC Client/Server mechanism! In this case, qmaster will contact an RPC server running on a separate server machine. If you want to use the SGE shadowd, you have to use the RPC Client/Server mechanism. Enter database server name or hit <RETURN> to use default [host2] >> Enter the database directory or hit <RETURN> to use default [/opt/n1ge6/default//spooldb] >> creating directory: /opt/n1ge6/default//spooldb Now we have to startup the rc script >/opt/n1ge6/default/common/sgebdb< on the RPC server machine If you already have a configured Berkeley DB Spooling Server, you have to restart the Database with the rc script now and continue with >NO< Shall the installation script try to start the RPC server? (y/n) [y] >> y Starting rpc server on host host2! The Berkeley DB has been started with these parameters: Spooling Server Name: host2 DB Spooling Directory: /opt/n1ge6/default//spooldb Please remember these values, during Qmaster installation you will be asked for them! Hit <RETURN> to continue! Berkeley DB startup script -------------------------- We can install the startup script that Grid Engine is started at machine boot (y/n) [y] >> y |
|
Sun Grid Engine Information Center Registering Administration HostsThe master host is implicitly allowed to run administrative tasks and to submit, monitor, and delete jobs. The master host does not require any additional installation or configuration to perform administration functions. By contrast, pure administration hosts do require registration.
To register an administration host from the command line:
|
|
Sun Grid Engine Information Center Registering Submit Hosts
To register a submit host from the command line:
Refer to About Hosts and Daemons for more details and other means to configure the different host types. |
|
<< Previous: Installing Sun Grid Engine Software Interactively |
|
Sun Grid Engine Information Center Installing the Increased Security FeaturesUse the instructions in this section to set up your system more securely. These instructions will help you set up your system with Certificate Security Protocol (CSP)-based encryption. Installing the increased security features consists of the following topics: Why Install the Increased Security Features?Instead of transferring messages in clear text, the messages in this secure system are encrypted with a secret key. The secret key is exchanged using a public/private key protocol. Users present their certificates through the Grid Engine system to prove identity. Users receive the certificate to ensure that they are communicating with the correct systems. After this initial announcement phase, communication continues transparently in encrypted form. The session is valid only for a certain period, after which the session must be re-announced. Additional Setup RequiredThe steps required to set up the Certificate Security Protocol enhanced version of the Grid Engine system are similar to the standard setup. You generally follow the instructions in Planning the Installation, Loading the Distribution Files on a Workstation, How to Install the Master Host, How to Install Execution Hosts and Registering Administration Hosts. However, the following additional tasks are required:
How to Install a CSP-Secured SystemInstall the Grid Engine software as outlined in Performing an Installation, with the following exception: use the additional flag -csp when invoking the various installation scripts. To install a CSP-secured system do the following:
How to Generate Certificates and Private Keys for UsersTo use the CSP-secured system, the user must have access to a user-specific certificate and private key. The most convenient method of gaining access is to create a text file identifying the users.
How to Renew Certificates
Checking CertificatesThe following sections provide examples of commands related to certificates, where arch is your system architecture, as in sol-sparc64. Depending on what you want to do, type one or more of the following commands. Displaying a CertificateType the following as one string with a space between the -in and the ~/.sge components.
% $SGE_ROOT/utilbin/arch/opensslx509 -in
~/.sge/port536/default/certs/cert.pem -text
Check IssuerType the following as one string with a space between the -in and the ~/.sge components.
% $SGE_ROOT/utilbin/arch/opensslx509 -issuer -in
~/.sge/port536/default/certs/cert.pem -noout
Check SubjectType the following as one string with a space between the -in and the ~/.sge components.
% $SGE_ROOT/utilbin/arch/opensslx509 -subject -in
~/.sge/port536/default/certs/cert.pem -noout
Show Email of CertificateType the following as one string with a space between the -in and the ~/.sge components.
% $SGE_ROOT/utilbin/arch/opensslx509 -email -in
~/.sge/default/port536/certs/cert.pem -noout
Show ValidityType the following as one string with a space between the -in and the ~/.sge components.
% $SGE_ROOT/utilbin/arch/opensslx509 -dates -in
~/.sge/default/port536/certs/cert.pem -noout
Show FingerprintType the following as one string with a space between the -in and the ~/.sge components.
% $SGE_ROOT/utilbin/arch/opensslx509 -fingerprint -in
~/.sge/port536/default/certs/cert.pem -noout
|
|
<< Previous: Installing Sun Grid Engine Software Interactively |
|
<< Previous: Installing Sun Grid Engine Software Interactively |
|
Sun Grid Engine Information Center Upgrading From a Previous Version of Sun Grid Engine Software
About Upgrading the Software
The upgrade procedure uses the cluster configuration information from the older version of the software to install the Grid Engine 6.2 software on the master host. Beginning with the Sun Grid Engine 6.2 release, you can install 6.2 to a different $SGE_ROOT or $SGE_CELL and transfer the old configuration to this cluster. This method is called cloned cluster configuration. You might want to use this method to accomplish the following:
Before You UpgradeChoose one of the following methods to upgrade to 6.2:
ConstraintsThe following constraints apply to both upgrade methods:
Additional Constraints for the New 6.2 Installation with Cloned ConfigurationFor the cloned cluster configuration, you must also define several new variables and directories that must be different from the original settings:
Back Up the Configuration of the Old ClusterYou can create this backup at any time before you start the upgrade procedure. The upgrade is the same for both types of the upgrade procedures. To create the backup, at least the qmaster daemon must be running. What the Backup ContainsThe backup saves the following files:
The backup process creates the following files:
The standard qconf client is used to save the complete cluster configuration. How to Back Up the Cluster
How to Install the 6.2 Software Using the Cloned Cluster Configuration Method
Upgrade is complete. How to Upgrade the Original Cluster to 6.2 Software (Real Upgrade)
Upgrade is complete. |
|
<< Previous: Installing Sun Grid Engine Software Interactively |
|
Sun Grid Engine Information Center Example Upgrade for Cloned Cluster ConfigurationThe following upgrade example uses a copy of the existing cluster configuration with a different $SGE_CELL. This example does not use JMX and there are no Service Tags. The steps in this example are referred to from the software upgrade description at How to Install the 6.2 Software Using the Cloned Cluster Configuration Method. Steps 4 and 5# ./inst_sge -upd Welcome to the Grid Engine Upgrade Procedure -------------------------------------------- Before you continue with the upgrade, read these hints: - Your terminal window should have a size of at least 80x24 characters - At any time during the upgrade process, use your standard interrupt key to abort the upgrade. Typically, the interrupt key combination is Ctrl-C. The upgrade procedure will take approximately 1-2 minutes. Hit <RETURN> to continue >> Step 6
Type the complete path to the Grid Engine configuration backup directory.
-------------------------------------------------------------------------
Backup directory >> /tmp/bck
Found backup from GE 6.1u4 version created on 2008-06-10_10:56:29
Continue with this backup directory (y/n) [y] >>
Step 7The Grid Engine root directory is: $SGE_ROOT = /sge If this directory is not correct (e.g. it may contain an automounter prefix) enter the correct path to this directory or hit <RETURN> to use default [/sge] >> Your $SGE_ROOT directory: /sge Hit <RETURN> to continue >> Step 8Grid Engine cells ----------------- Grid Engine supports multiple cells. If you are not planning to run multiple Grid Engine clusters or if you don't know yet what is a Grid Engine cell it is safe to keep the default cell name default If you want to install multiple cells you can enter a cell name now. The environment variable $SGE_CELL=<your_cell_name> will be set for all further Grid Engine commands. Enter cell name [default] >> new_cell Using cell >new_cell<. Hit <RETURN> to continue >> Step 9Grid Engine TCP/IP communication service ---------------------------------------- The port for sge_qmaster is currently set by the shell environment. SGE_QMASTER_PORT = 21640 Now you have the possibility to set/change the communication ports by using the >shell environment< or you may configure it via a network service, configured in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form sge_qmaster <port_number>/tcp to your services database and make sure to use an unused port number. How do you want to configure the Grid Engine communication ports? Using the >shell environment<: [1] Using a network service like >/etc/service<, >NIS/NIS+<: [2] (default: 1) >> Grid Engine TCP/IP communication service ---------------------------------------- Using the environment variable $SGE_QMASTER_PORT=21640 as port for communication. Do you want to change the port number? (y/n) [n] >> Step 10Grid Engine TCP/IP communication service ---------------------------------------- The port for sge_execd is currently set by the shell environment. SGE_EXECD_PORT = 21641 Now you have the possibility to set/change the communication ports by using the >shell environment< or you may configure it via a network service, configured in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form sge_execd <port_number>/tcp to your services database and make sure to use an unused port number. How do you want to configure the Grid Engine communication ports? Using the >shell environment<: [1] Using a network service like >/etc/service<, >NIS/NIS+<: [2] (default: 1) >> Grid Engine TCP/IP communication service ---------------------------------------- Using the environment variable $SGE_EXECD_PORT=21641 as port for communication. Do you want to change the port number? (y/n) [n] >> Step 11Grid Engine qmaster spool directory ----------------------------------- The qmaster spool directory is the place where the qmaster daemon stores the configuration and the state of the queuing system. The admin user >sgeadmin< must have read/write access to the qmaster spool directory. If you will install shadow master hosts or if you want to be able to start the qmaster daemon on other hosts (see the corresponding section in the Grid Engine Installation and Administration Manual for details) the account on the shadow master hosts also needs read/write access to this directory. The following directory [/sge/new_cell/spool/qmaster] will be used as qmaster spool directory by default! Do you want to select another qmaster spool directory (y/n) [n] >> Step 12Unique cluster name ------------------- The cluster name uniquely identifies a specific Sun Grid Engine cluster. The cluster name must be unique throughout your organization. The name is not related to the SGE cell. The cluster name must start with a letter ([A-Za-z]), followed by letters, digits ([0-9]), dashes (-) or underscores (_). Enter new cluster name or hit <RETURN> to use default [p21640] >> Your $SGE_CLUSTER_NAME: p21640 Hit <RETURN> to continue >> Step 14
creating directory: /sge/new_cell/spool/qmaster/job_scripts
Setup spooling
--------------
Your SGE binaries are compiled to link the spooling libraries
during runtime (dynamically). So you can choose between Berkeley DB
spooling and Classic spooling method.
Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> classic
Initializing spooling database
Hit <RETURN> to continue >>
Step 15Interactive Job Support (IJS) Selection --------------------------------------- The backup configuration includes information for running interactive jobs. Do you want to use the IJS information from the backup ('y') or use new default values ('n') (y/n) [y] >> n Using new interactive job support default setting for a new installation. Hit <RETURN> to continue >> Creating >act_qmaster< file Step 16Grid Engine group id range -------------------------- When jobs are started under the control of Grid Engine an additional group id is set on platforms which do not support jobs. This is done to provide maximum control for Grid Engine jobs. This additional UNIX group id range must be unused group id's in your system. Each job will be assigned a unique id during the time it is running. Therefore you need to provide a range of id's which will be assigned dynamically for jobs. The range must be big enough to provide enough numbers for the maximum number of Grid Engine jobs running at a single moment on a single host. E.g. a range like >20000-20100< means, that Grid Engine will use the group ids from 20000-20100 and provides a range for 100 Grid Engine jobs at the same time on a single host. You can change at any time the group id range in your cluster configuration. Please enter a range [34299-34498] >> Using >34299-34498< as gid range. Hit <RETURN> to continue >> Grid Engine cluster configuration --------------------------------- Please give the basic configuration parameters of your Grid Engine installation: <execd_spool_dir> The pathname of the spool directory of the execution hosts. User >sgeadmin< must have the right to create this directory and to write into it. Default: [/sge/new_cell/spool] >> Grid Engine cluster configuration (continued) --------------------------------------------- <administrator_mail> The email address of the administrator to whom problem reports are sent. It is recommended to configure this parameter. You may use >none< if you do not wish to receive administrator mail. Please enter an email address in the form >user@foo.com<. Default: [sgeadmin@qmaster.com] >> The following parameters for the cluster configuration were configured: execd_spool_dir /sge/new_cell/spool administrator_mail sgeadmin@qmaster.com Do you want to change the configuration parameters (y/n) [n] >> Step 17Provide a value to use for the next job ID. ------------------------------------------- Backup contains last job ID 1. As a suggested value, we added 1000 to that number and rounded it up to the nearest 1000. Increase the value, if appropriate. Choose the new next job ID [2000] >> Hit <RETURN> to continue >> Step 18Provide a value to use for the next AR ID. ------------------------------------------ Backup contains last AR ID 1. As a suggested value, we added 1000 to that number and rounded it to the nearest 1000. Increase the value, if appropriate. Choose the new next AR ID [2000] >> Hit <RETURN> to continue >> Step 19Creating >sgemaster< script Creating >sgeexecd< script Creating settings files for >.profile/.cshrc< Hit <RETURN> to continue >> qmaster startup script ---------------------- Do you want to start qmaster automatically at machine boot? NOTE: If you select "n" SMF will be not used at all! (y/n) [y] >> n Grid Engine qmaster startup --------------------------- Starting qmaster daemon. Please wait ... starting sge_qmaster Hit <RETURN> to continue >> Step 20Last step - load configuration from the backup ---------------------------------------------- load command: /sge/util/upgrade_modules/load_sge_config.sh /tmp/bck -mode "copy" -log C -newijs "false" -gid_range "34299-34498" -admin_mail "sgeadmin@qmaster.com" -execd_spool_dir "/sge/new_cell/spool" Hit <RETURN> to continue >> Loading saved cluster configuration from /tmp/bck (log in /tmp/sge_backup_load_2008-06-13_17:42:28.log)... Loading saved cluster configuration from /tmp/bck (log in /tmp/sge_backup_load_2008-06-13_17:42:28.log)... Done If loading the configuration succeeded run these additional commands: REQUIRED: inst_sge -upd-execd This command initializes all execd spool directories. inst_sge -upd-win This command connects to all Windows execution hosts and installs the new Windows helper service on each host. WARNING: If a helper service from a previous release is running on this host, the new helper service overwrites it. The host will run only in a 6.2 cluster. TIP: This action requires to enter a windows administrator user for each host interactively. If all your systems share the same administrator you can set the environment variable SGE_WIN_ADMIN to that user name. E.g.: (sh, bash) export SGE_WIN_ADMIN=Administrator (csh,tcsh) setenv SGE_WIN_ADMIN Administrator OPTIONAL: inst_sge -upd-rc This command creates new autostart scripts for the new cluster and removes any conflicting files. TIP: To disable SMF on Solaris systems, use the command inst_sge -upd-rc -nosmf TIP: Use inst_sge -post-upd to do all above actions |
|
IndexHow to Upgrade the Software From 5.3 to 6.0 Update 2Before You BeginBe sure to review Planning the Installation for the information that you will need during the upgrade process. If you have decided to use an administrative user, as described in User Names, you should create that user now. This procedure assumes that you have already extracted the Grid Engine software, as described in Loading the Distribution Files on a Workstation.
Steps
|
|
<< Previous: Installing Sun Grid Engine Software Interactively |
|
Sun Grid Engine Information Center Verifying Sun Grid Engine InstallationVerifying the InstallationThe verification phase includes the following tasks:
To ensure that the Grid Engine system daemons are running, look for the sge_qmaster daemon on the master host and the sge_execd daemon on the execution hosts. Once you have verified that the daemons are running, you can try to use commands and prepare to submit jobs.
How to Verify That the Daemon Is Running on the Master Host
How to Verify That the Daemons Are Running on the Execution Hosts
How to Run Simple CommandsIf both the necessary daemons are running on the master and execution hosts, the Grid Engine software should be operational. Check by issuing a trial command.
How to Submit Test JobsBefore you start submitting batch scripts to the Grid Engine system, check to see whether your site's standard shell resource files (.cshrc, .profile, or .kshrc) as well as your personal resource files contain commands such as stty. Batch jobs do not have a terminal connection by default, and therefore calls to stty result in an error.
In case of problems, see Improving Grid Engine Performance. |
|
<< Previous: Installing Sun Grid Engine Software Interactively |
|
Sun Grid Engine Information Center Automating the Installation ProcessThis section describes how you can automate the software installation process for the following reasons:
This section consists of the following topics:
About Automatic InstallationYou can use the $SGE_ROOT/inst_sge utility to install and uninstall Sun Grid Engine master hosts, execution hosts, shadow host and Berkeley DB spooling server hosts. You can also use this utility to backup automatically the Sun Grid Engine configuration and accounting data.
You can use the inst_sge utility in interactive mode to supplant any of the commands that were described in Installing the Grid Engine Software Interactively. To simplify automatic installation and backup processes, use the configuration templates that are located in the $SGE_ROOT/util/install_modules directory. The automatic installation requires no user interaction. No messages are displayed on the terminal during the installation. When the installation finishes, a message indicates where the installation log file resides. The name of the installation log file format is install_hostname_timestamp.log. Normally, you can find information about errors during installation in this file. In case of serious errors though, the installation script might not be able to move the log file into the spool directory. In this situation, the log file is placed in the /tmp directory. Special ConsiderationsThe first step in performing an automatic installation is to set up a configuration file. You can find configuration file templates in the $SGE_ROOT/util/install_modules directory. Consider the following as you plan your automatic installation:
To perform this step manually before you start the automatic installation, use the following command: ./inst_sge -db You can also use the following command to install automatically the Berkeley DB Spooling Server: % ./inst_sge -db -m -x -auto <full-path-to-configuration-file> This command checks the SPOOLING_SERVER entry within the configuration file and starts the Berkeley DB installation on the server host.
Using the inst_sge Utility and a Configuration TemplateTo automate system installation, use the inst_sge utility in combination with a configuration file. See Configuration File Templates.
How to Automate the Master Host InstallationBefore You BeginYou need to complete the planning process as outlined in Planning the Installation. In addition, you need to be able to connect to each of the remote hosts using the rsh or ssh commands, without supplying a password. If this type of access is not allowed on your network, you cannot use this method of installation. Steps
The -m option starts the master host installation and installs the master daemon on the local machine. In addition, the -auto option sets up any remote hosts, as specified in the configuration file.
To prevent data loss or destroying an already installed cluster, the automatic installation terminates if the configured $SGE_CELL directory or the configured Berkeley DB spooling directory already exists. If the installation terminates, the script displays the reason for the termination on the screen. A log file of the master installation is created in the $SGE_ROOT/default/spool/qmaster directory. The file name is created using the format install_hostname_date_time.log.
./inst_sge -m -x -auto <full-path-to-configuration-file> a. Wait for notification that the installation has completed. b. When the automatic installation exits successfully, it displays a message similar to the following:
The Install log can be found in the
{{/opt/sge62/spool/install_myhost_30mar2007_090152.log}} file.
The installation log file includes any script or error messages that were generated during installation. If the qmaster_spooling_dir directory exists, the log files will be in that directory. If the directory does not exist, the log files will be in the /tmp directory.
Automating Other Installations Through a Configuration FileIn addition to installing the master host, you can perform a variety of other automatic installations using a similar process. The actual form of the inst_sge command differs slightly, and different sections of the configuration file apply. This section provides some examples.
See Configuration File Templates. Automatic Installation With Increased Security (CSP)The automatic installation also supports the Certificate Security Protocol (CSP) mode described in Installing the Increased Security Features. To use the CSP security mode, you must fill out the CSP parameters of the template files. The parameters are as follows: # This section is used for csp installation mode. # CSP_RECREATE recreates the certs on each installation, if true. # In case of false, the certs will be created, if not existing. # Existing certs won't be overwritten. (mandatory for csp install) CSP_RECREATE="true" # The created certs won't be copied, if this option is set to false # If true, the script tries to copy the generated certs. This # requires passwordless ssh/rsh access for user root to the # execution hosts CSP_COPY_CERTS="false" # csp information, your country code (only 2 characters) # (mandatory for csp install) CSP_COUNTRY_CODE="DE" # your state (mandatory for csp install) CSP_STATE="Germany" # your location, eg. the building (mandatory for csp install) CSP_LOCATION="Building" # your organisation (mandatory for csp install) CSP_ORGA="Organisation" # your organisation unit (mandatory for csp install) CSP_ORGA_UNIT="Organisation_unit" # your email (mandatory for csp install) CSP_MAIL_ADDRESS="name@yourdomain.com" To start the installation, type the following command: inst_sge -m -csp -auto template-file-name
Automatic UninstallationYou can also uninstall hosts automatically.
To ensure that you have a clean environment, always source the $SGE_ROOT/$SGE_CELL/common/settings.csh file before proceeding. Uninstalling Execution HostsDuring the execution host uninstallation, all configuration information for the targeted hosts is deleted. The uninstallation attempts to stop the exec hosts in a graceful manner. First, the queue instances associated with the target host of the uninstallation will be disabled, so that new jobs will not be started. Then, in sequence, the following actions are done on each of the running jobs: checkpoint the job; reschedule the job; do forced rescheduling of the job. At this point, the queue instance will be empty, and the execution daemon will be shut down, then the configuration, global spool directory or local spool directory will be removed. The configuration file template has a section for identifying hosts that can be uninstalled automatically. Look for this section: # Remove this execution hosts in automatic mode EXEC_HOST_LIST_RM="host1 host2 host3 host4" Every host in the EXEC_HOST_LIST_RM list will be automatically removed from the cluster. To start the automatic uninstallation of execution hosts, type the following command: % ./inst_sge -ux -auto <full-path-to-configuration-file> Uninstalling the Master HostThe master host uninstallation removes all of the Sun Grid Engine configuration files. After the uninstallation procedure completes, only the binary files remain. If you think that you will need the configuration information after the uninstallation, perform a backup of the master host. The master host uninstallation supports both interactive and automatic mode. To start the automatic uninstallation of the master host, type the following command: % ./inst_sge -um -auto <full-path-to-configuration-file> This command performs the same procedure as in interactive mode, except the user is not prompted for confirmation of any steps and all terminal output is suppressed. Once the uninstall process is started, it cannot be stopped. Uninstalling the Shadow HostTo start the automatic uninstallation of the shadow host, type the following command: % ./inst_sge -usm -auto <full-path-to-configuration-file> Automatic BackupThe automatic backup procedure backs up configuration and accounting data in much the same way as the interactive backup procedure. You can run the automatic backup procedure as a cron job if you want to schedule unattended or periodic backups. The automatic backup requires a configuration file, for which a template is located in the $SGE_ROOT/util/install_modules/backup_template.conf file. Comments within the configuration file template indicate what values to use for your environment. Starting an Automatic BackupAfter you set up the configuration file, type the following command to start the automatic backup: % ./inst_sge -bup -auto <full-path-to-configuration-file> To prevent overwriting existing backup files, a date/time combination is added to the end of the backup directory name that is specified in the configuration file. Example - Backup Configuration File#--------------------------------------------------- # Autobackup Configuration File Template #--------------------------------------------------- # Please, enter your SGE_ROOT here (mandatory) SGE_ROOT="/opt/gridengine" # Please, enter your SGE_CELL here (mandatory) SGE_CELL="default" # Please, enter your Backup Directory here # After backup you will find your backup files here (mandatory) # The autobackup will add a time /date combination to this dirname # to prevent an overwriting! BACKUP_DIR="/opt/backups/ge_backup" # Please, enter true to get a tar/gz package # and false to copy the files only (mandatory) TAR="true" # Please, enter the backup file name here. (mandatory) BACKUP_FILE="backup.tar" Troubleshooting Automatic Installation and UninstallationThe following errors might be encountered during auto-installation:
If your network does not allow user root to have permissions to connect to other hosts through rsh or ssh without asking for a password, the automatic installation will not work remotely. In this case, log in to the host and use the following command to start the automatic installation locally on each host: % ./inst_sge -x -noremote -auto /tmp/install_config_file.conf Supplemental Information |
|
IndexConfiguration File TemplatesConfiguration file templates are located in the $SGE_ROOT/util/install_modules directory. Example - Configuration File#------------------------------------------------- # SGE default configuration file #------------------------------------------------- # Use always fully qualified pathnames, please # SGE_ROOT Path, this is basic information #(mandatory for qmaster and execd installation) SGE_ROOT="/opt/n1ge61" # SGE_QMASTER_PORT is used by qmaster for communication # Please enter the port in this way: 1300 # Please do not this: 1300/tcp #(mandatory for qmaster installation) SGE_QMASTER_PORT="6444" # SGE_EXECD_PORT is used by execd for communication # Please enter the port in this way: 1300 # Please do not this: 1300/tcp #(mandatory for qmaster installation) SGE_EXECD_PORT="6445" # CELL_NAME, will be a dir in SGE_ROOT, contains the common dir # Please enter only the name of the cell. No path, please #(mandatory for qmaster and execd installation) CELL_NAME="default" # ADMIN_USER, if you want to use a different admin user than the owner, # of SGE_ROOT, you have to enter the user name, here # Leaving this blank, the owner of the SGE_ROOT dir will be used as admin user ADMIN_USER="" # The dir, where qmaster spools this parts, which are not spooled by DB #(mandatory for qmaster installation) QMASTER_SPOOL_DIR="/opt/n1ge61/default/spool/qmaster" # The dir, where the execd spools (active jobs) # This entry is needed, even if your are going to use # berkeley db spooling. Only cluster configuration and jobs will # be spooled in the database. The execution daemon still needs a spool # directory #(mandatory for qmaster installation) EXECD_SPOOL_DIR="/opt/n1ge61/default/spool" # For monitoring and accounting of jobs, every job will get # unique GID. So you have to enter a free GID Range, which # is assigned to each job running on a machine. # If you want to run 100 Jobs at the same time on one host you # have to enter a GID-Range like that: 16000-16100 #(mandatory for qmaster installation) GID_RANGE="20000-20100" # If SGE is compiled with -spool-dynamic, you have to enter here, which # spooling method should be used. (classic or berkeleydb) #(mandatory for qmaster installation) SPOOLING_METHOD="berkeleydb" # Name of the Server, where the Spooling DB is running on # if spooling methode is berkeleydb, it must be "none", when # using no spooling server and it must containe the servername # if a server should be used. In case of "classic" spooling, # can be left out DB_SPOOLING_SERVER="none" # The dir, where the DB spools # If berkeley db spooling is used, it must contain the path to # the spooling db. Please enter the full path. (eg. /tmp/data/spooldb) # Remember, this directory must be local on the qmaster host or on the # Berkeley DB Server host. No NSF mount, please DB_SPOOLING_DIR="/opt/n1ge61/default/spooldb" # A List of Host which should become admin hosts # If you do not enter any host here, you have to add all of your hosts # by hand, after the installation. The autoinstallation works without # any entry ADMIN_HOST_LIST="host1" # A List of Host which should become submit hosts # If you do not enter any host here, you have to add all of your hosts # by hand, after the installation. The autoinstallation works without # any entry SUBMIT_HOST_LIST="host1" # A List of Host which should become exec hosts # If you do not enter any host here, you have to add all of your hosts # by hand, after the installation. The autoinstallation works without # any entry # (mandatory for execution host installation) EXEC_HOST_LIST="host1" # The dir, where the execd spools (local configuration) # If you want configure your execution daemons to spool in # a local directory, you have to enter this directory here. # If you do not want to configure a local execution host spool directory # please leave this empty EXECD_SPOOL_DIR_LOCAL="" # If true, the domainnames will be ignored, during the hostname resolving # if false, the fully qualified domain name will be used for name resolving HOSTNAME_RESOLVING="true" # Shell, which should be used for remote installation (rsh/ssh) # This is only supported, if your hosts and rshd/sshd is configured, # not to ask for a password, or promting any message. SHELL_NAME="rsh" # Enter your default domain, if you are using /etc/hosts or NIS configuration DEFAULT_DOMAIN="none" # If a job stops, fails, finnish, you can send a mail to this adress ADMIN_MAIL="my.name@sun.com" # If true, the rc scripts (sgemaster, sgeexecd, sgebdb) will be added, # to start automatically during boottime ADD_TO_RC="true" #If this is "true" the file permissions of executables will be set to 755 #and of ordinary file to 644. SET_FILE_PERMS="true" # This option is not implemented, yet. # When a exechost should be uninstalled, the running jobs will be rescheduled RESCHEDULE_JOBS="wait" # Enter a one of the three distributed scheduler tuning configuration sets # (1=normal, 2=high, 3=max) SCHEDD_CONF="1" # The name of the shadow host. This host must have read/write permission # to the qmaster spool directory # If you want to setup a shadow host, you must enter the servername # (mandatory for shadowhost installation) SHADOW_HOST="hostname" # Remove this execution hosts in automatic mode # (mandatory for unistallation of executions hosts) EXEC_HOST_LIST_RM="host2 host3 host4" # This is a Windows specific part of the auto isntallation template # If you going to install windows executions hosts, you have to enable the # windows support. To do this, please set the WINDOWS_SUPPORT variable # to "true". ("false" is disabled) # (mandatory for qmaster installation, by default WINDOWS_SUPPORT is # disabled) WINDOWS_SUPPORT="false" # Enabling the WINDOWS_SUPPORT, recommends the following parameter. # The WIN_ADMIN_NAME will be added to the list of SGE managers. # Without adding the WIN_ADMIN_NAME the execution host installation # won't install correctly. # WIN_ADMIN_NAME is set to "Administrator" which is default on most # Windows systems. In some cases the WIN_ADMIN_NAME can be prefixed with # the windows domain name (eg. DOMAIN+Administrator) # (mandatory for qmaster installation) WIN_ADMIN_NAME="Administrator" # This parameter set the number of parallel installation processes. # The prevent a system overload, or exeeding the number of open file # descriptors the user can limit the number of parallel install processes. # eg. set PAR_EXECD_INST_COUNT="20", maximum 20 parallel execd are installed. PAR_EXECD_INST_COUNT="20" |
|
<< Previous: Installing Sun Grid Engine Software Interactively |
|
Sun Grid Engine Information Center Installing SMF ServicesThe Service Management Facility (SMF) is a new feature in Solaris 10. It provides a unified model for controlling services, replaces RC scripts, handles service dependencies, provides better service availability, and speeds up boot process. If you do not use at least Version 10 of the Solaris OS in your cluster, or you do not plan to use SMF, continue with Installing Sun Grid Engine Software Interactively.
Installing SMF services includes the following topics: Why Install SMF Services?SMF provides a unified administrative model of the persistent services. It solves many challenges of the previous approaches. All services have a common place for log files. Persistent services are automatically restarted on failure. For more information, see SMF documentation. Additional Setup RequiredIf you want unprivileged users to use SMF services, you should create a role sge_admin. Assign this role to the users who should be able to manipulate the Grid Engine SMF services as described here. Then, you can simply answer y when prompted to use SMF during the installation. How Do SMF Services Compare to the Normal Services?The biggest difference between SMF and normal services is that SMF does not consider kill -9 to be a correct service shutdown. SMF interprets kill -9 to restart the service. Within the SMF framework, a service is uniquely identified by its fault resource management identifier (FMRI). qmaster DaemonService name (FMRI) is svc:/application/sge/qmaster:$SGE_CLUSTER_NAME.
1 - Restart the daemon if RC scripts were installed shadowd DaemonService name (FMRI) is svc:/application/sge/shadowd:$SGE_CLUSTER_NAME.
1 - Restart the daemon if RC scripts were installed execd DaemonService name (FMRI) is svc:/application/sge/execd:$SGE_CLUSTER_NAME.
1 - Restart the daemon if RC scripts were installed Berkeley RPC ServerService name (FMRI) is svc:/application/sge/bdb:$SGE_CLUSTER_NAME.
1 - Restart the server if RC scripts were installed dbwriter SoftwareService name (FMRI) is svc:/application/sge/dbwriter:$SGE_CLUSTER_NAME.
1 - Restart the dbwriter if RC scripts were installed
|
|
<< Previous: Installing Sun Grid Engine Software Interactively |
|
Sun Grid Engine Information Center Installing a JMX-Enabled SystemThe JMX agent functionality enables access to a subset of sge_qmaster functionality via the JMX protocol. For Sun Grid Engine 6.2, the main purpose of the JMX agent is to provide an interface between the SDM Grid Engine adapter and the Sun Grid Engine system. Additional Setup RequiredThe steps required to set up the JMX agent feature of Grid Engine are similar to the standard setup. You generally follow the instructions in Planning the Installation, Loading the Distribution Files on a Workstation, How to Install the Master Host, How to Install Execution Hosts and Registering Administration Hosts.
How to Install a JMX Agent-Enabled SystemInstall the Grid Engine software as outlined in Performing an Interactive Installation, with the following exception: use the additional flag -jmx when invoking the qmaster installation scripts. To install a JMX agent enabled system do the following:
How to Generate Certificates, Private Keys and Keystores for UsersTo use the CSP-secured system, the user must have access to a user-specific certificate, private key and keystore. Usually the steps outlined in How to Generate Certificates and Private Keys for Users are performed. After that the following procedure can be done to generate the corresponding keystore files for the users.
Checking Certificates, Keys and KeystoresTo confirm that these files contain the intended information, use the following commands:
To display a keystore or truststore: $JAVA_HOME/bin/keytool -list -v -keystore <whereever>/keystore The keystore password must be entered to see all entries otherwise only the certificates are visible. For more information, see Java keytool documentation. JMX Configuration FilesThe following configuration files are installed into $SGE_ROOT/$SGE_CELL/common/jmx and are explained in detail here. Manual modification is usually not necessary and the preinstalled configurations should be sufficient. jaas.configBefore using the JMX interface, you must run a special authentication against sge_qmaster. This process adds the correct principle that gives you the necessary role to access the JMX interfaces in read-only or read-write mode. The responsible section in the jaas.config file is named GridwareConfig or TestConfig (for testing only).
/* * Default login configuration for qmaster's jmx server */ GridwareConfig { /** * Accepts all clients which have a certificate which is signed with * the CA certificate. */ com.sun.grid.security.login.GECATrustManagerLoginModule requisite caTop="${com.sun.grid.jgdi.caTop}"; /* * Accepts all clients which has a valid username/password. * * The username/password validation is done with the authuser binary (included * in the Grid Engine distribution, $SGE_ROOT/utilbin/$ARCH/authuser). * * ATTENTION: The authuser binary needs the suid bit. It does not work if grid * engine is installed on a nosuid file system. * */ com.sun.grid.security.login.UnixLoginModule requisite sge_root="${com.sun.grid.jgdi.sgeRoot}" auth_method="system"; /* * Username password authentication against LDAP. * * Alternative username/password authentication if * com.sun.grid.security.login.UnixLoginModule is not working. * * The LDAP specific parameters have to be adjusted to the local requirements * For details please have a look at the LdapLoginModule javadocs. * * ATTENTION: The LdapLoginModule is only available in java 6. The * parameter libjvm_path must point to a java 6 jvm * (qconf -sconf | grep libjvm_path) */ /* com.sun.security.auth.module.LdapLoginModule requisite userProvider="ldap://sun-ds/ou=people,dc=sun,dc=com" userFilter="(&(uid={USERNAME})(objectClass=inetOrgPerson))" useSSL=false; */ /* * The JGDILoginModule adds a JGDIPrincipal to the subject. The username of * the JGDIPrincipal is the name of the first trusted principal. This name * treated as username for gdi communication. * For each login a new jgdi session id is created. * * In the jmxremote.access file users who can access the system are defined * Any principal matching these entries is given the corresponding role. * Usually a jmxPrincipal is defined to give a user access to the system. * (e.g. com.sun.grid.security.login.UserPrincipal = xyz & * jmxPrincipal="controlRole" gives user xyz access under controlRole * ) */ com.sun.grid.jgdi.security.JGDILoginModule optional trustedPrincipal="com.sun.grid.security.login.UserPrincipal" trustedPrincipal1="com.sun.security.auth.UserPrincipal" jmxPrincipal="controlRole"; }; /* * TestConfig accepts any user. Only for testing */ TestConfig { com.sun.grid.security.login.TestLoginModule requisite; com.sun.grid.jgdi.security.JGDILoginModule optional trustedPrincipal="com.sun.grid.security.login.UserPrincipal" jmxPrincipal="controlRole"; }; /* * Mandatory if native jgdi is used with a csp system * (e.g. jgdish in csp mode) */ jgdi { com.sun.security.auth.module.KeyStoreLoginModule required keyStoreURL="file:./keystore" debug=false; }; java.policyThe java.policy file that is used by the JGDIAgent restricts the possibilities of code that can be run in sge_qmaster's JVM. Usually changes here are only necessary to change the access to a subset of the overall functionality. To tweak the policy settings to your needs it is useful to run the JMX server with security debugging enabled and to consult the generated logging files. (qconf -mconf, additional_jvm_args = -Djavax.net.debug=ssl -Djava.security.debug=access,failure) /* ** ** with LdapLoginModule ** grant principal com.sun.security.auth.UserPrincipal "controlRole" ** ** with jmxremote.password ** grant principal javax.management.remote.JMXPrincipal "controlRole" ** */ grant codeBase "file:${com.sun.grid.jgdi.sgeRoot}/lib/jgdi.jar" { permission java.net.SocketPermission "*:1024-", "accept,connect"; permission java.net.SocketPermission "localhost:1024-", "listen,resolve"; permission java.lang.RuntimePermission "loadLibrary.jgdi"; permission java.lang.RuntimePermission "shutdownHooks"; permission java.lang.RuntimePermission "setContextClassLoader"; permission javax.security.auth.AuthPermission "createLoginContext.jgdi"; permission javax.security.auth.AuthPermission "doAs"; permission javax.security.auth.AuthPermission "getSubject"; permission java.util.PropertyPermission "*", "read"; permission java.util.logging.LoggingPermission "control"; permission java.lang.FilePermission "${com.sun.grid.jgdi.sgeRoot}/${com.sun.grid.jgdi.sgeCell}/common/jmx/-", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/util/-", "execute"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/utilbin/-", "execute"; permission javax.management.MBeanServerPermission "createMBeanServer"; permission javax.management.MBeanPermission "*", "*"; permission javax.management.MBeanTrustPermission "register"; permission java.lang.management.ManagementPermission "monitor"; permission java.lang.management.ManagementPermission "control"; permission java.lang.RuntimePermission "setIO"; permission java.io.FilePermission "jgdi.stdout", "write"; permission java.io.FilePermission "jgdi.stderr", "write"; permission java.io.FilePermission "jgdi0.log.lck", "delete"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/${com.sun.grid.jgdi.sgeCell}/common/jmx/*", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/lib/-", "read"; permission java.lang.RuntimePermission "accessClassInPackage.sun.management.jmxremote"; permission java.lang.RuntimePermission "accessClassInPackage.sun.management.resources"; permission java.lang.RuntimePermission "accessClassInPackage.sun.management"; permission java.lang.RuntimePermission "accessClassInPackage.sun.rmi.server"; permission java.lang.RuntimePermission "accessClassInPackage.sun.management.snmp.util"; permission java.lang.RuntimePermission "accessClassInPackage.sun.rmi.registry"; permission java.util.PropertyPermission "java.rmi.server.randomIDs", "write"; permission javax.security.auth.AuthPermission "modifyPrincipals"; permission javax.security.auth.AuthPermission "createLoginContext.*"; permission javax.security.auth.AuthPermission "createLoginContext.JMXPluggableAuthenticator"; permission java.security.SecurityPermission "createAccessControlContext"; permission javax.management.remote.SubjectDelegationPermission "javax.management.remote.JMXPrincipal.controlRole"; }; grant principal javax.management.remote.JMXPrincipal "controlRole" { permission javax.management.MBeanPermission "com.sun.grid.jgdi.management.mbeans.JGDIJMX#*", "*"; permission javax.management.MBeanPermission "sun.management.*#*", "*"; permission javax.security.auth.AuthPermission "createLoginContext.jgdi"; permission javax.security.auth.AuthPermission "doAs"; permission javax.security.auth.AuthPermission "getSubject"; permission java.util.PropertyPermission "*", "read"; permission java.util.PropertyPermission "user.timezone", "read,write"; permission java.util.logging.LoggingPermission "control"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/lib/-", "read"; permission java.lang.management.ManagementPermission "monitor"; permission java.net.SocketPermission "*", "resolve"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#-[java.lang:type=OperatingSystem]", "isInstanceOf"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#-[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#ProcessCpuTime[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#Name[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#Version[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#Arch[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#AvailableProcessors[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#CommittedVirtualMemorySize[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#TotalPhysicalMemorySize[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#FreePhysicalMemorySize[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#TotalSwapSpaceSize[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#FreeSwapSpaceSize[java.lang:type=OperatingSystem]", "getAttribute"; permission javax.management.MBeanPermission "javax.management.MBeanServerDelegate#-[JMImplementation:type=MBeanServerDelegate]", "addNotificationListener"; permission javax.management.MBeanPermission "javax.management.MBeanServerDelegate#-[JMImplementation:type=MBeanServerDelegate]", "isInstanceOf"; permission javax.management.MBeanPermission "javax.management.MBeanServerDelegate#-[JMImplementation:type=MBeanServerDelegate]", "getMBeanInfo"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#-[java.lang:type=OperatingSystem]", "queryNames"; permission javax.management.MBeanPermission "java.util.logging.Logging#-[java.util.logging:type=Logging]", "queryNames"; permission javax.management.MBeanPermission "javax.management.MBeanServerDelegate#-[JMImplementation:type=MBeanServerDelegate]", "queryNames"; permission javax.management.MBeanPermission "java.util.logging.Logging#-[java.util.logging:type=Logging]", "isInstanceOf"; permission javax.management.MBeanPermission "java.util.logging.Logging#-[java.util.logging:type=Logging]", "getMBeanInfo"; permission javax.management.MBeanPermission "com.sun.management.UnixOperatingSystem#-[java.lang:type=OperatingSystem]", "getMBeanInfo"; }; grant { permission java.util.logging.LoggingPermission "control"; permission java.util.PropertyPermission "*", "read"; permission java.util.PropertyPermission "user.timezone", "write"; permission java.lang.RuntimePermission "setIO"; permission java.lang.RuntimePermission "loadLibrary.jgdi"; permission java.io.FilePermission "jgdi.stdout", "write"; permission java.io.FilePermission "jgdi.stderr", "write"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/lib/-", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/util/arch", "execute"; permission java.io.FilePermission "${com.sun.grid.jgdi.sgeRoot}/utilbin/-", "execute"; permission javax.security.auth.AuthPermission "modifyPrincipals"; permission java.io.FilePermission "${com.sun.grid.jgdi.caTop}", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.caTop}/cacert.pem", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.caTop}/ca-crl.pem", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.caTop}/usercerts/-", "read"; permission java.io.FilePermission "${com.sun.grid.jgdi.serverKeystore}", "read"; }; /* grant { permission java.security.AllPermission; }; */ management.propertiesThis file describes the general JMX server configuration and the default template looks similar to this example and is usually adapted automatically during the installation process replacing the @@SGE_*@@ variables by concrete values.
##################################################################### # Default Configuration File for JGDI JMX ##################################################################### # # The Management Configuration file (in java.util.Properties format) # will be read if one of the following system properties is set: # -Dcom.sun.grid.jgdi.management.jmxremote.port=<port-number> # or -Dcom.sun.grid.jgdi.management.config.file=<this-file> # # The default Management Configuration file is: # # $SGE_ROOT/{$SGE_CELL|default}/common/jmx/management.properties # # ################ Management Agent Port ######################### # # For setting the JMX RMI agent port use the following line # com.sun.grid.jgdi.management.jmxremote.port=<port-number> com.sun.grid.jgdi.management.jmxremote.port=@@SGE_JMX_PORT@@ ##################################################################### # RMI Management Properties ##################################################################### # # If system property -Dcom.sun.grid.jgdi.management.jmxremote.port=<port-number> # is set then # - A MBean server is started # - JRE Platform MBeans are registered in the MBean server # - RMI connector is published in a private readonly registry at # specified port using a well known name, "jmxrmi" # - the following properties are read for JMX remote management. # # The configuration can be specified only at startup time. # Later changes to above system property (e.g. via setProperty method), # this config file, the password file, or the access file have no effect to the # running MBean server, the connector, or the registry. # # # ###################### RMI SSL ############################# # # com.sun.grid.jgdi.management.jmxremote.ssl=true|false # Default for this property is true. (Case for true/false ignored) # If this property is specified as false then SSL is not used. # #For RMI monitoring without SSL use the following line # com.sun.grid.jgdi.management.jmxremote.ssl=false com.sun.grid.jgdi.management.jmxremote.ssl=@@SGE_JMX_SSL@@ # com.sun.grid.jgdi.management.jmxremote.ssl.enabled.cipher.suites=<cipher-suites> # The value of this property is a string that is a comma-separated list # of SSL/TLS cipher suites to enable. This property can be specified in # conjunction with the previous property "com.sun.management.jmxremote.ssl" # in order to control which particular SSL/TLS cipher suites are enabled # for use by accepted connections. If this property is not specified then # the SSL RMI Server Socket Factory uses the SSL/TLS cipher suites that # are enabled by default. # # com.sun.grid.jgdi.management.jmxremote.ssl.enabled.protocols=<protocol-versions> # The value of this property is a string that is a comma-separated list # of SSL/TLS protocol versions to enable. This property can be specified in # conjunction with the previous property "com.sun.management.jmxremote.ssl" # in order to control which particular SSL/TLS protocol versions are # enabled for use by accepted connections. If this property is not # specified then the SSL RMI Server Socket Factory uses the SSL/TLS # protocol versions that are enabled by default. # # com.sun.grid.jgdi.management.jmxremote.ssl.need.client.auth=true|false # Default for this property is false. (Case for true/false ignored) # If this property is specified as true in conjunction with the previous # property "com.sun.management.jmxremote.ssl" then the SSL RMI Server # Socket Factory will require client authentication. # #For RMI monitoring with SSL client authentication use the following line #com.sun.grid.jgdi.management.jmxremote.ssl.need.client.auth=true com.sun.grid.jgdi.management.jmxremote.ssl.need.client.auth=@@SGE_JMX_SSL_CLIENT@@ # # ################ RMI User authentication ################ # # com.sun.grid.jgdi.management.jmxremote.authenticate=true|false # Default for this property is true. (Case for true/false ignored) # If this property is specified as false then no authentication is # performed and all users are allowed all access. # # For RMI monitoring without any checking use the following line # com.sun.grid.jgdi.management.jmxremote.authenticate=false com.sun.grid.jgdi.management.jmxremote.authenticate=true # # ################ RMI Login configuration ################### # # com.sun.grid.jgdi.management.jmxremote.login.config=<config-name> # Specifies the name of a JAAS login configuration entry to use when # authenticating users of RMI monitoring. # # Setting this property is optional - the default login configuration # specifies a file-based authentication that uses the password file. # # When using this property to override the default login configuration # then the named configuration entry must be in a file that gets loaded # by JAAS. In addition, the login module(s) specified in the configuration # should use the name and/or password callbacks to acquire the user's # credentials. See the NameCallback and PasswordCallback classes in the # javax.security.auth.callback package for more details. # # If the property "com.sun.management.jmxremote.authenticate" is set to # false, then this property and the password & access files are ignored. # # For a non-default login configuration use the following line # com.sun.grid.jgdi.management.jmxremote.login.config=<config-name> com.sun.grid.jgdi.management.jmxremote.login.config=GridwareConfig # # ################ RMI Password file location ################## # # com.sun.grid.jgdi.management.jmxremote.password.file=filepath # Specifies location for password file # This is optional - default location is # $JRE/lib/management/jmxremote.password # # If the property "com.sun.grid.jgdi.management.jmxremote.authenticate" is set to # false, then this property and the password & access files are ignored. # For a non-default password file location use the following line # com.sun.grid.jgdi.management.jmxremote.password.file=filepath com.sun.grid.jgdi.management.jmxremote.password.file=@@SGE_ROOT@@/@@SGE_CELL@@/common/jmx/jmxremote.password # # ################ RMI Access file location ##################### # # com.sun.grid.jgdi.management.jmxremote.access.file=filepath # Specifies location for access file # This is optional - default location is # $JRE/lib/management/jmxremote.access # # If the property "com.sun.management.jmxremote.authenticate" is set to # false, then this property and the password & access files are ignored. # Otherwise, the access file must exist and be in the valid format. # If the access file is empty or non-existent then no access is allowed. # # For a non-default access file location use the following line # com.sun.grid.jgdi.management.jmxremote.access.file=filepath com.sun.grid.jgdi.management.jmxremote.access.file=@@SGE_ROOT@@/@@SGE_CELL@@/common/jmx/jmxremote.access # For the JGDI keystore module use this settings for the server keystore and keystore password com.sun.grid.jgdi.management.jmxremote.ssl.serverKeystore=@@SGE_JMX_SSL_KEYSTORE@@ com.sun.grid.jgdi.management.jmxremote.ssl.serverKeystorePassword=@@SGE_JMX_SSL_KEYSTORE_PW@@ jmx.accessThe jmx access file defines which principals are mapped to a special role. ###################################################################### # Default Access Control File for Remote JMX(TM) Monitoring ###################################################################### # # Access control file for Remote JMX API access to monitoring. # This file defines the allowed access for different roles. The # password file (jmxremote.password by default) defines the roles and their # passwords. To be functional, a role must have an entry in # both the password and the access files. # # Default location of this file is $JRE/lib/management/jmxremote.access # You can specify an alternate location by specifying a property in # the management config file $JRE/lib/management/management.properties # (See that file for details) # # The file format for password and access files is syntactically the same # as the Properties file format. The syntax is described in the Javadoc # for java.util.Properties.load. # Typical access file has multiple lines, where each line is blank, # a comment (like this one), or an access control entry. # # An access control entry consists of a role name, and an # associated access level. The role name is any string that does not # itself contain spaces or tabs. It corresponds to an entry in the # password file (jmxremote.password). The access level is one of the # following: # "readonly" grants access to read attributes of MBeans. # For monitoring, this means that a remote client in this # role can read measurements but cannot perform any action # that changes the environment of the running program. # "readwrite" grants access to read and write attributes of MBeans, # to invoke operations on them, and to create or remove them. # This access should be granted to only trusted clients, # since they can potentially interfere with the smooth # operation of a running program # # A given role should have at most one entry in this file. If a role # has no entry, it has no access. # If multiple entries are found for the same role name, then the last # access entry is used. # # # Default access control entries: # o The "monitorRole" role has readonly access. # o The "controlRole" role has readwrite access. monitorRole readonly controlRole readwrite jmx.passwordThis is also a possible simple authentication mechanism though not recommended. Usually the jaas login module is preferred since it is much more flexible. You can specify a password for the different roles there. If a simple login mechanism is required it is recommended to change management.properties to use TestConfig instead of GridwareConfig, which allows any valid Unix user to connect to JGDI JMX server without a password. logging.propertiesTo enable JGDI and JMX logging the delivered logging file has to be adjusted and sge_qmaster or at least the JMX server has to be restarted. The generated logging files default to jgdi0.log, jgdi.stderr and jgdi.stdout in the master spooling directory. The logging can also be influenced by changing the additional_jvm_args configuration to enable additional debugging messages for example. # # Java Logging Configuration for JMX MBean server # # Specify the handlers to create in the root logger # (all loggers are children of the root logger) # The following creates two handlers # Per default we log to the console #handlers = java.util.logging.ConsoleHandler # Use FileHandler handlers = java.util.logging.FileHandler # ------------------------------------------------------------------------------ # Definition of log levels # ------------------------------------------------------------------------------ # Set the default logging level for the root logger .level = INFO #com.sun.grid.jgdi.JGDI.level = FINE #com.sun.grid.jgdi.rmi.level = FINE #com.sun.grid.jgdi.configuration.xml.XMLUtil.level = FINE #com.sun.grid.jgdi.configuration.ClusterQueueTestCase.level = FINE #com.sun.grid.jgdi.management.level = FINER #com.sun.grid.jgdi.event.level = FINER # For authuser login module debugging #com.sun.grid.security.login.level = FINER #com.sun.grid.util.expect.level = FINER # ------------------------------------------------------------------------------ # Settings for ConsoleHandler # ------------------------------------------------------------------------------ # Set the default logging level for new ConsoleHandler instances java.util.logging.ConsoleHandler.level = INFO # Set the default formatter for new ConsoleHandler instances java.util.logging.ConsoleHandler.formatter = com.sun.grid.jgdi.util.SGEFormatter # ------------------------------------------------------------------------------ # Settings for FileHandler # ------------------------------------------------------------------------------ # Set the default logging level for new FileHandler instances java.util.logging.FileHandler.level = ALL # qmaster runs in qmaster spool dir, so the file is created there java.util.logging.FileHandler.pattern=jgdi%u.log java.util.logging.FileHandler.formatter=com.sun.grid.jgdi.util.SGEFormatter # # Possible columns: # # time timestamp of the log message # host hostname of the log message # name name of the logger # thread id of the thread # level log level (short form) # source class and method name # level_long log_level long form # com.sun.grid.jgdi.util.SGEFormatter.columns = time thread source level message # # Print the stacktrace of the log record # com.sun.grid.jgdi.util.SGEFormatter.withStacktrace=true # # Delimiter between columns # com.sun.grid.jgdi.util.SGEFormatter.delimiter = | Testing and TroubleshootingTo connect to the JMX server jconsole can be used for testing. It is the responsibility of the administrator to allow/disallow access to the system via JMX. To force also client authentication of jconsole the management.properties file must be configured with:
% jconsole -J-Djava.security.manager=java.rmi.RMISecurityManager \ -J-Djava.security.policy=$SGE_ROOT/util/rmiconsole.policy \ -J-Djavax.net.ssl.trustStore=<server truststore> \ [-J-Djavax.net.ssl.keyStore=/<safe>/mykeystore \ -J-Djavax.net.ssl.keyStorePassword=<mykeystore_pw> \ -J-Djavax.net.ssl.keyPassword=<mykeystore_pw> ] \ [-J-Djavax.net.debug=ssl] where <server truststore> usually is either: keytool -export -alias "root" \ -keystore /var/sgeCA/port$SGE_QMASTER_PORT/$SGE_CELL/private/keystore -rfc -file /tmp/jmxserver.cer keytool -import -file /tmp/jmxserver.cer -keystore /tmp/truststore Enter keystore password: <pwd> ... Trust this certificate? [no]: yes Certificate was added to keystore The optional arguments are required if client authentication is set to true or for debugging. The following simple example can be used to connect via JMX and monitor events % java [-Dcom.sun.grid.jgdi.keyStore=\ /var/sgeCA/port$SGE_QMASTER_PORT/$SGE_CELL/private/keystore \ -Dcom.sun.grid.jgdi.caTop="$SGE_ROOT/$SGE_CELL/common/sgeCA" \ -Djava.util.logging.config.file=util/shell_logging.properties ] \ -cp $SGE_ROOT/lib/juti.jar:$SGE_ROOT/lib/jgdi.jar \ com.sun.grid.jgdi.examples.jmxeventmonitor.Main The optional arguments can be skipped and serve only to preset the login dialog with useful values. If a connection has been established once a preferences file is written, that is reused afterwards. For troubleshooting the following settings and files might give some additional insights:
|
|
Sun Grid Engine Information Center Removing the Grid Engine SoftwareHow to Remove the Software InteractivelyTo remove the software interactively, follow the steps below.
How to Remove the Software Using the inst_sge Utility and a Configuration TemplateUnlike the interactive uninstallation method, the automated uninstallation method suppresses output during the process. Also, the automated method requires a properly formatted configuration file. To remove the software using the inst_sge utility and a configuration template, follow these steps:
|
|
Sun Grid Engine Information Center Microsoft Services for UNIX
OverviewMicrosoft Windows Services for UNIX (SFU) makes it possible to integrate some Windows operating systems into existing UNIX environments. SFU provides components that simplify network administration and user management across the UNIX and Windows platforms. You can use SFU to do the following:
Unsupported Grid Engine FunctionalityThe following Grid Engine components are not supported in a Microsoft Windows environment and cannot be used on Windows Hosts even though they are standard to a Grid Engine installation:
System RequirementsThe following system requirements apply to the SFU installation:
You can find more details concerning SFU requirements at http://www.microsoft.com/windows/sfu/. Services for UNIX InstallationMicrosoft's SFU is required to install Grid Engine successfully. You can download SFU from Microsoft. Search the site for "Windows Services for Unix" to find the current download information.
Post SFU Installation TasksThere are several steps you should follow after you install the SFU software.
Troubleshooting SFUThe following section describes some common problems that users may encounter when installing and using Grid Engine in a Services for UNIX environment on a Windows system.
|
|
Sun Grid Engine Information Center Changing Default Behavior to Case SensitivityYou might have to choose between default behavior and case sensitivity for object names, such as file names. Your choice will affect system security as well as how Windows Services for UNIX (SFU) functions. With Microsoft Windows, the names of most objects are case preserving, but case insensitive. So, you cannot have two files in the same directory named sample.txt and Sample.txt because Windows regards the names as identical. However, the UNIX operating system is fully case sensitive. So, UNIX systems distinguish between object names even when the only difference between those names is the case of the object name characters. Therefore, sample.txt and Sample.txt could appear in the same directory and the UNIX system would distinguish between them when performing operations on the files. For example, the command rm S*.txt would delete Sample.txt but not sample.txt. To implement typical UNIX behavior, the server for NFS and the Interix subsystem are normally case sensitive when working with file names. This behavior can present security issues, particularly for users who are accustomed to the case insensitive conventions of Windows. For example, a Trojan horse version of edit.exe, named EDIT.EXE, could be stored in the same directory as the original. If a user were to type edit at a Windows command prompt, the Trojan horse version (EDIT.EXE) could be executed instead of the standard version.
For Windows XP (Professional) and the Windows Server 2003 family, the default behavior of subsystems (other than the Win32 subsystem) is to preserve case but be case insensitive. In previous versions of Windows, such subsystems were fully case sensitive by default. To support standard UNIX behavior, the SFU Setup allows you to change the default Windows XP and Windows Server 2003 family behavior for non-Win32 subsystems when installing the base utilities (the Interix subsystem) or Server for NFS. If you enable case sensitivity and then subsequently uninstall the base utilities and Server for NFS, the SFU Setup will restore the default, case-insensitive behavior of non-Win32 subsystems. |
|
Sun Grid Engine Information Center Configuring User Name MappingUser Name Mapping acts as a single clearinghouse that provides centralized user mapping services for the NFS client of Interix. User Name Mapping provides a map between the Windows users and groups on the NFS client, and the corresponding UNIX users and groups on the NFS server. In principle, these user and group names might not be identical. However, for users who intend to use Sun Grid Engine, these names must be identical. User Name Mapping lets you maintain a single mapping database for the entire enterprise. This feature makes it easy to configure authentication for multiple computers running Windows Services for UNIX. User Name Mapping also permits one-to-many mapping. This lets you associate multiple Windows accounts with a single UNIX account. To do this, you can use simple maps, which map Windows and UNIX accounts with identical names. You can also create advanced maps to associate Windows and UNIX accounts with different names, which you can use with simple maps. This feature can be useful, for example, when you do not need to maintain separate UNIX accounts for individuals and would rather use a few accounts to provide different classes of access permission.
|
|
Sun Grid Engine Information Center Disabling Data Eexcution Prevention (DEP)
|
|
Sun Grid Engine Information Center Enable suid Behavior for Interix ProgramsAccording to the POSIX standard, a file has permissions that include bits to set both a UID (setuid) and a GID (setgid) when the file is executed. If either or both bits are set on a file, and a process executes that file, the process gains the UID or GID of the file. When used carefully, this mechanism allows a non-privileged user to execute programs that run with the higher privileges of the file's owner or group. When used incorrectly, however, this behavior can present security risks by allowing non-privileged users to perform actions that should only be performed by an administrator. For this reason, Windows Services for UNIX Setup does not enable support for this mechanism by default. You should enable support for setuid behavior because Grid Engine runs programs that require this support. If you do not enable support for setuid behavior when installing Windows Services for UNIX, you can enable it later. |
|
Sun Grid Engine Information Center User Management for Sun Grid Engine on Windows HostsOverviewEvery user of the Grid Engine execution environment of a Windows machine must have a user account that has the same name as on the UNIX hosts. User accounts contain information about the user, including name, password, various optional entries that determine when and how users log on. and how their desktop settings are stored. The following sections describe how you would use Windows user management to support Grid Engine.
Managing Users on Windows HostsIt is possible to administer user accounts on all Windows hosts individually. Each Windows Host has an authentication center which validates user names and corresponding user rights. User accounts which are defined on a Windows workstation are referred to here as local user accounts or local users. Each Windows Host has its own local domain, and each Windows Server has the ability to make that domain available to other hosts. Account names within a local domain and account names within a server domain can collide. To avoid such collisions, you must specify the correct user account by providing the domain name as a prefix to the user account name followed by a + (plus sign) character. Windows User ExampleThe following is an example that illustrates the potential complexity of Windows host accounts interacting with Windows Domain accounts. Suppose Windows Workstation host named CRUNCH has a local user account named Peter. This Windows Workstation is part of the domain named ENGINEERING. This domain is provided by a Windows Server which also has a user account named Peter. In this example, the ENGINEERING domain is the default domain of the host named CRUNCH. The following table shows the possible results of what would happen if a person tried to log in to CRUNCH. Table – Using Domain Accounts
Each domain has a special user account that provides superuser access. The default name for that account is Administrator. For native Windows, the members of the Administrators group and of the Domain Admins group in the server domain also have superuser access. However, for Interix, only the user Administrator of the local domain is the superuser of the local host. The local Administrator can start applications in an account without knowing the password of the user for that account. However, the application would not be able to access network resources because even the local Administrator is not fully trusted by the network, unlike the Unix super user root. Therefore, the Sun Grid Engine administrator uses the sgepasswd tool to register the users' passwords, as explained in Using Grid Engine in a Microsoft Windows Environment. UNIX User ManagementUNIX has no equivalent to the Windows domain concept. With UNIX, each user has a local account and is authenticated as a local account even if the underlying account information lies on an LDAP or NIS server. The UNIX super user root is similar to the local Windows super user Administrator. The UNIX super user can start applications and processes on behalf of UNIX accounts without knowing each corresponding password. Using Grid Engine in a Microsoft Windows EnvironmentThe Grid Engine execution environment starts jobs on behalf of the submitting user. The execution daemon (sge_execd) on UNIX hosts runs as root so that it can start jobs on behalf of all users. On Windows hosts, the execution daemon runs as the local Administrator user so that it can start jobs on behalf of users without knowing their password, but these jobs would not have the permissions to access network resources. Only fully authenticated users can access network resources. For a full authentication, the user's password is needed. Therefore, all users who want to submit jobs to a Windows execution host have to register their passwords with Grid Engine. The execution daemon still needs to run as the local Administrator to have the permissions to do several administrative tasks. Registering Windows User PasswordsUsers who want to start Grid Engine jobs on Windows execution hosts use the sgepasswd client application to register their Windows passwords. The following example shows Peter who has a user account in the domain ENGINEERING. Because ENGINEERING is the principal domain of the Windows execution host CRUNCH, Peter does not need to register his password for a specific domain. This should be the default in any properly set up single domain environment. In multiple domain environments, it might be necessary to register the password explicitly for a specific domain.
> sgepasswd Changing password for Peter New password: Re-enter new password: Password changed Using the sgepasswd CommandThe sgepasswd command changes the Grid Engine password file sgepasswd(5). This file contains a list of user names and their Windows passwords in encrypted form. You can use sgepasswd to perform the following tasks:
Additionally, the root user can change or delete the password entries for other user accounts. sgepasswd is only available on non-Windows hosts. The sgepasswd uses one of the following syntaxes: sgepasswd [[ -D <domain> ] -d <user> ] sgepasswd [ -D <domain> ] [ <user> ] This command supports the following options:
Additionally, the following environment variables affect the operation of this command.
Adding Windows Hosts to Existing Grid Engine SystemsIf you have a running Grid Engine system on which Windows support is not enabled, you can enable the support manually. The following steps provide a Windows-enabled Grid Engine system that allows additional Windows execution hosts. How to Add Windows Hosts Later
|
|
Sun Grid Engine Information Center Other Sun Grid Engine Installation IssuesAdditional considerations for installing Sun Grid Engine software are identified in this section. These include the following topics: Verifying and Installing Linux Motif LibrariesOn newer Linux systems, the libXm.so.2 Motif libraries are not always installed, which results in the inability to run the precompiled Linux qmon binary. To correct this problem, follow these steps:
Installing the Grid Engine on a System With IPMPThis section describes how to install the Grid Engine software on hosts with the Solaris Operating Environment IP Multipathing (IPMP) technology. What Is IP Multipathing?IP Multipathing is a technology that allows TCP/IP interfaces to be grouped for failover and load balancing purposes. If an interface within an IP Multipathing group fails, the interface is disabled and its IP address is relocated to another interface in the group. Outbound IP traffic is distributed across the interfaces of a group. For further details on IP Multipathing, refer to the Solaris Operating Environment documentation at http://docs.sun.com/app/docs/doc/816-4554/ipmptm-1. Issues Between IPMP and Grid EngineWhen starting the Grid Engine daemons on a machine where the main interface is part of an IPMP group, error messages appear. When the IPMP load balancing distributes the connections across the interfaces in the group, the IP packets show up at the receiving end as coming from a different host from the one associated with the main interface. For example, on a machine with three interfaces named qfe0, qfe1, and qfe3, where the IP addresses for these interfaces are 10.1.1.1, 10.1.1.2, and 10.1.1.3 respectively, IPMP would need an extra address for each interface for testing. However, that requirement is ignored in this example. Each of these addresses has a host name associated with it. The hosts table looks like the following example: 10.1.1.1 sge 10.1.1.2 sge-qfe1 10.1.1.3 sge-qfe2 The machine's host name is sge. When a connection is established from sge to another machine, it might go through sge, sge-qfe1, or sge-qfe2. Upon installation, Grid Engine will only recognize sge. When Grid Engine receives a connection request from sge-qfe2, it closes the connection because the request is not from one of the authorized (or known) nodes. To solve this problem, use the host_aliases files to "tell" Grid Engine that sge, sge1, and sge-qfe2 are all from the same machine. See the sge_h_aliases man page for details. The host_aliases file in this case would look like this: sge sge-qfe1 sge-qfe2
Installing the Grid Engine Master Node With IPMPThere are two ways that you can fix this problem:
Ignoring the Error MessagesTo ignore the error messages, follow these steps:
Temporarily Disabling IPMPTo temporarily disable IPMP, follow these steps:
Installing a Grid Engine on an Execution Host With IPMPOnce the host_aliases file is installed and the Grid Engine daemons are restarted, you can simply start the execution host installation without further problems. Enabling Administrative and Submit Hosts With IPMPYou have two choices when enabling these hosts with IPMP:
|








