|
Sun Grid Engine Information Center
Installing Sun Grid Engine
Index
Verifying Sun Grid Engine Installation
Verifying the Installation
The verification phase includes the following tasks:
- Ensuring that the master daemon is running on the master host
- Ensuring that the daemons are running on all execution hosts
- Ensuring that you can run simple commands
- Submitting test jobs
To ensure that the Grid Engine system daemons are running, look for the sge_qmaster daemon on the master host and the sge_execd daemon on the execution hosts. Once you have verified that the daemons are running, you can try to use commands and prepare to submit jobs.
 | Note If no cell name was specified during installation, the value of $SGE_CELL is default. |
How to Verify That the Daemon Is Running on the Master Host
- Log in to the master host.
Look in the file $SGE_ROOT/$SGE_CELL/common/act_qmaster to see if you really are on the master host.
- Verify that the master daemon is running.
- If you do not see the appropriate string, restart the daemon.
To start the master host daemon, sge_qmaster:
# $SGE_ROOT/$SGE_CELL/common/sgemaster start
- Continue the verification process.
After you have verified that the master host and the execution host daemons are running, continue the verification process. See How to Run Simple Commands.
How to Verify That the Daemons Are Running on the Execution Hosts
- Log in to the execution hosts on which you ran the execution host installation procedure.
- Verify that the daemons are running.
- If you do not see similar output, restart the daemon.
# $SGE_ROOT/$SGE_CELL/common/sgeexecd start
- Continue the verification process.
After you have verified that the master host and the execution host daemons are running, continue the verification process. See How to Run Simple Commands below for details.
How to Run Simple Commands
If both the necessary daemons are running on the master and execution hosts, the Grid Engine software should be operational. Check by issuing a trial command.
- Log in to either the master host or another administrative host.
In your standard search path, make sure to include $SGE_ROOT/bin.
- From the command line, type the following command:
This qconf command displays the current global cluster configuration Basic Cluster Configuration.
If this command fails, your $SGE_ROOT environment variable is not set correctly.
- Check whether the environment variables SGE_EXECD_PORT and SGE_QMASTER_PORT are set in the script files, $SGE_ROOT/$SGE_CELL/common/settings.csh or $SGE_ROOT/$SGE_CELL/common/settings.sh.
 | Note If no cell name was specified during installation, the value of $SGE_CELL is default. |
- If so, make sure that the environment variables SGE_EXECD_PORT and {{SGE_QMASTER_PORT} are set to the correct value before you try the command again.
- If not, verify whether your NIS services map contains entries for sge_qmaster and sge_execd.
If the SGE_EXECD_PORT and SGE_QMASTER_PORT variables are not used in these files, then the services database (/etc/services or the NIS services map for example) on the machine from which you run the command must provide entries for both sge_qmaster and sge_execd. If these entries do not exist, add an entry to the machine's services database, giving it the same value as is configured on the master host.
- Retry the qconf command.
- Try to submit test jobs.
How to Submit Test Jobs
Before you start submitting batch scripts to the Grid Engine system, check to see whether your site's standard shell resource files (.cshrc, .profile, or .kshrc) as well as your personal resource files contain commands such as stty. Batch jobs do not have a terminal connection by default, and therefore calls to stty result in an error.
- Log in to the master host.
- Type the following command.
% rsh <exec-host-name> date
The exec-host-name refers to one of the already installed execution hosts. You should try this test on all execution hosts if your login or home directories differ from host to host. The rsh command should give you output similar to the date command run locally on the master host. If any additional lines contain error messages, you must fix the cause of the errors before you can run a batch job successfully.
For all command interpreters, you can check on an actual terminal connection before you run a command such as stty.
The following is an example of a Bourne shell script to test the terminal connection.
tty -s
if [ $? = 0 ]; then
stty erase ^H
fi
The following example shows C shell syntax.
tty -s
if ( $status = 0 ) then
stty erase ^H
endif
- Submit one of the sample scripts contained in the $SGE_ROOT/examples/jobs directory.
% qsub $SGE_ROOT/examples/jobs/simple.sh
- Use the qstat command to monitor the job's behavior.
For more information about submitting and monitoring batch jobs, see Submitting Batch Jobs.
- After the job finishes executing, check your home directory for the redirected stdout/stderr files script-name.ejob-id and script-name.ojob-id.
The job-id is a consecutive unique integer number assigned to each job.
In case of problems, see Improving Grid Engine Performance.
|