Verifying Sun Grid Engine Installation

<< Previous: Installing Sun Grid Engine Software Interactively

Next: Installing Sun Grid Engine >>

Searching Sun Grid Engine 6.2

Sun Grid Engine Information Center
Installing Sun Grid Engine
Index


Verifying Sun Grid Engine Installation

Verifying the Installation

The verification phase includes the following tasks:

  • Ensuring that the master daemon is running on the master host
  • Ensuring that the daemons are running on all execution hosts
  • Ensuring that you can run simple commands
  • Submitting test jobs

To ensure that the Grid Engine system daemons are running, look for the sge_qmaster daemon on the master host and the sge_execd daemon on the execution hosts. Once you have verified that the daemons are running, you can try to use commands and prepare to submit jobs.

Note
If no cell name was specified during installation, the value of $SGE_CELL is default.

How to Verify That the Daemon Is Running on the Master Host

  1. Log in to the master host.
    Look in the file $SGE_ROOT/$SGE_CELL/common/act_qmaster to see if you really are on the master host.

  2. Verify that the master daemon is running.
    • On BSD-based UNIX systems, type the following command:
      % ps -ax | grep sge
      

      You should see output similar to the following example.

      14676 p1 S <  4:47 /gridware/sge/bin/solaris/sge_qmaster
      
    • On systems running a UNIX System 5-based operating system (such as the Solaris Operating System), type the following command:
      % ps -ef | grep sge
      

      You should see output similar to the following example.

      root 439 1 0 Jun 2 ? 3:37 /gridware/sge/bin/solaris/sge_qmaster
      


  3. If you do not see the appropriate string, restart the daemon.
    To start the master host daemon, sge_qmaster:
    # $SGE_ROOT/$SGE_CELL/common/sgemaster  start
    


  4. Continue the verification process.
    After you have verified that the master host and the execution host daemons are running, continue the verification process. See How to Run Simple Commands.

How to Verify That the Daemons Are Running on the Execution Hosts

  1. Log in to the execution hosts on which you ran the execution host installation procedure.

  2. Verify that the daemons are running.
    • On BSD-based UNIX systems, type the following command:
      % ps -ax | grep sge
      

      You should see output similar to the following example.

      14688 p1 S <    4:27  /gridware/sge/bin/solaris/sge_execd
      
    • On systems running a UNIX System 5-based operating system (such as the Solaris Operating System), type the following command:
      % ps -ef | grep sge
      

      You should see output similar to the following example.

      root 171 1 0 Jun 22 ? 7:11 /gridware/sge/bin/solaris/sge_execd
      


  3. If you do not see similar output, restart the daemon.
    # $SGE_ROOT/$SGE_CELL/common/sgeexecd  start
    


  4. Continue the verification process.
    After you have verified that the master host and the execution host daemons are running, continue the verification process. See How to Run Simple Commands below for details.

How to Run Simple Commands

If both the necessary daemons are running on the master and execution hosts, the Grid Engine software should be operational. Check by issuing a trial command.

  1. Log in to either the master host or another administrative host.
    In your standard search path, make sure to include $SGE_ROOT/bin.

  2. From the command line, type the following command:
    % qconf -sconf
    

    This qconf command displays the current global cluster configuration Basic Cluster Configuration.
    If this command fails, your $SGE_ROOT environment variable is not set correctly.

    1. Check whether the environment variables SGE_EXECD_PORT and SGE_QMASTER_PORT are set in the script files, $SGE_ROOT/$SGE_CELL/common/settings.csh or $SGE_ROOT/$SGE_CELL/common/settings.sh.
      Note
      If no cell name was specified during installation, the value of $SGE_CELL is default.
      • If so, make sure that the environment variables SGE_EXECD_PORT and {{SGE_QMASTER_PORT} are set to the correct value before you try the command again.
      • If not, verify whether your NIS services map contains entries for sge_qmaster and sge_execd.
        If the SGE_EXECD_PORT and SGE_QMASTER_PORT variables are not used in these files, then the services database (/etc/services or the NIS services map for example) on the machine from which you run the command must provide entries for both sge_qmaster and sge_execd. If these entries do not exist, add an entry to the machine's services database, giving it the same value as is configured on the master host.
    2. Retry the qconf command.

  3. Try to submit test jobs.

How to Submit Test Jobs

Before you start submitting batch scripts to the Grid Engine system, check to see whether your site's standard shell resource files (.cshrc, .profile, or .kshrc) as well as your personal resource files contain commands such as stty. Batch jobs do not have a terminal connection by default, and therefore calls to stty result in an error.

  1. Log in to the master host.

  2. Type the following command.
    % rsh <exec-host-name> date
    

    The exec-host-name refers to one of the already installed execution hosts. You should try this test on all execution hosts if your login or home directories differ from host to host. The rsh command should give you output similar to the date command run locally on the master host. If any additional lines contain error messages, you must fix the cause of the errors before you can run a batch job successfully.

    For all command interpreters, you can check on an actual terminal connection before you run a command such as stty.
    The following is an example of a Bourne shell script to test the terminal connection.

    tty -s 
    if [ $? = 0 ]; then
       stty erase ^H
    fi
    


    The following example shows C shell syntax.

    tty -s
    if ( $status = 0 ) then
       stty erase ^H
    endif
    


  3. Submit one of the sample scripts contained in the $SGE_ROOT/examples/jobs directory.
    % qsub $SGE_ROOT/examples/jobs/simple.sh
    


  4. Use the qstat command to monitor the job's behavior.
    For more information about submitting and monitoring batch jobs, see Submitting Batch Jobs.

  5. After the job finishes executing, check your home directory for the redirected stdout/stderr files script-name.ejob-id and script-name.ojob-id.
    The job-id is a consecutive unique integer number assigned to each job.

In case of problems, see Improving Grid Engine Performance.


<< Previous: Installing Sun Grid Engine Software Interactively

Next: Installing Sun Grid Engine >>

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

Sign up or Log in to add a comment or watch this page.


The individuals who post here are part of the extended Sun Microsystems community and they might not be employed or in any way formally affiliated with Sun Microsystems. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Sun nor any other party necessarily agrees with them.

Copyright 1994-2009 Sun Microsystems, Inc.
Powered by Atlassian Confluence
Sun Guidelines on Public Discourse Privacy Policy Terms of Use Trademarks Site Map Employment Investor Relations Contact