Changing the Master Host

Searching Sun Grid Engine 6.2

Sun Grid Engine Information Center
Administering Sun Grid Engine
Index


Changing the Master Host

This section explains how to change the system that Grid Engine considers to be the master host by moving the sge_qmaster daemon.

How to Migrate qmaster to Another Host by Using a Script

Note
Because the spooling database cannot be located on an NFS-mounted file system, the following procedure requires that you use the Berkeley DB RPC server for spooling. If you configure spooling to a local file system, you must transfer the spooling database to a local file system on the new sge_qmaster host.
  1. Check that the new master host has read/write access.
    The new master host must have read/write access to the qmaster spool directory and common directory as does the current master. If the administrative user is the root user (check the global cluster configuration for the setting of admin_user), you should verify that the root user can create files in these directory under the root user name.

  2. Run the migration script on the new master host.
    On the new master host, type the following command as the root user:
    # $SGE_ROOT/$SGE_CELL/common/sgemaster -migrate
    

    This command stops sge_qmaster on the old master host and starts it on the new master host. The master host name listed in the file $SGE_ROOT/$SGE_CELL/common/act_qmaster is automatically changed to the new master host. If qmaster is not running, warning messages will appear and a delay of about one minute will occur until qmaster is started on the new host.

  3. Modify the shadow_masters file if necessary.
    1. Check if the $SGE_ROOT/$SGE_CELL/common/shadow_masters file exists.
      If the file exists, you can add the new qmaster host to this file and remove the old master host, depending on your requirements.
    2. Then stop and restart the sge_shadowd daemons by issuing the following commands on the respective machines:
      $SGE_ROOT/$SGE_CELL/common/sgemaster -shadowd stop
      $SGE_ROOT/$SGE_CELL/common/sgemaster -shadowd start
      

Important Notes About Migration

The migration procedure migrates to the host on which the sgemaster -migrate command is issued. If the file primary_qmaster exists, any subsequent calls of sgemaster on the machine contained in the primary_qmaster file will cause a migration back to that machine. To avoid such a situation, change or delete the $SGE_ROOT/$SGE_CELL/common/primary_qmaster file.

Note
Existence of the primary_qmaster file does not imply that the qmaster is actually running.

Although jobs may continue to run during the migration procedure, the grid should be inactive. While the migration is taking place, any running Grid Engine commands, such as qsub or qstat, will return an error.

If the current qmaster is down, the scheduler will not shut down until it times out waiting for contact with the qmaster.

The shadow_masters file has no direct effect on the migration procedure. This file only exists if one or more shadow masters have been configured. For more information on how to set up shadow masters, see Configuring Shadow Master Hosts.

How to Migrate qmaster to Another Host Manually

  1. On the current master host, stop the master daemon.
    Type the following command:
    qconf -km
    


  2. Edit the $SGE_ROOT/$SGE_CELL/common/act_qmaster file according to the following guidelines:
    • Confirm the new master host's name. To get the new master host name, type the following command on the new master host:
      $SGE_ROOT/utilbin/$SGE_ARCH/gethostname
      
    • In the act_qmaster file, replace the current host name with the new master host's name returned by the gethostname utility.

  3. On the new master host, start sge_qmaster:
    $SGE_ROOT/$SGE_CELL/common/sgemaster
    

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

Sign up or Log in to add a comment or watch this page.


The individuals who post here are part of the extended Sun Microsystems community and they might not be employed or in any way formally affiliated with Sun Microsystems. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Sun nor any other party necessarily agrees with them.

Copyright 1994-2009 Sun Microsystems, Inc.
Powered by Atlassian Confluence
Sun Guidelines on Public Discourse Privacy Policy Terms of Use Trademarks Site Map Employment Investor Relations Contact