|
Sun Grid Engine Information Center
Administering Sun Grid Engine
Index
Changing the Master Host
This section explains how to change the system that Grid Engine considers to be the master host by moving the sge_qmaster daemon.
How to Migrate qmaster to Another Host by Using a Script
 | Note Because the spooling database cannot be located on an NFS-mounted file system, the following procedure requires that you use the Berkeley DB RPC server for spooling. If you configure spooling to a local file system, you must transfer the spooling database to a local file system on the new sge_qmaster host. |
- Check that the new master host has read/write access.
The new master host must have read/write access to the qmaster spool directory and common directory as does the current master. If the administrative user is the root user (check the global cluster configuration for the setting of admin_user), you should verify that the root user can create files in these directory under the root user name.
- Run the migration script on the new master host.
On the new master host, type the following command as the root user:
# $SGE_ROOT/$SGE_CELL/common/sgemaster -migrate
This command stops sge_qmaster on the old master host and starts it on the new master host. The master host name listed in the file $SGE_ROOT/$SGE_CELL/common/act_qmaster is automatically changed to the new master host. If qmaster is not running, warning messages will appear and a delay of about one minute will occur until qmaster is started on the new host.
- Modify the shadow_masters file if necessary.
- Check if the $SGE_ROOT/$SGE_CELL/common/shadow_masters file exists.
If the file exists, you can add the new qmaster host to this file and remove the old master host, depending on your requirements.
- Then stop and restart the sge_shadowd daemons by issuing the following commands on the respective machines:
$SGE_ROOT/$SGE_CELL/common/sgemaster -shadowd stop
$SGE_ROOT/$SGE_CELL/common/sgemaster -shadowd start
Important Notes About Migration
The migration procedure migrates to the host on which the sgemaster -migrate command is issued. If the file primary_qmaster exists, any subsequent calls of sgemaster on the machine contained in the primary_qmaster file will cause a migration back to that machine. To avoid such a situation, change or delete the $SGE_ROOT/$SGE_CELL/common/primary_qmaster file.
 | Note Existence of the primary_qmaster file does not imply that the qmaster is actually running. |
Although jobs may continue to run during the migration procedure, the grid should be inactive. While the migration is taking place, any running Grid Engine commands, such as qsub or qstat, will return an error.
If the current qmaster is down, the scheduler will not shut down until it times out waiting for contact with the qmaster.
The shadow_masters file has no direct effect on the migration procedure. This file only exists if one or more shadow masters have been configured. For more information on how to set up shadow masters, see Configuring Shadow Master Hosts.
How to Migrate qmaster to Another Host Manually
- On the current master host, stop the master daemon.
Type the following command:
- Edit the $SGE_ROOT/$SGE_CELL/common/act_qmaster file according to the following guidelines:
- Confirm the new master host's name. To get the new master host name, type the following command on the new master host:
$SGE_ROOT/utilbin/$SGE_ARCH/gethostname
- In the act_qmaster file, replace the current host name with the new master host's name returned by the gethostname utility.
- On the new master host, start sge_qmaster:
$SGE_ROOT/$SGE_CELL/common/sgemaster
|