Tivoli Storage Manager and Sun Cluster or How to Create a Resource That Is a Script in Sun Cluster

Searching Sun Cluster

Contents
Index

To get a script in the cluster framework, specifically in our case, one that starts and stops IBM Tivoli Storage Manager's (TSM) Distributed Service Manager (DSM) scheduler, several steps were needed.

The most critical for me was to stop following the TSM manual where it was telling me that all configurations files and all scripts for starting and stopping the TSM scheduler must be on shared storage. This simply doesn't work.

The dsm.opt file for each TSM node (note that a TSM node is different to, and not a cluster node!) can and generally should be on shared storage, mainly for consistency. The scripts for starting, stopping and probing the TSM services, however, need to be local and present on every node at all times. This availability of the scripts is what the cluster framework needs in order to add the resource into the cluster. If the script wasn't available on all nodes when I tried to create the resource, cluster spat the dummy.

After setting up the scripts and manually testing the TSM client to make sure the configuration was correct on all nodes, it was possible to add a new resource to the cluster of type SUNW.gds - a general data service. To add the scripts as a GDS resource into the cluster, the following command does the job:

clresource create -g www-rg -t SUNW.gds \
-p Start_command="/etc/init.d/dsm.scheduler.cluster.sh /zones/webdata/tsm/dsm.opt start" \
-p Probe_command="/etc/init.d/dsm.scheduler.cluster.sh webdata probe" \
-p Stop_command="/etc/init.d/dsm.scheduler.cluster.sh webdata stop" \
-p Network_aware=false webdata-backup-rs

So in this example, the script /etc/init.d/dsm.scheduler.cluster.sh is located on local storage on all nodes and is identical across all nodes. The script is shown below. The file /zones/webdata/tsm/dsm.opt is located on shared storage and switches between nodes in the event of a failover. When the resource group starts on a different node, the script is run and the resource comes online. Curiously, the dsmcad daemon process
doesn't need to be killed in the event of a failover. The cluster framework seems to take care of this, killing the process and allowing a clean failover. Also, making the resource not network aware removed the need for a logical host name for the resource group.

The script to start, stop, and probe the DSM client is shown below. The script could definitely be done better, however, it works. I've also noticed it may be possible to directly start and stop the scheduler process, dsmc, using the script. I haven't tried this. However, I'm sure it would work. Note that I include this script for informational purposes only, I don't promise that it will work for you.

#!/bin/ksh

# Generally, we should start up with something like this:
# /opt/tivoli/tsm/client/ba/bin/dsmcad -optfile=/zstorage/build-test/tsm/dsm.opt

# set the necessary environment variables so that TSM doesn't vomit
LC_CTYPE="en_US"
export LC_CTYPE
LANG="en_US"
export LANG
LC_LANG="en_US"
export LC_LANG
LC_ALL="en_US"
export LC_ALL

# work out which argument is the command and which the config file
case "$1" in
    'start'|'stop'|'probe')
        COMMAND=$1
        DSM_CONFIG=$2
    ;;
    *)
        COMMAND=$2
        DSM_CONFIG=$1
esac

# now check what we want to do.
case "$COMMAND" in
   'start')
        # echo "starting" 
        # There has to be a better way to do this test....... 
        if test -f $DSM_CONFIG ; then
                true
        else
                echo "Config file $DSM_CONFIG does not exist, exiting." 
                exit 1
        fi
        export DSM_CONFIG
        # Check if there is already a dsmcad process running, if so, ignore the start command
        PS=`ps -ef | grep -v grep | grep -v vi | grep -v probe | grep -v zoneadmd | grep -v "dsm.scheduler.cluster.sh" | grep -c "$DSM_CONFIG"` 
        if test "$PS" -eq "1" ; then
                echo "dsmcad is already started for $DSM_CONFIG, will not start another." 
                ps -ef | grep -v grep | grep -v vi | grep -v probe | grep -v zoneadmd | grep -v "dsm.scheduler.cluster.sh" | grep "$DSM_CONFIG" 
                exit 0
        elif test "$PS" -gt "1" ; then
                echo "Seems to be too many processes running for dsmcad for $DSM_CONFIG, please check it."
                exit 1
        fi

        /opt/tivoli/tsm/client/ba/bin/dsmcad -optfile=$DSM_CONFIG
        if test "$?" -ne "0" ; then
                echo "Failed to start the dsm scheduler, exiting" 
                exit 1
        fi
    ;; 

    'stop')
        # echo "stopping" 
        # For the most part, we ignore a stop command as the dsmcad should work out itself
        # that it has to stop it's child process when the directory with it's password
        # isn't available.
        exit 0
    ;;

    'probe')
        # echo "probing" 
        # WARNING: The following would produce a bug if "vi" is in the arguments...
        # So make sure you avoid it, OK?
        PS=`ps -ef | grep -v grep | grep -v vi | grep -v probe | grep -v zoneadmd | grep -c "$DSM_CONFIG"` 
        if test "$PS" -gt "0" ; then
                # echo "Found $PS processes" 
                exit 0
        else
                echo "Found no processes" 
                exit 1
        fi
    ;;

    *)
        # otherwise an invalid command was received, vomit.
        echo "options { start | stop | probe }" 
        exit 1

esac

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. May 08, 2009

    barthw says:

    Hello. This works well. However, one thing that I have noticed is, for som...

    Hello.

    This works well. However, one thing that I have noticed is, for some reason dsmcad wants to write its logfile to / ... This causes an error and prevents dsmcad from starting, causing the RG to fail and eventually go offline. To get around this, I snuck a "cd /var/tmp" on the line before the "/opt/tivoli/tsm/client/ba/bin/dsmcad -optfile=$DSM_CONFIG" in the start directive.

    Have you seen this behaviour at all?

    I had attempted to adjust the logfile locations in the dsm.opt/dsm.sys file, but that didn't seem to help either ...

    Thanks,

    -b.

Sign up or Log in to add a comment or watch this page.


The individuals who post here are part of the extended Sun Microsystems community and they might not be employed or in any way formally affiliated with Sun Microsystems. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Sun nor any other party necessarily agrees with them.

Copyright 1994-2009 Sun Microsystems, Inc.
Powered by Atlassian Confluence
Sun Guidelines on Public Discourse Privacy Policy Terms of Use Trademarks Site Map Employment Investor Relations Contact