HOWTO Recover a SGD Deployment with an Inconsistent Array Topology

Introduction

Occasionally, a Sun ™ Secure Global Desktop array may stumble with resource synchronization across nodes. This condition — if left unchecked — has the potential to result in an inconsistent array topology.

This state can be identified when discrete nodes within an array report different results during the execution of the 'tarantella status' command at the system console.

Understanding this behavior

This condition is most commonly attributed to partial or failed execution of array maintenance commands, but can also be triggered within an existing array environment in extended periods of network downtime.

Unfortunately, attempts to recover from an inconsistent array topology using the 'tarantella array' command frequently fail. This is typically due to the fact that a secondary node is not authorized to communicate with the bulk of the array, or there is a fundamental disagreement on which node is the true primary.

In order to recover from this state, it is necessary to manually separate the troubled nodes from the array, thereby forcing them into a stand-alone condition. Once this process is completed, it may be possible to rejoin the array using the 'tarantella array' command.

Note: It is important that the administrator work to understand the cause of the initial failure. If two nodes are still unable to successfully communicate with one another bilaterally, the reader is likely to find themselves working through these steps a second time. This matter is discussed in some detail, below.

The example below will outline the steps to recover a demonstration array, consisting of the following two nodes:

  • alpha.demo.sun.com - array primary
  • omega.demo.sun.com - array secondary

Manually detaching a node from an array

Step 1: Force the secondary node into a stand alone state

To implement this change, we must modify the node's underlying properties file, and configure it to be self-referencing. In this configuration, the SGD installation will consider itself a primary node. The corresponding lack of additional secondary properties indicate it is in an array of one, or running as a stand-alone server.

Note: It is always considered good practice to create a back-up of any file you intend to modify before committing your changes to disk. Doing so will allow you to quickly retrace your steps if necessary.

1. Stop the Secure Global Desktop application on all nodes of the array before making any changes.

 
# tarantella stop

2. The properties files live within the <install_dir>/var/serverconfig/global/ directory. The file we will modify is named '[array_node].properties,' where [array_node] is a placeholder for the fully qualified domain name (FQDN) of the secondary node you are trying to recover.

To illustrate this concept within our example environment, we would look for a file corresponding to the name of our secondary:

/opt/tarantella/var/serverconfig/global/omega.demo.sun.com.properties

After making a copy of the original, we would open this file using a text editor and modify the "master" property, as demonstrated below.

Look for the entry:

tarantella.config.host.master=alpha.demo.sun.com

...and update with the FQDN of the secondary itself:

tarantella.config.host.master=omega.demo.sun.com

3. In order to confirm the above changes had the desired effect, we must verify that the secondary believes it is acting independently of the initial array, and is running in stand-alone mode. The 'tarantella status' command will be the metric metric of our success.

# ./tarantella status
Array members (1):
- omega.demo.sun.com (primary): Accepting standard connections.
Webtop sessions: 0
Emulator sessions: 0

Step 2: Removing all record of the detached node from the array

The above steps were specific to stabilizing and segregating the troubled secondary itself. It is now necessary to remove all record of node(s) manually detached from the remaining members of the previous array.

4. On the Primary node of the original array, we must remove all the [array_node].properties files that correspond to secondaries detached using the instructions above. These files will live in the the following directory.

<install_dir>/var/serverconfig/global

Note: Take care not to remove the properties file corresponding to the primary itself. In the example scenario above, we would remove only the file corresponding to the secondary:

# rm omega.demo.sun.com.properties

5. In order to confirm that this process has completed successfully, we need to verify the correct array topology is reported when executing the 'tarantella status' command on the primary.

# tarantella status

6. Start Tarantella services on all machines. If additional nodes remain in the original array, it is important to allow a few minutes for the primary to resynchronize with its secondaries. When all nodes in the array report consistent responses to the 'tarantella status' command, you will have completed the manual separation of the secondary node.

If you intend to re-join the secondary into the array, it should now be safe to do so, using the 'tarantella array join' command, executed on the primary.

Understanding the cause of the problem

It is very important to verify that individual nodes can resolve peer hostnames both forwards and backwards. Additionally, it may be necessary to verify that there are no obstructions to traffic on ports used by SGD for peer communication. In a default installation, this would be port 5427, but others may be important, depending on your configuration.

For a list of ports used by SGD, see the online documentation, available here:
http://docs.sun.com/source/819-4309-10/en-us/base/indepth/ports_used.html

Additional information

It is strongly recommended that the Secure Global Desktop administrator is familiar with the concept of arrays, and understands the usage of the 'tarantella array' command. Additional information on these subjects can be found within the online documentation, available here:

What is an array?
http://docs.sun.com/source/819-4309-10/en-us/base/gettingstarted/array_whatis.html

Setting up and dismantling a Secure Global Desktop array
http://docs.sun.com/source/819-4309-10/en-us/base/standard/array_management.html

The tarantella array command
http://docs.sun.com/source/819-4309-10/en-us/base/standard/tta_array.html

Finally, it's worth noting that the SGD Support team have found that the command-line (CLI) interface is much more robust than the Array Manager (AM) for this type of administration. We strongly suggest the using the CLI in lieu of the graphic interface for changes in array topology.

Labels

td84692 td84692 Delete
sun sun Delete
secure secure Delete
global global Delete
desktop desktop Delete
sgd sgd Delete
array array Delete
topology topology Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

Sign up or Log in to add a comment or watch this page.


The individuals who post here are part of the extended Sun Microsystems community and they might not be employed or in any way formally affiliated with Sun Microsystems. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Sun nor any other party necessarily agrees with them.

Copyright 1994-2009 Sun Microsystems, Inc.
Powered by Atlassian Confluence
Sun Guidelines on Public Discourse Privacy Policy Terms of Use Trademarks Site Map Employment Investor Relations Contact