The primary way Celeste achieves data availablity is through simple data replication. It stores data in more than one location, in the expectation that at least one copy of the data will be available despite one or more replicas being inaccessible at the moment.
Devising techniques for maintaining data availablility in Celeste is a rich area for experimentation and research, covering everything from the relatively simple question of how many object copies the system should maintain to policies and adaptive behaviours for governing object caching, placement, and replication under stress. Our current implementation (described below) falls at the simple end of the spectrum. We hope to extend the implementation (and update this description accordingly) with a more sophisticated set of controls.
Background

A Celeste file consists of three kinds of objects: a A single Anchor Object (AObject) per file, a single Version Object (VObject) per file version, and zero or more Block Objects (BObject) each containing file data. These objects appear as boxes in the diagram on the right. In addition to these file related objects, Celeste creates a set of objects that maintain the file's mutable state. These objects are controlled by the quorum protocol governing file updates and keep track of the mapping between the file itself, as represented by its AObject, and its current version, as represented in a VObject. These quorum-related objects are shown figuratively as the dashed arrow in the diagram.
Each of these objects maintains parameters for its own replication, and may maintain replication parameters for the objects that it controls. For example, each AObject also records the replication parameters for each of its corresponding VObjects and BObjects. Similarly each VObject records the replication parameters for each of its BObjects. New objects created as a result of a file update inherit their replication parameters from their corresponding controlling object.
Specifying the replication parameters consists of supplying simple string attribute-value pairs. These strings are given as arguments to command line interfaces, and to the constructor of the ReplicationParams.java class in the Celeste Java interface and are described below.
File Creation
Replication parameters for each of the AObject, VObject, and BObjects are supplied as part of the CreateFileOperation and are used for the lifetime of the file.
Replication.Store=n
The number of data replicas created when the data is stored in the Celeste system. These replicas are created during the write process and the write is not complete until all replicas are safely in place.
This number has a direct impact on the amount of time it takes to store data in Celeste. A low value reduces the amount of time a write takes to complete at the risk that data will be unavailable when needed because insufficient copies are available. A high value increases the amount of time it takes for a write to complete, but ensures that more replicas are available to hedge against failure. Choosing this number is a function of the required availability of the data, the number of objects from the pool of all objects that are missing at any given moment. Each of these parameters is a function of other variables as well.
AObjectVersionMap.Params=f,b
The mechanism that maintains the value of an Anchor Object to Version Object mapping is implemented as a map from the AObject object-id, to the VObject object-id. The map is implemented as a fault-scalable, byzantine fault-tolerant variable. The variable requires the maintenance of a set of stored objects, each of which is stored on a different node in the system, some of which may be missing, out-of-date, or behaving maliciously (byzantine) when needed.
The current protocol requires storing 3f+2b+1 of these map objects to maintain the value of the mapping from the Anchor Object to its current Version Object. The value f is the anticipated number of unavailable objects in the set of objects maintaining the map. The value b is the anticipated number of byzantine objects in the set of objects maintaining the map.
Note that the number of objects necessary to maintain the value of the AObject to VObject map dictates the smallest possible Celeste system. For example, specifying a value of 1 for both f and b necessitates a system with at least 6 unique nodes.
There are many directions for extensions and improvements to the schemes related to replication, information dispersion and so forth. If you have ideas or wish to further the discussion please post your thoughts on the forums.
Examples
The examples below are illustrative.
Command Line Interfaces
By convention, all of the command line interfaces take a single string as the specification for the replication parameters.
Here are a few examples:
$ celestesh create-file celestesh-id myPass celestesh-ns nspw celestesh-file \ celestesh-Owner celestesh-Group deleteMe \ 'AObject.Replication.Store=3;VObject.Replication.Store=3;BObject.Replication.Store=3' \ 8388608 86400
$ celestefs --id meMyselfAndI --password plugh \ --replication 'AObject.Replication.Store=3;VObject.Replication.Store=3;BObject.Replication.Store=3' \ create /celestefs-example/fubar
Java Interfaces
The string containing the replication parameters is supplied to the CreateFileOperation object constructor. For example, client-side code analagous to the celestesh example above is:
DOLRObjectId nameSpaceId = new DOLRObjectId_("celestesh-ns".getBytes()); DOLRObjectId fileId = new DOLRObjectId_("celestesh-ns".getBytes()); CreateFileOperation operation = new CreateFileOperation( new DOLRObjectId_("celestesh-id".getBytes()), nameSpaceId, fileId, new DOLRObjectId_((nameSpaceId.toString() + fileId.toString() + "deleteMe").getBytes()).getObjectId(), 86400, 8388608, "AObject.Replication.Store=3;VObject.Replication.Store=3;BObject.Replication.Store=3", new ClientMetaData(), new DOLRObjectId_("celestesh-Owner".getBytes()), new DOLRObjectId_("celestesh-Group".getBytes()), (CelesteACL) null, true);