pNFS Control Protocol Specification

Solaris pNFS Control Protocol 1.1

The control protocol facilitates the management of NFSv4.1 state and data-sets at the DSs. It handles:

  • MDS and DS restart / network partition indication.
  • Filehandle, File StateID, and layout validation.
  • Client access validation (has access to MDS-FS and is using appropriate security mechanism).
  • Reporting of DS resources.
  • Management of Inter-DS data movement.
  • MDS Proxy I/O.
  • DS state invalidation.

The control protocol is broken up into 3 RPC programs, that have been assigned the program numbers of 104000, 104001 and 104002.

Messages that flow from the DS to MDS use program number 104001 that has the name of PNFS_CTL_DS, messages that flow from the MDS to a DS use program number 104000 that has the name of PNFS_CTL_MDS, messages that flow between DS and DS to copy and move file data use program number 104002 and has the name PNFS_CTL_MV.

The DS receives filehandles that are in the following format:

  • Type A type to indicate filehandle usage.
  • Version A version number to allow changes.
  • Flags Flags.
  • Generation A generation to expire invalid filehandles
  • MDS id The owning MDS identifier
  • MDS SID The MDS storage identifier which identifies a piece of DS storage (DS_GUID)
  • MDS dataset id The containing dataset id
  • Object id The object identifier.

The control protocol provides the means to identify the MDS id via the DS_EXIBI message, and the mapping for the MDS_SID to DS_GUID via messages DS_REPORTAVAIL and DS_MAP_MDSSID.

The MDS embeds the MDS dataset id so that it may subsequently invalidate all state held by the data server for a specific file system. This is needed in situations such as the un-sharing of a file system or when the security mechanism in use by the client and metadata server have changed (re-share).

Control Protocol Messages:

DS to MDS: (PNFS_CTL_DS / 1040001)

The general model is for the DS to lazily request validation of presented state associated with an I/O operation from a NFS Client. It is expected that the DS will cache the results, and when needed the MDS will invalidate the state via the DS_INVALIDATE message.

DS_CHECKSTATE - Validate state.

Issued by the DS asking MDS to validate the provided FH, Client and stateid.

Description:

The data server provides the stateid, mode, and filehandle from the nfs I/O, along with the NFS client owner (client_owner4).

Using this MDS will first validate the stateid, making sure it is known and also associated with the provided filehandle.

On successful validation MDS provides back the layout segments that are associated with the client/stateid, the MDS dataset id, the MDS clientid corresponding to the stateid and the effective open mode.

Arguments:

struct DS_CHECKSTATEargs
{
    stateid4     	   stateid;
    nfs_fh4      	   fh;
    client_owner4      co_owner;
    int                mode;
};

Results:

struct layout_info
{
    stateid4        layoutstateid;
    offset4         offset;
    length4         length;
    layoutiomode4   iomode;
    uint32_t        device_count;
    uint32_t        positions_in_stripe<>;
};

struct ds_filestate
{
    clientid4        mds_clid;
    layout_info      layout;
    int              open_mode;
};

union DS_CHECKSTATEres switch (ds_status status)
{
    case DS_OK:
        ds_filestate    file_state;
    default:
        void;
};

Possible Error Codes:

DSERR_NOT_AUTH, DSERR_INVAL, DSERR_ACCESS, DSERR_STALE,
DSERR_BAD_FH, DSERR_EXPIRED, DSERR_GRACE, DSERR_WRONGSEC,
DSERR_PNFS_NO_LAYOUT

Notes:

The files' layout(s) for the associated client returned to the DS can be used by the DS to validate that the client is issuing I/O to an appropriate region of the file held by the DS.

The MDS returns the short-hand clientid that is associated with the stateid in order to invalidate the state held at the DS should the NFS Client lose the lease at the MDS.

If the client holds no active layout segment the MDS will return DSERR_PNFS_NO_LAYOUT.

DS_EXIBI – Exchange Identity and Boot Instance.

Exchange identities and boot instances (DS_EXIBI) follows the general intent for the NFSv4.0 OP_SETCLIENTID, and the NFSv4.1 OP_EXCHANGE_ID.

Description:

DS_EXIBI should be issued by the DS to an MDS to establish an identity with the MDS. The data server provides a unique identity string and boot verifier. The MDS replies with a short-hand DS identifier, the MDS's boot verifier, a short-hand MDS identifier and the current lease period.

The DS will present the short version identifier on subsequent DS_REPORTAVAIL and DS_RENEW procedure calls.

Data Types:

typedef uint64_t  cp_id;

typedef identity
{
    ds_verifier     boot_verifier;
    opaque          instance;
};

Argument:

struct DS_EXIBIargs
{
    identity ds_ident;
};

Result:

struct DS_EXIBIresok
{
    cp_id       ds_id;
    cp_id       mds_id;
    ds_verifier mds_boot_verifier;
    uint32_t    mds_lease_period;
};

union DS_EXIBIres switch (ds_status status)
{
    case DS_OK:
        DS_EXIBIresok    res_ok;
    default:
        void;
};

Possible Error Codes:

DSERR_NOT_AUTH, DSERR_INVAL

Notes:

The MDS implementation can use the DS identity to logically collect the information it holds for that particular identity.

It could for example have quick access to the collection of universal addresses for the control-protocol, so when the MDS is acting in proxy_io mode (on behalf of a NFS Client) it may trunk the I/O over multiple interfaces.

The DS may hold state on behalf of a NFS Client for multiple MDSs. The mds_lease_periods from all the MDSs should be taken into account when calculating the lease period of the state, and idle sessions/clientids at the DS.

DS_MAP_MDSSID - Return a mapping for an mds sid.

This is a DS to MDS message to retrieve a mapping for a DS_GUID.

Description:

Although the DS and MDS establish the DS_GUID to MDS_SID mapping at DS_REPORTAVIL time, it is likely that an MDS administrator will create a new pnfs zfs filesystem(s). When this occurs the DS will not have the DS_GUID to MDS_SID mapping when a new MDS_SID is presented in a layout filehandle.

This message provides a method for the DS to request the mapping if it should encounter an unknown MDS_SID embedded in a filehandle. The DS will know to which MDS it should direct the message since the filehandle also carries the mds_id.

The mds_id is obtained by the DS in the reply to the DS_EXIBI message.

Data Types:

struct ds_guid_map
{
    ds_guid   ds_guid;
    mds_sid   mds_sid_array<>;
};

Arguments:

DS_MAP_MDSSIDargs
{
    mds_sid  mma_sid;
};

Results:

struct DS_MAP_MDSSIDresok
{
    mds_guid_map    guid_map;
};

union DS_MAP_MDSSIDres switch (ds_status status)
{
    case DS_OK:
        DS_MAP_MDSSIDresok resok;
    default:
        void;
};

Possible Error Codes:

DSERR_NOT_AUTH, DSERR_INVAL, DSERR_GRACE, DSERR_BAD_MDSSID
DS_MAP_MDS_DATASET_ID - Query root path.

This is a data server to MDS message to query the root path for a MDS dataset id.

Description:

The MDS dataset id is one that the data server has obtained from the filehandle. The MDS will return the root path of the file system that corresponds to the MDS dataset id.

Arguments:

struct DS_MAP_MDS_DATASET_IDargs
{
    mds_dataset_id mds_dataset_id;
};

struct DS_MAP_MDS_DATASET_IDresok
{
    ds_status   status;
    utf8string  pathname;
};

Results:

union DS_MAP_MDS_DATASET_IDres switch (ds_status status)
{
    case DS_OK:
        DS_MAP_MDS_DATASET_IDresok res_ok;
    default:
        void;
};
DS_RENEW - Exchange boot instance verifier.

A message from the DS to MDS to test the connection to MDS and to detect restarts.

Description:

A message from the DS to MDS used to exchange a ds_verifier. This can be used to indicate to the MDS when a data server has restarted, and also to the DS when the MDS has restarted.

In issuing this message on a regular basis the DS can also detect when there is a network partition.

Arguments:

struct DS_RENEWargs
{
    cp_id        ds_id;
    ds_verifier ds_boottime;
};

Results:

union DS_RENEWres switch (ds_status status)
{
    case DS_OK:
        ds_verifier mds_boottime;
    default:
        void;
};

Possible Error Codes:

DSERR_NOT_AUTH, DSERR_INVAL, DSERR_STALE_DSID

Notes:

When the DS or the MDS detect that a restart has occurred. The DS should drop all state that it has cached for that particular MDS and the MDS should drop any state that it believes the DS is holding.

If the MDS has restarted then it will not know about ds_id, in this case MDS will return DSERR_STALE_DSID.

DS_REPORTAVAIL - Data Server resource availability and status.

Issued by the data server informing the MDS of available storage (e.g. pNFS datasets), attributes of available storage (e.g. size available, size used) and IP Addresses.

Description

The data server supplies the short-hand version ds_id obtained from DS_EXIBI so the MDS can locate the correct DS Identity. The DS and MDS communicate which set of storage attributes each understands by setting the ds_attrvers member of DS_REPORTAVAILargs and DS_REPORTAVAILres, respectively.

Data Types:

struct ds_zfsattr
{
    utf8string  attrname;
    opaque      attrvalue<>;
};

typedef uint32_t   ds_addruse;

/* Intended use for the addrs */
const NFS      = 0x00000001;
const DSERV    = 0x00000002;

struct ds_addr
{
    netaddr4            addr;
    ds_addruse          validuse;
};

struct ds_zfsguid
{
    uint64_t        zpool_guid;
    uint64_t        dataset_guid;
};

enum storage_type
{
    ZFS = 1
};

union ds_guid switch (storage_type stor_type)
{
    case ZFS:
        opaque  zfsguid<>;
    default:
        void;
};

struct ds_guid_map
{
    ds_guid         ds_guid;
    mds_sid         mds_sid_array<>;
};

struct ds_zfsinfo
{
    ds_guid_map     guid_map;
    ds_zfsattr      attrs<>;
};

union ds_storinfo switch (storage_type type)
{
    case ZFS:
        ds_zfsinfo zfs_info;
    default:
        void;
};

enum ds_attr_version
{
    DS_ATTR_v1 = 1
};

Arguments:

struct DS_REPORTAVAILargs
{
    cp_id             ds_id;
    ds_verifier       ds_verifier;
    struct ds_addr    ds_addrs<>;
    ds_attr_version   ds_attrvers;
    ds_storinfo       ds_storinfo<>;
};

Results:

struct DS_REPORTAVAILresok
{
    ds_attr_version ds_attrvers;
    ds_guid_map     guid_map<>;
};

union DS_REPORTAVAILres switch (ds_status status)
{
    case DS_OK:
        DS_REPORTAVAILresok res_ok;
    default:
        void;
};

Possible Error Codes:

DSERR_NOT_AUTH, DSERR_INVAL, DSERR_BAD_DSID
DS_SECINFO - Return security information

A message from the DS to the MDS to query object security flavors.

Description:

There is a need for the data server to inquire on the security flavors used for an object at the MDS. The case in particular is when the NFS Client uses an invalid security flavor, the DS replies with the NFSv4 error NFS4ERR_WRONGSEC and the NFS Client then issues a SECINFO_NO_NAME.

  • Arguments:
struct DS_SECINFOargs
{
    nfs_fh4 	object;
    netaddr4   *cl_addr;
};

union ds_secinfo switch (uint32_t flavor)
{
    case RPCSEC_GSS:
        rpcsec_gss_info	 flavor_info;
    default:
        void;
};

Results:

typedef ds_secinfo DS_SECINFOresok<>;

union DS_SECINFOres switch (ds_status status)
{
    case DS_OK:
        DS_SECINFOresok res_ok;
    default:
        void;
};

Possible Error Codes:

DSERR_NOT_AUTH, DSERR_INVAL, DSERR_STALE_FH, DSERR_BAD_FH
DS_SHUTDOWN - Orderly DS shutdown

This is a message from the DS to the MDS informing the MDS that the DS is in the process of an orderly shutdown.

Arguments:

struct DS_SHUTDOWNargs
{
    cp_id        ds_id;
};

Results:

struct DS_SHUTDOWNres
{
    ds_status status;
};

Possible Error Codes:

DSERR_INVAL, DSERR_NOT_AUTH, DSERR_STALE_DSID
DS_FMATPT - transport FMA event(s)

This message is a DS to MDS message that will facilitate the transportation of FMA events to the MDS.

Arguments:

struct DS_FMATPTargs {
    opaque      fma_msg<>;
};

Results:

struct DS_FMATPTres {
    ds_status status;
};

Possible Error Codes:

DSERR_INVAL, DSERR_NOTSUPP
MDS to DS: (PNFS_CTL_MDS / 104000)
DS_COMMIT - Commit write.

Issued by the MDS requesting that the DS write any non-stable data for the given filehandle, for the given file segments.

Data Types:

struct ds_fileseg
{
    offset4 offset;
    count4  count;
};

Arguments:

struct DS_COMMITargs {
    nfs_fh4         fh;
    ds_fileseg      cmv<>;
};

Results:

struct DS_COMMITresok {
    ds_verifier    writeverf;
    count4          count<>;
};

union DS_COMMITres switch (ds_status status) {
    case DS_OK:
        DS_COMMITresok resok;
    default:
        void;
};

Possible Error Codes:

DSERR_NOT_AUTH, DSERR_INVAL, DSERR_NOSPC, DSERR_BAD_FH
DS_LIST - List objects.

Get a list of objects from the DS matching the provided criteria.

Description:

Issued by the MDS asking the DS for a list of objects matching the provided criteria (either mds_dataset_id or MDS_SID)

This message is modeled after READDIR, and so MDS provides a starting offset cookie, and returning buffer-size parameters.

DS returns as many entries that will fit in the buffer-size parameters, starting from the provided cookie.

The information returned will be a counted array of filehandles and also the.

Data Types:

enum ds_list_type {
    DS_LIST_MDS_SID,
    DS_LIST_MDS_DATASET_ID
};

Arguments:

struct ds_list_sid_arg {
    mds_sid mds_sid;
    uint64_t cookie;
    count4   maxcount;
};

struct ds_list_dataset_arg {
    mds_dataset_id   dataset_id;
    uint64_t         cookie;
    count4           maxcount;
};

union DS_LISTargs switch (ds_list_type dla_type) {
    case DS_LIST_MDS_SID:
        ds_list_sid_arg       sid;

    case DS_LIST_MDS_DATASET_ID:
        ds_list_dataset_arg   dataset_id;

    default:
        void;
};

struct DS_LISTresok {
    mds_ds_fh        fh_list<>;
    uint64_t        cookie;
};

Results:

union DS_LISTres switch (ds_status status) {
    case DS_OK:
        DS_LISTresok	res_ok;
    default:
        void;
};

Possible Error Codes:

DSERR_NOT_AUTH, DSERR_INVAL, DSERR_TOOSMALL, DSERR_NOENT
DS_READ - Read an object.

Issued by the MDS to read from the DS for a given DS filehandle, for the provided data segments.

Data Types:

struct ds_fileseg {
    offset4 offset;
    count4  count;
};

Arguments:

struct DS_READargs
{
    nfs_fh4         fh;
    count4          count;
    ds_fileseg      rdv<>;
};

Results:

struct ds_filesegbuf
{
    offset4 offset;
    opaque  data<>;
};

struct DS_READresok {
    bool    eof;
    count4  count;
    ds_filesegbuf rdv<>;
};

union DS_READres switch (ds_status status) {
    case DS_OK:
        DS_READresok res_ok;
    default:
        void;
};

Possible Error Codes:

DSERR_NOT_AUTH, DSERR_INVAL, DSERR_BAD_FH, DSERR_NOENT
CTL_MDS_REMOVE - Remove.

From MDS to a DS, for the given filehandle(s) or MDS dataset id(s), remove the object(s).

Description:

Allows the MDS to remove one or more objects by filehandle or one or more MDS datasets from the DS.

Arguments:

enum ctl_mds_rm_type {
    CTL_MDS_RM_OBJ,
    CTL_MDS_RM_MDS_DATASET_ID
};


union CTL_MDS_REMOVEargs switch (ds_rm_type type) {
    case CTL_MDS_RM_OBJ:
        nfs_fh4         obj<>;

    case CTL_MDS_RM_MDS_DATASET_ID:
        mds_sid           mds_sid;
        mds_dataset_id    dataset_id<>;

    default:
        void;
};

Results:

struct CTL_MDS_REMOVEres {
    ds_status       status;
};

Possible Error Codes:

DSERR_NOT_AUTH, DSERR_INVAL, DSERR_BAD_FH, DSERR_NOENT
DS_WRITE - Write an Object.

From the MDS to a DS, for the given filehandle, write the data segments.

Description:

This procedure provides a way for the MDS to write file data segments to the DS. Most likely due to the NFS Client performing writes through the MDS, due to a network partition between the NFS Client and DS, or the MDS has recalled/revoked the layout and the client was unable to write/commit data to the DS.

The filehandle used in the arguments MUST be obtained from the layout allocation.

Arguments:

struct ds_filesegbuf {
    offset4 offset;
    opaque  data<>;
};

struct DS_WRITEargs
{
    nfs_fh4         fh;
    stable_how4     stable;
    count4          count;
    ds_filesegbuf   wrv<>;
};

Results:

struct DS_WRITEresok
{
    stable_how4     committed;
    ds_verifier    writeverf;
    count4          wrv<>;
};

union DS_WRITEres switch (ds_status status)
{
    case DS_OK:
        DS_WRITEresok res_ok;
    default:
        void;
};

Possible Error Codes:

DSERR_NOT_AUTH, DSERR_INVAL, DSERR_BAD_FH
DS_STAT - Inquire on status of a data server object

From MDS to data sever to return information on the requested object.

Arguments:

struct DS_STATargs
{
    nfs_fh4 object;
};

Results:

union DS_STATres switch (ds_status status)
{
    case DS_OK:
        ds_attr  dattr;
    default:
        void;
};
DS_GETATTR - Get attributes for a data server object

Issued by a MDS to a DS in order to query the attributes of a DS object.

Description:

This may be used when a MDS wants to verify the attributes (for example, size) provided by a client upon LAYOUTCOMMIT.

Arguments:

struct DS_GETATTRargs
{
    nfs_fh4 fh;
    ds_attr dattrs;
};

Results:

union DS_GETATTRres switch (ds_status status)
{
    case DS_OK:
        ds_attr dattrs;
    default:
        void;
};

Possible Error Codes:

DSERR_INVAL, DSERR_ATTR_NOTSUPP DSERR_NOENT
DS_SETATTR - Set attributes for a data server object

Issued by a MDS to a DS to set attributes of a DS object.

Description:

This procedure allows a MDS to set attributes of the specified DS object. When an NFS Client truncates the file size down at the MDS, the MDS MUST issue a DS_SETATTR size to all the effected DSs. The DS_SETATTR messages can occur in parallel, however the MDS must not reply to the NFS Client until all the DS_SETATTRs have completed.

Arguments:

struct DS_SETATTRargs
{
    nfs_fh4 fh;
    ds_attr dattrs;
};

Results:

struct DS_SETATTRres
{
    ds_status status;
};

Possible Error Codes:

DSERR_INVAL, DS_ATTR_NOTSUPP, DSERR_NOENT
DS_SNAP - Snapshot a dataset on the DS that is related to the given MDS Dataset ID.

Arguments:

struct DS_SNAPargs
{
    mds_dataset_id dataset_id;
};

Results:

struct DS_SNAPresok
{
    utf8string  name;
};

union DS_SNAPres switch (ds_status status)
{
    case DS_OK:
        DS_SNAPresok resok;
    default:
        void;
};

Possible Error Codes:

DSERR_NOTSUPP, DSERR_INVAL
DS_INVALIDATE - Invalidate state.

Issued by the MDS to a DS to invalidate state that the DS may hold.

Description:

The MDS may wish to invalidate state at the DS due to many factors. The DS_INVALIDATE message carries a type to indicate the scope of the invalidation, and the DS uses the type to interpret the associated parameter.

Examples of when the different invalidate types will be used are as follows:

  • DS_INVALIDATE_ALL - Used on nfs/server service offlined on the MDS
  • DS_INVALIDATE_CLIENTID - Used on client lease (with the MDS) expiration
  • DS_INVALIDATE_LAYOUT_BY_CLIENT - Used on LAYOUTRETURN when invalidating all layouts held by a client
  • DS_INVALIDATE_LAYOUT_BY_FH - Used on REMOVE issued from client to MDS
  • DS_INVALIDATE_LAYOUT_BY_STATEID - Used on LAYOUTRETURN when invalidating a specific layout held by a client
  • DS_INVALIDATE_MDS_DATASET_ID - Used on unshare of MDS FS
  • DS_INVALIDATE_STATEID - Used on CLOSE, UNLOCK, DELEGRETURN issued from client to MDS

Arguments:

enum ds_invalidate_type
{
    DS_INVALIDATE_ALL,
    DS_INVALIDATE_CLIENTID,
    DS_INVALIDATE_LAYOUT_BY_CLIENT,
    DS_INVALIDATE_LAYOUT_BY_FH,
    DS_INVALIDATE_LAYOUT_BY_STATEID,
    DS_INVALIDATE_MDS_DATASET_ID,
    DS_INVALIDATE_STATEID
};

struct ds_inval_layout_by_clid
{
    clientid4 mds_clid;
    nfs_fh4 fh;
};

struct ds_inval_stateid_lo_stateid
{
    stateid4 stateid; /* MUST be layout stateid */
    nfs_fh4 fh;
};

struct ds_inval_state_id
{
    stateid4 stateid; /* MUST be open, lock or delegation stateid */
    nfs_fh4 fh;
};

union DS_INVALIDATEargs switch (ds_invalidate_type obj)
{
    case DS_INVALIDATE_ALL:
        void;
    case DS_INVALIDATE_CLIENTID:
        clientid4  clid;
    case DS_INVALIDATE_LAYOUT_BY_CLIENT:
        ds_inval_layout_by_clid layout;
    case DS_INVALIDATE_LAYOUT_BY_FH:
        nfs_fh4 fh;
    case DS_INVALIDATE_LAYOUT_BY_STATEID:
        ds_inval_layout_by_lo_stateid stateid;
    case DS_INVALIDATE_MDS_DATASET_ID:
        mds_dataset_id  dataset_id;
    case DS_INVALIDATE_STATEID:
        ds_inval_stateid stateid;
};

struct DS_INVALIDATEres
{
    ds_status status;
};

Results:

struct DS_INVALIDATEres
{
    ds_status status;
};

Possible Error Codes:

DSERR_NOT_AUTH, DSERR_INVAL, DSERR_NOENT

Notes:

DS_INVALIDATE_ALL

Invalidates all state at the data server. This is perhaps sent when the MDS knows it is in the process of an orderly shutdown.

DS_INVALIDATE_LAYOUT

To invalidate all layout segments held at the DS for this particular filehandle clientid combination. The MDS could use this message when a client returns the layout via OP_LAYOUTRETURN, or when the MDS has revoked the layout from the client.

DS_INVALIDATE_MDS_DATASET_ID

To invalidate all state for a MDS dataset id. The MDS would use this message for the un-share (or re-share) of a filesystem.

DS_INVALIDATE_STATEID

To invalidate state for the specified stateid. The MDS would use this when the client has issued either OP_CLOSE or OP_DELEGRETURN. The MDS could also use this message when it has revoked a delegation.

DS_INVALIDATE_CLIENTID

To invalidate all state for a particular clientid. The MDS would use this message once the client has lost it lease with the MDS, or the client has issues an explicit OP_DESTROY_CLIENTID.

DS_PNFSSTAT - Collect kstats

This is a message from the MDS to DS to ask DS to report all pnfs/rpc kstats

The indices will be reported as nvpairs. The name will possibly be constructed from the corresponding kstat <class><name> <statistic>_< instance>

Arguments:

/* RPC kstats */
const DS_NFSSTAT_RPC     = 0x000000001;

/* NFS kstats */
const DS_NFSSTAT_NFS     = 0x000000002;

/* the DMOV protocol kstats */
const DS_NFSSTAT_DMOV    = 0x000000004;

/* the control protocol kstats */
const DS_NFSSTAT_CP      = 0x000000008;

/* CPU kstat (all stats for module cpu)*/
const DS_NFSSTAT_CPU     = 0x000000010;

struct DS_PNFSSTATargs
{
    uint64_t stat_wanted;
};

Results:

struct DS_PNFSSTATresok
{
    opaque    nvlist<>;
};

union DS_PNFSSTATres switch (ds_status status)
{
    case DS_OK:
        DS_PNFSSTATresok res_ok;
    default:
        void;
};

Possible Error Codes:

DSERR_INVAL, DSERR_NOTSUPP
DS_CHANGE_DATASET_ID - Notify the data server that a change to the dataset ID on the MDS has been made (e.g. as a result of a zfs send/recv of a pNFS file system on the MDS).

Arguments:

struct DS_CHANGE_DATASET_IDargs
{
    mds_sid           mds_sid_owner;
    mds_dataset_id    old_dataset_id;
    mds_dataset_id    new_dataset_id;
};

Results:

struct DS_CHANGE_DATASET_IDres
{
    ds_status status;
};
DS_OBJ_MOVE_STATUS - To inquire on the status of a move

Since MDS supplies a task id for the data server move, it can inquire as to the status of the move.

Arguments:

struct DS_OBJ_MOVE_STATUSargs
{
    uint64_t taskid;
};

Results:

struct DS_OBJ_MOVE_STATUSresok
{
    uint64_t maxoffset;
    bool complete;
};

union DS_OBJ_MOVEres switch (ds_status status) {
    case DS_OK:
         DS_OBJ_MOVEresok res_ok;
    default:
        void;
};

Possible Error Codes:

DSERR_INVAL, DSERR_NOTSUPP
DS_OBJ_MOVE_ABORT - To stop an in-progress move.

For the given taskid, abort, abort, abort

Arguments:

struct DS_OBJ_MOVE_ABORTargs
{
    uint64_t taskid;
};

Results:

struct DS_OBJ_MOVE_ABORTres
{
    ds_status   status;
};

Possible Error Codes:

DSERR_INVAL, DSERR_NOTSUPP, DSERR_NOMATCHING_TASKID
DS_OBJ_MOVE - To initiate a data server to data server move

From the MDS to data server, a message to instruct a data server to move the files data blocks for a given FH.

The MDS supplies source and destination FH and target data server and taskid.

Arguments:

struct DS_OBJ_MOVEargs
{
    uint64_t taskid;
    nfs_fh4 source;
    nfs_fh4 target;
    netaddr4 targetserver;
};

Results:

struct DS_OBJ_MOVEres
{
    ds_status    status;
};

Possible Error Codes:

DSERR_INVAL, DSERR_NOTSUPP
Object move/relocation: (PNFSCTLMV / 1040002)

The PNFSCTLMV protocol is exclusively for moving data from one data server to another. This would occur as the result of the MDS making a request (DS_OBJ_MOVE) via the PNFSCTLMDS protocol.

MOVE - Move data from one data server to another.

Arguments:

struct MOVEargs {
    uint64_t taskid;
    uint64_t minoffset;
    uint64_t maxoffset;
    uint32_t maxbytes;
};

Results:

struct MOVEsegment {
    uint64_t fileoffset;
    uint32_t len;
};

struct MOVEres {
    ds_status status;
    opaque data<>;
    struct MOVEsegment segments<>;
};

Possible Error Codes:
DSERR_NOENT

Notes:
The MOVE message is roughly equivalent to an NFS read, but with other properties. It is specially designed to deal with objects with holes.

MOVE is pull-based, rather than push-based. That is, the target data server contacts the source data server to ask it for data. NFS/RDMA has shown better performance with READ than WRITE. Also, pull-based avoids the problem of the stability of the data on the target server – there will be no need for write verifiers and the like.

The disadvantage with pull-based data movement is that the source data server is the one that is aware of any holes in the source object. The dmov protocol reflects this. The result of a MOVE operation is a block of data, and an array of offsets and lengths. The arguments give a minimum and maximum file offset, which enables multiple MOVE operations to be in flight – the target server can arrange for non-overlapping MOVEs.

Error codes (ds_status)

The control protocol carries a status word in the reply to convey successful execution of the request or to report a problem. The status word uses an enumeration 'ds_status' and currently has the following assigned values.

  • DS_OK
    The request completed successfully.
  • DSERR_NOENT
    The item does not exist.
  • DSERR_NOT_AUTH
    The DS has been blocked from accessing the MDS through system administration.
  • DSERR_INVAL
    One or more of the arguments provided in the RPC Call are invalid.
  • DSERR_IO
    The server encountered a I/O error with the underlying data store.
  • DSERR_ACCESS
    During the DS_CHECKSTATE validation it was determined that the client does not have the appropriate access rights.
  • DSERR_NOSPC
    The MDS issued a DS_WRITE to a DS and the DS has no space left.
  • DSERR_STALE_FH
    The provided filehandle is stale.
  • DSERR_BADHANDLE
    The format of the filehandle was not recognized.
  • DSERR_BAD_COOKIE
    The MDS presented an invalid cookie to the DS in the DS_LIST message
  • DSERR_NOTSUPP
    The control protocol message you issued has yet to be implemented.
  • DSERR_TOOSMALL
    The DS_LIST results will not fit into the provided size.
  • DSERR_EXPIRED
  • DSERR_GRACE
    The MDS is in the process of allowing clients to reclaim state after the MDS restarted
  • DSERR_FHEXPIRED
    The generation count within the filehandle has been incremented.
  • DSERR_WRONGSEC
    The security flavors for ds_checkstate do no match the required security flavor for the target object.
  • DSERR_RESOURCE
    The MDS has no available resource to complete the requested operation.
  • DSERR_STALE_DSID
    The ds_id used is not known to the MDS.
  • DSERR_STALE_STATEID
  • DSERR_SERVERFAULT
    The data server encountered an unrecoverable error.
  • DSERR_OLD_STATEID
  • DSERR_BAD_MDSSID
    The MDS was unable to provide a mapping for the requested MDS_SID.
  • DSERR_PNFS_NO_LAYOUT
    The client, identified by the co_owner member of the CHECKSTATE4args, that contacted the DS for I/O does not hold a layout.
  • DSERR_NOMATCHING_TASKID
    Returned from DS_MOVE_OBJ_STATUS and DS_MOVE_OBJ_ABORT if the taskid given is not recognized by the DS.
  • DSERR_ATTR_NOTSUPP
    Attribute is not supported.
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

Sign up or Log in to add a comment or watch this page.


The individuals who post here are part of the extended Sun Microsystems community and they might not be employed or in any way formally affiliated with Sun Microsystems. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Sun nor any other party necessarily agrees with them.

© 2010, Oracle Corporation and/or its affiliates
Powered by Atlassian Confluence
Oracle Social Media Participation Policy Privacy Policy Terms of Use Trademarks Site Map Employment Investor Relations Contact