Solaris pNFS Control Protocol 1.1
The control protocol facilitates the management of NFSv4.1 state and data-sets at the DSs. It handles:
- MDS and DS restart / network partition indication.
- Filehandle, File StateID, and layout validation.
- Client access validation (has access to MDS-FS and is using appropriate security mechanism).
- Reporting of DS resources.
- Management of Inter-DS data movement.
- MDS Proxy I/O.
- DS state invalidation.
The control protocol is broken up into 3 RPC programs, that have been assigned the program numbers of 104000, 104001 and 104002.
Messages that flow from the DS to MDS use program number 104001 that has the name of PNFS_CTL_DS, messages that flow from the MDS to a DS use program number 104000 that has the name of PNFS_CTL_MDS, messages that flow between DS and DS to copy and move file data use program number 104002 and has the name PNFS_CTL_MV.
The DS receives filehandles that are in the following format:
- Type A type to indicate filehandle usage.
- Version A version number to allow changes.
- Flags Flags.
- Generation A generation to expire invalid filehandles
- MDS id The owning MDS identifier
- MDS SID The MDS storage identifier which identifies a piece of DS storage (DS_GUID)
- MDS dataset id The containing dataset id
- Object id The object identifier.
The control protocol provides the means to identify the MDS id via the DS_EXIBI message, and the mapping for the MDS_SID to DS_GUID via messages DS_REPORTAVAIL and DS_MAP_MDSSID.
The MDS embeds the MDS dataset id so that it may subsequently invalidate all state held by the data server for a specific file system. This is needed in situations such as the un-sharing of a file system or when the security mechanism in use by the client and metadata server have changed (re-share).
Control Protocol Messages:
DS to MDS: (PNFS_CTL_DS / 1040001)
The general model is for the DS to lazily request validation of presented state associated with an I/O operation from a NFS Client. It is expected that the DS will cache the results, and when needed the MDS will invalidate the state via the DS_INVALIDATE message.
DS_CHECKSTATE - Validate state.
Issued by the DS asking MDS to validate the provided FH, Client and stateid.
Description:
The data server provides the stateid, mode, and filehandle from the nfs I/O, along with the NFS client owner (client_owner4).
Using this MDS will first validate the stateid, making sure it is known and also associated with the provided filehandle.
On successful validation MDS provides back the layout segments that are associated with the client/stateid, the MDS dataset id, the MDS clientid corresponding to the stateid and the effective open mode.
Arguments:
struct DS_CHECKSTATEargs
{
stateid4 stateid;
nfs_fh4 fh;
client_owner4 co_owner;
int mode;
};
Results:
struct layout_info
{
stateid4 layoutstateid;
offset4 offset;
length4 length;
layoutiomode4 iomode;
uint32_t device_count;
uint32_t positions_in_stripe<>;
};
struct ds_filestate
{
clientid4 mds_clid;
layout_info layout;
int open_mode;
};
union DS_CHECKSTATEres switch (ds_status status)
{
case DS_OK:
ds_filestate file_state;
default:
void;
};
Possible Error Codes:
DSERR_NOT_AUTH, DSERR_INVAL, DSERR_ACCESS, DSERR_STALE, DSERR_BAD_FH, DSERR_EXPIRED, DSERR_GRACE, DSERR_WRONGSEC, DSERR_PNFS_NO_LAYOUT
Notes:
The files' layout(s) for the associated client returned to the DS can be used by the DS to validate that the client is issuing I/O to an appropriate region of the file held by the DS.
The MDS returns the short-hand clientid that is associated with the stateid in order to invalidate the state held at the DS should the NFS Client lose the lease at the MDS.
If the client holds no active layout segment the MDS will return DSERR_PNFS_NO_LAYOUT.
DS_EXIBI – Exchange Identity and Boot Instance.
Exchange identities and boot instances (DS_EXIBI) follows the general intent for the NFSv4.0 OP_SETCLIENTID, and the NFSv4.1 OP_EXCHANGE_ID.
Description:
DS_EXIBI should be issued by the DS to an MDS to establish an identity with the MDS. The data server provides a unique identity string and boot verifier. The MDS replies with a short-hand DS identifier, the MDS's boot verifier, a short-hand MDS identifier and the current lease period.
The DS will present the short version identifier on subsequent DS_REPORTAVAIL and DS_RENEW procedure calls.
Data Types:
typedef uint64_t cp_id;
typedef identity
{
ds_verifier boot_verifier;
opaque instance;
};
Argument:
struct DS_EXIBIargs
{
identity ds_ident;
};
Result:
struct DS_EXIBIresok
{
cp_id ds_id;
cp_id mds_id;
ds_verifier mds_boot_verifier;
uint32_t mds_lease_period;
};
union DS_EXIBIres switch (ds_status status)
{
case DS_OK:
DS_EXIBIresok res_ok;
default:
void;
};
Possible Error Codes:
DSERR_NOT_AUTH, DSERR_INVAL
Notes:
The MDS implementation can use the DS identity to logically collect the information it holds for that particular identity.
It could for example have quick access to the collection of universal addresses for the control-protocol, so when the MDS is acting in proxy_io mode (on behalf of a NFS Client) it may trunk the I/O over multiple interfaces.
The DS may hold state on behalf of a NFS Client for multiple MDSs. The mds_lease_periods from all the MDSs should be taken into account when calculating the lease period of the state, and idle sessions/clientids at the DS.
DS_MAP_MDSSID - Return a mapping for an mds sid.
This is a DS to MDS message to retrieve a mapping for a DS_GUID.
Description:
Although the DS and MDS establish the DS_GUID to MDS_SID mapping at DS_REPORTAVIL time, it is likely that an MDS administrator will create a new pnfs zfs filesystem(s). When this occurs the DS will not have the DS_GUID to MDS_SID mapping when a new MDS_SID is presented in a layout filehandle.
This message provides a method for the DS to request the mapping if it should encounter an unknown MDS_SID embedded in a filehandle. The DS will know to which MDS it should direct the message since the filehandle also carries the mds_id.
The mds_id is obtained by the DS in the reply to the DS_EXIBI message.
Data Types:
struct ds_guid_map
{
ds_guid ds_guid;
mds_sid mds_sid_array<>;
};
Arguments:
DS_MAP_MDSSIDargs
{
mds_sid mma_sid;
};
Results:
struct DS_MAP_MDSSIDresok
{
mds_guid_map guid_map;
};
union DS_MAP_MDSSIDres switch (ds_status status)
{
case DS_OK:
DS_MAP_MDSSIDresok resok;
default:
void;
};
Possible Error Codes:
DSERR_NOT_AUTH, DSERR_INVAL, DSERR_GRACE, DSERR_BAD_MDSSID
DS_MAP_MDS_DATASET_ID - Query root path.
This is a data server to MDS message to query the root path for a MDS dataset id.
Description:
The MDS dataset id is one that the data server has obtained from the filehandle. The MDS will return the root path of the file system that corresponds to the MDS dataset id.
Arguments:
struct DS_MAP_MDS_DATASET_IDargs
{
mds_dataset_id mds_dataset_id;
};
struct DS_MAP_MDS_DATASET_IDresok
{
ds_status status;
utf8string pathname;
};
Results:
union DS_MAP_MDS_DATASET_IDres switch (ds_status status) { case DS_OK: DS_MAP_MDS_DATASET_IDresok res_ok; default: void; };
DS_RENEW - Exchange boot instance verifier.
A message from the DS to MDS to test the connection to MDS and to detect restarts.
Description:
A message from the DS to MDS used to exchange a ds_verifier. This can be used to indicate to the MDS when a data server has restarted, and also to the DS when the MDS has restarted.
In issuing this message on a regular basis the DS can also detect when there is a network partition.
Arguments:
struct DS_RENEWargs
{
cp_id ds_id;
ds_verifier ds_boottime;
};
Results:
union DS_RENEWres switch (ds_status status) { case DS_OK: ds_verifier mds_boottime; default: void; };
Possible Error Codes:
DSERR_NOT_AUTH, DSERR_INVAL, DSERR_STALE_DSID
Notes:
When the DS or the MDS detect that a restart has occurred. The DS should drop all state that it has cached for that particular MDS and the MDS should drop any state that it believes the DS is holding.
If the MDS has restarted then it will not know about ds_id, in this case MDS will return DSERR_STALE_DSID.
DS_REPORTAVAIL - Data Server resource availability and status.
Issued by the data server informing the MDS of available storage (e.g. pNFS datasets), attributes of available storage (e.g. size available, size used) and IP Addresses.
Description
The data server supplies the short-hand version ds_id obtained from DS_EXIBI so the MDS can locate the correct DS Identity. The DS and MDS communicate which set of storage attributes each understands by setting the ds_attrvers member of DS_REPORTAVAILargs and DS_REPORTAVAILres, respectively.
Data Types:
struct ds_zfsattr
{
utf8string attrname;
opaque attrvalue<>;
};
typedef uint32_t ds_addruse;
/* Intended use for the addrs */
const NFS = 0x00000001;
const DSERV = 0x00000002;
struct ds_addr
{
netaddr4 addr;
ds_addruse validuse;
};
struct ds_zfsguid
{
uint64_t zpool_guid;
uint64_t dataset_guid;
};
enum storage_type
{
ZFS = 1
};
union ds_guid switch (storage_type stor_type)
{
case ZFS:
opaque zfsguid<>;
default:
void;
};
struct ds_guid_map
{
ds_guid ds_guid;
mds_sid mds_sid_array<>;
};
struct ds_zfsinfo
{
ds_guid_map guid_map;
ds_zfsattr attrs<>;
};
union ds_storinfo switch (storage_type type)
{
case ZFS:
ds_zfsinfo zfs_info;
default:
void;
};
enum ds_attr_version
{
DS_ATTR_v1 = 1
};
Arguments:
struct DS_REPORTAVAILargs
{
cp_id ds_id;
ds_verifier ds_verifier;
struct ds_addr ds_addrs<>;
ds_attr_version ds_attrvers;
ds_storinfo ds_storinfo<>;
};
Results:
struct DS_REPORTAVAILresok
{
ds_attr_version ds_attrvers;
ds_guid_map guid_map<>;
};
union DS_REPORTAVAILres switch (ds_status status)
{
case DS_OK:
DS_REPORTAVAILresok res_ok;
default:
void;
};
Possible Error Codes:
DSERR_NOT_AUTH, DSERR_INVAL, DSERR_BAD_DSID
DS_SECINFO - Return security information
A message from the DS to the MDS to query object security flavors.
Description:
There is a need for the data server to inquire on the security flavors used for an object at the MDS. The case in particular is when the NFS Client uses an invalid security flavor, the DS replies with the NFSv4 error NFS4ERR_WRONGSEC and the NFS Client then issues a SECINFO_NO_NAME.
- Arguments:
struct DS_SECINFOargs
{
nfs_fh4 object;
netaddr4 *cl_addr;
};
union ds_secinfo switch (uint32_t flavor)
{
case RPCSEC_GSS:
rpcsec_gss_info flavor_info;
default:
void;
};
Results:
typedef ds_secinfo DS_SECINFOresok<>; union DS_SECINFOres switch (ds_status status) { case DS_OK: DS_SECINFOresok res_ok; default: void; };
Possible Error Codes:
DSERR_NOT_AUTH, DSERR_INVAL, DSERR_STALE_FH, DSERR_BAD_FH
DS_SHUTDOWN - Orderly DS shutdown
This is a message from the DS to the MDS informing the MDS that the DS is in the process of an orderly shutdown.
Arguments:
struct DS_SHUTDOWNargs
{
cp_id ds_id;
};
Results:
struct DS_SHUTDOWNres
{
ds_status status;
};
Possible Error Codes:
DSERR_INVAL, DSERR_NOT_AUTH, DSERR_STALE_DSID
DS_FMATPT - transport FMA event(s)
This message is a DS to MDS message that will facilitate the transportation of FMA events to the MDS.
Arguments:
struct DS_FMATPTargs {
opaque fma_msg<>;
};
Results:
struct DS_FMATPTres {
ds_status status;
};
Possible Error Codes:
DSERR_INVAL, DSERR_NOTSUPP
MDS to DS: (PNFS_CTL_MDS / 104000)
DS_COMMIT - Commit write.
Issued by the MDS requesting that the DS write any non-stable data for the given filehandle, for the given file segments.
Data Types:
struct ds_fileseg
{
offset4 offset;
count4 count;
};
Arguments:
struct DS_COMMITargs {
nfs_fh4 fh;
ds_fileseg cmv<>;
};
Results:
struct DS_COMMITresok {
ds_verifier writeverf;
count4 count<>;
};
union DS_COMMITres switch (ds_status status) {
case DS_OK:
DS_COMMITresok resok;
default:
void;
};
Possible Error Codes:
DSERR_NOT_AUTH, DSERR_INVAL, DSERR_NOSPC, DSERR_BAD_FH
DS_LIST - List objects.
Get a list of objects from the DS matching the provided criteria.
Description:
Issued by the MDS asking the DS for a list of objects matching the provided criteria (either mds_dataset_id or MDS_SID)
This message is modeled after READDIR, and so MDS provides a starting offset cookie, and returning buffer-size parameters.
DS returns as many entries that will fit in the buffer-size parameters, starting from the provided cookie.
The information returned will be a counted array of filehandles and also the.
Data Types:
enum ds_list_type {
DS_LIST_MDS_SID,
DS_LIST_MDS_DATASET_ID
};
Arguments:
struct ds_list_sid_arg {
mds_sid mds_sid;
uint64_t cookie;
count4 maxcount;
};
struct ds_list_dataset_arg {
mds_dataset_id dataset_id;
uint64_t cookie;
count4 maxcount;
};
union DS_LISTargs switch (ds_list_type dla_type) {
case DS_LIST_MDS_SID:
ds_list_sid_arg sid;
case DS_LIST_MDS_DATASET_ID:
ds_list_dataset_arg dataset_id;
default:
void;
};
struct DS_LISTresok {
mds_ds_fh fh_list<>;
uint64_t cookie;
};
Results:
union DS_LISTres switch (ds_status status) { case DS_OK: DS_LISTresok res_ok; default: void; };
Possible Error Codes:
DSERR_NOT_AUTH, DSERR_INVAL, DSERR_TOOSMALL, DSERR_NOENT
DS_READ - Read an object.
Issued by the MDS to read from the DS for a given DS filehandle, for the provided data segments.
Data Types:
struct ds_fileseg {
offset4 offset;
count4 count;
};
Arguments:
struct DS_READargs
{
nfs_fh4 fh;
count4 count;
ds_fileseg rdv<>;
};
Results:
struct ds_filesegbuf
{
offset4 offset;
opaque data<>;
};
struct DS_READresok {
bool eof;
count4 count;
ds_filesegbuf rdv<>;
};
union DS_READres switch (ds_status status) {
case DS_OK:
DS_READresok res_ok;
default:
void;
};
Possible Error Codes:
DSERR_NOT_AUTH, DSERR_INVAL, DSERR_BAD_FH, DSERR_NOENT
CTL_MDS_REMOVE - Remove.
From MDS to a DS, for the given filehandle(s) or MDS dataset id(s), remove the object(s).
Description:
Allows the MDS to remove one or more objects by filehandle or one or more MDS datasets from the DS.
Arguments:
enum ctl_mds_rm_type { CTL_MDS_RM_OBJ, CTL_MDS_RM_MDS_DATASET_ID }; union CTL_MDS_REMOVEargs switch (ds_rm_type type) { case CTL_MDS_RM_OBJ: nfs_fh4 obj<>; case CTL_MDS_RM_MDS_DATASET_ID: mds_sid mds_sid; mds_dataset_id dataset_id<>; default: void; };
Results:
struct CTL_MDS_REMOVEres {
ds_status status;
};
Possible Error Codes:
DSERR_NOT_AUTH, DSERR_INVAL, DSERR_BAD_FH, DSERR_NOENT
DS_WRITE - Write an Object.
From the MDS to a DS, for the given filehandle, write the data segments.
Description:
This procedure provides a way for the MDS to write file data segments to the DS. Most likely due to the NFS Client performing writes through the MDS, due to a network partition between the NFS Client and DS, or the MDS has recalled/revoked the layout and the client was unable to write/commit data to the DS.
The filehandle used in the arguments MUST be obtained from the layout allocation.
Arguments:
struct ds_filesegbuf {
offset4 offset;
opaque data<>;
};
struct DS_WRITEargs
{
nfs_fh4 fh;
stable_how4 stable;
count4 count;
ds_filesegbuf wrv<>;
};
Results:
struct DS_WRITEresok
{
stable_how4 committed;
ds_verifier writeverf;
count4 wrv<>;
};
union DS_WRITEres switch (ds_status status)
{
case DS_OK:
DS_WRITEresok res_ok;
default:
void;
};
Possible Error Codes:
DSERR_NOT_AUTH, DSERR_INVAL, DSERR_BAD_FH
DS_STAT - Inquire on status of a data server object
From MDS to data sever to return information on the requested object.
Arguments:
struct DS_STATargs
{
nfs_fh4 object;
};
Results:
union DS_STATres switch (ds_status status) { case DS_OK: ds_attr dattr; default: void; };
DS_GETATTR - Get attributes for a data server object
Issued by a MDS to a DS in order to query the attributes of a DS object.
Description:
This may be used when a MDS wants to verify the attributes (for example, size) provided by a client upon LAYOUTCOMMIT.
Arguments:
struct DS_GETATTRargs
{
nfs_fh4 fh;
ds_attr dattrs;
};
Results:
union DS_GETATTRres switch (ds_status status) { case DS_OK: ds_attr dattrs; default: void; };
Possible Error Codes:
DSERR_INVAL, DSERR_ATTR_NOTSUPP DSERR_NOENT
DS_SETATTR - Set attributes for a data server object
Issued by a MDS to a DS to set attributes of a DS object.
Description:
This procedure allows a MDS to set attributes of the specified DS object. When an NFS Client truncates the file size down at the MDS, the MDS MUST issue a DS_SETATTR size to all the effected DSs. The DS_SETATTR messages can occur in parallel, however the MDS must not reply to the NFS Client until all the DS_SETATTRs have completed.
Arguments:
struct DS_SETATTRargs
{
nfs_fh4 fh;
ds_attr dattrs;
};
Results:
struct DS_SETATTRres
{
ds_status status;
};
Possible Error Codes:
DSERR_INVAL, DS_ATTR_NOTSUPP, DSERR_NOENT
DS_SNAP - Snapshot a dataset on the DS that is related to the given MDS Dataset ID.
Arguments:
struct DS_SNAPargs
{
mds_dataset_id dataset_id;
};
Results:
struct DS_SNAPresok
{
utf8string name;
};
union DS_SNAPres switch (ds_status status)
{
case DS_OK:
DS_SNAPresok resok;
default:
void;
};
Possible Error Codes:
DSERR_NOTSUPP, DSERR_INVAL
DS_INVALIDATE - Invalidate state.
Issued by the MDS to a DS to invalidate state that the DS may hold.
Description:
The MDS may wish to invalidate state at the DS due to many factors. The DS_INVALIDATE message carries a type to indicate the scope of the invalidation, and the DS uses the type to interpret the associated parameter.
Examples of when the different invalidate types will be used are as follows:
- DS_INVALIDATE_ALL - Used on nfs/server service offlined on the MDS
- DS_INVALIDATE_CLIENTID - Used on client lease (with the MDS) expiration
- DS_INVALIDATE_LAYOUT_BY_CLIENT - Used on LAYOUTRETURN when invalidating all layouts held by a client
- DS_INVALIDATE_LAYOUT_BY_FH - Used on REMOVE issued from client to MDS
- DS_INVALIDATE_LAYOUT_BY_STATEID - Used on LAYOUTRETURN when invalidating a specific layout held by a client
- DS_INVALIDATE_MDS_DATASET_ID - Used on unshare of MDS FS
- DS_INVALIDATE_STATEID - Used on CLOSE, UNLOCK, DELEGRETURN issued from client to MDS
Arguments:
enum ds_invalidate_type { DS_INVALIDATE_ALL, DS_INVALIDATE_CLIENTID, DS_INVALIDATE_LAYOUT_BY_CLIENT, DS_INVALIDATE_LAYOUT_BY_FH, DS_INVALIDATE_LAYOUT_BY_STATEID, DS_INVALIDATE_MDS_DATASET_ID, DS_INVALIDATE_STATEID }; struct ds_inval_layout_by_clid { clientid4 mds_clid; nfs_fh4 fh; }; struct ds_inval_stateid_lo_stateid { stateid4 stateid; /* MUST be layout stateid */ nfs_fh4 fh; }; struct ds_inval_state_id { stateid4 stateid; /* MUST be open, lock or delegation stateid */ nfs_fh4 fh; }; union DS_INVALIDATEargs switch (ds_invalidate_type obj) { case DS_INVALIDATE_ALL: void; case DS_INVALIDATE_CLIENTID: clientid4 clid; case DS_INVALIDATE_LAYOUT_BY_CLIENT: ds_inval_layout_by_clid layout; case DS_INVALIDATE_LAYOUT_BY_FH: nfs_fh4 fh; case DS_INVALIDATE_LAYOUT_BY_STATEID: ds_inval_layout_by_lo_stateid stateid; case DS_INVALIDATE_MDS_DATASET_ID: mds_dataset_id dataset_id; case DS_INVALIDATE_STATEID: ds_inval_stateid stateid; }; struct DS_INVALIDATEres { ds_status status; };
Results:
struct DS_INVALIDATEres
{
ds_status status;
};
Possible Error Codes:
DSERR_NOT_AUTH, DSERR_INVAL, DSERR_NOENT
Notes:
DS_INVALIDATE_ALL
Invalidates all state at the data server. This is perhaps sent when the MDS knows it is in the process of an orderly shutdown.
DS_INVALIDATE_LAYOUT
To invalidate all layout segments held at the DS for this particular filehandle clientid combination. The MDS could use this message when a client returns the layout via OP_LAYOUTRETURN, or when the MDS has revoked the layout from the client.
DS_INVALIDATE_MDS_DATASET_ID
To invalidate all state for a MDS dataset id. The MDS would use this message for the un-share (or re-share) of a filesystem.
DS_INVALIDATE_STATEID
To invalidate state for the specified stateid. The MDS would use this when the client has issued either OP_CLOSE or OP_DELEGRETURN. The MDS could also use this message when it has revoked a delegation.
DS_INVALIDATE_CLIENTID
To invalidate all state for a particular clientid. The MDS would use this message once the client has lost it lease with the MDS, or the client has issues an explicit OP_DESTROY_CLIENTID.
DS_PNFSSTAT - Collect kstats
This is a message from the MDS to DS to ask DS to report all pnfs/rpc kstats
The indices will be reported as nvpairs. The name will possibly be constructed from the corresponding kstat <class><name> <statistic>_< instance>
Arguments:
/* RPC kstats */ const DS_NFSSTAT_RPC = 0x000000001; /* NFS kstats */ const DS_NFSSTAT_NFS = 0x000000002; /* the DMOV protocol kstats */ const DS_NFSSTAT_DMOV = 0x000000004; /* the control protocol kstats */ const DS_NFSSTAT_CP = 0x000000008; /* CPU kstat (all stats for module cpu)*/ const DS_NFSSTAT_CPU = 0x000000010; struct DS_PNFSSTATargs { uint64_t stat_wanted; };
Results:
struct DS_PNFSSTATresok
{
opaque nvlist<>;
};
union DS_PNFSSTATres switch (ds_status status)
{
case DS_OK:
DS_PNFSSTATresok res_ok;
default:
void;
};
Possible Error Codes:
DSERR_INVAL, DSERR_NOTSUPP
DS_CHANGE_DATASET_ID - Notify the data server that a change to the dataset ID on the MDS has been made (e.g. as a result of a zfs send/recv of a pNFS file system on the MDS).
Arguments:
struct DS_CHANGE_DATASET_IDargs
{
mds_sid mds_sid_owner;
mds_dataset_id old_dataset_id;
mds_dataset_id new_dataset_id;
};
Results:
struct DS_CHANGE_DATASET_IDres
{
ds_status status;
};
DS_OBJ_MOVE_STATUS - To inquire on the status of a move
Since MDS supplies a task id for the data server move, it can inquire as to the status of the move.
Arguments:
struct DS_OBJ_MOVE_STATUSargs
{
uint64_t taskid;
};
Results:
struct DS_OBJ_MOVE_STATUSresok
{
uint64_t maxoffset;
bool complete;
};
union DS_OBJ_MOVEres switch (ds_status status) {
case DS_OK:
DS_OBJ_MOVEresok res_ok;
default:
void;
};
Possible Error Codes:
DSERR_INVAL, DSERR_NOTSUPP
DS_OBJ_MOVE_ABORT - To stop an in-progress move.
For the given taskid, abort, abort, abort
Arguments:
struct DS_OBJ_MOVE_ABORTargs
{
uint64_t taskid;
};
Results:
struct DS_OBJ_MOVE_ABORTres
{
ds_status status;
};
Possible Error Codes:
DSERR_INVAL, DSERR_NOTSUPP, DSERR_NOMATCHING_TASKID
DS_OBJ_MOVE - To initiate a data server to data server move
From the MDS to data server, a message to instruct a data server to move the files data blocks for a given FH.
The MDS supplies source and destination FH and target data server and taskid.
Arguments:
struct DS_OBJ_MOVEargs
{
uint64_t taskid;
nfs_fh4 source;
nfs_fh4 target;
netaddr4 targetserver;
};
Results:
struct DS_OBJ_MOVEres
{
ds_status status;
};
Possible Error Codes:
DSERR_INVAL, DSERR_NOTSUPP
Object move/relocation: (PNFSCTLMV / 1040002)
The PNFSCTLMV protocol is exclusively for moving data from one data server to another. This would occur as the result of the MDS making a request (DS_OBJ_MOVE) via the PNFSCTLMDS protocol.
MOVE - Move data from one data server to another.
Arguments:
struct MOVEargs {
uint64_t taskid;
uint64_t minoffset;
uint64_t maxoffset;
uint32_t maxbytes;
};
Results:
struct MOVEsegment {
uint64_t fileoffset;
uint32_t len;
};
struct MOVEres {
ds_status status;
opaque data<>;
struct MOVEsegment segments<>;
};
Possible Error Codes:
DSERR_NOENT
Notes:
The MOVE message is roughly equivalent to an NFS read, but with other properties. It is specially designed to deal with objects with holes.
MOVE is pull-based, rather than push-based. That is, the target data server contacts the source data server to ask it for data. NFS/RDMA has shown better performance with READ than WRITE. Also, pull-based avoids the problem of the stability of the data on the target server – there will be no need for write verifiers and the like.
The disadvantage with pull-based data movement is that the source data server is the one that is aware of any holes in the source object. The dmov protocol reflects this. The result of a MOVE operation is a block of data, and an array of offsets and lengths. The arguments give a minimum and maximum file offset, which enables multiple MOVE operations to be in flight – the target server can arrange for non-overlapping MOVEs.
Error codes (ds_status)
The control protocol carries a status word in the reply to convey successful execution of the request or to report a problem. The status word uses an enumeration 'ds_status' and currently has the following assigned values.
- DS_OK
The request completed successfully.
- DSERR_NOENT
The item does not exist.
- DSERR_NOT_AUTH
The DS has been blocked from accessing the MDS through system administration.
- DSERR_INVAL
One or more of the arguments provided in the RPC Call are invalid.
- DSERR_IO
The server encountered a I/O error with the underlying data store.
- DSERR_ACCESS
During the DS_CHECKSTATE validation it was determined that the client does not have the appropriate access rights.
- DSERR_NOSPC
The MDS issued a DS_WRITE to a DS and the DS has no space left.
- DSERR_STALE_FH
The provided filehandle is stale.
- DSERR_BADHANDLE
The format of the filehandle was not recognized.
- DSERR_BAD_COOKIE
The MDS presented an invalid cookie to the DS in the DS_LIST message
- DSERR_NOTSUPP
The control protocol message you issued has yet to be implemented.
- DSERR_TOOSMALL
The DS_LIST results will not fit into the provided size.
- DSERR_EXPIRED
- DSERR_GRACE
The MDS is in the process of allowing clients to reclaim state after the MDS restarted
- DSERR_FHEXPIRED
The generation count within the filehandle has been incremented.
- DSERR_WRONGSEC
The security flavors for ds_checkstate do no match the required security flavor for the target object.
- DSERR_RESOURCE
The MDS has no available resource to complete the requested operation.
- DSERR_STALE_DSID
The ds_id used is not known to the MDS.
- DSERR_STALE_STATEID
- DSERR_SERVERFAULT
The data server encountered an unrecoverable error.
- DSERR_OLD_STATEID
- DSERR_BAD_MDSSID
The MDS was unable to provide a mapping for the requested MDS_SID.
- DSERR_PNFS_NO_LAYOUT
The client, identified by the co_owner member of the CHECKSTATE4args, that contacted the DS for I/O does not hold a layout.
- DSERR_NOMATCHING_TASKID
Returned from DS_MOVE_OBJ_STATUS and DS_MOVE_OBJ_ABORT if the taskid given is not recognized by the DS.
- DSERR_ATTR_NOTSUPP
Attribute is not supported.