Glusterfs之nfs模块源码分析（下）之NFS协议之R

我的新浪微博： http://weibo.com/freshairbrucewoo 。

欢迎大家相互交流，共同提高技术。

六、 NFS 协议之 RPC 的实现

因为 nfs 服务器启动时的端口是不确定的，所以 nfs 服务器将自己的端口注册到 rpc 服务，客户端通过 rpc 请求知道 nfs 服务器的监听端口。下面就分析整个 rpc 的处理过程。现在假设客户端有一个 rpc 请求达到服务器端了，通过上面 nfs 协议初始化的分析知道：所有的数据读写事件都是在函数 nfs_rpcsvc_conn_data_handler 中处理，因为是客户端发送来的请求数据，所以执行的是 epoll_in 事件处理相关代码，这些事件的处理都是在函数 nfs_rpcsvc_conn_data_poll_in 中，这个函数实现如下：

      
         1
      
      
        int
      
       nfs_rpcsvc_conn_data_poll_in (rpcsvc_conn_t *
      
        conn)


      
      
         2
      
      
         3
      
      
        {


      
      
         4
      
      
         5
      
       ssize_t         dataread = -
      
        1
      
      
        ;


      
      
         6
      
      
         7
      
          size_t          readsize = 
      
        0
      
      
        ;


      
      
         8
      
      
         9
      
      
        char
      
                  *readaddr =
      
         NULL;


      
      
        10
      
      
        11
      
      
        int
      
                   ret = -
      
        1
      
      
        ;


      
      
        12
      
      
        13
      
       readaddr = nfs_rpcsvc_record_read_addr (&conn->rstate);
      
        //
      
      
        rpc服务记录开始读取数据的地址
      
      
        14
      
      
        15
      
       readsize = nfs_rpcsvc_record_read_size (&conn->rstate);
      
        //
      
      
        rpc服务记录数据需要读取的长度
      
      
        16
      
      
        17
      
          dataread = nfs_rpcsvc_socket_read (conn->sockfd, readaddr, readsize);
      
        //
      
      
        从socket中读出记录数据
      
      
        18
      
      
        19
      
      
        if
      
       (dataread > 
      
        0
      
      
        )


      
      
        20
      
      
        21
      
              ret = nfs_rpcsvc_record_update_state (conn, dataread);
      
        //
      
      
        根据读取的数据处理
      
      
        22
      
      
        23
      
      
        return
      
      
         ret;


      
      
        24
      
      
        25
      
       }

上面代码首先会根据 rpc 服务记录中的接收数据类型来判断接收什么数据，主要是分为头部消息和正式的 rpc 消息，正式的 rpc 消息的长度是通过头部消息中给出的，所以接收消息的步骤一般是先头部消息，然后正式的 rpc 调用消息，否则就是视为错误的消息，然后根据消息的长度从 socket 中读出消息到 rpc 服务记录的结构体的成员变量中，最后交给函数 nfs_rpcsvc_record_update_state 处理，它根据读取的数据来处理整个 rpc 的过程，包括 xdr （外部数据表示）和根据消息获取调用的函数并且执行函数，具体实现如下：

      
         1
      
      
        int
      
       nfs_rpcsvc_record_update_state (rpcsvc_conn_t *
      
        conn, ssize_t dataread)


      
      
         2
      
      
         3
      
      
        {


      
      
         4
      
      
         5
      
       rpcsvc_record_state_t   *rs =
      
         NULL;


      
      
         6
      
      
         7
      
          rpcsvc_t                *svc =
      
         NULL;


      
      
         8
      
      
         9
      
       rs = &conn->
      
        rstate;


      
      
        10
      
      
        11
      
      
        if
      
       (nfs_rpcsvc_record_readfraghdr(rs))
      
        //
      
      
        根据rpc服务的记录状态是否读取头部消息
      
      
        12
      
      
        13
      
            dataread = nfs_rpcsvc_record_update_fraghdr (rs, dataread);
      
        //
      
      
        读取消息头部
      
      
        14
      
      
        15
      
      
        if
      
       (nfs_rpcsvc_record_readfrag(rs)) {
      
        //
      
      
        是否读取后面的数据
      
      
        16
      
      
        17
      
      
        if
      
       ((dataread > 
      
        0
      
      ) && (nfs_rpcsvc_record_vectored (rs))) {
      
        //
      
      
        是否读取向量片段（
      
      
        18
      
      
        19
      
                dataread = nfs_rpcsvc_handle_vectored_frag (conn, dataread);
      
        //
      
      
        处理向量片段数据
      
      
        20
      
      
        21
      
              } 
      
        else
      
      
        if
      
       (dataread > 
      
        0
      
      
        ) {


      
      
        22
      
      
        23
      
              dataread = nfs_rpcsvc_record_update_frag (rs, dataread);
      
        //
      
      
        更新rpc服务记录的片段数据
      
      
        24
      
      
        25
      
      
          }


      
      
        26
      
      
        27
      
      
         }


      
      
        28
      
      
        29
      
      
        if
      
       ((nfs_rpcsvc_record_readfraghdr(rs)) && (rs->islastfrag)) {
      
        //
      
      
        如果下一条消息是头部消息且是最后一帧
      
      
        30
      
      
        31
      
            nfs_rpcsvc_handle_rpc_call (conn);
      
        //
      
      
        处理rpc调用
      
      
        32
      
      
        33
      
              svc = nfs_rpcsvc_conn_rpcsvc (conn);
      
        //
      
      
        链接对象引用加1
      
      
        34
      
      
        35
      
             nfs_rpcsvc_record_init (rs, svc->ctx->iobuf_pool);
      
        //
      
      
        重新初始化rpc服务记录的状态信息
      
      
        36
      
      
        37
      
      
         }


      
      
        38
      
      
        39
      
      
        return
      
      
        0
      
      
        ;


      
      
        40
      
      
        41
      
       }

整个函数首先读取协议信息的头部消息，读取完头部信息以后更新 rpc 服务记录状态，然后根据更新的状态继续读取头部信息后面的消息，后面的消息分为两种情况来读取，一般第一次来的是一个头部消息，这个消息中记录了下一次需要读取的消息的长度，也就是正式的 rpc 调用信息的长度。所以当第二次消息响应来的时候就是正式消息，根据不同的消息有不同的处理方式。头部消息处理方式主要是为接收正式的消息做一些初始化和准备工作（例如数据的长度和类型等）。如果头部消息则不会执行处理 rpc 的调用函数，因为它必须要接收到 rpc 调用消息以后才能处理。下面继续分析处理 rpc 调用的函数 nfs_rpcsvc_handle_rpc_call ，因为它是处理整个 rpc 调用的核心，它的实现如下：

      
         1
      
      
        int
      
       nfs_rpcsvc_handle_rpc_call (rpcsvc_conn_t *
      
        conn)


      
      
         2
      
      
         3
      
      
        {


      
      
         4
      
      
         5
      
        rpcsvc_actor_t          *actor =
      
         NULL;


      
      
         6
      
      
         7
      
          rpcsvc_request_t        *req =
      
         NULL;


      
      
         8
      
      
         9
      
      
        int
      
                           ret = -
      
        1
      
      
        ;


      
      
        10
      
      
        11
      
       req = nfs_rpcsvc_request_create (conn);
      
        //
      
      
        动态创建一个rpc服务请求对象（结构体）
      
      
        12
      
      
        13
      
      
        if
      
       (!nfs_rpcsvc_request_accepted (req))
      
        //
      
      
        是否接受rpc服务请求
      
      
        14
      
      
        15
      
      
                        ;


      
      
        16
      
      
        17
      
       actor = nfs_rpcsvc_program_actor (req);
      
        //
      
      
        得到rpc服务调用过程的描述对象
      
      
        18
      
      
        19
      
      
        if
      
       ((actor) && (actor->
      
        actor)) {


      
      
        20
      
      
        21
      
            THIS = nfs_rpcsvc_request_actorxl (req);
      
        //
      
      
        得到请求的xlator链表
      
      
        22
      
      
        23
      
              nfs_rpcsvc_conn_ref (conn);
      
        //
      
      
        链接状态对象的引用加1
      
      
        24
      
      
        25
      
              ret = actor->actor (req);
      
        //
      
      
        执行函数调用
      
      
        26
      
      
        27
      
      
          }


      
      
        28
      
      
        29
      
      
        return
      
      
         ret;


      
      
        30
      
      
        31
      
       }

这个函数首先根据链接状态对象创建一个 rpc 服务请求的对象，然后根据 rpc 服务请求对象得到一个 rpc 服务调用过程的描述对象，最后就根据这个描述对象执行具体的某一个 rpc 远程调用请求。下面在看看怎样根据连接状态对象创建 rpc 服务请求对象的， nfs_rpcsvc_request_create 函数实现如下：

      
         1
      
       rpcsvc_request_t * nfs_rpcsvc_request_create (rpcsvc_conn_t *
      
        conn)


      
      
         2
      
      
         3
      
      
        {


      
      
         4
      
      
         5
      
      
        char
      
                          *msgbuf =
      
         NULL;


      
      
         6
      
      
         7
      
      
        struct
      
      
         rpc_msg          rpcmsg;


      
      
         8
      
      
         9
      
      
        struct
      
       iovec            progmsg;        
      
        /*
      
      
         RPC Program payload 
      
      
        */
      
      
        10
      
      
        11
      
          rpcsvc_request_t        *req =
      
         NULL;


      
      
        12
      
      
        13
      
      
        int
      
                           ret = -
      
        1
      
      
        ;


      
      
        14
      
      
        15
      
          rpcsvc_program_t        *program =
      
         NULL;


      
      
        16
      
      
        17
      
        nfs_rpcsvc_alloc_request (conn, req);
      
        //
      
      
        从内存池中得到一个权限请求对象并且初始化为0
      
      
        18
      
      
        19
      
       msgbuf = iobuf_ptr (conn->rstate.activeiob);
      
        //
      
      
        从激活的IO缓存得到一个用于消息存放的缓存空间


      
      
        20
      
      
        21
      
      
        //
      
      
        从xdr数据格式转换到rpc数据格式
      
      
        22
      
      
        23
      
         ret = nfs_xdr_to_rpc_call (msgbuf, conn->rstate.recordsize, &
      
        rpcmsg,


      
      
        24
      
      
        25
      
                                          &progmsg, req->cred.authdata, req->
      
        verf.authdata);


      
      
        26
      
      
        27
      
         nfs_rpcsvc_request_init (conn, &rpcmsg, progmsg, req);
      
        //
      
      
        根据上面转换的消息初始化rpc服务请求对象
      
      
        28
      
      
        29
      
      
        if
      
       (nfs_rpc_call_rpcvers (&rpcmsg) != 
      
        2
      
      ) {
      
        //
      
      
        rpc协议版本是否支持
      
      
        30
      
      
        31
      
      
             ;


      
      
        32
      
      
        33
      
      
          }


      
      
        34
      
      
        35
      
       ret = __nfs_rpcsvc_program_actor (req, &program);
      
        //
      
      
        根据程序版本号得到正确的rpc请求描述对象
      
      
        36
      
      
        37
      
         req->program =
      
         program;


      
      
        38
      
      
        39
      
         ret = nfs_rpcsvc_authenticate (req);
      
        //
      
      
        执行权限验证函数调用验证权限
      
      
        40
      
      
        41
      
      
        if
      
       (ret == RPCSVC_AUTH_REJECT) {
      
        //
      
      
        是否被权限拒绝
      
      
        42
      
      
        43
      
      
            ;


      
      
        44
      
      
        45
      
      
           }


      
      
        46
      
      
        47
      
      
        return
      
      
         req;


      
      
        48
      
      
        49
      
       }

通过上面的函数调用就得到了一个正确版本的 rpc 服务远程调用程序的描述对象，后面会根据这个对象得到对应的远程调用函数的描述对象，这个是通过下面这个函数实现的：

      
         1
      
       rpcsvc_actor_t * nfs_rpcsvc_program_actor (rpcsvc_request_t *
      
        req)


      
      
         2
      
      
         3
      
      
        {


      
      
         4
      
      
         5
      
      
        int
      
                           err =
      
         SYSTEM_ERR;


      
      
         6
      
      
         7
      
          rpcsvc_actor_t          *actor =
      
         NULL;


      
      
         8
      
      
         9
      
          actor = &req->program->actors[req->procnum];
      
        //
      
      
        根据函数id得到正确的函数调用对象
      
      
        10
      
      
        11
      
      
        return
      
      
         actor;


      
      
        12
      
      
        13
      
       }

这里得到的函数调用对象就会返回给调用程序，调用程序就会具体执行远程过程调用了。到此一个完整的 rpc 调用以及一个 nfs 服务就完成了， nfs 服务器就等待下一个请求，整个过程可谓一波三折，整个过程绕了很大一个圈。下面通过一个图来完整描述整个过程：

Glusterfs之nfs模块源码分析（下）之NFS协议之RPC的实现和NFS协议内容

附件 1 NFS Protocol Family

NFS Protocol Family

The NFS protocol suite includes the following protocols:
MNTV1	Mount protocol version 1, for NFS version 2
Mntv3	Mount protocol version 3, for NFS version 3
NFS2	Sun Network File system version 2
NFS3	Sun Network File system version 3
NFSv4	Sun Network File system version 4
NLMv4	Network Lock Manager version 4
NSMv1	Network Status Monitor protocol

MNTV1 ： ftp://ftp.rfc-editor.org/in-notes/rfc1094.txt .
    The Mount protocol version 1 for NFS version 2 (MNTv1) is separate from, but related to, the NFS protocol. It provides operating system specific services to get the NFS off the ground -- looks up server path names, validates user identity, and checks access permissions. Clients use the Mount protocol to get the first file handle, which allows them entry into a remote filesystem.
The Mount protocol is kept separate from the NFS protocol to make it easy to plug in new access checking and validation methods without changing the NFS server protocol.
    Notice that the protocol definition implies stateful servers because the server maintains a list of client's mount requests. The Mount list information is not critical for the correct functioning of either the client or the server. It is intended for advisory use only, for example, to warn possible clients when a server is going down.
    Version one of the Mount protocol is used with version two of the NFS protocol. The only information communicated between these two protocols is the "fhandle" structure. The header structure is as follows:

8	7	6	5	4	3	2	1	Octets
Directory Path Length								1
								2
								3
								4
Directory Path Name								5-N

Directory Path Length ： The directory path length.
Directory Path Name ： The directory path name.

Mntv3 ： ftp://ftp.rfc-editor.org/in-notes/rfc1813.txt .
    The supporting Mount protocol version 3 for NFS version 3 protocol performs the operating system-specific functions that allow clients to attach remote directory trees to a point within the local file system. The Mount process also allows the server to grant remote access privileges to a restricted set of clients via export control.
    The Lock Manager provides support for file locking when used in the NFS environment. The Network Lock Manager (NLM) protocol isolates the inherently stateful aspects of file locking into a separate protocol. A complete description of the above protocols and their implementation is to be found in [X/OpenNFS].
    The normative text is the description of the RPC procedures and arguments and results, which defines the over-the-wire protocol, and the semantics of those procedures. The material describing implementation practice aids the understanding of the protocol specification and describes some possible implementation issues and solutions. It is not possible to describe all implementations and the UNIX operating system implementation of the NFS version 3 protocol is most often used to provide examples. The structure of the protocol is as follows.

8	7	6	5	4	3	2	1	Octets
Directory Path Length								1
								2
								3
								4
Directory Path Name								5-N

Directory path length ： The directory path length.
Directory Path Name ： The directory path name

NFS2 ： ftp://ftp.rfc-editor.org/in-notes/rfc1094.txt .
The Sun Network File system (NFS version 2) protocol provides transparent remote access to shared files across networks. The NFS protocol is designed to be portable across different machines, operating systems, network architectures, and transport protocols. This portability is achieved through the use of Remote Procedure Call (RPC) primitives built on top of an eXternal Data Representation (XDR). Implementations already exist for a variety of machines, from personal computers to supercomputers.
The supporting Mount protocol allows the server to hand out remote access privileges to a restricted set of clients. It performs the operating system-specific functions that allow, for example, to attach remote directory trees to some local file systems. The protocol header is as follows:

8	7	6	5	4	3	2	1	Octets
File info/Directory info								1
.								.
.								.
.								N

File info/Directory info ： The File info or directory info.

NFS3 ： ftp://ftp.rfc-editor.org/in-notes/rfc1813.txt .
Version 3 of the NFS protocol addresses new requirements, for instance; the need to support larger files and file systems has prompted extensions to allow 64 bit file sizes and offsets. The revision enhances security by adding support for an access check to be done on the server. Performance modifications are of three types:

1 The number of over-the-wire packets for a given set of file operations is reduced by returning file attributes on every operation, thus decreasing the number of calls to get modified attributes.

2 The write throughput bottleneck caused by the synchronous definition of write in the NFS version 2 protocol has been addressed by adding support so that the NFS server can do unsafe writes. Unsafe writes are writes which have not been committed to stable storage before the operation returns.

3 Limitations on transfer sizes have been relaxed.

The ability to support multiple versions of a protocol in RPC will allow implementors of the NFS version 3 protocol to define clients and servers that provide backward compatibility with the existing installed base of NFS version 2 protocol implementations.
The extensions described here represent an evolution of the existing NFS protocol and most of the design features of the NFS protocol previsouly persist. The protocol header structure is as follows:

8	7	6	5	4	3	2	1	Octets
Object info/ File info/ Directory info Length								1
								2
								3
								4
Object info/ File info/ Directory info Name								5-N

Object info/ File info/ Directory info Length ： The information length in octets
Object info/ File info/ Directory info Name ： The information value (string).

NFSv4 ： ftp://ftp.rfc-editor.org/in-notes/rfc3010.txt
NFS (Network File System) version 4 is a distributed file system protocol based on NFS protocol versions 2 [RFC1094] and 3 [RFC1813]. Unlike earlier versions, the NFS version 4 protocol supports traditional file access while integrating support for file locking and the mount protocol. In addition, support for strong security (and its negotiation), compound operations, client caching, and internationalization have been added. Attention has also been applied to making NFS version 4 operate well in an Internet environment.
The goals of the NFS version 4 revision are as follows:

· Improved access and good performance on the Internet.

· Strong security with negotiation built into the protocol.

· Good cross-platform interoperability.

· Designed for protocol extensions.

    The general file system model used for the NFS version 4 protocol is the same as previous versions. The server file system is hierarchical with the regular files contained within being treated as opaque byte streams. In a slight departure, file and directory names are encoded with UTF-8 to deal with the basics of internationalization.
    A separate protocol to provide for the initial mapping between path name and filehandle is no longer required. Instead of using the older MOUNT protocol for this mapping, theserver provides a ROOT filehandle that represents the logical root or top of the file system tree provided by the server.
    The protocol header is as follows:

8	7	6	5	4	3	2	1	Octets
Tag Length								1-4
Tag (depends on Tag length)								5-N
Minor Version								N+1-N+4
Operation Argument								N+5-N+8

Tag Length ： The length in bytes of the tag
Tag ： Defined by the implementor
Minor Version ： Each minor version number will correspond to an RFC. Minor version zero corresponds to NFSv4
Operation Argument ： Operation to be executed by the protocol
Operaton Argument Values ： The operation arg value, can be one of the following:

Value	Name
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38	ACCESS CLOSE COMMIT CREATE DELEGPURGE DELEGRETURN GETATTR GETFH LINK LOCK LOCKT LOCKU LOOKUP LOOKUPP NVERIFY OPEN OPENATTR OPEN_CONFIRM OPEN_DOWNGRADE PUTFH PUTPUBFH PUTROOTFH READ READDIR READLINK REMOVE RENAME RENEW RESTOREFH SAVEFH SECINFO SETATTR SETCLIENTID SETCLIENTID_CONFIRM VERIFY WRITE

NLMv4 ： ftp://ftp.rfc-editor.org/in-notes/rfc1813.txt .
Since the NFS versions 2 and 3 are stateless, an additional Network Lock Manager (NLM) protocol is required to support locking of NFS-mounted files. As a result of the changes in version 3 of the NFS protocol version 4 of the NLM protocol is required.
In this version 4, almost all the names in the NLM version 4 protocol have been changed to include a version number. The procedures in the NLM version 4 protocol are semantically the same as those in the NLM version 3 protocol. The only semantic difference is the addition of a NULL procedure that can be used to test for server responsiveness.
The structure of the NLMv4 heading is as follows:

8	7	6	5	4	3	2	1	Octet
Cookie Length								1
								2
								3
								4
Cookie								5-N

Cookie Length ： The cookie length.
Cookie ： The cookie string itself.

NSMv1 ： http://www.opengroup.org/onlinepubs/009629799/chap11.htm .
    The Network Status Monitor (NSM) protocol is related to, but separate from, the Network Lock Manager (NLM) protocol.The NLM uses the NSM (Network Status Monitor Protocol V1) to enable it to recover from crashes of either the client or server host. To do this, the NSM and NLM protocols on both the client and server hosts must cooperate.
    The NSM is a service that provides applications with information on the status of network hosts. Each NSM keeps track of its own "state" and notifies any interested party of a change in this state to any other NSM upon request. The state is merely a number which increases monotonically each time the state of the host changes; an even number indicates the host is down, while an odd number indicates the host is up.
    Applications register the network hosts they are interested in with the local NSM. If one of these hosts crashes, the NSM on the crashed host, after a reboot, will notify the NSM on the local host that the state changed. The local NSM can then, in turn, notify the interested application of this state change.
    The NSM is used heavily by the Network Lock Manager (NLM). The local NLM registers with the local NSM all server hosts on which the NLM has currently active locks. In parallel, the NLM on the remote (server) host registers all of its client hosts with its local NSM. If the server host crashes and reboots, the server NSM will inform the NSM on the client hosts of this event. The local NLM can then take steps to re-establish the locks when the server is rebooted. Low-end systems that do not run an NSM, due to memory or speed constraints, are restricted to using non-monitored locks.
The structure of the protocol is as follows:

8	7	6	5	4	3	2	1	Octet
Name Length								1
								2
								3
								4
Mon Name /Host Name								5-N

Name Length ： The mon name or host name length.
Mon Name ： The name of the host to be monitored by the NSM.
Host Name ： The host name.

Glusterfs之nfs模块源码分析（下）之NFS协议之RPC的实现和NFS协议内容

更多文章、技术交流、商务合作、联系博主

微信扫码或搜索：z360901061

微信扫一扫加我为好友

QQ号联系： 360901061

您的支持是博主写作最大的动力，如果您喜欢我的文章，感觉我的文章对您有帮助，请用微信扫描下面二维码支持博主2元、5元、10元、20元等您想捐的金额吧，狠狠点击下面给点支持吧，站长非常感激您！手机微信长按不能支付解决办法：请将微信支付二维码保存到相册，切换到微信，然后点击微信右上角扫一扫功能，选择支付二维码完成支付。

【本文对您有帮助就好】元

2元

5元

10元

20元

自定义