RDMA在HPC和datacenter基本是网络互联的不二之选,
对比传统的TCP/IP stack+socket,RDMA传输的特点简言之就是:
zero-copy、
kernel bypass、cpu offload。
更详细的RDMA/RoCE可以参考:
http://www.roceinitiative.org/blog
RDMA并非在以太网络架构下诞生,它的完美部署需要依赖NIC、switch、layer2 QoS等phy和link的特殊功能以及RoCE、iWRAP等软件协议栈。
不过通过OFED发布的softRoCE,可以在我们传统的以太网上使用RoCE通信。
1. 安装softRoCE
分为kernel和lib两部分
git clone https://github.com/SoftRoCE/rxe-dev.git
git clone https://github.com/SoftRoCE/librxe-dev.git
按照https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home 的wiki在host机器上进行安装,由于RDMA传输需要server和client,需要两台机器
2. 创建 RoCE interface
使用新安装的kernel启动host机器
sudo rxe_cfg start
sudo rxe_cfg add
enp1s0f2
成功后应出现
-
[pole2@localhost ~]$ rxe_cfg
-
Name Link Driver Speed NMTU IPv4_addr RDEV RMTU
-
eno1 no e1000e
-
enp5s0f0 yes igb
-
enp5s0f1 no igb
-
enp5s0f2 yes igb rxe0 1024 (3)
-
enp5s0f3 no igb
-
virbr0 no bridge
若想要删除RoCE设备
sudo rxe_cfg remove enp1s0f2
3. 测试前准备
查看rdma设备
ibv_devices
ibv_devinfo
centos默认防火墙规则是拒绝不明连接的,直接测试会被对端的防火墙reject。
所以必须使得报文可以通过防火墙。
可以flush掉filter表的规则 iptables -F; iptables -t mangle -F
也可以把两个rdma接口的IP地址加入到trusted。假设server为 1.1.1.1,client为1.1.1.2
firewall-cmd --zone=trusted --add-source=1.1.1.1 --permanent
firewall-cmd --zone=trusted --add-source=1.1.1.2 --permanent
4. softRoCE连通性
server:
[pole2@localhost ~]$ rping
-s -a 1.1.1.1 -v -C 10
result:
-
server ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
-
server ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
-
server ping data: rdma-ping-2: CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst
-
server ping data: rdma-ping-3: DEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu
-
server ping data: rdma-ping-4: EFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuv
-
server ping data: rdma-ping-5: FGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvw
-
server ping data: rdma-ping-6: GHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwx
-
server ping data: rdma-ping-7: HIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxy
-
server ping data: rdma-ping-8: IJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz
-
server ping data: rdma-ping-9: JKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyzA
-
server DISCONNECT EVENT...
-
wait for RDMA_READ_ADV state 10
client:
[pole2@localhost ~]$ rping
-c -a 1.1.1.1 -v -C 10
result:
-
ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
-
ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
-
ping data: rdma-ping-2: CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst
-
ping data: rdma-ping-3: DEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu
-
ping data: rdma-ping-4: EFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuv
-
ping data: rdma-ping-5: FGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvw
-
ping data: rdma-ping-6: GHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwx
-
ping data: rdma-ping-7: HIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxy
-
ping data: rdma-ping-8: IJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz
-
ping data: rdma-ping-9: JKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyzA
-
client DISCONNECT EVENT...
去下载一个支持RoCE报文的wireshark,
抓个包可以很清楚地看到RDMA的传输过程
阅读(16591) | 评论(0) | 转发(0) |