if you are not in control of the NIC hardware, then your task is immensely harder and is probably not worth the effort.
Also, for this project, RDMA is an irrelevant feature. RDMA is a technology to accelerate the transfer of data from the NIC into RAM. But you are primarily concerned with lossless transfer over the fabric - or emulated lossless transfer over TCP over the fabric. And it is likely that whatever mechanism that the NIC driver uses, your throughput will be much lower - and if it relies on TCP, and the hardware does not actually implement lossless ethernet, then it is likely to be much lower
your question is whether you can or should use the in box TCP stack. If you have specific hardware that can implement the stack on your device, that would be the standard approach. If not, then whatever you implement will suck and there isn’t much you can do about it
Completely agree with all your points. But, the problem at hand is having something working (acknowledging lower throughput) vs not having at anything all. Especially when Linux having a standard tool which does seamlessly for any NIC.
I have understood that it is possible to implement using WSK and I’ll summarize my study. I have noted that it is not easy and immensely harder.
Thank you very much again for your insight.
My opinion this should ideally come from MSFT if it is worth having it like MS-iSCSI Initiator but I am talking without complete knowledge here.
@Yatindra_Vaishnav Great to hear about your product. However, I am about to start on it before been pulled off to another priority project. However, I would start soon. I was planned to use WSK for the underlying TCP communication and looks like it is the choice. I understand if you cannot share your work but any pointers which helps if can be shareable would be of great help.
As I promised please look at my github page for RDMA and TCP kernel socket library and NVMe-oF driver. Which I’m building as open source to help community (to the comment by @“Peter_Viscarola_(OSR)” ). Though it is not yet completed but I would want to share so that people can start observing the overall development.
@Yatindra_Vaishnav said:
As I promised please look at my github page for RDMA and TCP kernel socket library and NVMe-oF driver. Which I’m building as open source to help community (to the comment by @“Peter_Viscarola_(OSR)” ). Though it is not yet completed but I would want to share so that people can start observing the overall development.
Thank you so much for publishing NVMe-oF. May I ask a question about the driver compiling. I use VS2022 but I got some compiling errors. Could you guide me if there is any additional library I’m missing. It seems some definitions could not be found. Thanks again.
Build started...
1>------ Build started: Project: NVMeoF, Configuration: Debug x64 ------
1>Building 'NVMeoF' with toolset 'WindowsKernelModeDriver10.0' and the 'Universal' target platform.
1>Stamping x64\Debug\NVMeoF.inf
1>Stamping [Version] section with DriverVer=08/04/2023,10.41.31.774
1>fab_rdma.c
. . . . . .
. . . . . .
1>Z:\OneDrive\CZ_Dev\NVMe-oF-WinDrv\workplace\fab_rdma.c(1074,1): error C2084: function 'NTSTATUS NdkCreateRdmaSocket(IF_INDEX,PNDK_COMPLETION_CBS,PNDK_SOCKET *)' already has a body
1>Z:\OneDrive\CZ_Dev\NVMe-oF-WinDrv\workplace\fab_rdma.c(1006,1): message : see previous definition of 'NdkCreateRdmaSocket'
1>Z:\OneDrive\CZ_Dev\NVMe-oF-WinDrv\workplace\fab_rdma.c(1590,2): error C2065: 'NDK_WREQ_SQ_GET_SEND_RESULTS': undeclared identifier
1>Z:\OneDrive\CZ_Dev\NVMe-oF-WinDrv\workplace\fab_rdma.c(1695,2): error C2051: case expression not constant
1>fab_tcp.c
1>nvmefrdma.c
1>Z:\OneDrive\CZ_Dev\NVMe-oF-WinDrv\workplace\nvmef.h(106,2): error C2061: syntax error: identifier 'PNVMEOF_TCP_WORK_REQUEST'
1>Z:\OneDrive\CZ_Dev\NVMe-oF-WinDrv\workplace\nvmef.h(107,1): error C2059: syntax error: '}'
. . . . . .
. . . . . .
1>Z:\OneDrive\CZ_Dev\NVMe-oF-WinDrv\workplace\nvmefrdma.c(49,56): error C2146: syntax error: missing ')' before identifier 'psDispatcher'
1>Z:\OneDrive\CZ_Dev\NVMe-oF-WinDrv\workplace\nvmefrdma.c(49,56): error C2081: 'PNDK_CALLBACKS': name in formal parameter list illegal
. . . . . .
1>Z:\OneDrive\CZ_Dev\NVMe-oF-WinDrv\workplace\nvmefrdma.c(209,63): error C2065: 'NVMEOF_RESPONSE_STATE_SUBMITTED': undeclared identifier
1>Z:\OneDrive\CZ_Dev\NVMe-oF-WinDrv\workplace\nvmefrdma.c(210,58): error C2065: 'NVMEOF_RESPONSE_STATE_FREE': undeclared identifier
1>Z:\OneDrive\CZ_Dev\NVMe-oF-WinDrv\workplace\nvmefrdma.c(210,89): error C2065: 'NVMEOF_RESPONSE_STATE_FREE': undeclared identifier
. . . . . .
. . . . . .
1>Z:\OneDrive\CZ_Dev\NVMe-oF-WinDrv\workplace\nvmefrdma.c(653,126): fatal error C1003: error count exceeds 100; stopping compilation
1>nvmeftcp.c
. . . . . .
. . . . . .
1>Generating Code...
1>Done building project "NVMeoF.vcxproj" -- FAILED.
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
========== Build started at 10:41 AM and took 04.276 seconds ==========
Tim, I stopped working on this as I didn’t see any interest for this in the NTDEV. I can start if some really good interests are there. As I’ve already implemented this in my earlier company.
I for one would be excited if someone did something about NVMeoF for Windows. I tested the IBM / Emulex drivers about 6 months ago and they were tangibly worse than the SCSI versions of the same - despite the obvious potential for performance improvement offered by MVMeoF.
But unless I can support it out of my own dev team, of it comes with support from Microsoft or another major vendor, it is unlikely that I can get something like this approved for production. It would have to be a very compelling performance improvement, and a lower risk like source license.
I’ve already done both NVMe over TCP and NVMe over RDMA in the separate drivers. That was the reason I need to bring both fabrics together in this source. And bot drivers were giving almost Line rate with TCP and RDMA over RoCE and IB.