Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Home NTDEV

More Info on Driver Writing and Debugging


The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.


Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/


Before Posting...

Please check out the Community Guidelines in the Announcements and Administration Category.

How WinSock works

Maxim_S._ShatskihMaxim_S._Shatskih Member Posts: 10,396
> > So does AFD. So, AFD is still an analog of sockfs in Windows.
>
> Since when can you use ReadFile() and WriteFile() on sockets,

???

Well, OK, looks like you need some education on what is NT's WinSock.

WinSock is the flexible polymorphic implementation of Berkeley sockets API,
which is implemented in both user and kernel mode, and which has the addition
of MS-specific calls which implement the NT's overlapped IO on sockets.

WinSock APIs are in is wsock32.dll/ws2_32.dll, which in turn looks in the
registry and load the proper provider DLL. This allows you to implement your
own address families _fully in user mode_, or as a mix of your own proprietary
user+kernel modules with your own interface between user and kernel part.

Actually, WinSock API calls do the following - "get the provider's function
table by socket handle value" and then "call the provider's function".

This is OK for all Berkeley and WSAxxx calls.

But note that ReadFile and WriteFile are always supported on a socket handle -
in NT, SOCKET is just a kernel file handle. If we are speaking about send() -
then send() is in WinSock, and WinSock is free to implement any semantics on
it. But, if we are speaking about ReadFile, then sorry, this API is not in
WinSock and the standard NT API which knows nothing on sockets, then only way
of customizing ReadFile is to customize its kernel part.

To support ReadFile on user-mode WinSock provider DLLs (user-mode address
families), there is an auxiliary driver called ws2ifsl.sys. To employ this
driver, the provider DLL must call the WinSock's provider interface function
"create IFS handle" or such. This call will create a pair of file handles on
ws2ifsl, will create the necessary thread pool for inverted calls, will
associate the slave handle with the provider's function table and will return
the master handle to the provider DLL. Then the provider DLL returns this
handle from its WSPSocket.

When the app calls Read/WriteFile on this handle, the calls go directly (no
WinSock!) to ws2ifsl.sys in kernel. This module transfers the call to the slave
end of its conceptual "pipe", and the thread pool in ws2_32.dll will consume
the call (yes, inverted call) and execute it by calling some WSPXxx in the
provider DLL.

But this is not the typical scenario of socket address family implementation.
The typical scenario is that the address family package has the kernel part,
which automatically guarantees that the socket handle will be the file handle
on this kernel part. Such packages use "register IFS handle" instead of "create
IFS handle", their WSPSocket path first does CreateFile on their kernel part,
and then "register IFS handle". The second step is needed for functions like
send() to be dispatched to this provider's WSPSend. Read/WriteFile are
automatically delivered to the kernel part.

Now note that many address families have lots of common in them - buffering,
listen backlog, lingering closes to say a few. So, the common layer which
implements all of this was created, and also this same layer serves as default
kernel-mode WinSock provider. This module is called AFD.SYS.

So, if the address family implementor needs a kernel part, then the existing
AFD.SYS can be reused as a framework. To reuse it, one must program to the
lower-edge interface of AFD, which is called TDI.

TDI is much more low-level then socket calls. For instance, the TDI transports
usually (surely this is true on TCPIP) have no buffering at all. So, TDI_SEND
operation is kept pending _till all ACKs will arrive_. The reason is that,
while the ACKs have not arrive yet, there is a possibility that there will be a
need for retransmit. Now note that the transport does no buffering, no data
copies, so, if it would complete the original send request - it will have the
data for the retransmit no more. So, TDI_SEND on unbuffered (the usual way,
TCPIP's too) transport pends till all ACKs will arrive, and retransmits are
handled off the same send request's buffer.

On receive and accept, TDI uses the 2phase interaction - first is
ClientEventReceive/Connect about incoming connection offer or incoming data
portion, second is the TDI_ACCEPT or TDI_RECEIVE completion routine. On accept,
this allows (and requires) the client to create the new TDI endpoint (accept's
target) itself, and then associate it with this particular incoming connection
offer. This allows to implement listen backlog above the transport, and also
allows to extend this "accept to specified pre-created socket" feature to user
mode as overlapped AcceptEx API.

On receive, this allows the client to own the memory buffers for the received
data, no need to allocate them in transport. Also this allows to first receive
the header, access it, determine the data portion size which will follow (like
SMB WRITE transaction), then get this data in nonblocking way with only 1 copy.

Also there are other operation modes in TDI like chained receive etc.

If the provider of some address family (like IrDA) is implemented as
kernel-mode TDI transport - then it automatically reused AFD.SYS layer, which
a) exposes it to user-mode WinSock b) implements buffers/listen
backlog/lingering close.

The second part of this "default WinSock provider" is the user mode provider
DLL, called MSAFD.DLL, which is nearly totally consists of DeviceIoControl
calls to AFD.SYS.

Surely default WinSock provider cannot support protocol-dependent stuff like
socket options other then SOL_SOCKET. To implement them, MSAFD requires
address-family-specific helper DLL, which is - for TCPIP - given as the
WSHSMPLE sample in the DDK.

The differences with Linux:

- Berkeley calls are syscalls in Linux, but are user-mode wrappers around
DeviceIoControl in Windows, MSAFD.DLL is a wrapper (and, if you use only TCPIP,
all WinSock userland is a wrapper), AFD.SYS is the kernel module where they
arrive. Also it was so in SunOS 5.2/Solaris 2.x with their /dev/nit - so, AFD
is the same as "nit" kernel module in SunOS, which is in turn the same as
"sockfs" in Linux.

- in Linux and FreeBSD, kernel-mode clients talking directly to TCP without
sockets - with their own listen backlog and buffering - are not permitted. They
are permitted in Windows. Instead, socket API is absent in kmode in Windows at
all, though implementable as wrapper around TDI.

You cannot use _select()_ in Windows on anything but sockets, since select() in
Windows is not a generic kernel notion, but a socket-only notion, possibly
built around WSAEventSelect. You also cannot use select with 3 NULLs as sleep.

But you can use Read/WriteFile on a socket.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
[email protected]
http://www.storagecraft.com

Comments

  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    >> TDI is much more low-level then socket calls. For instance, the TDI
    >> transports usually (surely this is true on TCPIP) have no buffering
    >> at all. So, TDI_SEND operation is kept pending _till all ACKs will
    >> arrive_. The reason is that, while the ACKs have not arrive yet,
    >> there is a possibility that there will be a need for retransmit. Now
    >> note that the transport does no buffering, no data copies, so, if it
    >> would complete the original send request - it will have the data for
    >> the retransmit no more. So, TDI_SEND on unbuffered (the usual way,
    >> TCPIP's too) transport pends till all ACKs will arrive, and retransmits are handled off the same send request's buffer.

    I call TdiBuildSend with TDI_SEND_NO_RESPONSE_EXPECTED | TDI_SEND_NON_BLOCKING in InFlags, but it looks like still waiting for a acknowledgment of the send from the remote node until timeout, is this InFlags able to avoid the Nagle algorithm or not?

    Thanks,

    Tao
    _________________________________________________________________
  • Maxim_S._ShatskihMaxim_S._Shatskih Member Posts: 10,396
    >acknowledgment of the send from the remote node until timeout, is this InFlags
    >able to avoid the Nagle algorithm or not?

    Doubts. I will not be surprised if TCP does not care about some of the flags.

    Try switching Nagle off by IOCTL_TCP_SET_INFORMATION_EX

    --
    Maxim Shatskih, Windows DDK MVP
    StorageCraft Corporation
    [email protected]
    http://www.storagecraft.com
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. Sign in or register to get started.

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Writing WDF Drivers 24 January 2022 Live, Online
Internals & Software Drivers 7 February 2022 Live, Online
Kernel Debugging 21 March 2022 Live, Online
Developing Minifilters 23 May 2022 Live, Online