Spin Lock

Thanks to all. In a few days I will probe and inform you about the result of my experiments.

@Tim_Roberts said:

All memory allocated from UM can be paged out if the system is under memory pressure.

You are almost never wrong, but I have to dispute this assertion. Pages locked with VirtualLock will remain in physical memory until the process exits. Even the documentation says they are “guaranteed not to be written to the page file while they are locked”.

Raymond Chen covered the confusing nature of VirtualLock here:

https://devblogs.microsoft.com/oldnewthing/20071106-00/?p=24573

With an update/correction here:

https://devblogs.microsoft.com/oldnewthing/20140207-00/?p=1833

To the OP’s question of:

Can I access [the user buffer locked with VirtualLock] from the driver routine, working in arbitrary thread context (DISPATCH_LEVEL)

That would be a no for at least two reasons:

  1. If you’re in an arbitrary thread context you can’t access a user data pointer tied to a specific context
  2. Nothing says the user data pointer won’t become invalid while you’re using it (e.g. the app could call VirtualFree, VirtualProtect and make it read only, etc.). If you’re running at IRQL < DISPATCH_LEVEL the Mm will raise an exception that you can catch. If you’re at DISPATCH_LEVEL the system crashes.

If you want to access a user data buffer in an arbitrary context at arbitrary IRQL you use the MmProbeAndLockPages/MmGetSystemAddressForMdlSafe pattern.

Of course. Thank You. The user virtual addresses can be used only in a dispatch routine, working in process context of initiator, in IRQL level 0.
But I can not understand why to use MmProbeAndLockPage if it is already Locked( VirtualLock])

The most basic answer to your question of why to probe and lock is one of trust. by definition KM code does not trust UM code, so everything that UM passes to KM must be checked for validity and protected against interference until it can no longer cause KM to fail

As to the question of VritualLock, I have re-red these old posts from Raymond, and the present documentation, and I guess it comes down to how strenuously the system actually enforces the minimum working set for a process. I’m not convinced that this has not changed between windows versions, but my only evidence is vague memories. The current documentation clearly indicates how it should work. The use cases are still very limited

Well. Thank You

I red the article “Sharing Memory Between Drivers and Applications”. Thank You for that usefull article. I have a question. I want to realize the same, as in "CreateAndMapMemory " function in device creation time. But how to send the virtual address of user buffer in creation time, if function
HANDLE CreateFileW(
[in] LPCWSTR lpFileName,
[in] DWORD dwDesiredAccess,
[in] DWORD dwShareMode,
[in, optional] LPSECURITY_ATTRIBUTES lpSecurityAttributes,
[in] DWORD dwCreationDisposition,
[in] DWORD dwFlagsAndAttributes,
[in, optional] HANDLE hTemplateFile
);
do not have any extra parameter.

You don’t do it at “CreateFile” time, of course. You open the driver using “CreateFile”, and then send a custom ioctl to receive the mapped buffer address. The article is showing you the driver code you would use to help handle that ioctl.

Thanks

sorry Mr.Tim. You said receive. Do You mean send. Because the user program must create device, allocate memory and send to a driver user virtual address of that buffer. I am organizing bidirectional Direct I/O. I want to use the same user buffer for sending to driver and receiving from driver, and all time, while driver is active.

That’s one way to do it, but that’s not what the article you quoted does. That article has the kernel driver allocate the memory, and then map that memory into the user-mode process. Doing it that way is required if you need physically-contiguous memory for DMA, for example.

Yet another way is to have the application send down a METHOD_IN_DIRECT ioctl very early on, with the desired buffer as the second buffer in the ioctl, and then have the driver keep that ioctl pending for a long time. That way, the I/O system handles the mapping of the memory, and keeps it mapped as long as the ioctl is pending.

The very long pending IRP is a much better design choice. The content of whatever memory buffer you share is entirely up to you.

but consider that whatever mechanism you invent, it is unlikely to be more efficient or effective that standard ReadFile / WriteFile or DeviceIOControl calls. The shared memory design has advantages only in very specific use cases and you should be sure that yours is one before you go to the significant effort to implement a scheme like this.

If your application can tolerate lost, duplicated and corrupted data, then shared memory is easy to implement. If not, then you will end up re-implementing the standard calls

Thanks

Can I send to the user program the chain of MDLs (from cloned chain of NBLs) via the one Irp->MdlAddress.
Can I complete the MDL from Irp->MdlAddress received from user(I/O manager) and change that address with new one from newly cloned MDLs extracted from send path of NDIS filter driver. Need I lock cloned MDLs before returning to the user. Thank You vary much.

The short answer is no you can’t do this and don’t want to try.

The long answer is much longer.

With respect, I suggest you take some training and / or do more research. The questions that you are asking indicate a lack of fundamental knowledge in some areas, and it is not a good idea to attempt advanced / non-standard work without a solid understanding of the basics

Thank you. I will copy the entire payload from NBLs into one MDL from Irp and return to the user. If this is not a good way, please let me know

How I already said, I am allocate a buffer 16000byte length in user program, full it, and call DeviceIoControl with Direct I/O. The system convert it to a MDL and pass to the NDIS filter driver. The driver, in his dispatch routine converts it to the NBLs and send to net. In the same time I queue that IRP and complete with STATUS_PENDING flag. After when NDIS calls my FilterSendNetBufferLists, I take that IRP from queue fills the MDL from IRP and return it to user with new network payload via the IoCompleteRequest. It works good some time (last time 2 days). But sometimes turn out, that the integral length of All (MDLs * NBs) is greater than 16000; I can increase that user buffer but it is not good way to do. What can You advise me.

you should queue multiple IRPs at the same time. after one is completed, data that arrives will have another one ready

thank You. Mr. MBond2. I have done it. But I wanted to send with one Irp. Because for splitting to many NBs and sending every NB separatory will decrease the speed.

There may be a limit on the total length of the entire payload. I mean all NBs from all NBLs. Perhaps this is a property of the network card. How can I request a card

yes, you can find out the MTU on the interface and make sure your buffer is larger than that. For most hardware, the largest jumbo frame is around 9,000 bytes

but what I mean is not that you should split the data for a single packet into multiple buffers, but that you should use OVERLAPPED IO and send multiple buffers to the driver when there is no data. Then as data arrives, fill the first buffer and send it back to UM. After the UM code runs, send another buffer back to the driver. the point is that because the speed at which the KM code can detect packets, and the speed at which UM code can process them mismatch, you want a queue of pending buffers to even out spikes