PreProcess IRPs in KWDF behaviour

Hi,
I want to understand the implications of implementing a preprocess callback for IRPs in KWDF.
What does KWDF do before it calls a driver preprocess callback function:
A.Does it handles cancellation?
B.Does it synchronize the IRP with Remove_Device/Surprise_Removal:

  1. Is the IRP rejected by the framework if the device is being removed or was removed?
  2. If remove_Device is called during processing of this IRP (Whether synchronously in the callback or asynchronously by the callback returning STATUS_PENDING, will it wait
    until the IRP is completed or will it continue its remove_device handling regardless of the IRP in progress.

Thanks,
Eran.

The preprocess routine is just like a WDM routine in your driver, nothing is implement, so…

a) no, cancellation is not supported
b) no, it does not sync the callback w/pnp state. But if the i/o is handle based, it will never be racing with a remove irp since remove only comes after all handles are closed. Kmdf does not track this irp in this callback so it is completely uncoordinated with kmdf’s pnp state machine

the preprocess routine is invoked before kmdf does any processing, all state guarantees do not apply.

d

Thanks,
Even if the I/O is handle base (Which it should be in my case), I still think that I/O can arrive after surprise removal is invoked, so I still need to do some syncing on my own.
I am checking IRP preprocessing because I am a bit worried about the performance implication of having KWDF allocating a WDFREQUEST for each IRP. I guess that the allocation is pretty fast as look-aside lists are involved (Did anyone do some benchmarking tests?). However I am worried about the scenario in which every device in the stack is KWDF, in this case, if the stack has 8 devices in its stack (Not uncommon in storage stacks), we will have 8 WDFREQUEST allocations per one IRP.
Did the KWDF team think about such scenarios, and whether there are thoughts to somehow share the same WDFREQUEST object by several devices in the stack in a similar manner as IRPs are shared?

In the bottom line, I guess I just need some reassurance that KWDF scales and performs well even with many ongoing IRPs(perhaps even 1000 IRPs in the same time), so that I can have winning arguments to my boss why my driver should be KWDF and not simply WDM.

Thanks,
Eran

we have measured around a 10% CPU increase with KMDF. you can still fully saturate the bus (pci, usb, etc), but with slightly higher CPU usage…and if you are already doing sync of pnp/power state while processing i/o under a lock, the increase is smaller since KMDF is doing the same thing you were doing beforehand. in the end, you need to measure. on of the big value adds for KMDF is that i/o is synchronized with pnp/power. removing that value add by preprocessing all i/o means that you must reimplement all of the WDM behavior again (and again, if you need a lock to do it, that is not much different then KMDF).

we would like to make WDFREQUESTs as light as possible. sharing a wdfrequest among all stack locations is not currently feasible.

now, KMDF has been tested in a storage stack, but KMDF does not make fwd progress guarantees. if we can’t alloc a WDFREQUEST for the PIRP (which does come from a lookaside list), it will be completed with failure. for storage, that is an issue (failing paging i/o could be disastrous) while for most other stacks, not such a big deal b/c they don’t have fwd progress requirements to begin with.

outside of the FS driver, you do not see handle based i/o in the storage stack. yes, i/o can still come after the s.r. has been sent and before the remove has been sent.

d

By saying 10% increase, I guess 10% increase relative to a WDM driver and not total 10% increase cpu.
For example, if a WDM driver took 30% cpu, you will expect its WDF equivalent to take 33% and not 40%. Is this correct?
Regarding the lookaside list: I understand there are certain thresholds in which the list will start releasing memory to the non-paged pool. Is there some way to tune KWDF(Or the OS itself) to use other settings? As in my case I expect a prolonged time in which at least one thousand WDFREQUEST will be running in the same time, and I am worried about the implications of too much accesses to the non-paged pool list.

Thanks,
Eran.

There is no way to manually control when lookaside lists are reclaimed
by the OS. Yes, the % was relative. If you are going to have 1000
active requests, I think the memory used by the WDFREQUESTs will fall
out in the wash compared to the other components using memory for the
requests.

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@topio.com
Sent: Sunday, October 22, 2006 6:19 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] PreProcess IRPs in KWDF behaviour

By saying 10% increase, I guess 10% increase relative to a WDM driver
and not total 10% increase cpu.
For example, if a WDM driver took 30% cpu, you will expect its WDF
equivalent to take 33% and not 40%. Is this correct?
Regarding the lookaside list: I understand there are certain thresholds
in which the list will start releasing memory to the non-paged pool. Is
there some way to tune KWDF(Or the OS itself) to use other settings? As
in my case I expect a prolonged time in which at least one thousand
WDFREQUEST will be running in the same time, and I am worried about the
implications of too much accesses to the non-paged pool list.

Thanks,
Eran.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Thanks for your reply.
Regarding the memory concerns, my driver will sit in a dedicated machine (X64) with alot of memory, so there shouldn’t be any problems there. The only concerns are performance and scalability.
Thank to your replies, I think I have ample ammunition to convince my boss to go with KWDF.

Thanks again,
Eran.