Thread model for NDIS driver -- What's your opinion?

Barak_Mendelevich · August 4, 2011, 3:04pm

Hi everybody,

I’m about to write an NDIS miniport driver for WLAN, and I was wondering if there are certain guidelines or “dos and don’ts” for the execution scheme of the driver.

To be more specific, I came to the following 3 models.
Can you please give your opinion about them?
Is there another way which may be better?

I know that there is more than one “correct answer”, but I need your
insight, to avoid trouble.

option 1:
run everything in a single-thread, PASSIVE IRQL
[i.e.: queue a work item (or similar) for every IO request / interrupt, and do the actual job in the context of the work item]

If I choose this option, I understand that the scheduler may preempt my
driver for the favor of a user app. Is there a way to make sure my driver
will always get the CPU before any user app?

Another issue: is the context-switch (from the caller context to the work item)
something I should consider? Do you have an estimation of how much time
this will take (assuming my work item is scheduled to run right after
the caller exits)

option 2:
run everything in the context of the caller, and sync
everything via a lock [e.g.: have a global lock in the system,
and whenever the driver gets control (OID, interrupt) - just
grab the global spinlock, do the job, and release the spinlock

option 3:
write a “multithreaded” driver, with spinlock for every shared
data / mutual exclusion.

My thoughts are about option 1, but it feels too risky.
Option 2 seems the most “straight-forward”, but also very
non-friendly to other drivers / apps.

What do you think?

Thanks !

Barak

Peter_Viscarola_OSR · August 4, 2011, 3:24pm

Option 3. The OS is an inherently reentrant and multi-threaded entity. Drivers are part of the OS, and hence are also inherently re-entrered and multi-threaded.

That’s the usual Windows driver way.

Peter
OSR

Jeffrey_Tippet_MSFT · August 4, 2011, 5:07pm

Peter is right (of course) about option 3 being the “natural” model for NT drivers. And option 1 is very unnatural – you’ll spend a lot of time fighting the system if you go that route. (Even in your short description, you mentioned worrying about thread priority – you’ve already begun fighting the system’s design!)

But option 2 isn’t terrible. Especially if you’re new to writing multithreaded code, or if you need to hammer out a reliable driver in a hurry, there are worse things you can do than have one giant lock. You’ll never have the fastest or most responsive driver, and you still won’t be exempt from needing careful design and thought. But it’s better to ship a slow and safe driver than to ship a big mess of erratic race conditions. It’s worth noting that early versions of NDIS forced you into a model like option 2. Those ancient serialized miniports can still run on Win7 today, and an end-user won’t really notice, as long as they don’t try to push more than 100Mpbs.

So if you have the time and motivation to learn to write drivers properly, and you’re willing to tolerate a few buggy “teachable moments” along the way, then go with option 3. But if you just need to hammer out a driver quickly and move on with your life, option 2 will get you the most reliable driver in the shortest amount of time.

Barak_Mendelevich · August 6, 2011, 3:19pm

Hi everybody, and thanks for your replies.

I spent some more time thinking of the problem (and the solutions), and I have some more questions and thoughts, if I may…

The first thought was that I’m probably trying to find a common solution for the entire driver, and that might not work. After all, handling the data path has different requirements than handling the control path.

So, I’d like to focus on handling the control path.
By this I mean the OIDs, and also the control data that goes to and form the air
(scanning, authentication, etc…).

Will serializing this path (where operations are usually serialized anyway?)
be acceptable?

Do you think that the “handle everything in PASSIVE mode” method I suggested in option 1 work here?

My concerns are mainly about performance:

Spin locks are expensive (busy wait, flushing cache)
Running in DISPATCH level for long time hurts performance
It seems like MS (slowly but assuredly) directs developers to use DISPATCH
level, or even user-mode (e.g.: for Win8 logo, there are limitations on the
amount of time a driver may run in DISPATCH level), so I’d like to avoid
DISPATCH level as much as I can.
In this way, data path will never lose CPU to control path (if there is CPU time for the driver).

I don’t know why, but it “feels” very right for me to run control path in PASSIVE and serialized.
Perhaps that’s because I come from the world of embedded systems, where the CPU
runs only the device’s apps, and that data is more important than control

Sorry for being lengthy… it’s a sign for confusion:-)

Your thoughts and ideas are welcome.

Thanks,

Barak

Pavel_A1 · August 7, 2011, 3:51pm

The NDIS (and DISPATH/DPC) rules are pretty well known, there isn’t a
lot of wheel to (re)invent.
What is variable: your hardware interface. The driver basically consists
of the hardware interface, NDIS interface and glue between them.
You want to decouple execution flows of these parts so that they won’t
block and hold each other.
If the hardware is cheap and simple (like USB 2.0, SDIO) would be
reasonable to serialize it: make a single thread servicing the data
pump, or set of DPC routines around the hardware, that need to be
serialized.

In the NDIS part, you’ll have send, control (OID) and packet return
paths running in parallel; they will feed data to the h/w interface part
(which runs asynchronously).

The h/w interface will decode received data (network and output of
OIDs). If it is data, indicate it up, if this is OID - complete it.
Keep in mind that NDIS API which indicate data up or complete requests
may immediately call you back to submit new request (yes, recursion!
This is where you need to decouple the paths).

It also depends on whether your driver is NDIS6 or NDIS5.
Synchronization rules there are different.

At the end the design will likely have several queues filled by various
inbound paths, and emptied by the h/w interface “engine”.

In any case you cannot run entirely at PASSIVE because NDIS entry points
can be called at dispatch. Delaying them to passive is a waste.
Most of hardware interfaces (even USB) also calls the callbacks at
dispatch. So spinlocks are natural choice. There is nothing to fear, you
need just learn and get some experience.

If your hardware is very high-throwput (PCIe or like that), it should be
specially designed to cooperate with the host driver better. NDIS has
few special mechanisms for such devices (TCP offloads, RSS and others).

I don’t know why, but it “feels” very right for me to run control
path in PASSIVE and serialized.
> Perhaps that’s because I come from the world of embedded systems,

Yes, very likely. In many of these systems drivers are tasks,
interrupts are like events, so communication among drivers is like
communication among tasks. But Windows NT is different.

Good luck,
– pa

On 06-Aug-2011 22:17, xxxxx@gmail.com wrote:

Hi everybody, and thanks for your replies.

I spent some more time thinking of the problem (and the solutions), and I have some more questions and thoughts, if I may…

The first thought was that I’m probably trying to find a common solution for the entire driver, and that might not work. After all, handling the data path has different requirements than handling the control path.

So, I’d like to focus on handling the control path.
By this I mean the OIDs, and also the control data that goes to and form the air
(scanning, authentication, etc…).

Will serializing this path (where operations are usually serialized anyway?)
be acceptable?

Do you think that the “handle everything in PASSIVE mode” method I suggested in option 1 work here?

My concerns are mainly about performance:

Spin locks are expensive (busy wait, flushing cache)

Running in DISPATCH level for long time hurts performance

It seems like MS (slowly but assuredly) directs developers to use DISPATCH
level, or even user-mode (e.g.: for Win8 logo, there are limitations on the
amount of time a driver may run in DISPATCH level), so I’d like to avoid
DISPATCH level as much as I can.

In this way, data path will never lose CPU to control path (if there is CPU time for the driver).

I don’t know why, but it “feels” very right for me to run control path in PASSIVE and serialized.
Perhaps that’s because I come from the world of embedded systems, where the CPU
runs only the device’s apps, and that data is more important than control

Sorry for being lengthy… it’s a sign for confusion:-)

Your thoughts and ideas are welcome.

Thanks,

Barak

Peter_Viscarola_OSR · August 7, 2011, 4:06pm

Actually, quite the contrary. Spinlocks are very INexpensive.

While I agree with Mr. Tippet’s general advice that it’s better to over-serialize and have your driver work correctly than under-serialize and have your driver work WRONG, it’s best to serialize the right amount and have your driver work correctly.

So, ONLY hold locks when you need to, and drop them when you don’t. Don’t spend extended amounts of time at IRQL DISPATCH_LEVEL, or with a spinlock held, and you’ll be just fine.

Peter
OSR

Pavel_A1 · August 7, 2011, 4:25pm

On 07-Aug-2011 22:44, Pavel A wrote:

If your hardware is very high-throwput

Throughput. it happens even *with* a keyboard

–pa

virendra_arya · August 8, 2011, 2:08am

Its hard to avoid DISPATCH level for data path. For vista onwards if you are spending too much time in DPC, system will assert with KeUpdateRuntime().

If your In-packet ISR is queuing DPCs back to back without relinquishing CPU for long, system will assert with KeUpdateSystemTime(). But this issue is handled by NDIS efficiently when you are queuing DPC with NDIS api call. In case of NDIS with WDM lower edge, you need to put your own mechanism to relinquish CPU after enough DPC. you will need to find out this _enough :).
Moral of story don’t get scared with DISPATCH level, just use wisely :).

Regards,
Ardneriv