Mark,
I think you have wholly missed the mark on why most drivers are crappy. The
BIG reason is the API is way too robust, (read its way to easy to hang
oneself). Add to this a SEVERE lack of documentation and samples and of
course you are going to end up with a mess. I am surprised that all works
as well as it does on the Windows platforms.
But, as an example of what I say, I would be willing to bet a shiny new
quarter that MOST driver writers out there don’t understand all or most of
the following:
1.) asynchronous I/O
2.) pending flag and related issues
3.) IRP cancellation
4.) IOCTL ctl code setup and buffering methods, especially the differences
in METHOD_IN_DIRECT and METHOD_OUT_DIRECT or METHOD_NEITHER overall.
5.) StartIo recursion
6.) IRP stack locations and how they are used
7.) map registers
8.) virtual/logical/physical addresses and the differences between them
9.) power handling such as power policy owner, system to device mappings,
idle detection
10.) how to properly handle things in a filter driver, such as passing on
everything, propagating the DeviceObject I/O flags, propagating the power
flags.
11.) proper registry access in WDM drivers
12.) work items and system worker thread starvation
13.) Pool fragmentation
Haven’t even gotten into the difficult aspects yet, or into specifics like
USB, 1394, scatter/gather, the list goes on.
This isn’t a problem of stupid driver writers or time to market. These are
errors you can find in the DDK samples, which is another problem which
begets problems. BTW, I don’t think MS is any less immune to this problem.
Their drivers seem no less buggy than those available commercially. Its
easy to get to the point where your drivers are bullet proof when you run on
100 million platforms. But, take some code that doesn’t get exercised all
that often, like the 1394 stack and I would say Microsoft’s hitting average
is no better than anyone elses. No offense to MS either, I am just trying
to point out that no one gets it right – most of the time. It just is that
complex.
It just is too difficult to write even a simple driver. Then its either
hard, almost impossible, or flat out impossible to figure out how all the
above work in a WDM driver from the DDK by itself.
Another solution Microsoft has tried is to force people into a simpler
driver model. That doesn’t work too well either. I take NDIS, SCSI, or
video as an example. All of these miniport models are severely limited to
the point of actually adding complexity if your device falls outside of what
was intended to be supported. I say this to say, I think the simpler driver
model approach is good provided you have some outstandingly brilliant folks
writing the model interface such that if you had to you could step outside
the model and still plug into the stack, that or see the future and have the
model encompass every conceivable possible device which of course isn’t
possible. Add to this performance and you better have some outstandingly
brilliant people with outstandingly brilliant Windows driver knowledge doing
this.
All in all this IS an unsolvable problem. The best you can hope for is to
propagate information and better samples more efficiently. Strengthen your
quality tests. And make it easier for hardware companies and such to not
write their own drivers and instead go to some trusted third party driver
writers.
No OS has ever had to deal with the matrix of hardware that Windows has
seen. It is unprecedented, so I think all in all Windows is doing a fair
job.
–
Bill McKenzie
“Roddy, Mark” wrote in message news:xxxxx@ntdev…
>
> The performance issue might be overstated but it is not to be ignored. A
3%
> hit on performance is significant, a 20% hit is unacceptable. Moving
drivers
> to their own protected space would most likely be closer to the 20% than
the
> 3. The answer is that we have to stop writing crappy drivers.
>
> Which brings this around full circle. ‘We’ are writing crappy drivers for
at
> least two reasons: some of ‘we’ are incompetent and don’t understand how
to
> do this right, and a lot of our employers don’t give a rat’s ass about
> reliability.
>
> We can’t solve the first problem without forcing the issue somehow on the
> second. Driver certification partially forces vendors to at least pay lip
> service to quality, although my experience is pretty consistent:
> certification is currently routinely ignored by hardware vendors.
> Driver-writer certification might help as well, if vendors could not get
> their drivers signed unless their source code had been audited by
certified
> driver developers, perhaps the level of competancy would be forced to
> increase.
>
> Until vendors care enough to make sure that competent programmers are
> producing reliable drivers, nothing is going to change. I agree with the
> consumer boycott sentiment in spirit, in reality, I install unsigned
drivers
> all the time, as they frequently fix bugs in signed drivers. Why is that?
>
> > -----Original Message-----
> > From: xxxxx@acm.org [mailto:xxxxx@acm.org]
> > Sent: Monday, April 29, 2002 12:34 PM
> > To: NT Developers Interest List
> > Subject: [ntdev] Re: Philosophical Rant [was Re: Writing
> > Drivers in Java]
> >
> >
> > Greg Dyess wrote:
> >
> > > We SHOULDN’T have to take ANY performance penalty to have a system
> > > that didn’t crash! I’ve had Alpha-based machines that
> > dual-booted NT
> > > and VMS. VMS, despite its reputation as a CPU and memory hog, ran
> > > circles around NT in terms of responsiveness and it NEVER
> > crashed on
> > > the exact same hardware! Something’s wrong here when we
> > start to even
> > > think we should accept moving device drivers to outer rings
> > to prevent
> > > the systems from crashing. Everyone should start buying
> > only hardware
> > > that has successfully passed the MS quality tests. I know
> > there are
> > > those on this list that will disagree primarily because the
> > violate MS
> > > guidelines for whatever reasons and cannot get certification. I
> > > myself have stopped buying anything not certified. If everyone did
> > > that, we could force these fly-by-nighters out of the business.
> > >
> >
> > Sorry, the we shouldn’t need a performance hit, is exactly
> > why many drivers do crash, and things are so crappy. I was
> > involved with various standards efforts for fault tolerance,
> > it was amazing to here large vendors say “We can’t take a 3%
> > performance hit, just to get 50% more reliable”. I am not
> > kidding when I say that some of the largest system vendors
> > screamed this at the top of their lungs.
> >
> > Good drivers are going to take a performance hit, because
> > they are going to check and recheck data, and make no
> > assumptions that the world is either safe or secure. I agree
> > with Peter that the performance question is overstated, we
> > have the bandwidth to do things well, we just aren’t doing them.
> >
> > Finally, while I agree with support vendors who certify their
> > drivers, that isn’t enough. I have the source of a certified
> > driver in front of me as I work, so far I have identified
> > over 100 likely BSOD’s out of the driver, and I suspect there
> > are at least 1000 probable BSOD’s. Testing will never
> > replace, careful planning and coding in developing driver.
> >
> > Don Burn
> > Egenera, Inc
> >
> > —
> > You are currently subscribed to ntdev as:
> > xxxxx@stratus.com To unsubscribe send a blank email to
> > %%email.unsub%%
> >
>
>