Re: Flushing DMA Buffer Allocated with AllocateCommon Buffer

Hi Mats, thanks for the thorough answer. I know absolutely nothing
about Windows video drivers, so I’m mostly just curious what the reasons
are for doing this.

The other reason would be maintaining the different builds in some
sensible
way, and avoiding errors. If you split the source up into different
source
files, some update somewhere will be missed in a different version.

Presumably any code that needed to be shared among the different
versions would be in a common place (perhaps a library) to avoid “copy &
paste” updates and other modification propogation errors.

If you
have lots of #if in the code, it becomes hairy for other reasons.
Having one
large source file that contains all different variations without any
conditional compile makes it relatively easy to maintain.

This is true, although in those places where you need different code for
different hardware, it has to be done somehow – be it a compile-time
#if or a run-time if (meaning a conditional of some sort).

There’s also the fact that for each package you have, you need
separate WHQL
certification, which means that if you do a different build for a
particular
board/SKU etc, you need to run the same tests on this build as you did
on
the main build, so for each variation you add one lot of WHQL runs. A
run of
WHQL for Display driver takes around a day to run, assuming all goes
well…
Add to this that you have to send the logs to MS, get them to certify
you,
and wait for the results from MS.

This is probably the most compelling of all the reasons you mention.
=^)

If the driver is updated often (during Beta stage when you’re
developing a
new board + ASIC for instance), you also get problems with tracking
which
versions of which variation of the driver has which fixes included, as
someone may have updated something during the daily build stage, and
the
second of the builds for that day has a different set of fixes than
the
first build.

These are of course real issues, but there are procedures and processes
you can use to alleviate them. For example, having a dedicated build
engineer; building from a source “snapshot” (so you don’t get builds
when the source code is in an “intermediate” state); and building all
binaries during a single build from the source snapshot.

Now multiply all this by the number of OS’s that you support (WinNT,
2K, XP,
9X etc) and you start seeing why having one driver is a real nice
thing.

This is also true. I have written application code that handles the
various OS flavors at run-time as opposed to compile-time. I’m just
surprised that something somewhat performance critical as a video driver
would do this too. Having a single binary implies at least three
potentially performance-sapping side-effects: (1) run-time conditionals
(extra instructions for tests, comparisons, branches; CPU performance
with regard to branches, such as incorrectly predicted branches,
instruction fetch queue and pipeline flushes, etc.); (2) increased cache
misses due to lower spacial proximity as well as larger footprint; and
(3) swapping or memory resource consumption due to a larger binary
footprint. Do you have any thoughts on how, in a practical “real world”
situation, having a unified driver binary affects performance? For
example, if I had a driver compiled specifically for my hardware, how
many more frames per second would I see playing Half Life? =^)

Chuck

> -----Original Message-----
> From: Chuck Batson [mailto:xxxxx@cbatson.com]
> Sent: Thursday, November 20, 2003 2:48 PM
> To: Windows System Software Devs Interest List
> Subject: [ntdev] Re: Flushing DMA Buffer Allocated with
AllocateCommon
> Buffer
>
>
> Perhaps this is a dumb question, but is there any particular
> reason why
> you don’t segregate into different builds? Why is it
> necessary to cram
> everything into a single driver binary?
>
> Chuck
>
> ----- Original Message -----
> From: “Calvin Guan”
> > To: “Windows System Software Devs Interest List”

> > Sent: Wednesday, November 19, 2003 2:26 AM
> > Subject: [ntdev] Re: Flushing DMA Buffer Allocated with
AllocateCommon
> > Buffer
> >
> >
> > > To add what Alberto said, our miniport has to support a huge list
of
> > desktop
> > > and mobile ASICs. Each asic has different video BIOS to handle.
The
> > most
> > > headache to me is the mobile ASICs on notebooks. Different OEMs
have
> > > different LCD panels. And different OEM requires different
features.
> > Also,
> > > there are many awesome features implemented in the miniport.
> > >
> > > Instead of “miniport”, I would call it a griantport. It’s
> > even larger
> > than
> > > ntfs.sys in size. I really miss the day when I was with
> > NDIS miniport
> > that I
> > > wrote every single line of code for my driver-:slight_smile:
> > >
> > > -----Original Message-----
> > > From: Moreira, Alberto [mailto:xxxxx@compuware.com]
> > > Sent: Tuesday, November 18, 2003 10:40 AM
> > > To: Windows System Software Devs Interest List
> > > Subject: [ntdev] Re: Flushing DMA Buffer Allocated with
> > AllocateCommon
> > > Buffer
> > >
> > >
> > > There’s a lot of functionality in a Miniport, it does most of the
> > > non-time-critical functions of driving a graphics subsystem. Some
> > people put
> > > support for several different chips in the same piece of code, but
> > even if
> > > you only have one chip, your Miniport may end up being pretty big.
> > Some of
> > > the actual space is taken by tables, for example, every graphics
> > driver
> > > supports several resolutions and bit depths, and one must
> > keep tables
> > of
> > > register settings that set up your chip to the corresponding video
> > mode.
> > > There’s also tables with configuration and capability settings,
and
> > they
> > > take space. You must handle initialization, capabilities, mode
> > changes,
> > > power management, multiple screens, resource management,
> > you name it.
> > You
> > > must also manage the retrace interrupt. In WinXP there’s even new
> > support
> > > for DMA. BTW, Calvin, do you guys implement and use the new
> > DMA calls
> > that
> > > WinXP added to the Miniport ?
> > >
> > > Alberto.
> > >
> > >
> > > -----Original Message-----
> > > From: xxxxx@lists.osr.com
> > > [mailto:xxxxx@lists.osr.com]On Behalf Of Maxim
> > S. Shatskih
> > > Sent: Monday, November 17, 2003 10:14 PM
> > > To: Windows System Software Devs Interest List
> > > Subject: [ntdev] Re: Flushing DMA Buffer Allocated with AllocateC
> > > ommonBuffer
> > >
> > >
> > > Wow! Am I right that this huge amount of code is due to
> > supporting
> > all
> > > videocard hardware models and maintaining the backward
> > compatibility,
> > so
> > > that the newest binary can work with even the old hardware?
> > >
> > > Maxim Shatskih, Windows DDK MVP
> > > StorageCraft Corporation
> > > xxxxx@storagecraft.com mailto:xxxxx
> > > http://www.storagecraft.com http:
> > >
> > >
> > > ----- Original Message -----
> > > From: Calvin Guan mailto:xxxxx
> > > To: Windows System Software Devs Interest
> > mailto:xxxxx List
> > >
> > > Sent: Tuesday, November 18, 2003 4:02 AM
> > > Subject: [ntdev] Re: Flushing DMA Buffer Allocated with AllocateC
> > > ommonBuffer
> > >
> > >
> > > Well, video miniport is a lot of code-:).
> > > Our Radeon x86 free build miniport (ati2mtag.sys) is more than
600k.
> > the chk
> > > build doesn’t fit into a floppy…
> > >
> > > Calvin Guan, Software Developer xxxxx@nospam.ati.com
> > > mailto:xxxxx
> > > SW2D-Radeon NT Core Drivers
> > > ATI Technologies Inc.
> > > 1 Commerce Valley Drive East
> > > Markham, Ontario, Canada L3T 7X6
> > > Tel: (905) 882-2600 Ext. 8654
> > > Find a driver: http://www.ati.com/support/driver.html
> > > http:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Maxim S. Shatskih [mailto:xxxxx@storagecraft.com
> > > mailto:xxxxx]
> > > > Sent: Monday, November 17, 2003 7:20 PM
> > > > To: Windows System Software Devs Interest List
> > > > Subject: [ntdev] Re: Flushing DMA Buffer Allocated with
AllocateC
> > > > ommonBuffer
> > > >
> > > >
> > > > > Miniport. For example, look at the Permedia P3 sample in
> > > > the DDK, the DMA
> > > > > rendering is handled in the driver and not in the Miniport.
> > > > There’s not
> > > >
> > > > Then why the nVidia’s miniport is THIS huge (500KB or such)?
> > > >
> > > > Maxim Shatskih, Windows DDK MVP
> > > > StorageCraft Corporation
> > > > xxxxx@storagecraft.com
> > > > http://www.storagecraft.com http:
> > > >
> > > >
> > > > —
> > > > Questions? First check the Kernel Driver FAQ at
> > > > http://www.osronline.com/article.cfm?id=256
> > > http:
> > > >
> > > > You are currently subscribed to ntdev as: xxxxx@ati.com
> > > > To unsubscribe send a blank email to
> > xxxxx@lists.osr.com
> > > >
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at
> > > http://www.osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as: xxxxx@storagecraft.com
> > > To unsubscribe send a blank email to
> > xxxxx@lists.osr.com
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at
> > > http://www.osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as:
> > xxxxx@compuware.com
> > > To unsubscribe send a blank email to
> > xxxxx@lists.osr.com
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at
> > > http://www.osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as: xxxxx@ati.com
> > > To unsubscribe send a blank email to
> > xxxxx@lists.osr.com
> > >
> > >
> > >
> > >
> > > The contents of this e-mail are intended for the named
> > addressee only.
> > It
> > > contains information that may be confidential. Unless you are the
> > named
> > > addressee or an authorized designee, you may not copy or use it,
or
> > disclose
> > > it to anyone else. If you received it in error please notify us
> > immediately
> > > and then destroy it.
> > >
> > >
> > >
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at
> > http://www.osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as: xxxxx@cbatson.com
> > > To unsubscribe send a blank email to
> > xxxxx@lists.osr.com
> > >
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at
> > http://www.osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
> >
>
> —
> Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@cbatson.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
></http:></http:></mailto:xxxxx></http:></mailto:xxxxx></mailto:xxxxx></mailto:xxxxx></http:></mailto:xxxxx>

The effect of run-time conditionals is often exaggerated. They are only
costly if they are encountered very often, in performance-critical
sections. (There are other potential inefficiencies, such as allocating
memory that you don’t need in a particular environment.) This is
usually not the case.

In the vast majority of cases, the conditional is not in a
performance-critical region. For example, conditionals in
initialization / configuration / capability queries, etc. However, for
the sake of argument, if we assume that there ARE some regions where
performance is critical, and that these regions differ significantly,
then it still isn’t a problem. Inside a single driver image, you can
have multiple copies of the same device-specific code, specialized with
#if statements, and you can choose which chunk to use at run-time. (For
example, during initialization, you could decide which vector of
function pointers to use, based on which hardware was discovered.) And
of course you don’t want to cut-n-paste code – a mortal sin! – but
there are lots of work-arounds for this.

For example, if you had video cards, models Foo, Bar, and Zub, you could
write your core implementation in model-implementation.c, which would
have #if statements, testing which model to specialize for. Then, you
could have three model-specific files, that just set up certain #define
statements, then #include “model-implementation.c”. Easy as could be,
and no code duplication.

– arlie

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Chuck Batson
Sent: Thursday, November 20, 2003 10:45 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: Flushing DMA Buffer Allocated with AllocateCommon
Buffer

[… deletia …]

Now multiply all this by the number of OS’s that you support (WinNT,
2K, XP,
9X etc) and you start seeing why having one driver is a real nice
thing.

This is also true. I have written application code that handles the
various OS flavors at run-time as opposed to compile-time. I’m just
surprised that something somewhat performance critical as a video driver
would do this too. Having a single binary implies at least three
potentially performance-sapping side-effects: (1) run-time conditionals
(extra instructions for tests, comparisons, branches; CPU performance
with regard to branches, such as incorrectly predicted branches,
instruction fetch queue and pipeline flushes, etc.); (2) increased cache
misses due to lower spacial proximity as well as larger footprint; and
(3) swapping or memory resource consumption due to a larger binary
footprint. Do you have any thoughts on how, in a practical “real world”
situation, having a unified driver binary affects performance? For
example, if I had a driver compiled specifically for my hardware, how
many more frames per second would I see playing Half Life? =^)

“Chuck Batson” wrote in message news:xxxxx@ntdev…

[snip]

> This is also true. I have written application code that handles the
> various OS flavors at run-time as opposed to compile-time. I’m just
> surprised that something somewhat performance critical as a video driver
> would do this too. Having a single binary implies at least three
> potentially performance-sapping side-effects: (1) run-time conditionals
> (extra instructions for tests, comparisons, branches; CPU performance
> with regard to branches, such as incorrectly predicted branches,
> instruction fetch queue and pipeline flushes, etc.); (2) increased cache
> misses due to lower spacial proximity as well as larger footprint; and
> (3) swapping or memory resource consumption due to a larger binary
> footprint. Do you have any thoughts on how, in a practical “real world”
> situation, having a unified driver binary affects performance? For
> example, if I had a driver compiled specifically for my hardware, how
> many more frames per second would I see playing Half Life? =^)

I’m not going to offer anything for Mats, but I can describe one technique
to ameliorate the first two of the three: Init-time specialization.

That means that you abstract the hardware differences across a common
interface, then you implement a separate module for each hardware variant
against that interface. At init time, you select the module that matches
the hardware, pop the function pointers into your dispatch table, and then
just call them unconditionally when you need them. No run-time decisions
required. In fact, that’s pretty much what the various miniports do, but
the miniport loads first, and the common part (the port) is linked to it by
the loader. The technique I described is the same thing, only you are doing
the linking explicitly, instead of letting the OS do it for you.

Phil

Philip D. Barila Windows DDK MVP
Seagate Technology, LLC
(720) 684-1842
As if I need to say it: Not speaking for Seagate.