Dear Sirs,
Can anyone tell me the Difference between Scatter/Gather DMA and Common Buffer DMA? Specifically, I want to know the respective benefit of Scatter/Gather DMA and Common Buffer DMA. And then I want to know the respective weakness of Scatter/Gather DMA and Common Buffer DMA.
Thank you!
Hi.
There is a series of entries regarding DMA on the blog of Peter Wieland
(thanks Peter!). The first one is here
http://blogs.msdn.com/peterwie/archive/2006/02/27/540252.aspx
It’s quite difficult to write down a list of benefits vs disadvantages of
each solution. It depends a lot on the requirements of your hardware (if you
are designing the hardware).
Some incomplete notes:
- if you use a common buffer, as the name implies you just have one buffer
to deal with. You usually pass the base physical pointer of the buffer to
your hardware and that’s it (this is a big over simplification).
Disadvantages: it’s one single buffer. If it needs to be rather big (and
this could mean just 2-3MB), you could have problems allocating it. A DMA
buffer is a contiguous physical buffer, over time physical memory gets
fragmented. Also, you always need to deal with ownership of the buffer. Who
owns the buffer (or part of it) at any given time? Hardware or host? How do
you manage synchronization? Have these things right is not trivial. - scatter gather solves the problem of the contiguous buffer, you allocate
the buffer in smaller chunks. Obviously you need to pass the physical
pointer to each such small buffer to your hardware. But usually
synchronization and ownership comes for free (each chunk is owned by the
hardware or by the host, and the hardware usually relinquishes ownership of
a buffer by raising an interrupt). - consider that a lot of hardware uses some mixed approach, using both a
common buffer and scatther/gather.
Have a nice day
GV
–
Gianluca Varenni, Windows DDK MVP
CACE Technologies
http://www.cacetech.com
----- Original Message -----
From:
To: “Windows System Software Devs Interest List”
Sent: Friday, November 30, 2007 8:07 AM
Subject: [ntdev] Difference between Scatter/Gather DMA and Common Buffer DMA
> Dear Sirs,
> Can anyone tell me the Difference between Scatter/Gather DMA and Common
> Buffer DMA? Specifically, I want to know the respective benefit of
> Scatter/Gather DMA and Common Buffer DMA. And then I want to know the
> respective weakness of Scatter/Gather DMA and Common Buffer DMA.
> Thank you!
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
Thank Gianluca Varenni.
Continue to search for differences.
Does anyone know more detailed or other difference? Thank you.
Try this doc: http://www.microsoft.com/whdc/driver/wdf/dma.mspx
It’s about DMA in KMDF, but the intro section does an admirable job
describing the general basics.
-scott
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com
.
wrote in message news:xxxxx@ntdev…
> Thank Gianluca Varenni.
> Continue to search for differences.
> Does anyone know more detailed or other difference? Thank you.
>
Common buffer DMA is DMA which runs over specially allocated driver’s
buffer, not over IRP’s buffers.
Scatter-gather DMA is the situation when the DMA engine on the hardware is
capable of running over physically discontiguous buffer, having its
scatter-gather list (bus address+length pairs) in some form, and interpreting
this list in hardware.
Often, this list is located in the common buffer in the main RAM. This is
also called “chain DMA”.
USB controllers, TI OHCI 1394 controller and many ATA/SATA controllers are
the samples of scatter-gather-DMA-capable hardware. Adaptec’s good old aic78xx
too.
–
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com
wrote in message news:xxxxx@ntdev…
> Dear Sirs,
> Can anyone tell me the Difference between Scatter/Gather DMA and Common
Buffer DMA? Specifically, I want to know the respective benefit of
Scatter/Gather DMA and Common Buffer DMA. And then I want to know the
respective weakness of Scatter/Gather DMA and Common Buffer DMA.
> Thank you!
>
xxxxx@gmail.com wrote:
Thank Gianluca Varenni.
Continue to search for differences.
Does anyone know more detailed or other difference? Thank you.
GV’s response was pretty detailed. The difference is very clear-cut;
it’s not like there is any mystery. Are you still confused?
The issue is that DMA operations have to use physical addresses, but the
memory addresses in our programs and drivers are virtual addresses. Are
you familiar with the difference?
When you allocate a 2MB buffer, it will consist of roughly 500 pages of
memory. The virtual addresses in that buffer are all consecutive; they
might start at E0620000 and go through E081FFFF. However, the physical
addresses of those pages are NOT consecutive. The first 4k bytes might
be from 01234000 through 01234FFF, the next 4K at 06920000 through
06920FFF, and so on. Drivers and applications use the virtual
addresses, and the page tables in the processor do the conversion.
Those page tables are not available to PCI devices. When you do a DMA
transfer, you have to provide the physical address to use. For this
buffer, you would tell it 01234000. However, your device cannot just
transfer 2 megabytes from that spot, because after the first 4k it has
to change to a different physical address.
Now, with common buffer DMA, you use special APIs to allocate a buffer
that has consecutive addresses in BOTH virtual and physical space. So,
if we had a common buffer from E0620000 to E081FFFF, its physical
address range might be 01234000 through 01433FFF. Now, you can give
your hardware the starting address and have it do the entire 2 megabyte
transfer.
However, after a system has been running a while, it is difficult to
find a set of pages with consecutive physical addresses. Plus, in many
cases you don’t have control over how the buffer was allocated; you have
to use a buffer that was handed to you. Because of that, many devices
let you do DMA by chopping up the transfer into a set of smaller
pieces. You give it a whole list of physical address and size pairs.
In my first example, I would tell it to transfer 4k from 01234000, then
4k from 06920000, and so on.
That’s called scatter/gather, because it is “gathering data” from
various places in memory, or “scattering data” in the case of a write.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
Can you give us more information on what specific differences you are
looking for, or what you are trying to design? Are you designing some
DMA-enabled card? Any specific requirements as to data transfer to/from the
piece of hardware?
Have a nice day
GV
–
Gianluca Varenni, Windows DDK MVP
CACE Technologies
http://www.cacetech.com
----- Original Message -----
From:
To: “Windows System Software Devs Interest List”
Sent: Friday, November 30, 2007 8:38 AM
Subject: RE:[ntdev] Difference between Scatter/Gather DMA and Common Buffer
DMA
> Thank Gianluca Varenni.
> Continue to search for differences.
> Does anyone know more detailed or other difference? Thank you.
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
Tim , this is a really , really nice explanation. You should start writing books , and I will buy all of them
Christiaan
----- Original Message -----
From: “Tim Roberts”
To: “Windows System Software Devs Interest List”
Sent: Friday, November 30, 2007 7:23 PM
Subject: Re: [ntdev] Difference between Scatter/Gather DMA and Common Buffer DMA
> xxxxx@gmail.com wrote:
>> Thank Gianluca Varenni.
>> Continue to search for differences.
>> Does anyone know more detailed or other difference? Thank you.
>>
>
> GV’s response was pretty detailed. The difference is very clear-cut;
> it’s not like there is any mystery. Are you still confused?
>
> The issue is that DMA operations have to use physical addresses, but the
> memory addresses in our programs and drivers are virtual addresses. Are
> you familiar with the difference?
>
> When you allocate a 2MB buffer, it will consist of roughly 500 pages of
> memory. The virtual addresses in that buffer are all consecutive; they
> might start at E0620000 and go through E081FFFF. However, the physical
> addresses of those pages are NOT consecutive. The first 4k bytes might
> be from 01234000 through 01234FFF, the next 4K at 06920000 through
> 06920FFF, and so on. Drivers and applications use the virtual
> addresses, and the page tables in the processor do the conversion.
>
> Those page tables are not available to PCI devices. When you do a DMA
> transfer, you have to provide the physical address to use. For this
> buffer, you would tell it 01234000. However, your device cannot just
> transfer 2 megabytes from that spot, because after the first 4k it has
> to change to a different physical address.
>
> Now, with common buffer DMA, you use special APIs to allocate a buffer
> that has consecutive addresses in BOTH virtual and physical space. So,
> if we had a common buffer from E0620000 to E081FFFF, its physical
> address range might be 01234000 through 01433FFF. Now, you can give
> your hardware the starting address and have it do the entire 2 megabyte
> transfer.
>
> However, after a system has been running a while, it is difficult to
> find a set of pages with consecutive physical addresses. Plus, in many
> cases you don’t have control over how the buffer was allocated; you have
> to use a buffer that was handed to you. Because of that, many devices
> let you do DMA by chopping up the transfer into a set of smaller
> pieces. You give it a whole list of physical address and size pairs.
> In my first example, I would tell it to transfer 4k from 01234000, then
> 4k from 06920000, and so on.
>
> That’s called scatter/gather, because it is “gathering data” from
> various places in memory, or “scattering data” in the case of a write.
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
That’s a nice explanation, Tim. But I’d like to add that the physical
address of the memory may not be the bus address used by the device. This
distinction has always been in the NT driver model. You need to use DMA
logical addresses when programming the device.
This has been a minor point for many years, as only the RISC machines (and
AGP devices) had to care about differences between device logical addresses
and physical memory addresses, since all the common machines in the world
had a one-to-one identity mapping between system physical memory and I/O
busses.
With the advent of device virtualization, which is happening now in the PCI
SIG, device logical addresses are going to be departing from system physical
addresses much more often than they now do. (See the relatively new
“Address Translation Services” and “Single-Root I/O Virtualization” specs in
the SIG publications.)
So please, please, please. Be careful when programming your drivers so that
they conform to the NT DMA APIs. These APIs will not only protect you from
problems relating to devices which can’t address the entire physical space
(as with 32-bit devices in a greater-than-4GB machine) but also from
problems relating to virtualized remapping of system memory.
- Jake Oshins
Windows Kernel Guy
Virtualation I/O Architect
Microsoft
“Tim Roberts” wrote in message news:xxxxx@ntdev…
> xxxxx@gmail.com wrote:
>> Thank Gianluca Varenni.
>> Continue to search for differences.
>> Does anyone know more detailed or other difference? Thank you.
>>
>
> GV’s response was pretty detailed. The difference is very clear-cut;
> it’s not like there is any mystery. Are you still confused?
>
> The issue is that DMA operations have to use physical addresses, but the
> memory addresses in our programs and drivers are virtual addresses. Are
> you familiar with the difference?
>
> When you allocate a 2MB buffer, it will consist of roughly 500 pages of
> memory. The virtual addresses in that buffer are all consecutive; they
> might start at E0620000 and go through E081FFFF. However, the physical
> addresses of those pages are NOT consecutive. The first 4k bytes might
> be from 01234000 through 01234FFF, the next 4K at 06920000 through
> 06920FFF, and so on. Drivers and applications use the virtual
> addresses, and the page tables in the processor do the conversion.
>
> Those page tables are not available to PCI devices. When you do a DMA
> transfer, you have to provide the physical address to use. For this
> buffer, you would tell it 01234000. However, your device cannot just
> transfer 2 megabytes from that spot, because after the first 4k it has
> to change to a different physical address.
>
> Now, with common buffer DMA, you use special APIs to allocate a buffer
> that has consecutive addresses in BOTH virtual and physical space. So,
> if we had a common buffer from E0620000 to E081FFFF, its physical
> address range might be 01234000 through 01433FFF. Now, you can give
> your hardware the starting address and have it do the entire 2 megabyte
> transfer.
>
> However, after a system has been running a while, it is difficult to
> find a set of pages with consecutive physical addresses. Plus, in many
> cases you don’t have control over how the buffer was allocated; you have
> to use a buffer that was handed to you. Because of that, many devices
> let you do DMA by chopping up the transfer into a set of smaller
> pieces. You give it a whole list of physical address and size pairs.
> In my first example, I would tell it to transfer 4k from 01234000, then
> 4k from 06920000, and so on.
>
> That’s called scatter/gather, because it is “gathering data” from
> various places in memory, or “scattering data” in the case of a write.
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
The problem with using the NT DMA API is that it binds the driver to
Windows. That can be a rather undesirable thing, for example, in my case
where I need to make sure that most of my DMA runs just the same under
Windows, Linux, Solaris and MacOS. If all I use from the OS API is a service
layer, things are easy to port; if I need to adhere to a pre-provided
design, compatibility may be very hard to achieve, if not impossible.
The main problem I find is that the NT DMA API, like the whole of the WDM
and WDF complexes, is a “fill in the blanks” thing: the Windows team has
designed a subsystem, and we driver writers are supposed to fill in the
blanks with bits and pieces that negotiate the unique features of the
hardware. That approach is not good for compatibility, what I need is a set
of services which allow me to ask the OS for help when I need it, while
letting me design my DMA the way I need it to be designed. This also spills
into hardware design - we need a piece of hardware that runs just as well
under multiple operating systems, not just a “built for Windows” piece of
silicon. So, in this case of Virtualization what I would like to see is a
few unobtrusive Memory Management calls that allow me to negotiate the
differences in memory addressing, but I will not appreciate if that
functionality is tied up to an obligation of doing DMA the way the Windows
team designed it, because I’m going to have a heck of a time converting it
to Linux, Solaris or MacOS - that is if that’s at all possible.
Take a simple example: I need to be able to read and write to my PCI Config
Registers, and my hardware may have more registers than the PCI spec asks
for. I do not like to have to ask a bus driver to do it; that generates the
need to have two pieces of code, one for Windows and the other for the
Unixes. What I would like to be able to do is to use the hardware level
mechanisms supplied by the PCI standard, and if that generates any kind of
synchronization issue, I would have expected the OS to provided me some
Mutex, Semaphore or Spinlock that would allow me to indivisibily access the
PCI config hardware. This is, incidentally, similar to the situation with
the APIC ICR which I talked about in my recent posts: this is another case
where I would have expected the OS to provide some way of synchronizing
access to such piece of hardware, so that driver writers can talk to the
hardware and not depend on other drivers which isolate us from the hardware
and create a substantial impediment to easy compatibility.
Besides, it is a jolly nightmare to try to circumvent chip bugs while being
straightjacketed inside someone else’s design. Some chip bugs demand that I
have the ability to manipulate things at the lowest level, and the last
thing I want is an OS forcing me to adhere to its own I/O design, which may
be eons away from the way my chip works, and preventing me from being able
to negotiate my chip’s bugs through a mixture of needing to stick to the API
and not having enough documentation to know how to bend it to do what’s
needed. This is why in many instances I’m rather ready to take on myself the
responsibility to handle the lowest level, lest I want to write multiple
drivers for all the different OS configurations I need to support!
Alberto.
----- Original Message -----
From: “Jake Oshins”
Newsgroups: ntdev
To: “Windows System Software Devs Interest List”
Sent: Saturday, December 01, 2007 5:06 PM
Subject: Re:[ntdev] Difference between Scatter/Gather DMA and Common Buffer
DMA
> That’s a nice explanation, Tim. But I’d like to add that the physical
> address of the memory may not be the bus address used by the device. This
> distinction has always been in the NT driver model. You need to use DMA
> logical addresses when programming the device.
>
> This has been a minor point for many years, as only the RISC machines (and
> AGP devices) had to care about differences between device logical
> addresses and physical memory addresses, since all the common machines in
> the world had a one-to-one identity mapping between system physical memory
> and I/O busses.
>
> With the advent of device virtualization, which is happening now in the
> PCI SIG, device logical addresses are going to be departing from system
> physical addresses much more often than they now do. (See the relatively
> new “Address Translation Services” and “Single-Root I/O Virtualization”
> specs in the SIG publications.)
>
> So please, please, please. Be careful when programming your drivers so
> that they conform to the NT DMA APIs. These APIs will not only protect
> you from problems relating to devices which can’t address the entire
> physical space (as with 32-bit devices in a greater-than-4GB machine) but
> also from problems relating to virtualized remapping of system memory.
>
> - Jake Oshins
> Windows Kernel Guy
> Virtualation I/O Architect
> Microsoft
>
>
>
> “Tim Roberts” wrote in message news:xxxxx@ntdev…
>> xxxxx@gmail.com wrote:
>>> Thank Gianluca Varenni.
>>> Continue to search for differences.
>>> Does anyone know more detailed or other difference? Thank you.
>>>
>>
>> GV’s response was pretty detailed. The difference is very clear-cut;
>> it’s not like there is any mystery. Are you still confused?
>>
>> The issue is that DMA operations have to use physical addresses, but the
>> memory addresses in our programs and drivers are virtual addresses. Are
>> you familiar with the difference?
>>
>> When you allocate a 2MB buffer, it will consist of roughly 500 pages of
>> memory. The virtual addresses in that buffer are all consecutive; they
>> might start at E0620000 and go through E081FFFF. However, the physical
>> addresses of those pages are NOT consecutive. The first 4k bytes might
>> be from 01234000 through 01234FFF, the next 4K at 06920000 through
>> 06920FFF, and so on. Drivers and applications use the virtual
>> addresses, and the page tables in the processor do the conversion.
>>
>> Those page tables are not available to PCI devices. When you do a DMA
>> transfer, you have to provide the physical address to use. For this
>> buffer, you would tell it 01234000. However, your device cannot just
>> transfer 2 megabytes from that spot, because after the first 4k it has
>> to change to a different physical address.
>>
>> Now, with common buffer DMA, you use special APIs to allocate a buffer
>> that has consecutive addresses in BOTH virtual and physical space. So,
>> if we had a common buffer from E0620000 to E081FFFF, its physical
>> address range might be 01234000 through 01433FFF. Now, you can give
>> your hardware the starting address and have it do the entire 2 megabyte
>> transfer.
>>
>> However, after a system has been running a while, it is difficult to
>> find a set of pages with consecutive physical addresses. Plus, in many
>> cases you don’t have control over how the buffer was allocated; you have
>> to use a buffer that was handed to you. Because of that, many devices
>> let you do DMA by chopping up the transfer into a set of smaller
>> pieces. You give it a whole list of physical address and size pairs.
>> In my first example, I would tell it to transfer 4k from 01234000, then
>> 4k from 06920000, and so on.
>>
>> That’s called scatter/gather, because it is “gathering data” from
>> various places in memory, or “scattering data” in the case of a write.
>>
>> –
>> Tim Roberts, xxxxx@probo.com
>> Providenza & Boekelheide, Inc.
>>
>>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
Alberto,
The problem with using the NT DMA API is that it binds the driver to Windows.
That can be a rather undesirable thing, for example, in my case where I need to
make sure that most of my DMA runs just the same under Windows, Linux,
Solaris and MacOS.
Unfortunately, there is no such thing as an absolute portability - your have to choose between
hardware portability and software portability…
You have 2 options here:
-
Write your code the way OS designers tell you to. If you do it this way, your source-level version of a driver for the system X may be very different from the one for the system Y, but at least you will be able to use the same source for building drivers for every hardware platform that the target OS supports.
-
Ignore the system-provided API altogether, and do everything yourself. If you do it this way, you will be able to use the same source for building drivers for every OS that runs on the target hardware platform. However, you have to write separate versions of a driver for every hardware platform the target OSes may run on. For example, your solution with custom MSI is not going to work if the target machine does not support APIC, and there are still quite of few of these machines around ( I already don’t mention the fact that you have to write separate versions of a driver for x86, x64 and IA64)
In other words, you have to choose between cross-hardware and cross-OS portability - there is no way to build drivers for every hardware and software platform in existence from the same source. In practical terms, it is much easier to write separate “supported” versions of a driver for few well-known OSes, rather than to write separate “unsupported” versions of a driver for every hardware platform in existence…
Anton Bassov
Alberto:
That being said, I really like your picture of OS provided resources, I
can certainly understand your wishes, most of them would please me as
well, and I don’t see most of them as incompatible with the existing
resources provided by the OS, as long as one is willing to trust kernel
developers, but I have absolutely no idea of why you would expect things
to work this way. Fundamentally, source compatibility for drivers is
not exactly a high priority for most people developing Windows device
drivers, and given that, expecting developer’s to adhere to a driver
model seems pretty reasonable to me. As far as buggy chips, I really
have no idea of how common this is, but it seems like it would be an
issue for almost no one, relatively speaking, as Windows is for the most
part all about commodity hardware. Such a set of primitives would
essentially be orthogonal to the way that everything in Windows works,
with the idea being, in my opinion, it is easier for developer’s to
achieve platform independence, and it mitigates a source of error, as
Microsoft pretty much categorically seems unwilling to trust kernel
developers. As far as the abstractions of the model to make it easier
to achieve platform independence, I think they work, but they have at
least to date ended up being not much more than minor inconveniences due
to the way the market had worked out, but who knew that however many
years ago. With regards to the later, if is true, I would say that it,
combined with the lack of documentation, is for the most part worse than
the problems it attempts to solve, as those with either the need or just
curiosity or going to do it anyway, and the lack of documentation on
implementation makes those doing such things basically doomed to failure
unless they really know what they are doing, and are willing to spend a
lot of time in WinDbg. Finally, I think that the combination of
abstraction and documentation can be frustrating, in some cases - like
the Mm functions - because the end result is so much harder to
understand the architecture.
All that being said, look at how many devices Windows supports. While
it is true that to some extent this is a forced march, as there really
is no choice other making it work on Windows if you want your device to
sell at all, I nevertheless find it very difficult to dismiss the
amazing degree of device support, and have to believe that Microsoft’s
approach had something to do with that.
mm
Alberto Moreira wrote:
The problem with using the NT DMA API is that it binds the driver to
Windows. That can be a rather undesirable thing, for example, in my case
where I need to make sure that most of my DMA runs just the same under
Windows, Linux, Solaris and MacOS. If all I use from the OS API is a
service layer, things are easy to port; if I need to adhere to a
pre-provided design, compatibility may be very hard to achieve, if not
impossible.The main problem I find is that the NT DMA API, like the whole of the
WDM and WDF complexes, is a “fill in the blanks” thing: the Windows team
has designed a subsystem, and we driver writers are supposed to fill in
the blanks with bits and pieces that negotiate the unique features of
the hardware. That approach is not good for compatibility, what I need
is a set of services which allow me to ask the OS for help when I need
it, while letting me design my DMA the way I need it to be designed.
This also spills into hardware design - we need a piece of hardware that
runs just as well under multiple operating systems, not just a “built
for Windows” piece of silicon. So, in this case of Virtualization what I
would like to see is a few unobtrusive Memory Management calls that
allow me to negotiate the differences in memory addressing, but I will
not appreciate if that functionality is tied up to an obligation of
doing DMA the way the Windows team designed it, because I’m going to
have a heck of a time converting it to Linux, Solaris or MacOS - that is
if that’s at all possible.Take a simple example: I need to be able to read and write to my PCI
Config Registers, and my hardware may have more registers than the PCI
spec asks for. I do not like to have to ask a bus driver to do it; that
generates the need to have two pieces of code, one for Windows and the
other for the Unixes. What I would like to be able to do is to use the
hardware level mechanisms supplied by the PCI standard, and if that
generates any kind of synchronization issue, I would have expected the
OS to provided me some Mutex, Semaphore or Spinlock that would allow me
to indivisibily access the PCI config hardware. This is, incidentally,
similar to the situation with the APIC ICR which I talked about in my
recent posts: this is another case where I would have expected the OS to
provide some way of synchronizing access to such piece of hardware, so
that driver writers can talk to the hardware and not depend on other
drivers which isolate us from the hardware and create a substantial
impediment to easy compatibility.Besides, it is a jolly nightmare to try to circumvent chip bugs while
being straightjacketed inside someone else’s design. Some chip bugs
demand that I have the ability to manipulate things at the lowest level,
and the last thing I want is an OS forcing me to adhere to its own I/O
design, which may be eons away from the way my chip works, and
preventing me from being able to negotiate my chip’s bugs through a
mixture of needing to stick to the API and not having enough
documentation to know how to bend it to do what’s needed. This is why in
many instances I’m rather ready to take on myself the responsibility to
handle the lowest level, lest I want to write multiple drivers for all
the different OS configurations I need to support!Alberto.
----- Original Message ----- From: “Jake Oshins”
> Newsgroups: ntdev
> To: “Windows System Software Devs Interest List”
> Sent: Saturday, December 01, 2007 5:06 PM
> Subject: Re:[ntdev] Difference between Scatter/Gather DMA and Common
> Buffer DMA
>
>
>> That’s a nice explanation, Tim. But I’d like to add that the physical
>> address of the memory may not be the bus address used by the device.
>> This distinction has always been in the NT driver model. You need to
>> use DMA logical addresses when programming the device.
>>
>> This has been a minor point for many years, as only the RISC machines
>> (and AGP devices) had to care about differences between device logical
>> addresses and physical memory addresses, since all the common machines
>> in the world had a one-to-one identity mapping between system physical
>> memory and I/O busses.
>>
>> With the advent of device virtualization, which is happening now in
>> the PCI SIG, device logical addresses are going to be departing from
>> system physical addresses much more often than they now do. (See the
>> relatively new “Address Translation Services” and “Single-Root I/O
>> Virtualization” specs in the SIG publications.)
>>
>> So please, please, please. Be careful when programming your drivers
>> so that they conform to the NT DMA APIs. These APIs will not only
>> protect you from problems relating to devices which can’t address the
>> entire physical space (as with 32-bit devices in a greater-than-4GB
>> machine) but also from problems relating to virtualized remapping of
>> system memory.
>>
>> - Jake Oshins
>> Windows Kernel Guy
>> Virtualation I/O Architect
>> Microsoft
>>
>>
>>
>> “Tim Roberts” wrote in message news:xxxxx@ntdev…
>>> xxxxx@gmail.com wrote:
>>>> Thank Gianluca Varenni.
>>>> Continue to search for differences.
>>>> Does anyone know more detailed or other difference? Thank you.
>>>>
>>>
>>> GV’s response was pretty detailed. The difference is very clear-cut;
>>> it’s not like there is any mystery. Are you still confused?
>>>
>>> The issue is that DMA operations have to use physical addresses, but the
>>> memory addresses in our programs and drivers are virtual addresses. Are
>>> you familiar with the difference?
>>>
>>> When you allocate a 2MB buffer, it will consist of roughly 500 pages of
>>> memory. The virtual addresses in that buffer are all consecutive; they
>>> might start at E0620000 and go through E081FFFF. However, the physical
>>> addresses of those pages are NOT consecutive. The first 4k bytes might
>>> be from 01234000 through 01234FFF, the next 4K at 06920000 through
>>> 06920FFF, and so on. Drivers and applications use the virtual
>>> addresses, and the page tables in the processor do the conversion.
>>>
>>> Those page tables are not available to PCI devices. When you do a DMA
>>> transfer, you have to provide the physical address to use. For this
>>> buffer, you would tell it 01234000. However, your device cannot just
>>> transfer 2 megabytes from that spot, because after the first 4k it has
>>> to change to a different physical address.
>>>
>>> Now, with common buffer DMA, you use special APIs to allocate a buffer
>>> that has consecutive addresses in BOTH virtual and physical space. So,
>>> if we had a common buffer from E0620000 to E081FFFF, its physical
>>> address range might be 01234000 through 01433FFF. Now, you can give
>>> your hardware the starting address and have it do the entire 2 megabyte
>>> transfer.
>>>
>>> However, after a system has been running a while, it is difficult to
>>> find a set of pages with consecutive physical addresses. Plus, in many
>>> cases you don’t have control over how the buffer was allocated; you have
>>> to use a buffer that was handed to you. Because of that, many devices
>>> let you do DMA by chopping up the transfer into a set of smaller
>>> pieces. You give it a whole list of physical address and size pairs.
>>> In my first example, I would tell it to transfer 4k from 01234000, then
>>> 4k from 06920000, and so on.
>>>
>>> That’s called scatter/gather, because it is “gathering data” from
>>> various places in memory, or “scattering data” in the case of a write.
>>>
>>> –
>>> Tim Roberts, xxxxx@probo.com
>>> Providenza & Boekelheide, Inc.
>>>
>>>
>>
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>
>
Martin,
That being said, I really like your picture of OS provided resources, I can
certainly understand your wishes, most of them would please me as well, and
I don’t see most of them as incompatible with the existing resources provided by the OS,
I cannot really agree the statement about most of Alberto’s wishes being not necessarily incompatible with the existing OS model. Can you imagine the number of
potential hardware conflicts if driver writers were encouraged to configure their devices via Configuration Space, i.e. assign resources, interrupt vectors, etc upon their own discretion? Therefore, I think the wish of being provided with synchronization objects so that you can safely access Configuration Space and other critical internal system structures contradicts the very concept of Windows kernel…
Anton Bassov
All I was saying is that, there is no reason why providing, say, a
function to read PCI configuration space, would be incompatible with the
larger driver model, in and of itself; indeed, there is a function on
the OS to do this, it just isn’t exported. That’s really all I’m
talking about: exported v. private. That being said, you are quite
correct that this could be used incorrectly quite easily and cause
catastrophic problems. I just think that the type of low level
functions he wishes could be exposed for people who had unusual
circumstances, new what they were doing, or just didn’t care, not that
the results would necessarily be very good in some cases, but nor would
doing so inherently break anything in the larger picture, assuming you
used them correctly, which is a very large assumption, but certainly
possible for someone like Alberto. It would just give you options if
you needed them, and while they would be rather unforgiving, in the end,
in my opinion, it’s still a better situation, because people still do
things like hammer on the PCI Configuration Space whether you expose
functions or not, and if you don’t expose them, you can’t synchronize
anything. If your driver model is convenient and the documentation
makes sense, then most people aren’t going to drop down to the metal on
a whim, I don’t think at least.
mm
xxxxx@hotmail.com wrote:
Martin,
> That being said, I really like your picture of OS provided resources, I can
> certainly understand your wishes, most of them would please me as well, and
> I don’t see most of them as incompatible with the existing resources provided by the OS,I cannot really agree the statement about most of Alberto’s wishes being not necessarily incompatible with the existing OS model. Can you imagine the number of
potential hardware conflicts if driver writers were encouraged to configure their devices via Configuration Space, i.e. assign resources, interrupt vectors, etc upon their own discretion? Therefore, I think the wish of being provided with synchronization objects so that you can safely access Configuration Space and other critical internal system structures contradicts the very concept of Windows kernel…Anton Bassov
> You have 2 options here:
Write your code the way OS designers tell you to. If you do it this way,
your
source-level version of a driver for the system X may be very different from
the oneIgnore the system-provided API altogether, and do everything yourself. If
you do
it this way, you will be able to use the same source for building drivers for
every
OS that runs on the target hardware platform. However, you have to write
separate
versions of a driver for every hardware platform
+1.
I would also say that, for now, any PCI device designer should have some input
from Windows (and possibly Linux) kernel specialists.
Yes, Windows really imposes restrictions on PCI hardware design, and it is so
since at least 1999, when I first saw the MS’s PPT file on “PCI hardware design
guidelines” or such.
The restrictions are not tyrannic at all, they are very sane, and I suspect
that following them will just simplify the silicon sometimes.
For instance, there was a deprecation by MS on index/value register
architecture (like one used in legacy VGA). Instead, the hardware designers
were recommended to just expose all internal registers directly to the BAR
space, like, say, OHCI1394 controller does (it really exposes control/status
block of each of 32 independent isoch channels as a separate position in BAR
space).
Is this bad? this is OK. Why? index/value register pair occupies the silicon.
Or let’s take the ports deprecation in favor of memory-mapped IO. Is this bad?
the port space is 64KB, the memory aperture for PCI is at least 2000 times
larger.
So, the MS’s guidelines for PCI hardware design are good, and reading these
rules for hardware designers is a matter of good taste after all.
The question of “why the hardware designers must follow the software ones” is a
moot point now. It was valid when the OS was developed by the same company as
the hardware, and is possibly valid now on Macintosh.
But on x86… sorry, MS is more influential in this world then any hardware
make - Dell, Asus, Toshiba or such. Hardware makers are plenty, and the OS is
one.
The only serious competitor for MS is the Linux community, and it is also more
powerful then any chip-making company.
Lesser powerful have no choice then obey to more powerful, this is a natural
law. Thus, the PCI hardware designers should follow the MS’s and Linux’s
guidelines.
–
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com
> The problem with using the NT DMA API is that it binds the driver to
Windows.
NT DMA API uses a very sane set of ideas - namely the SGL and the common
buffer. I think that these ideas can be mapped 1-to-1 to DMA APIs of another
OSes.
Windows, Linux, Solaris and MacOS. If all I use from the OS API is a service
layer, things are easy to port; if I need to adhere to a pre-provided
design, compatibility may be very hard to achieve, if not impossible.
That’s why Adaptec developed some common layer to make the SCSI controller
driver portable across OSes.
Their driver binary, as it is obvious from the symbols present at Microsoft for
some of them, contains the hardware-independent common OS abstraction layer,
and then the hardware-dependent code which talks to particular hardware and is
OS-independent.
This is normal. OSes do differ.
letting me design my DMA the way I need it to be designed. This also spills
into hardware design - we need a piece of hardware that runs just as well
under multiple operating systems, not just a “built for Windows” piece of
silicon.
The Windows requirements on hardware are rather simple and I don’t think they
will contradict the other OS requirements.
They really make the hardware better. For instance, the MS’s deprecation of
random DMA addressing limits (like alignment or so) is just plain good, it
simplifies the driver coding on any OS.
Take a simple example: I need to be able to read and write to my PCI Config
Registers, and my hardware may have more registers than the PCI spec asks
for.
Using the Config Space in the main IO operations (like, say, in ISRs) are
deprecated by MS - “no control and status registers in config space”. Isn’t
this good? They are very slow, and require some locks to be taken to access
them.
They are good for rarely changed properties and modes (by
IOCTL_SET_SOME_PROPERTY) only, not for general IO operations.
I do not like to have to ask a bus driver to do it;
Why is this bad? after all, you just do not know the PCI Config Space access
ways on this particular chipset. 0xcf8/0xcfc are deprecated.
Unixes. What I would like to be able to do is to use the hardware level
mechanisms supplied by the PCI standard,
I think that for now, only the ACPI spec defines the hardware ways of accessing
PCI Config Space, and 0xcf8/0xcfc are deprecated. So, very soon the only way of
accessing the Config Space will be - to execute some ACPI bytecode. Windows API
will do this for you.
PCI config hardware. This is, incidentally, similar to the situation with
the APIC ICR which I talked about in my recent posts:
Firing interrupts by running DMA over some APIC register, thus triggering a
MSI-style thing - seems to be extremely bad design.
This is like a rectal way of doing stomatology and yes, Windows is not so
friendly about such “rectal ways”.
Why not use the classic PCI interrupt support?
and create a substantial impediment to easy compatibility.
Impediments across OSes? yes.
Impediments across chipsets? sorry no, the generic OS API is better then coding
for all chipsets and bridges in your driver.
–
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com
The mechanism to read/write your own config space is supported and
documented. The deprecated HalGetBusData is, well, deprecated, and for good
reasons. Alberto always wants to write his own OS, and for the things he is
working on now, which appear to be platforms more or less dedicated to
operating a specific custom hardware device, that isn’t inappropriate. It is
when these sorts of techniques are then applied to commodity device drivers
for general purpose platforms that things fall apart.
On Dec 2, 2007 1:44 AM, Martin O’Brien wrote:
> All I was saying is that, there is no reason why providing, say, a
> function to read PCI configuration space, would be incompatible with the
> larger driver model, in and of itself; indeed, there is a function on
> the OS to do this, it just isn’t exported. That’s really all I’m
> talking about: exported v. private. That being said, you are quite
> correct that this could be used incorrectly quite easily and cause
> catastrophic problems. I just think that the type of low level
> functions he wishes could be exposed for people who had unusual
> circumstances, new what they were doing, or just didn’t care, not that
> the results would necessarily be very good in some cases, but nor would
> doing so inherently break anything in the larger picture, assuming you
> used them correctly, which is a very large assumption, but certainly
> possible for someone like Alberto. It would just give you options if
> you needed them, and while they would be rather unforgiving, in the end,
> in my opinion, it’s still a better situation, because people still do
> things like hammer on the PCI Configuration Space whether you expose
> functions or not, and if you don’t expose them, you can’t synchronize
> anything. If your driver model is convenient and the documentation
> makes sense, then most people aren’t going to drop down to the metal on
> a whim, I don’t think at least.
>
> mm
>
> xxxxx@hotmail.com wrote:
> > Martin,
> >
> >> That being said, I really like your picture of OS provided resources, I
> can
> >> certainly understand your wishes, most of them would please me as well,
> and
> >> I don’t see most of them as incompatible with the existing resources
> provided by the OS,
> >
> > I cannot really agree the statement about most of Alberto’s wishes being
> not necessarily incompatible with the existing OS model. Can you imagine
> the number of
> > potential hardware conflicts if driver writers were encouraged to
> configure their devices via Configuration Space, i.e. assign resources,
> interrupt vectors, etc upon their own discretion? Therefore, I think the
> wish of being provided with synchronization objects so that you can safely
> access Configuration Space and other critical internal system structures
> contradicts the very concept of Windows kernel…
> >
> > Anton Bassov
> >
> >
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
–
Mark Roddy
Martin,
I just think that the type of low level functions he wishes could be exposed for
people who had unusual circumstances, new what they were doing,
It would be, indeed, great…
However, as I already said, this would contradict the very concept of Windows kernel, i.e. of everyone doing only those things that he is sallowed to. Let’s face it - unfortunately, Windows is just not the kind of OS that is meant to be used for the research purposes and can be adjusted to your “not-so-conventional” needs (in fact, I am afraid this is the problem of all proprietary OSes). This is why if you want to do research and experimentation, it is better just to forget about Windows altogether, and target Linux or FreeBSD. More on it below
… but certainly possible for someone like Alberto.
Alberto’s general attitudes and aspirations are very, very similar to mine - we both resent the concept of thoughtlessly doing only what we are told to by MSFT. Therefore, I would advise him
to take my approach, i.e to start learning Linux/FreeBSD and leave Windows programming to MSFT aficinados who just love doing what they are told and resent the very concept of independent thinking. Unlike Windows, Linux and FreeBSD are just written for programmers - if you don’t like something, no one holds you back from modifying the OS source and adjusting it to your particular needs. Therefore, anyone who wants to be able to experiment and test new ideas is going to love it - no wonder that, unlike Windows, these systems are so popular in the academical community…
Anton Bassov
Max,
You got the order of things switched. The PCI bus was designed by people who
know their job, and they didn’t design it just for Windows. The PCI Config
mechanisms include the ability to ask for an I/O space just as well as they
include the ability to ask for memory space. I see nothing wrong with using
I/O space instead of memory space.
The designers exposed a few access mechanisms to be used by people who drive
i/o devices at hardware level: meaning, device drivers. I belong to a
generation of people who used to believe that software is written for the
hardware - not the other way around! So, once the standard hardware access
mechanisms are provided by the bus designers, I expect the OS to support
them and to allow device driver writers to have full and synchronized access
to those hardware level facilities.
So, no, it’s not that PCI designers should get OS writers’ input: it’s the
OS designers who should listen to the PCI device writers and adapt their
OS’s to the new hardware. Or so I do believe.
As it happens, much of a complex driver is common to different OS’s. No
point duplicating that code. OS APIs should exist to interface the driver
with the OS, not with the underlying hardware! Or so I do believe.
Alberto.
----- Original Message -----
From: “Maxim S. Shatskih”
Newsgroups: ntdev
To: “Windows System Software Devs Interest List”
Sent: Sunday, December 02, 2007 8:31 AM
Subject: Re:[ntdev] Difference between Scatter/Gather DMA and Common Buffer
DMA
>> You have 2 options here:
>>
>> 1. Write your code the way OS designers tell you to. If you do it this
>> way,
> your
>>source-level version of a driver for the system X may be very different
>>from
> the one
>>
>> 2. Ignore the system-provided API altogether, and do everything yourself.
>> If
> you do
>>it this way, you will be able to use the same source for building drivers
>>for
> every
>>OS that runs on the target hardware platform. However, you have to write
> separate
>>versions of a driver for every hardware platform
>
> +1.
>
> I would also say that, for now, any PCI device designer should have some
> input
> from Windows (and possibly Linux) kernel specialists.
>
> Yes, Windows really imposes restrictions on PCI hardware design, and it is
> so
> since at least 1999, when I first saw the MS’s PPT file on “PCI hardware
> design
> guidelines” or such.
>
> The restrictions are not tyrannic at all, they are very sane, and I
> suspect
> that following them will just simplify the silicon sometimes.
>
> For instance, there was a deprecation by MS on index/value register
> architecture (like one used in legacy VGA). Instead, the hardware
> designers
> were recommended to just expose all internal registers directly to the BAR
> space, like, say, OHCI1394 controller does (it really exposes
> control/status
> block of each of 32 independent isoch channels as a separate position in
> BAR
> space).
>
> Is this bad? this is OK. Why? index/value register pair occupies the
> silicon.
>
> Or let’s take the ports deprecation in favor of memory-mapped IO. Is this
> bad?
> the port space is 64KB, the memory aperture for PCI is at least 2000 times
> larger.
>
> So, the MS’s guidelines for PCI hardware design are good, and reading
> these
> rules for hardware designers is a matter of good taste after all.
>
> The question of “why the hardware designers must follow the software ones”
> is a
> moot point now. It was valid when the OS was developed by the same company
> as
> the hardware, and is possibly valid now on Macintosh.
>
> But on x86… sorry, MS is more influential in this world then any
> hardware
> make - Dell, Asus, Toshiba or such. Hardware makers are plenty, and the OS
> is
> one.
>
> The only serious competitor for MS is the Linux community, and it is also
> more
> powerful then any chip-making company.
>
> Lesser powerful have no choice then obey to more powerful, this is a
> natural
> law. Thus, the PCI hardware designers should follow the MS’s and Linux’s
> guidelines.
>
> –
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>>>>>>
I see nothing wrong with using
I/O space instead of memory space.
<<<<<<
Until you have seen a System BIOS come crashing to its knees because it
can’t allocate enough I/O space to all the cards plugged in and you can’t
even get past the BIOS startup at all!
Keep in mind that the minimum I/O space that can be passed to a secondary
PCI bus is 4k, even though a single function can only use 256 bytes in a
single BAR.
“Alberto Moreira”
Sent by: xxxxx@lists.osr.com
12/02/2007 10:20 PM
Please respond to
“Windows System Software Devs Interest List”
To
“Windows System Software Devs Interest List”
cc
Subject
Re: Re:[ntdev] Difference between Scatter/Gather DMA and Common Buffer DMA
Max,
You got the order of things switched. The PCI bus was designed by people
who
know their job, and they didn’t design it just for Windows. The PCI Config
mechanisms include the ability to ask for an I/O space just as well as
they
include the ability to ask for memory space. I see nothing wrong with
using
I/O space instead of memory space.
The designers exposed a few access mechanisms to be used by people who
drive
i/o devices at hardware level: meaning, device drivers. I belong to a
generation of people who used to believe that software is written for the
hardware - not the other way around! So, once the standard hardware access
mechanisms are provided by the bus designers, I expect the OS to support
them and to allow device driver writers to have full and synchronized
access
to those hardware level facilities.
So, no, it’s not that PCI designers should get OS writers’ input: it’s the
OS designers who should listen to the PCI device writers and adapt their
OS’s to the new hardware. Or so I do believe.
As it happens, much of a complex driver is common to different OS’s. No
point duplicating that code. OS APIs should exist to interface the driver
with the OS, not with the underlying hardware! Or so I do believe.
Alberto.
----- Original Message -----
From: “Maxim S. Shatskih”
Newsgroups: ntdev
To: “Windows System Software Devs Interest List”
Sent: Sunday, December 02, 2007 8:31 AM
Subject: Re:[ntdev] Difference between Scatter/Gather DMA and Common
Buffer
DMA
>> You have 2 options here:
>>
>> 1. Write your code the way OS designers tell you to. If you do it this
>> way,
> your
>>source-level version of a driver for the system X may be very different
>>from
> the one
>>
>> 2. Ignore the system-provided API altogether, and do everything
yourself.
>> If
> you do
>>it this way, you will be able to use the same source for building
drivers
>>for
> every
>>OS that runs on the target hardware platform. However, you have to write
> separate
>>versions of a driver for every hardware platform
>
> +1.
>
> I would also say that, for now, any PCI device designer should have some
> input
> from Windows (and possibly Linux) kernel specialists.
>
> Yes, Windows really imposes restrictions on PCI hardware design, and it
is
> so
> since at least 1999, when I first saw the MS’s PPT file on “PCI hardware
> design
> guidelines” or such.
>
> The restrictions are not tyrannic at all, they are very sane, and I
> suspect
> that following them will just simplify the silicon sometimes.
>
> For instance, there was a deprecation by MS on index/value register
> architecture (like one used in legacy VGA). Instead, the hardware
> designers
> were recommended to just expose all internal registers directly to the
BAR
> space, like, say, OHCI1394 controller does (it really exposes
> control/status
> block of each of 32 independent isoch channels as a separate position in
> BAR
> space).
>
> Is this bad? this is OK. Why? index/value register pair occupies the
> silicon.
>
> Or let’s take the ports deprecation in favor of memory-mapped IO. Is
this
> bad?
> the port space is 64KB, the memory aperture for PCI is at least 2000
times
> larger.
>
> So, the MS’s guidelines for PCI hardware design are good, and reading
> these
> rules for hardware designers is a matter of good taste after all.
>
> The question of “why the hardware designers must follow the software
ones”
> is a
> moot point now. It was valid when the OS was developed by the same
company
> as
> the hardware, and is possibly valid now on Macintosh.
>
> But on x86… sorry, MS is more influential in this world then any
> hardware
> make - Dell, Asus, Toshiba or such. Hardware makers are plenty, and the
OS
> is
> one.
>
> The only serious competitor for MS is the Linux community, and it is
also
> more
> powerful then any chip-making company.
>
> Lesser powerful have no choice then obey to more powerful, this is a
> natural
> law. Thus, the PCI hardware designers should follow the MS’s and Linux’s
> guidelines.
>
> –
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
—
NTDEV is sponsored by OSR
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer