Chipset DMA

OSR_Community_User · February 3, 2013, 3:05pm

Hi

I’m trying to understand if there is something like DMA adapters located on the chipset and whether I can use it.
So far the only connection to DMA adapters I’ve found, was using IoGetDmaAdapter API which to my understanding allocates DMA adapter from a device.

Please help clarify this issue.

Thanks a lot,
Tal

Doron_Holan · February 3, 2013, 3:15pm

There is a system DMA component, but it is ancient and only appropriate for ancient isa devices like floppy. What problem are you trying to solve?

d

Bent from my phone

From: xxxxx@yahoo.com mailto:xxxxx
Sent: ?2/?3/?2013 12:05 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: [ntdev] Chipset DMA

Hi

I’m trying to understand if there is something like DMA adapters located on the chipset and whether I can use it.
So far the only connection to DMA adapters I’ve found, was using IoGetDmaAdapter API which to my understanding allocates DMA adapter from a device.

Please help clarify this issue.

Thanks a lot,
Tal

—
NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx>

OSR_Community_User · February 3, 2013, 3:46pm

Hi Doron

I’m trying copy BIG chunks of data from one location in the RAM to another location in the RAM.

There is an “old” PCI device which can not be changed now. We want to copy the large chnucks of data to a regular PagedPool memory. We think that using the system Dma should be faster than than RtlMoveMemory.

Thanks for your help.
Tal

anton_bassov · February 3, 2013, 3:52pm

> There is a system DMA component, but it is ancient and only appropriate for ancient isa devices like floppy.

Actually, you think of it as of something ancient only because, until recently, Windows targeted only those platforms where on-board DMA controller is not used by anyone, apart from legacy ISA devices. However, once Windows is going to try itself in the ARM world, you should be prepared to deal with on-board DMA controller again. As they sometimes say, the future is just a well-forgotten past. When it comes to the system-level software development, this phrase does not seem to be hyperbolic at all - instead, it just nicely describes the state of the things…

Anton Bassov

anton_bassov · February 3, 2013, 3:56pm

> I’m trying copy BIG chunks of data from one location in the RAM to another location in the RAM.

There is an “old” PCI device which can not be changed now. We want to copy the large chnucks
of data to a regular PagedPool memory. We think that using the system Dma should be faster
than than RtlMoveMemory.

Well, the only thing I can say to the above is that you’ve got to learn A LOT. I find the statement about performing DMA to/from paged memory particularly amusing…

Anton Bassov

Doron_Holan · February 3, 2013, 4:03pm

I intentionally avoided this reference assuming he was not working on a soc. And system DMA on a soc is a bit different than PC system DMA from the mid 80s

d

Bent from my phone

From: xxxxx@hotmail.com mailto:xxxxx
Sent: ?2/?3/?2013 12:53 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: RE:[ntdev] Chipset DMA

> There is a system DMA component, but it is ancient and only appropriate for ancient isa devices like floppy.

Actually, you think of it as of something ancient only because, until recently, Windows targeted only those platforms where on-board DMA controller is not used by anyone, apart from legacy ISA devices. However, once Windows is going to try itself in the ARM world, you should be prepared to deal with on-board DMA controller again. As they sometimes say, the future is just a well-forgotten past. When it comes to the system-level software development, this phrase does not seem to be hyperbolic at all - instead, it just nicely describes the state of the things…

Anton Bassov

—
NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx>

Doron_Holan · February 3, 2013, 4:04pm

Rtlmovememory is more than likely faster

d

Bent from my phone

From: xxxxx@yahoo.com mailto:xxxxx
Sent: ?2/?3/?2013 12:46 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: RE:[ntdev] Chipset DMA

Hi Doron

I’m trying copy BIG chunks of data from one location in the RAM to another location in the RAM.

There is an “old” PCI device which can not be changed now. We want to copy the large chnucks of data to a regular PagedPool memory. We think that using the system Dma should be faster than than RtlMoveMemory.

Thanks for your help.
Tal

—
NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx>

OSR_Community_User · February 3, 2013, 7:16pm

> Hi Doron

I’m trying copy BIG chunks of data from one location in the RAM to another
location in the RAM.

There is no such thing as a BIG chunk of memory when it comes to
evaluating tradeoffs. There is an actual quantity involved. Without the
numbers, nothing can be said about how a technical decision is made.

I’ve actually had people say “I’ve got this HUGE file, how do I …?”
and whar they are talking is a file in the range of 10-20MB, which
translates as “tiny”. “Large” files would be those measured in hundreds
of megabytes, and “HUGE” means files whose sizes are expressed in
terabytes.

So without numbers, the question is at best nonsensical.

There is an “old” PCI device which can not be changed now. We want to copy
the large chnucks of data to a regular PagedPool memory. We think that
using the system Dma should be faster than than RtlMoveMemory.

It is worth pointing out that “system DMA” was so slow that MS-DOS did not
use it, so I wonder about your basis for thinking “chipset DMA” could
possibly be faster than RtlMoveMemory? And why RtlMoveMemory instead of
RtlCopyMemory? RTFM. The first sentence in the Remarks section of
RtlCopyMemory says “RtlCopyMemory is faster than RtlMoveMemory”.

Are you aware of concepts like caching, instruction pipelines, operand
prefetching, and speculative execution? Have you actually MEASURED the
time it takes to move the-block-of-undefined-size? Do you think the time
can be computed by multiplying the number of bytes times (source memory
access + target memory acess)? If so, you are about twenty years
out-of-date, and before running off to solve a non-existent problem, your
time would be better spent understanding modern technology and/or
gathering useful data to determine if your problem exists (chances are
that it does not).

What you seem to be aiming for is a complex, convoluted, and fragile
solution to a non-problem.
joe

Thanks for your help.
Tal

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

anton_bassov · February 3, 2013, 8:59pm

> So without numbers, the question is at best nonsensical.

Well, I am afraid the whole thing is nonsensical in itself, at least the way it has been presented up to this
point - moving data from RAM location X to RAM location Y is somehow related to the “old PCI device that cannot be changed”, and the use of legacy on-board DMA controller bears some relation to the pageable memory. Up to this point the issue of efficiency of any given approach does not even seem to arise - for the time being, the entire question is just some kind of"technical terms salad"…

Anton Bassov

Maxim_S_Shatskih · February 3, 2013, 10:15pm

>data to a regular PagedPool memory. We think that using the system Dma should be faster than than

RtlMoveMemory.

No. Never.

System DMA runs at ISA speed, which is maybe 100 times slower then memory.

Even PCIe is slower then memory.

MMC commands were once the fastest memcpy() available.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

OSR_Community_User · February 4, 2013, 1:08am

>> So without numbers, the question is at best nonsensical.

Well, I am afraid the whole thing is nonsensical in itself, at least the
way it has been presented up to this
point - moving data from RAM location X to RAM location Y is somehow
related to the “old PCI device that cannot be changed”, and the use of
legacy on-board DMA controller bears some relation to the pageable memory.
Up to this point the issue of efficiency of any given approach does not
even seem to arise - for the time being, the entire question is just some
kind of"technical terms salad"…

I agree, but I wanted to knock over the nonsense points one at a time.
Fighting nonsense is my only “blood sport”. Questions this bad deserve
several shots, starting with the vagueness of the question, which is
first-order nonsense, then thinking chipset DMA makes sense is second ordr
nonsense, and using psychic vibrations to predict system DMA is faster
than RtlCopyMemory is third-order nonsense.
koe

Anton Bassov

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · February 4, 2013, 2:03am

Good Morning

I see I’ve got much to learn… I can always claim it was late when I submitted my question (and it was) or it was a typo…

I wanted to know if it possible to perform DMA using DMA adapter which is not specific to some device but is (maybe) located on chipset.
Does modern chipsets arrive with DMA adapters and can they be used freely?
If not, does every device which performs DMA uses its own DMA adapter?

Thanks
Tal

Doron_Holan · February 4, 2013, 2:18am

Just copy memory if you don’t have a dma engine on your hardware. Every device which does dma has its own dma engine (or adapter as you call it). there is no global dma engine for you to use.

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Sunday, February 03, 2013 11:06 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Chipset DMA

Good Morning

I see I’ve got much to learn… I can always claim it was late when I submitted my question (and it was) or it was a typo…

I wanted to know if it possible to perform DMA using DMA adapter which is not specific to some device but is (maybe) located on chipset.
Does modern chipsets arrive with DMA adapters and can they be used freely?
If not, does every device which performs DMA uses its own DMA adapter?

Thanks
Tal

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · February 4, 2013, 2:39am

OK.
Thanks for your help.

Tal

Maxim_S_Shatskih · February 4, 2013, 5:46am

> If not, does every device which performs DMA uses its own DMA adapter?

DMA adapter is a software abstraction.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

OSR_Community_User · February 4, 2013, 6:34am

> Good Morning

I see I’ve got much to learn… I can always claim it was late when I
submitted my question (and it was) or it was a typo…

I wanted to know if it possible to perform DMA using DMA adapter which is
not specific to some device but is (maybe) located on chipset.
Does modern chipsets arrive with DMA adapters and can they be used
freely?
If not, does every device which performs DMA uses its own DMA adapter?

Thanks
Tal

State of the art is bus mastering DMA, supported by PCI and its
derivatives. In the days of the ISA bus and masks well over an order of
magnitude larger than modern chips, DMA was expensive to put on every
device, and the ISA bus made no provision for DMA bus grant. So it was
handled on the motherboard, using chips whose masks were carved by stone
chisels or tusks of mammoths. Today, the DMA controllers for PCI are
compact, and would require a high-powered optical microscope just to
locate (but I think the actual circuitry is too small to see with an
optical microscope). The effective cost of putting DMA on the board is
nearly zero, but it is simply not an option to not have bus mastering.

So yes, every device which does DMA has its own DMA circuitry, and the old
“system DMA” is effectively dead. It is not clear if modern chipsets even
support it, or if there is any reason it is even retained, given it is for
the nonexistent ISA bus.

The whole proposal is insane, since, as I already observed, you are
attempting to solve an (unspecified) non-problem using what are
undocumented (and probably unsupported) features which provide, at best,
severe negative impact on the performance.
joe

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Tim_Roberts · February 4, 2013, 1:13pm

xxxxx@yahoo.com wrote:

I’m trying copy BIG chunks of data from one location in the RAM to another location in the RAM.

There is an “old” PCI device which can not be changed now. We want to copy the large chnucks of data to a regular PagedPool memory. We think that using the system Dma should be faster than than RtlMoveMemory.

I see this topic was beaten to death before I got here, but I still feel
obligated to throw in my two cents.

It’s never a good idea to make an assumption like this without doing the
math. Processors today copy memory really danged fast. Even the simple
“rep movsd” instruction moves 4 bytes per cycle, which is 8 GB/s on a 2
GHz machine. Copying 100 megabytes a second is less than 2% CPU load.

Intel chipsets since Sandy Bridge once again have a system DMA
capability, although why Intel thought this was a good idea is beyond
me. It is, by definition, a scarce resource, so drivers have to be able
to operate assuming it is not available. That means no one is going to
write code to use it.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Calvin_Guan-3 · February 4, 2013, 4:38pm

Processors today copy memory really danged fast. Even the simple

“rep movsd” instruction moves 4 bytes per cycle, which is 8 GB/s on a 2
GHz machine. Copying 100 megabytes a second is less than 2% CPU load.

Classic DMA is not designed to do RAM-RAM copy. It’s mainly used to move
data between device and RAM without CPU’s involvement. Hence a more fair
comparison is looking at the time to move a DWORD between device and RAM
over a PCI bus, namely, a DWORD target memory cycle without any retry; over
PCIe bus, a 4-B MRd or MWR TLP. Well, of course it still much faster than
system DMA on isa, but it’s not going to be nearly impressive as RAM-RAM
“rep movs” on modern CPU.

OSR_Community_User · February 4, 2013, 11:41pm

I did point out that the formula for naive copy is number of bytes *
(source access cost + target access cost), because in MMIO, the access to
memory on the PCI bus is slower than access to main memory, and is often
mapped uncached. Nonetheless, such copies are usually done in DWORD
chunks for the main copy, with any unaligned bytes at the head or tail
being moved explicitly. So it is not as fast as RAM-to-RAM copy inmain
memory, but it is almost certainly faster than using an external DMA
mechanism, particularly one that is not actually intended for this
purpose.

When CMU acqired its IBM/360 67 (the virtual memory version of the 360/65)
we did not buy a paging drum, but instead bought 8MB of bulk core memory,
and a special memory-to-memory DMA processor. We also had a special-order
750K main memory, having via simulation dtermined that the stock 512K
memory would be too small. So instead of paging to te drum, we paged to
bulk memory. Main memory was 750ns, bulk memory was 8us, an order of
magnitude slower. After a year of performance measurement, we did two
things: (a) did the paging via the equivalent of RtlCopyMemory (b) didn’t
bother to do paging at all for most user pages, but executed directly from
bulk memory.

We supported 60 users concurrently on a machine with memory and computing
power comparable to a 286.

When people reminisce about the “good ol’ days” of computing, I point out
tat the best feature of the good ol’ days is that they are in the past.

It turns out what killed performance was the overhead of setting up and
responding to the DMA transfer. This overhead was so high that it cost
more to bring a page into fast main memory than to execute it directly in
the order-of-magnitude-slower bulk memory.

This machine had no caching, had an 8-slot TLB, no instruction prefetch or
pipeline, no speculative execution, essentially none of the cool features
that make rep movsd run screamingly fast, and it was STILL faster to use 8
MVC instructions (the equivalent of rep movsb but with an upper bound of
256 bytes in a transfer) or execute directly out of 8us bulk memory.

It took a year of careful instrumentation and analysis to determine the
correct solution. SonI tend to look with great skepticism on discussions
of the form “I have to move an undefined number of bytes between RAM and
MMIO, and {my gut tells me; I feel; a trusted friend told me; I saw in a
Web search; my manager heard; …} that I should try to use some kind of
DMA transfer mechanism to make this more efficient”. The OP rarely posts
any critical information essential to evaluating the proposal. This
discussion started the same way. And it has ended in the expected way:
don’t waste your time solving non-problems.
joe

Processors today copy memory really danged fast. Even the simple

> “rep movsd” instruction moves 4 bytes per cycle, which is 8 GB/s on a 2
> GHz machine. Copying 100 megabytes a second is less than 2% CPU load.
>
>
Classic DMA is not designed to do RAM-RAM copy. It’s mainly used to move
data between device and RAM without CPU’s involvement. Hence a more fair
comparison is looking at the time to move a DWORD between device and RAM
over a PCI bus, namely, a DWORD target memory cycle without any retry;
over
PCIe bus, a 4-B MRd or MWR TLP. Well, of course it still much faster than
system DMA on isa, but it’s not going to be nearly impressive as RAM-RAM
“rep movs” on modern CPU.

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer