memory-to-memory DMA

Maxim_S_Shatskih · November 25, 2005, 7:52am

Yes. Usually PCI busmasters support some form of scatter-gather DMA to
bypass this.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

----- Original Message -----
From: “Kannan, Raja”
To: “Windows System Software Devs Interest List”
Sent: Friday, November 25, 2005 8:22 AM
Subject: RE: [ntdev] memory-to-memory DMA

HAL uses map registers to translate a device or logical address to a
physical address. Since the system DMA controllers also in the same
system bus where the physical memory is connected, it may not need this
translation. But bus-master DMA controller need this because it can only
generate logical/device address.

My assumption is right?

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Mark Roddy
Sent: Thursday, November 24, 2005 6:10 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] memory-to-memory DMA

Almost all pci devices, at this point in time, perform bus master DMA.

For your studies you should spend as little time as possible on ‘system
dma’
as it is obsolete legacy pc dos crap, a relic from the 16bit platform
era.

=====================
Mark Roddy DDK MVP
Windows 2003/XP/2000 Consulting
Hollis Technology Solutions 603-321-1032 www.hollistech.com

> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Kannan, Raja
> Sent: Thursday, November 24, 2005 3:48 AM
> To: Windows System Software Devs Interest List
> Subject: RE: [ntdev] memory-to-memory DMA
>
> Thank you all who responded to my questions.
>
> I asked this question just only for my study purpose. One thing I
> can’t understand; If the system DMA is legacy feature, then most of
> the peripherals (PCI / ISA) should come with the bus-master DMA. Or
> the PCI bus itself will not support system DMA handshakes.
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Maxim S.
> Shatskih
> Sent: Wednesday, November 23, 2005 3:47 PM
> To: Windows System Software Devs Interest List
> Subject: Re: [ntdev] memory-to-memory DMA
>
> >Is it possible to do memory-to-memory copy using system DMA
> controller?
>
> I think no, but even is yes - “system DMA” (in fact, ISA DMA) is a
> slow-like-a-turtle legacy feature, which is used only to support the
> legacy hardware like floppy, LPT port in ECP mode and ancient
> SoundBlaster 16.
>
> memcpy() is surely better.
>
> >Is any windows system API supporting this feature? Or we
> have to do our
>
> >own assembly language coding for that?
>
> Forget this. This DMA logic is obsolete and slow (due to sitting 2
> bridges away from the memory).
>
> memcpy() is better.
>
> >feature. Intel 8237 support this features. But today’s PC comes with
> >8237 DMA controller?
>
> The 8237-compatible logic is now a part of the PCI-to-ISA (or
> PCI-to-LPC) bridge, which is in turn the part of the chipset’s “south
> bridge” chip.
>
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as:
> xxxxx@networkgeneral.com To unsubscribe send a blank email to
> xxxxx@lists.osr.com
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: unknown lmsubst tag
> argument: ‘’
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>

—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@networkgeneral.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

OSR_Community_User · November 25, 2005, 11:58am

The processor itself contains a far better implementation of this. I
already covered cache coherency and cache pollution (unwanted cache
modification). Read up on AMD K7 and up arch, and P4 and up arch.
Processor-driven transfers are the most efficient way to blit data within
system memory.

– arlie

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Alberto Moreira
Sent: Thursday, November 24, 2005 4:59 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] memory-to-memory DMA

If you have to move n items from memory to memory, you will have to do 2kn
(for some multiplier k) memory accesses no matter how you do it. The
difference will be, will that be bursted in one go, or distributed over time
? Will it be done directly memory to memory, or will you hog the Front-side
Bus as well ? Will you leave your caches alone, or are you going to fill
them up as well ? Caches and memory locations are on opposite sides of the
Front-side Bus, so, any movement of data from memory to cache may end up
costing a fair amount of FSB bandwidth. If the data being moved won’t be
used by a processor, it may be best to constrain that data movement to the
same side of the FSB and not to involve the processor at all. Also, in a
well designed machine, cache coherency is preserved by the MESI protocol (or
equivalent), which is implemented by the hardware and hence by and large
transparent to the software!

Alberto.

OSR_Community_User · November 25, 2005, 1:38pm

Again, I repeat my point: that may be ok (1) if you want
response time, (2) if you can spare the processor cycles, (3) if
the memory’s not in a peripheral that’s better handled by the
peripheral, (4) if the system memory controller doesn’t have an
efficient blitter. So, no, I don’t want to involve my processor
in a bitblt, nor do I want to waste processor cycles - even in
system memory - if I need those cycles to perform some
cpu-intensive scientific computation. This has nothing to do
with processor architecture, it’s an issue between the bridge
and the memory. A well designed and well-implemented
system-memory blitter, one that keeps the traffic south (or
east/west) of the North Bridge, cannot possibly be slower than
moving the information North, all the way to the processor. Just
look at a graphics chip on an AGP channel if you doubt it: I am
going to bet that a well implemented texture engine will fill
from system memory faster than a processor can. That’s why we
use DMA, no ? To move data without getting it across the FSB ?

Alberto.

----- Original Message -----
From: “Arlie Davis”
To: “Windows System Software Devs Interest List”

Sent: Friday, November 25, 2005 11:59 AM
Subject: RE: [ntdev] memory-to-memory DMA

> The processor itself contains a far better implementation of
> this. I
> already covered cache coherency and cache pollution (unwanted
> cache
> modification). Read up on AMD K7 and up arch, and P4 and up
> arch.
> Processor-driven transfers are the most efficient way to blit
> data within
> system memory.
>
> – arlie
>
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of
> Alberto Moreira
> Sent: Thursday, November 24, 2005 4:59 PM
> To: Windows System Software Devs Interest List
> Subject: Re: [ntdev] memory-to-memory DMA
>
> If you have to move n items from memory to memory, you will
> have to do 2kn
> (for some multiplier k) memory accesses no matter how you do
> it. The
> difference will be, will that be bursted in one go, or
> distributed over time
> ? Will it be done directly memory to memory, or will you hog
> the Front-side
> Bus as well ? Will you leave your caches alone, or are you
> going to fill
> them up as well ? Caches and memory locations are on opposite
> sides of the
> Front-side Bus, so, any movement of data from memory to cache
> may end up
> costing a fair amount of FSB bandwidth. If the data being
> moved won’t be
> used by a processor, it may be best to constrain that data
> movement to the
> same side of the FSB and not to involve the processor at all.
> Also, in a
> well designed machine, cache coherency is preserved by the
> MESI protocol (or
> equivalent), which is implemented by the hardware and hence by
> and large
> transparent to the software!
>
> Alberto.
>
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@ieee.org
> To unsubscribe send a blank email to
> xxxxx@lists.osr.com

Mark_Roddy · November 25, 2005, 3:04pm

The system dma controller may not be able to access all of physical memory -
I really can’t remember what its address capabilities are, but I think the
standard AT 8237 chip had exactly 24bits of address register space - i.e.
16M and lower only. If it can’t address all of physical memory, then you
actually need an intermediate buffer or buffers within its limited address
space to perform the copy. This would render the whole concept of
‘offloading the CPU with the 8Mhz System DMA Controller’ absurd, in addition
to being stupid. I’d take a look at the address registers on this thing
before I got too carried away with the idea.

Here is a good explanation of just what a piece of crap the system dma
controller is:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/dma.htm
l

=====================
Mark Roddy DDK MVP
Windows 2003/XP/2000 Consulting
Hollis Technology Solutions 603-321-1032
www.hollistech.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Kannan, Raja
Sent: Friday, November 25, 2005 12:22 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] memory-to-memory DMA

HAL uses map registers to translate a device or logical
address to a physical address. Since the system DMA
controllers also in the same system bus where the physical
memory is connected, it may not need this translation. But
bus-master DMA controller need this because it can only
generate logical/device address.

My assumption is right?

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Mark Roddy
Sent: Thursday, November 24, 2005 6:10 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] memory-to-memory DMA

Almost all pci devices, at this point in time, perform bus
master DMA.

For your studies you should spend as little time as possible
on ‘system dma’
as it is obsolete legacy pc dos crap, a relic from the 16bit
platform era.

=====================
Mark Roddy DDK MVP
Windows 2003/XP/2000 Consulting
Hollis Technology Solutions 603-321-1032 www.hollistech.com

> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Kannan, Raja
> Sent: Thursday, November 24, 2005 3:48 AM
> To: Windows System Software Devs Interest List
> Subject: RE: [ntdev] memory-to-memory DMA
>
> Thank you all who responded to my questions.
>
> I asked this question just only for my study purpose. One thing I
> can’t understand; If the system DMA is legacy feature, then most of
> the peripherals (PCI / ISA) should come with the bus-master DMA. Or
> the PCI bus itself will not support system DMA handshakes.
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Maxim S.
> Shatskih
> Sent: Wednesday, November 23, 2005 3:47 PM
> To: Windows System Software Devs Interest List
> Subject: Re: [ntdev] memory-to-memory DMA
>
> >Is it possible to do memory-to-memory copy using system DMA
> controller?
>
> I think no, but even is yes - “system DMA” (in fact, ISA DMA) is a
> slow-like-a-turtle legacy feature, which is used only to
support the
> legacy hardware like floppy, LPT port in ECP mode and ancient
> SoundBlaster 16.
>
> memcpy() is surely better.
>
> >Is any windows system API supporting this feature? Or we
> have to do our
>
> >own assembly language coding for that?
>
> Forget this. This DMA logic is obsolete and slow (due to sitting 2
> bridges away from the memory).
>
> memcpy() is better.
>
> >feature. Intel 8237 support this features. But today’s PC
comes with
> >8237 DMA controller?
>
> The 8237-compatible logic is now a part of the PCI-to-ISA (or
> PCI-to-LPC) bridge, which is in turn the part of the
chipset’s “south
> bridge” chip.
>
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as:
> xxxxx@networkgeneral.com To unsubscribe send a blank email to
> xxxxx@lists.osr.com
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: unknown lmsubst tag
> argument: ‘’
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as:
xxxxx@networkgeneral.com To unsubscribe send a blank
email to xxxxx@lists.osr.com

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: unknown lmsubst tag
argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

Mark_Roddy · November 25, 2005, 3:10pm

Perhaps this thread has wandered a bit, but aren’t we discussing the merits,
or lack thereof, of the hideous obsolete legacy system dma device as a
system memory to system memory offload engine? That thing is the opposite of
a ‘well implemented system-memory blitter’.

=====================
Mark Roddy DDK MVP
Windows 2003/XP/2000 Consulting
Hollis Technology Solutions 603-321-1032
www.hollistech.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Alberto Moreira
Sent: Friday, November 25, 2005 1:40 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] memory-to-memory DMA

Again, I repeat my point: that may be ok (1) if you want
response time, (2) if you can spare the processor cycles, (3)
if the memory’s not in a peripheral that’s better handled by
the peripheral, (4) if the system memory controller doesn’t
have an efficient blitter. So, no, I don’t want to involve my
processor in a bitblt, nor do I want to waste processor
cycles - even in system memory - if I need those cycles to
perform some cpu-intensive scientific computation. This has
nothing to do with processor architecture, it’s an issue
between the bridge and the memory. A well designed and
well-implemented system-memory blitter, one that keeps the
traffic south (or
east/west) of the North Bridge, cannot possibly be slower
than moving the information North, all the way to the
processor. Just look at a graphics chip on an AGP channel if
you doubt it: I am going to bet that a well implemented
texture engine will fill from system memory faster than a
processor can. That’s why we use DMA, no ? To move data
without getting it across the FSB ?

Alberto.

----- Original Message -----
From: “Arlie Davis”
> To: “Windows System Software Devs Interest List”
>
> Sent: Friday, November 25, 2005 11:59 AM
> Subject: RE: [ntdev] memory-to-memory DMA
>
>
> > The processor itself contains a far better implementation of
> > this. I
> > already covered cache coherency and cache pollution (unwanted
> > cache
> > modification). Read up on AMD K7 and up arch, and P4 and up
> > arch.
> > Processor-driven transfers are the most efficient way to blit
> > data within
> > system memory.
> >
> > – arlie
> >
> >
> > -----Original Message-----
> > From: xxxxx@lists.osr.com
> > [mailto:xxxxx@lists.osr.com] On Behalf Of
> > Alberto Moreira
> > Sent: Thursday, November 24, 2005 4:59 PM
> > To: Windows System Software Devs Interest List
> > Subject: Re: [ntdev] memory-to-memory DMA
> >
> > If you have to move n items from memory to memory, you will
> > have to do 2kn
> > (for some multiplier k) memory accesses no matter how you do
> > it. The
> > difference will be, will that be bursted in one go, or
> > distributed over time
> > ? Will it be done directly memory to memory, or will you hog
> > the Front-side
> > Bus as well ? Will you leave your caches alone, or are you
> > going to fill
> > them up as well ? Caches and memory locations are on opposite
> > sides of the
> > Front-side Bus, so, any movement of data from memory to cache
> > may end up
> > costing a fair amount of FSB bandwidth. If the data being
> > moved won’t be
> > used by a processor, it may be best to constrain that data
> > movement to the
> > same side of the FSB and not to involve the processor at all.
> > Also, in a
> > well designed machine, cache coherency is preserved by the
> > MESI protocol (or
> > equivalent), which is implemented by the hardware and hence by
> > and large
> > transparent to the software!
> >
> > Alberto.
> >
> >
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at
> > http://www.osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: xxxxx@ieee.org
> > To unsubscribe send a blank email to
> > xxxxx@lists.osr.com
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@hollistech.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>

Calvin_Guan-2 · November 26, 2005, 3:25am

— Arlie Davis wrote:

> That’s absurd. If the DMA hardware is busy
> reading/writing system memory,
> then the processor will have little or no access to
> system memory.

Not true. Hookup up a bus analyzer and you’ll find out
how smart the chipset handles this.

> Also, the front-end instruction decoders on most
> modern processors recognize
> REP MOV sequences, and convert these into burst
> reads/write cycles.

This is questionable. Are you talking about device
memory or system memory? For device memory, the burst
you “may” be seeing is a result of flushing post write
buffer or a cache-line-fill on reading cacheable
memory. WC is another possibility.

Some other CPU architectures actually have
instructions to initiate DMA from the CPU if the
chipset is willing to cooperate.

Calvin Guan (Windows DDK MVP)
NetXtreme Longhorn Miniport Prime
Broadcom Corp. www.broadcom.com

__________________________________________________________
Find your next car at http://autos.yahoo.ca