Cached and non cached common buffer

Hi all
I want to ask what is the difference between the cahced and noncached Commonbuffers
and if they are cached is there any coherence problem which require flushing the cache.

any help will be appreciated

thanks in advance

Hesham


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

The cache coherence should work as defined by the Intel specs if its a PCI device. If you look up the x86 version of the macro the DDK defines for forcing the cache coherant, you’ll find its a no-op.

You should read all of the Intel documentation on write ordering, serializing instructions, out of order reads, etc., carefully.

When I tried this, I got a measurable performance improvement, and no stability problems.

-DH

----- Original Message -----
From: hesham
To: NT Developers Interest List
Sent: Wednesday, July 18, 2001 7:33 PM
Subject: [ntdev] Cached and non cached common buffer

Hi all
I want to ask what is the difference between the cahced and noncached Commonbuffers
and if they are cached is there any coherence problem which require flushing the cache.

any help will be appreciated

thanks in advance

Hesham


You are currently subscribed to ntdev as: xxxxx@syssoftsol.com
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

please give me more details
where can I find this macro, and why it is NOP?

are you mean that there is no precaution to use cached memory with DMA?

thanks in advance

Hesham
----- Original Message -----
From: Dave Harvey
To: NT Developers Interest List
Sent: Wednesday, July 18, 2001 7:31 PM
Subject: [ntdev] Re: Cached and non cached common buffer

The cache coherence should work as defined by the Intel specs if its a PCI device. If you look up the x86 version of the macro the DDK defines for forcing the cache coherant, you’ll find its a no-op.

You should read all of the Intel documentation on write ordering, serializing instructions, out of order reads, etc., carefully.

When I tried this, I got a measurable performance improvement, and no stability problems.

-DH

----- Original Message -----
From: hesham
To: NT Developers Interest List
Sent: Wednesday, July 18, 2001 7:33 PM
Subject: [ntdev] Cached and non cached common buffer

Hi all
I want to ask what is the difference between the cahced and noncached Commonbuffers
and if they are cached is there any coherence problem which require flushing the cache.

any help will be appreciated

thanks in advance

Hesham


You are currently subscribed to ntdev as: xxxxx@syssoftsol.com
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

You are currently subscribed to ntdev as: xxxxx@yahoo.com
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

On an X86, the caches are coherent, but you still must worry about:

  1. reads being reordered WRT writes
  2. writes by different CPUs being in a different order than you expect.

On the X86, KeFlushIoBuffers() is a no-op (it wasn’t on the alpha). If I believe the W2K (not XP)
DDK, on IA64 this is now a function (and therefore could do something).

Section 16.5 in the design guide:

"In some platforms, the processor and system DMA controller (or busmaster DMA adapters) exhibit cache coherency anomalies. To maintain data integrity during DMA operations, lowest-level drivers must follow these guidelines:
1… Call KeFlushIoBuffers before beginning a transfer operation to maintain consistency between data that might be cached in the processor and the data in memory.
If a driver calls AllocateCommonBuffer with the CacheEnabled parameter set to TRUE, the driver must call KeFlushIoBuffers before beginning a transfer operation to/from its buffer.

2… Call FlushAdapterBuffers at the end of each device transfer operation to be sure any remainder bytes in the system DMA controller’s buffers have been written into memory or to the slave device.
Or, call FlushAdapterBuffers at the end of each transfer operation for a given IRP to be sure all data has been read into system memory or written out to a busmaster DMA device. "

I suspect that FlushAdapterBuffers issues the serializing instruction needed to prevent read-aheads in the CPU pipeline.

I’m unable to come up with an example where the write order across CPUs is a problem. (Each CPU issues its own writes in order.) Read the gory Intel documentation if you want to go further.

----- Original Message -----

From: hesham
To: NT Developers Interest List
Sent: Thursday, July 19, 2001 10:40 AM
Subject: [ntdev] Re: Cached and non cached common buffer

please give me more details
where can I find this macro, and why it is NOP?

are you mean that there is no precaution to use cached memory with DMA?

thanks in advance

Hesham
----- Original Message -----
From: Dave Harvey
To: NT Developers Interest List
Sent: Wednesday, July 18, 2001 7:31 PM
Subject: [ntdev] Re: Cached and non cached common buffer

The cache coherence should work as defined by the Intel specs if its a PCI device. If you look up the x86 version of the macro the DDK defines for forcing the cache coherant, you’ll find its a no-op.

You should read all of the Intel documentation on write ordering, serializing instructions, out of order reads, etc., carefully.

When I tried this, I got a measurable performance improvement, and no stability problems.

-DH

----- Original Message -----
From: hesham
To: NT Developers Interest List
Sent: Wednesday, July 18, 2001 7:33 PM
Subject: [ntdev] Cached and non cached common buffer

Hi all
I want to ask what is the difference between the cahced and noncached Commonbuffers
and if they are cached is there any coherence problem which require flushing the cache.

any help will be appreciated

thanks in advance

Hesham


You are currently subscribed to ntdev as: xxxxx@syssoftsol.com
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

You are currently subscribed to ntdev as: xxxxx@yahoo.com
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

You are currently subscribed to ntdev as: xxxxx@syssoftsol.com
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

>are you mean that there is no precaution to use cached memory with DMA?

It’s been a long time since I used ISA bus system DMA (like perhaps still
used by an ECP parallel port?), but as I remember, cache coherency was not
guaranteed.

Also seems like AGP bus transfers didn’t guarantee coherency either.

PCI bus master transfers are supposed to be coherent on an x86.

When I think about it, I can come up with a scenario that I’m unsure about
on PCI busses. I’ll assume the processor chip set designers got it right,
but may research it a bit. Specifically, I’m unsure if a PCI
WRITE_INVALIDATE could be half completed when the processor tries to load
the same cache line range, effectively loading the cache with only
partially updated data. My understanding was WRITE_INVALIDATE initiated the
cache snoop flush at the beginning of the PCI transaction, but don’t know
if the write destination is then locked from access until the whole range
is filled. Memory is now LOTS faster than a PCI bus. I especially wonder on
newer processors with larger than 32 byte cache line sizes (like the P4).

A slightly related area is caching PCI target addresses and cache
coherency. If you have a PCI target device and specify the memory window to
be cached, there is no magic that makes the cache coherent with the target
memory when it changes. You either have to not cache it (which is slow
because you don’t get burst reads) or else you have to ask for a cache
writeback+invalidate (which is really slow, like I believe tens of
thousands of clock cycles).

To really complicate matters, if you have cached memory for a bus master
destination, the wonders of out of order execution may have the processor
reading the data buffers BEFORE you read the memory with status info,
unless you specifically cause instruction stream synchronization (some
specific instructions). An example would be a bus master device that sets
a bit in a buffer header to indicate the buffer is filled. A normal
instruction stream may order the physical read of the buffer data before
the physical read of the status bit, so you don’t really know if the status
bit was set when the buffers were physically read. It’s strange to think in
terms of the physical memory access ordering being different than the
logical ordering of your source code, but for device drivers, you have to
pay attention. In this case, the caches ARE coherent, but since memory is
changing behind the processors back, instruction execution ordering becomes
important too.

Instruction ordering and cache coherency bugs are also super hard to find.
It’s best to simple not create them in the first place.

I think one general answer would be: you need to pay close attention to how
you access DMAed data, and the hardware has built in support to help.

  • Jan

You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

Thank you all for your appreciated help

But since I am not an exp. in these low level hardware stuff I will explain
my situation:

I have a master DAM PCI card. I want to transfer data from the card memory
to the system memory.

I have allocated a commonbuffer in my driver to hold the incoming data from
the device.

So there is an boolean argument in the commin buffer about caching. so I
think this caching is used when the memory is copied from its contiguous
system memory to another location in system memory (if I am wrong please
tell).

I think if I have enabled caching this internal memory transfer will be
faster. So all what I need to know if I am right is there any precaution to
take care about?
and if I am wrong how can I speed memory transfers between system memory
locations?

thats all !!

thanks in advance

Hesham

----- Original Message -----
From: “Jan Bottorff”
To: “NT Developers Interest List”
Sent: Friday, July 20, 2001 2:28 AM
Subject: [ntdev] Re: Cached and non cached common buffer

>
> >are you mean that there is no precaution to use cached memory with DMA?
>
> It’s been a long time since I used ISA bus system DMA (like perhaps still
> used by an ECP parallel port?), but as I remember, cache coherency was not
> guaranteed.
>
> Also seems like AGP bus transfers didn’t guarantee coherency either.
>
> PCI bus master transfers are supposed to be coherent on an x86.
>
> When I think about it, I can come up with a scenario that I’m unsure about
> on PCI busses. I’ll assume the processor chip set designers got it right,
> but may research it a bit. Specifically, I’m unsure if a PCI
> WRITE_INVALIDATE could be half completed when the processor tries to load
> the same cache line range, effectively loading the cache with only
> partially updated data. My understanding was WRITE_INVALIDATE initiated
the
> cache snoop flush at the beginning of the PCI transaction, but don’t know
> if the write destination is then locked from access until the whole range
> is filled. Memory is now LOTS faster than a PCI bus. I especially wonder
on
> newer processors with larger than 32 byte cache line sizes (like the P4).
>
> A slightly related area is caching PCI target addresses and cache
> coherency. If you have a PCI target device and specify the memory window
to
> be cached, there is no magic that makes the cache coherent with the target
> memory when it changes. You either have to not cache it (which is slow
> because you don’t get burst reads) or else you have to ask for a cache
> writeback+invalidate (which is really slow, like I believe tens of
> thousands of clock cycles).
>
> To really complicate matters, if you have cached memory for a bus master
> destination, the wonders of out of order execution may have the processor
> reading the data buffers BEFORE you read the memory with status info,
> unless you specifically cause instruction stream synchronization (some
> specific instructions). An example would be a bus master device that sets
> a bit in a buffer header to indicate the buffer is filled. A normal
> instruction stream may order the physical read of the buffer data before
> the physical read of the status bit, so you don’t really know if the
status
> bit was set when the buffers were physically read. It’s strange to think
in
> terms of the physical memory access ordering being different than the
> logical ordering of your source code, but for device drivers, you have to
> pay attention. In this case, the caches ARE coherent, but since memory is
> changing behind the processors back, instruction execution ordering
becomes
> important too.
>
> Instruction ordering and cache coherency bugs are also super hard to find.
> It’s best to simple not create them in the first place.
>
> I think one general answer would be: you need to pay close attention to
how
> you access DMAed data, and the hardware has built in support to help.
>
> - Jan
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@yahoo.com
> To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

_________________________________________________________

Do You Yahoo!?

Get your free @yahoo.com address at http://mail.yahoo.com


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

----- Original Message -----
From: “hesham”
To: “NT Developers Interest List”
Sent: Friday, July 20, 2001 12:19 PM
Subject: [ntdev] Re: Cached and non cached common buffer

> Thank you all for your appreciated help
>
> But since I am not an exp. in these low level hardware stuff I will explain
> my situation:
>
> I have a master DAM PCI card. I want to transfer data from the card memory
> to the system memory.
I assume you will be using the card to DMA into the system memory.

>
> I have allocated a commonbuffer in my driver to hold the incoming data from
> the device.
The common buffer is in system memory, and is the target of the DMA.

>
> So there is an boolean argument in the commin buffer about caching. so I
> think this caching is used when the memory is copied from its contiguous
> system memory to another location in system memory (if I am wrong please
> tell).
If the CPU moves the data from the common buffer elsewhere, or uses the data directly
out of the common buffer, enabling the cache will make this transfer/access MUCH faster.

>
> I think if I have enabled caching this internal memory transfer will be
> faster. So all what I need to know if I am right is there any precaution to
> take care about?
If you follow the rules in the DDK on DMA exactly, you will be safe on all platforms. If
you try to shortcut the DDK at all, you need to understand all of the issues Jan refers to.

-DH

> and if I am wrong how can I speed memory transfers between system memory
> locations?
>
> thats all !!
>
> thanks in advance
>
> Hesham
>
> ----- Original Message -----
> From: “Jan Bottorff”
> To: “NT Developers Interest List”
> Sent: Friday, July 20, 2001 2:28 AM
> Subject: [ntdev] Re: Cached and non cached common buffer
>
>
> >
> > >are you mean that there is no precaution to use cached memory with DMA?
> >
> > It’s been a long time since I used ISA bus system DMA (like perhaps still
> > used by an ECP parallel port?), but as I remember, cache coherency was not
> > guaranteed.
> >
> > Also seems like AGP bus transfers didn’t guarantee coherency either.
> >
> > PCI bus master transfers are supposed to be coherent on an x86.
> >
> > When I think about it, I can come up with a scenario that I’m unsure about
> > on PCI busses. I’ll assume the processor chip set designers got it right,
> > but may research it a bit. Specifically, I’m unsure if a PCI
> > WRITE_INVALIDATE could be half completed when the processor tries to load
> > the same cache line range, effectively loading the cache with only
> > partially updated data. My understanding was WRITE_INVALIDATE initiated
> the
> > cache snoop flush at the beginning of the PCI transaction, but don’t know
> > if the write destination is then locked from access until the whole range
> > is filled. Memory is now LOTS faster than a PCI bus. I especially wonder
> on
> > newer processors with larger than 32 byte cache line sizes (like the P4).
> >
> > A slightly related area is caching PCI target addresses and cache
> > coherency. If you have a PCI target device and specify the memory window
> to
> > be cached, there is no magic that makes the cache coherent with the target
> > memory when it changes. You either have to not cache it (which is slow
> > because you don’t get burst reads) or else you have to ask for a cache
> > writeback+invalidate (which is really slow, like I believe tens of
> > thousands of clock cycles).
> >
> > To really complicate matters, if you have cached memory for a bus master
> > destination, the wonders of out of order execution may have the processor
> > reading the data buffers BEFORE you read the memory with status info,
> > unless you specifically cause instruction stream synchronization (some
> > specific instructions). An example would be a bus master device that sets
> > a bit in a buffer header to indicate the buffer is filled. A normal
> > instruction stream may order the physical read of the buffer data before
> > the physical read of the status bit, so you don’t really know if the
> status
> > bit was set when the buffers were physically read. It’s strange to think
> in
> > terms of the physical memory access ordering being different than the
> > logical ordering of your source code, but for device drivers, you have to
> > pay attention. In this case, the caches ARE coherent, but since memory is
> > changing behind the processors back, instruction execution ordering
> becomes
> > important too.
> >
> > Instruction ordering and cache coherency bugs are also super hard to find.
> > It’s best to simple not create them in the first place.
> >
> > I think one general answer would be: you need to pay close attention to
> how
> > you access DMAed data, and the hardware has built in support to help.
> >
> > - Jan
> >
> >
> > —
> > You are currently subscribed to ntdev as: xxxxx@yahoo.com
> > To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com
>
>
>
> _________________________________________________________
>
> Do You Yahoo!?
>
> Get your free @yahoo.com address at http://mail.yahoo.com
>
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@syssoftsol.com
> To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com