PCI/PCIe throughput performance

Hi All,

We are using the “WRITE_REGISTER_BUFFER_ULONG” macro to transfer data
from Windows/PC to our device. I was wondering if this is the most
optimized way of transferring data or we can do some MMX programming to
improve the throughput. I assume that this macro transfers one word at a
time. Thus, if we can transfer more than one word somehow, that might
improve the throughput over PCI/PCIe drastically.

Thanks & Regards
Abhishek Joshi

Probably it’s obvious and your hardware cannot be changed but… the most optimized way to transfer data to/from a PCI/PCIe device is using DMA. Memory mapped I/O to/from a device is slow, and it’s generally used to either control/program the device, or to transfer data at very low rates.
What is the amount of data that you are trying to transfer with memory mapped i/o?

Have a nice day
GV


Gianluca Varenni, Windows DDK MVP

CACE Technologies
http://www.cacetech.com
----- Original Message -----
From: Abhishek Joshi
To: Windows System Software Devs Interest List
Sent: Friday, March 28, 2008 10:26 AM
Subject: [ntdev] PCI/PCIe throughput performance

Hi All,

We are using the “WRITE_REGISTER_BUFFER_ULONG” macro to transfer data from Windows/PC to our device. I was wondering if this is the most optimized way of transferring data or we can do some MMX programming to improve the throughput. I assume that this macro transfers one word at a time. Thus, if we can transfer more than one word somehow, that might improve the throughput over PCI/PCIe drastically.

Thanks & Regards
Abhishek Joshi


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

We are planning to send out 50 MBps of data.

Thanks & Regards
Abhishek Joshi


From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Gianluca Varenni
Sent: Friday, March 28, 2008 11:00 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] PCI/PCIe throughput performance

Probably it’s obvious and your hardware cannot be changed but… the
most optimized way to transfer data to/from a PCI/PCIe device is using
DMA. Memory mapped I/O to/from a device is slow, and it’s generally used
to either control/program the device, or to transfer data at very low
rates.
What is the amount of data that you are trying to transfer with memory
mapped i/o?

Have a nice day
GV


Gianluca Varenni, Windows DDK MVP

CACE Technologies
http://www.cacetech.com

----- Original Message -----
From: Abhishek Joshi mailto:xxxxx
To: Windows System Software Devs Interest List
mailto:xxxxx
Sent: Friday, March 28, 2008 10:26 AM
Subject: [ntdev] PCI/PCIe throughput performance

Hi All,

We are using the “WRITE_REGISTER_BUFFER_ULONG” macro to transfer
data from Windows/PC to our device. I was wondering if this is the most
optimized way of transferring data or we can do some MMX programming to
improve the throughput. I assume that this macro transfers one word at a
time. Thus, if we can transfer more than one word somehow, that might
improve the throughput over PCI/PCIe drastically.

Thanks & Regards
Abhishek Joshi


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars
visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx>

I would say this is too large a traffic for non-DMA case.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

“Abhishek Joshi” wrote in message news:xxxxx@ntdev…
We are planning to send out 50 MBps of data.

Thanks & Regards
Abhishek Joshi

________________________________

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Gianluca Varenni
Sent: Friday, March 28, 2008 11:00 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] PCI/PCIe throughput performance

Probably it’s obvious and your hardware cannot be changed but… the
most optimized way to transfer data to/from a PCI/PCIe device is using
DMA. Memory mapped I/O to/from a device is slow, and it’s generally used
to either control/program the device, or to transfer data at very low
rates.
What is the amount of data that you are trying to transfer with memory
mapped i/o?

Have a nice day
GV


Gianluca Varenni, Windows DDK MVP

CACE Technologies
http://www.cacetech.com

----- Original Message -----
From: Abhishek Joshi mailto:xxxxx
To: Windows System Software Devs Interest List
mailto:xxxxx
Sent: Friday, March 28, 2008 10:26 AM
Subject: [ntdev] PCI/PCIe throughput performance

Hi All,

We are using the “WRITE_REGISTER_BUFFER_ULONG” macro to transfer
data from Windows/PC to our device. I was wondering if this is the most
optimized way of transferring data or we can do some MMX programming to
improve the throughput. I assume that this macro transfers one word at a
time. Thus, if we can transfer more than one word somehow, that might
improve the throughput over PCI/PCIe drastically.

Thanks & Regards
Abhishek Joshi


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars
visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx>

I understand and agree that it should be device DMA which should
transfer the data. But going back to my previous question, Is this macro
highly optimized or there is some room for improvement?

Thanks & Regards
Abhishek Joshi

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Maxim S.
Shatskih
Sent: Friday, March 28, 2008 11:47 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] PCI/PCIe throughput performance

I would say this is too large a traffic for non-DMA case.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

“Abhishek Joshi” wrote in message
news:xxxxx@ntdev…
We are planning to send out 50 MBps of data.

Thanks & Regards
Abhishek Joshi

________________________________

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Gianluca Varenni
Sent: Friday, March 28, 2008 11:00 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] PCI/PCIe throughput performance

Probably it’s obvious and your hardware cannot be changed but… the
most optimized way to transfer data to/from a PCI/PCIe device is using
DMA. Memory mapped I/O to/from a device is slow, and it’s generally used
to either control/program the device, or to transfer data at very low
rates.
What is the amount of data that you are trying to transfer with memory
mapped i/o?

Have a nice day
GV


Gianluca Varenni, Windows DDK MVP

CACE Technologies
http://www.cacetech.com

----- Original Message -----
From: Abhishek Joshi mailto:xxxxx
To: Windows System Software Devs Interest List
mailto:xxxxx
Sent: Friday, March 28, 2008 10:26 AM
Subject: [ntdev] PCI/PCIe throughput performance

Hi All,

We are using the “WRITE_REGISTER_BUFFER_ULONG” macro to transfer data
from Windows/PC to our device. I was wondering if this is the most
optimized way of transferring data or we can do some MMX programming to
improve the throughput. I assume that this macro transfers one word at a
time. Thus, if we can transfer more than one word somehow, that might
improve the throughput over PCI/PCIe drastically.

Thanks & Regards
Abhishek Joshi


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars
visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx>

I don’t think you will ever be able to get 50Mbps out of memory mapped I/O. And even if you get to transfer 50Mbps, your CPU will be quite loaded transferring data, not a good thing. You need to redesign your device and use DMA.

Have a nice day
GV


Gianluca Varenni, Windows DDK MVP

CACE Technologies
http://www.cacetech.com
----- Original Message -----
From: Abhishek Joshi
To: Windows System Software Devs Interest List
Sent: Friday, March 28, 2008 11:38 AM
Subject: RE: [ntdev] PCI/PCIe throughput performance

We are planning to send out 50 MBps of data.

Thanks & Regards
Abhishek Joshi


From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Gianluca Varenni
Sent: Friday, March 28, 2008 11:00 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] PCI/PCIe throughput performance

Probably it’s obvious and your hardware cannot be changed but… the most optimized way to transfer data to/from a PCI/PCIe device is using DMA. Memory mapped I/O to/from a device is slow, and it’s generally used to either control/program the device, or to transfer data at very low rates.
What is the amount of data that you are trying to transfer with memory mapped i/o?

Have a nice day
GV


Gianluca Varenni, Windows DDK MVP

CACE Technologies
http://www.cacetech.com
----- Original Message -----
From: Abhishek Joshi
To: Windows System Software Devs Interest List
Sent: Friday, March 28, 2008 10:26 AM
Subject: [ntdev] PCI/PCIe throughput performance

Hi All,

We are using the “WRITE_REGISTER_BUFFER_ULONG” macro to transfer data from Windows/PC to our device. I was wondering if this is the most optimized way of transferring data or we can do some MMX programming to improve the throughput. I assume that this macro transfers one word at a time. Thus, if we can transfer more than one word somehow, that might improve the throughput over PCI/PCIe drastically.

Thanks & Regards
Abhishek Joshi


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

: kd> u nt!WRITE_REGISTER_BUFFER_ULONG
nt!WRITE_REGISTER_BUFFER_ULONG:
80872604 8bc6 mov eax,esi
80872606 8bd7 mov edx,edi
80872608 8b4c240c mov ecx,dword ptr [esp+0Ch]
8087260c 8b742408 mov esi,dword ptr [esp+8]
80872610 8b7c2404 mov edi,dword ptr [esp+4]
80872614 f3a5 rep movs dword ptr es:[edi],dword ptr [esi]
80872616 f0094c2404 lock or dword ptr [esp+4],ecx
8087261b 8bfa mov edi,edx
8087261d 8bf0 mov esi,eax
8087261f c20c00 ret 0Ch

Good luck with getting 50Mbps.
On Fri, Mar 28, 2008 at 1:26 PM, Abhishek Joshi
wrote:

> Hi All,
>
> We are using the “WRITE_REGISTER_BUFFER_ULONG” macro to transfer data from
> Windows/PC to our device. I was wondering if this is the most optimized way
> of transferring data or we can do some MMX programming to improve the
> throughput. I assume that this macro transfers one word at a time. Thus, if
> we can transfer more than one word somehow, that might improve the
> throughput over PCI/PCIe drastically.
>
> Thanks & Regards
> Abhishek Joshi
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


Mark Roddy

MMX implies floating point state, which we generally try to avoid in
the kernel, since it increases the cost of a context switch and forces
a flush of floating point state. So the internal implementation of
WRITE_REGISTER_BUFFER_ULONG doesn’t use it.

  • Jake Oshins
    Windows Kernel Guy

“Abhishek Joshi” wrote in message
news:xxxxx@ntdev…
Hi All,

We are using the “WRITE_REGISTER_BUFFER_ULONG” macro to transfer data
from Windows/PC to our device. I was wondering if this is the most
optimized way of transferring data or we can do some MMX programming
to improve the throughput. I assume that this macro transfers one word
at a time. Thus, if we can transfer more than one word somehow, that
might improve the throughput over PCI/PCIe drastically.

Thanks & Regards
Abhishek Joshi

On Fri, Mar 28, 2008 at 03:54:10PM -0400, Mark Roddy wrote:

: kd> u nt!WRITE_REGISTER_BUFFER_ULONG
nt!WRITE_REGISTER_BUFFER_ULONG:
80872604 8bc6 mov eax,esi
80872606 8bd7 mov edx,edi
80872608 8b4c240c mov ecx,dword ptr [esp+0Ch]
8087260c 8b742408 mov esi,dword ptr [esp+8]
80872610 8b7c2404 mov edi,dword ptr [esp+4]
80872614 f3a5 rep movs dword ptr es:[edi],dword ptr [esi]
80872616 f0094c2404 lock or dword ptr [esp+4],ecx
8087261b 8bfa mov edi,edx
8087261d 8bf0 mov esi,eax
8087261f c20c00 ret 0Ch

Good luck with getting 50Mbps.

I don’t understand all the pessimism in this thread. 50Mbps is only
6 megabytes per second. rep movsd can nearly saturate a PCI bus, certainly
in excess of 20 megabytes per second.

Tim Roberts, xxxxx@probo.com
Providenza & Boeklheide, Inc.

Tim Roberts wrote:

I don’t understand all the pessimism in this thread. 50Mbps is
only 6 megabytes per second. rep movsd can nearly saturate
a PCI bus, certainly in excess of 20 megabytes per second.

As usual, the OP has left the thread, but if you look closely, he said 50MBps. Which is technically 50 megabytes per second, not bits.

I misstyped mps ought to be MBS. The OP wants to get 50 mega bytes/second
without having a DMA capability on his device. I don’t think he is going to
get that.

On Sat, Mar 29, 2008 at 3:24 AM, wrote:

> On Fri, Mar 28, 2008 at 03:54:10PM -0400, Mark Roddy wrote:
> > : kd> u nt!WRITE_REGISTER_BUFFER_ULONG
> > nt!WRITE_REGISTER_BUFFER_ULONG:
> > 80872604 8bc6 mov eax,esi
> > 80872606 8bd7 mov edx,edi
> > 80872608 8b4c240c mov ecx,dword ptr [esp+0Ch]
> > 8087260c 8b742408 mov esi,dword ptr [esp+8]
> > 80872610 8b7c2404 mov edi,dword ptr [esp+4]
> > 80872614 f3a5 rep movs dword ptr es:[edi],dword ptr [esi]
> > 80872616 f0094c2404 lock or dword ptr [esp+4],ecx
> > 8087261b 8bfa mov edi,edx
> > 8087261d 8bf0 mov esi,eax
> > 8087261f c20c00 ret 0Ch
> >
> > Good luck with getting 50Mbps.
>
> I don’t understand all the pessimism in this thread. 50Mbps is only
> 6 megabytes per second. rep movsd can nearly saturate a PCI bus,
> certainly
> in excess of 20 megabytes per second.
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boeklheide, Inc.
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


Mark Roddy

I misstyped agin! “mps”? mbs. Anyway the OP was asking for way too much.

On Sat, Mar 29, 2008 at 3:44 PM, Mark Roddy wrote:

> I misstyped mps ought to be MBS. The OP wants to get 50 mega bytes/second
> without having a DMA capability on his device. I don’t think he is going to
> get that.
>
>
> On Sat, Mar 29, 2008 at 3:24 AM, wrote:
>
> > On Fri, Mar 28, 2008 at 03:54:10PM -0400, Mark Roddy wrote:
> > > : kd> u nt!WRITE_REGISTER_BUFFER_ULONG
> > > nt!WRITE_REGISTER_BUFFER_ULONG:
> > > 80872604 8bc6 mov eax,esi
> > > 80872606 8bd7 mov edx,edi
> > > 80872608 8b4c240c mov ecx,dword ptr [esp+0Ch]
> > > 8087260c 8b742408 mov esi,dword ptr [esp+8]
> > > 80872610 8b7c2404 mov edi,dword ptr [esp+4]
> > > 80872614 f3a5 rep movs dword ptr es:[edi],dword ptr [esi]
> > > 80872616 f0094c2404 lock or dword ptr [esp+4],ecx
> > > 8087261b 8bfa mov edi,edx
> > > 8087261d 8bf0 mov esi,eax
> > > 8087261f c20c00 ret 0Ch
> > >
> > > Good luck with getting 50Mbps.
> >
> > I don’t understand all the pessimism in this thread. 50Mbps is only
> > 6 megabytes per second. rep movsd can nearly saturate a PCI bus,
> > certainly
> > in excess of 20 megabytes per second.
> > –
> > Tim Roberts, xxxxx@probo.com
> > Providenza & Boeklheide, Inc.
> >
> > —
> > NTDEV is sponsored by OSR
> >
> > For our schedule of WDF, WDM, debugging and other seminars visit:
> > http://www.osr.com/seminars
> >
> > To unsubscribe, visit the List Server section of OSR Online at
> > http://www.osronline.com/page.cfm?name=ListServer
> >
>
>
>
> –
> Mark Roddy


Mark Roddy