PCIe DMA: Problem of out of order TLPs with jumbled data

Hi all,
I am using a Jungo 11.10 PCIe driver for x86(windows xp). I am trying to transferthe data from the host PC to Altera Arria 2 GX FPGA. There are 2 scenarios:

  1. Normal Wd_Transfer(): I am allocating a buffer of 1kbytes with a counter data (00000000-000000FFh) in it. I am transferring this 1kb data from PC to FPGA and I could verify the data reaching the FPGA by tapping the data in signal tap file (STP file). The data maintained its order and integrity.

  2. DMA transfer: I am again allocating a buffer of 1kbytes with a counter data in it. I am transferring this 1kb data from PC to FPGA, but with DMA mechanism and I could see that the data reaching the FPGA were out of order(jumbled). I could verify this by tapping the data in signal tap file (STP file). Ex: The data received followed a sequence from 00000000h-0000003Bh and the sequence started from 00000040h, missing the data from 0000003Ch-0000003Fh, only to appear after 0000007Fh. I am getting the entire data which was sent from the application but it is coming out of order.

So, Do I have to do anything specific for the exact ordering of the data, like setting some bits in the PCIe configuration space or DMA control registers or any other settings?

Additional information regarding Transaction layer Packets (TLPs):

Normal Wd_Transfer() : Header of the TLP indicates MemoryWrite request as seen from the format and type field(40) with payload length(number of DWORDs) is constant in all the TLP headers.

DMA transfer : Header of the TLP indicates Cpld (Completion request with data) as seen from the format and type field(4A) with payload length(number of DWORDs) varying for each TLP header.

Lastly, Can I control the payload length field(number of DWORDs) in the TLP header from the driver end? If so, how?

Please provide me any relevant information regarding this issue.
Thanks,
Arvind

Try a Jungo support forum. We have very few people here who know much that’s useful about Jungo.

My advice: ditch Jungo and use Microsoft’s supported WDF. Just as easy, more reliable, better support, and more well known.

Peter
OSR

xxxxx@tataelxsi.co.in wrote:

Hi all,
I am using a Jungo 11.10 PCIe driver for x86(windows xp). I am trying to transferthe data from the host PC to Altera Arria 2 GX FPGA. There are 2 scenarios:

  1. Normal Wd_Transfer(): I am allocating a buffer of 1kbytes with a counter data (00000000-000000FFh) in it. I am transferring this 1kb data from PC to FPGA and I could verify the data reaching the FPGA by tapping the data in signal tap file (STP file). The data maintained its order and integrity.

  2. DMA transfer: I am again allocating a buffer of 1kbytes with a counter data in it. I am transferring this 1kb data from PC to FPGA, but with DMA mechanism and I could see that the data reaching the FPGA were out of order(jumbled). I could verify this by tapping the data in signal tap file (STP file). Ex: The data received followed a sequence from 00000000h-0000003Bh and the sequence started from 00000040h, missing the data from 0000003Ch-0000003Fh, only to appear after 0000007Fh. I am getting the entire data which was sent from the application but it is coming out of order.

So, Do I have to do anything specific for the exact ordering of the data, like setting some bits in the PCIe configuration space or DMA control registers or any other settings?

What you have described here is a bug in your FPGA. Many FPGA designers
use a huge assembly buffer to gather transfers up into blocks of 64
bytes at a time, or even larger. Being hardware types isolated from the
realities of software, they often assume that every transfer will be
perfectly aligned, beginning on a 0 mod N boundary, and occupying an
even multiple of N bytes. The realities of poorly aligned transfers
then cause reassembly problems.

Lastly, Can I control the payload length field(number of DWORDs) in the TLP header from the driver end? If so, how?

No. That’s a PCI Express implementation detail. It is negotiated
between the root complex and your hardware’s endpoint. Your
configuration space has a register for your desired TLP size, but the
root complex can reduce that.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

@Tim Roberts:

  1. Thanks for your reply. My question is, if there is a bug in the FPGA like alignment problems, then why is that the normal WD_transfer happening correctly? Is it like the alignment problems would arise only for the DMA transfer?

  2. As you told that we can’t control the payload length(number of DWORDs) in TLP, but we still have a register for it. Is that the “Device control register” in PCIe Config space?

  3. Finally, In the device control register, there is a provision for specifying payload size and fields for max. read request size. In what combination we need to set these 2 fields? (Altera IP has been generated with 256 bytes as Max. Payload size)

@Peter: I have a compulsion to use Jungo.

Thanks,
Arvind

xxxxx@tataelxsi.co.in wrote:

@Tim Roberts:

  1. Thanks for your reply. My question is, if there is a bug in the FPGA like alignment problems, then why is that the normal WD_transfer happening correctly? Is it like the alignment problems would arise only for the DMA transfer?

Absolutely. The BAR-based transfer almost certainly uses a different
path to memory than the DMA-based transfers. DMA happens quickly, so
there are almost always FIFOs involved, and it’s easy to make
assumptions between the FIFO and the memory. I’m on a project right now
where we are dealing with this exact problem.

  1. As you told that we can’t control the payload length(number of DWORDs) in TLP, but we still have a register for it. Is that the “Device control register” in PCIe Config space?

Correct.

  1. Finally, In the device control register, there is a provision for specifying payload size and fields for max. read request size. In what combination we need to set these 2 fields? (Altera IP has been generated with 256 bytes as Max. Payload size)

That’s entirely dependent on what your FPGA can handle. Note, however,
that this just your offer in a negotiation. The root complex looks at
your limit and its own limit, and uses the minimum of those two. The
early PCIe root complexes only supported a 128-byte size (which is the
minimum), so that’s all you could ever get. Most endpoints are set to
256. Some PCIe motherboards can now handle 512 (which is the maximum).

@Peter: I have a compulsion to use Jungo.

What does that mean? The Jungo framework is crap. If your management
is insisting that you use it, then they are intentionally crippling
their product and should be fired. KMDF provides a solution that is
nearly as easy, but is reliable, better documented, and happens to be
fully supported by Microsoft.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

In my view, far too many hardware-focused companies see Jungo as a viable approach to the problem of writing a Windows driver. This may be because Jungo is “in bed with” some of the hardware vendors, and provides starter tools or something.

In my opinion, any reason to use Jungo disappeared with the introduction of WDF. KMDF is an exceptionally stable, easy to use, flexible, performant, framework for driver development. It is well supported by a very active community and by Microsoft. The architect at Microsoft who is the manager to whom the WDF team reports is perhaps THE most active participant in NTDEV.

You should tell whoever gave you that “compulsion” that it would be better, faster, more maintainable, and far easier if you dropped Jungo and used KMDF. Who knows… they might give you a promotion for making a great suggestion.

Peter
OSR

Peter,

Don’t forget the “Jungo consultants” the company had folks that
signed up and pushed the product, then bled the customer dry with being
the only ones to support the driver they had written. What got to me
about these characters is they would leave the customer blowing in the
wind when the client asked for things like, WMI support before Jungo
brought it out (about 3 years after Microsoft delivered it). There are
corporate HR filters I know of that if they see Jungo on the resume
reject it before it goes any further, because of this practice.

Don Burn
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

xxxxx@osr.com” wrote in message news:xxxxx@ntdev:

>


>
> In my view, far too many hardware-focused companies see Jungo as a viable approach to the problem of writing a Windows driver. This may be because Jungo is “in bed with” some of the hardware vendors, and provides starter tools or something.
>
> In my opinion, any reason to use Jungo disappeared with the introduction of WDF. KMDF is an exceptionally stable, easy to use, flexible, performant, framework for driver development. It is well supported by a very active community and by Microsoft. The architect at Microsoft who is the manager to whom the WDF team reports is perhaps THE most active participant in NTDEV.
>
> You should tell whoever gave you that “compulsion” that it would be better, faster, more maintainable, and far easier if you dropped Jungo and used KMDF. Who knows… they might give you a promotion for making a great suggestion.
>
> Peter
> OSR