Jungo PCIe driver - Writing to pUserDirectMemAddr causes the system to hang

I am developing a PCIe driver for Altera (Intel) Avalon-MM FPGA. My OS is Windows 10 64bit and I am currently using Jungo’s Windriver framework. (hopefully it is not a mistake as indicated here)

The hardware maps 4096 bytes of its DDR memory to BAR2. I can successfully use Windriver API WDC_WriteAddr32 or WDC_WriteAddrBlock to write to this memory range.

My final goal is to be able to receive a physical pointer (bus address) to this mapped region, hand it to another device driver so it can perform DMA transfers directly to the FPGA DDR (by using AMD DirectGMA technology for instance).

As a preliminary stage, I am trying to work solely with user mode virtual addresses and simply write to the pointer I receive from Windriver API. Here is the code:

PWDC_DEVICE device = static_cast<PWDC_DEVICE>(_hWDCDevice); // _hWDCDevice is of type WDC_DEVICE_HANDLE

const WDC_ADDR_DESC& bar2AddressDescriptor = device->pAddrDesc[2];

static_assert(sizeof(UPTR) == sizeof(uint32_t*));
uint32_t* ptr = reinterpret_cast<uint32_t*>(bar2AddressDescriptor.pUserDirectMemAddr);

ptr[0] = 123; // this line causes the system to hang.
A different approach I tried is obtaining the virtual address from the WD_PCI_CARD_INFO structure:

WD_PCI_CARD_INFO deviceInfo{};
deviceInfo.pciSlot = slot;

WDC_CHECK(WDC_PciGetDeviceInfo(&deviceInfo));
WDC_CHECK(WDC_PciDeviceOpen(&_hWDCDevice, &deviceInfo, &_deviceContext));

for (auto i = 0u; i < deviceInfo.Card.dwItems; i++)
{
const auto& item = deviceInfo.Card.Item[i];

if (item.item == ITEM_MEMORY)
{
    if (item.I.Mem.dwBar == 2)
    {
        _busAddress = item.I.Mem.pPhysicalAddr; // is this the bus address?

        auto ptr = item.I.Mem.pUserDirectAddr;

        assert(ptr); // this fails, pUserDirectAddr is 0!

    }
}

}
with this approach I get a 0 pUserDirectAddr (even though qwBytes of the same struct is equal to the expected 4096 bytes).

To sum up, here are my specific questions:

  • What is the correct method to obtain a valid pointer to the memory mapped range?
  • Why does the system hangs when writing to pointer obtained from pUserDirectMemAddr?
  • What is the difference between pUserDirectMemAddr obtained by the first method and pUserDirectAddr obtained by the second method? and why is pUserDirectAddr equals 0?
  • how can I get the bus address for this memory range?

link to the same question on SO:
https://stackoverflow.com/questions/53189471/jungo-pcie-driver-writing-to-puserdirectmemaddr-causes-the-system-to-hang

I am currently using Jungo’s Windriver framework. (hopefully it is not a mistake…

Sorry. It is a mistake. A seriously big mistake.

One thing that makes it such a big mistake is that there’s nobody on this forum, or pretty much anywhere outside of Jungo, who can help you. Given that Windows now has WDF, which has been around since… oh… 2005… there’s much less need for things like Jungo .

Sorry…

Peter

To add to what Peter said, about 10 years ago I did an analysis for a client of the Jungo source code. The number of bugs per function was horrific. Keep your algorithms you have developed, throw out Jungo and start over again with WDF.

Don Burn
Windows Driver Consulting
Website: http://www.windrvr.com

Elad8a wrote:

The hardware maps 4096 bytes of its DDR memory to BAR2. I can successfully use Windriver API WDC_WriteAddr32 or WDC_WriteAddrBlock to write to this memory range.

My final goal is to be able to receive a physical pointer (bus address) to this mapped region, hand it to another device driver so it can perform DMA transfers directly to the FPGA DDR (by using AMD DirectGMA technology for instance).

It would be unusual to let another driver do your DMA.  Also, if another
driver does it, you can only DMA to BAR addresses.  If you only have 4k
exposed, that makes DMA rather uninteresting.  The way to exploit DMA is
to use the DMA engine in your own hardware. It can suck  from any PCIe
address and blow into the board’s onboard memory, even if it is not exposed

As a preliminary stage, I am trying to work solely with user mode virtual addresses and simply write to the pointer I receive from Windriver API. Here is the code:

You must have retyped this from memory, because there are several syntax
errors here.

PWDC_DEVICE device = static_cast(_hWDCDevice); // _hWDCDevice is of type WDC_DEVICE_HANDLE
const WDC_ADDR_DESC& bar2AddressDescriptor = device->pAddrDesc[2];
static_assert(sizeof(UPTR) == sizeof(uint32_t));
uint32_t ptr = reinterpret_cast(bar2AddressDescriptor.pUserDirectMemAddr);
ptr[0] = 123; // this line causes the system to hang.

Are you building a 32-bit application here?  Your assert concerns me. 
Any library that limits virtual and physical addresses to 32 bits is a
library with no future at all.  I assume in the line where you wrote
uint32_t ptr, you actually meant this:

    uint32_t * ptr = reinterpret_cast<uint32_t>*>bar2AddressDescriptor.dwUserDirectMemAddr;

I’m not sure why you would choose reinterpret_cast here, but that’s
irrelevant. Did you veryify that dwUserDirectMemAddr had a valid address?

When you say “hang”, do you literally mean it locked up and would accept
no more input, or do you mean you got a blue screen? Are you quite sure
BAR2 is just DDR, as opposed to a set of registers?

> A different approach I tried is obtaining the virtual address from the WD_PCI_CARD_INFO structure:

The WD_PCI_CARD_INFO data won’t be mapped unless you have specifically
asked it to be mapped.

> * What is the correct method to obtain a valid pointer to the memory mapped range?

These are Jungo questions. We can tell you how to do this in the driver
world. We don’t know how Jungo handles it.</uint32_t>

Tim Roberts, thanks for the input, I will refer to each of the things you mentioned separatly:

  • It would be unusual to let another driver do your DMA.
    Maybe I was a bit unclear so I’ll give some background. At first, I tried configuring the FPGA (that can act as a bus master) to perform the DMA transfers from system RAM to its DDR. This caused the system to hang once the DMA was intitated. After reading this work, that mentions utilizing the GPU DMA engine to copy data straight to the PCIe mapped BAR, I thought I’ll give this method a try. but currently, this too causes the system to hang once I try writing to the relevant pointer.

  • If you only have 4k exposed, that makes DMA rather uninteresting.
    That is right. Our final goal is to transfer HD frames from an AMD GPU straight to the FPGA memory. The current hardware is only used for testing that the DMA (using Jungo) actually works for a single page.

  • You must have retyped this from memory, because there are several syntax
    errors here.
    I am so sorry, I am new to this forum and just copy-pasted my stackoverflow question. somehow the ‘*’ from the pointer type was omitted.
    You can have a look on the original question on SO. I think there are no mistakes there. the OS is x64 of course.

  • Did you veryify that dwUserDirectMemAddr had a valid address?
    I validated it being non nullptr. Are there any further checks I can do? either way, since this is a normal user space code handling a virtual address, I would expect an access violation, not a system hang.

  • When you say “hang”, do you literally mean it locked up and would accept
    no more input, or do you mean you got a blue screen?
    No blue screen, the whole system hangs. if I’m on my HP workstation, the system hangs, crashes and I get a screen saying PCIe timeout error.

  • Are you quite sure BAR2 is just DDR, as opposed to a set of registers?
    Pretty sure. I can use some portions of the Jungo API that writes a continuous chunk of memory. but I will verify with the hardware team.

Elad8a wrote:

Maybe I was a bit unclear so I’ll give some background. At first, I tried configuring the FPGA (that can act as a bus master) to perform the DMA transfers from system RAM to its DDR. This caused the system to hang once the DMA was intitated.

That says you have a bug, either in the FPGA or in the setup.  We all
know bus master DMA works, but there are a lot of ways to get it wrong. 
A hang these days almost always means a PCI bus error, and that’s going
to be a problem no matter who is doing the DMA. You should get your
RAM-to-device DMA working first, then you can add the GPU into the mix. 
Note that the DirectGMA stuff also allows outside DMA engines (like
yours) to access GPU memory.

After reading this work, that mentions utilizing the GPU DMA engine to copy data straight to the PCIe mapped BAR, I thought I’ll give this method a try. but currently, this too causes the system to hang once I try writing to the relevant pointer.

Then perhaps your hardware is handling that memory access incorrectly.

No blue screen, the whole system hangs. if I’m on my HP workstation, the system hangs, crashes and I get a screen saying PCIe timeout error.

That’s a problem in your hardware.  You have a PCIe bus protocol violation.