Two identical devices with common buffers, but DMA is very slow on one device.

I’m currently developing a driver for a XMC device that operates via a carrier in a PCIe slot. I’ve encountered a performance issue related to DMA writes (device to RAM) when two devices are installed, but only one device is running.

Setup Details:

Each device uses a common buffer for DMA operations, where a scatter-gather list of descriptors is written to the common buffer, and then we tell the HW where to find the common buffer and it reads through the descriptors and transfers the data.

Each device has a unique set of common buffers (one for reading, one for writing) and these are created during driver initialization.

Both devices share the same interrupt (IRQ number is identical for both).

Issue Observed:

The slowdown occurs between providing the hardware with the common buffer’s starting address and receiving the interrupt indicating transfer completion.

The Interrupt Service Routine (ISR) promptly checks which device triggered the interrupt and exits if it’s not the correct device, ruling out multiple ISR calls as the cause.

Performance disparity is significant: normal operation takes about 0.0006 seconds, whereas with two devices installed, it slows down to 0.01 seconds (but only on one device, the other device runs fine). In fact, when you swap the devices, the slowness follows the slot not the device, and we have tried this on multiple machines of different types.

Additional Context:

This issue hasn’t arisen with our other devices, which typically follow a “base” and “channel” design due to multiple channels. However, in this case, as there’s only one channel, so the base and channel design were merged.

I’m at a loss as to what could be causing this slowdown for one device but not the other, especially since there doesn’t appear to be any additional driver code executing during this time. Any insights or advice on this matter would be greatly appreciated!

Are these legacy interrupts or MSI? How are you measuring the time? Is it from setting the “GO” register to receiving the completion interrupt? Have you put a scope on these devices to try to see where the delay happens? The delay could be (a) getting the request started, (b) completing the transfer, (c) firing the “completion” interrupt, (d) acknowledging the “completion” interrupt.

the slowness follows the slot not the device

Hmmmm…

when you swap the devices, the slowness follows the slot not the device, and we have tried this on multiple machines of different types

(my bolding added)

REALLY machines of different TYPES? As in, different CPU types and different support chipsets?

Since you said “IRQ” I’m guessing these are LBIs (not MSIs) that you’re dealing with? You said “Both devices share the same interrupt (IRQ number is identical for both)” … Really? How do you know? How does that even, ah, happen on Windows?

Tell us more, please?

MSI interrupts are never shared, LBIs on PCI are always potentially shared. LBI on Pcie is sort of stupid, but also shareable.