32-bit device, 64-bit Win7 DMA Issue

Hello:

Some months ago, and with guidance from the members of this forum, I undertook an effort to develop a WDK8 Win7 driver for a 32-bit device that supported scatter-gather DMA from an FPGA board hosting a Xilinx PCIe Endpoint Block (v 1.15). This effort is close to success, but I am running into a problem that I can’t resolve, which occurs somewhere between the driver, the kernel and the firmware on the device. It appears to me that the firmware is writing to an incorrect location in memory, resulting in a blue screen.

The design employs the WdfDmaProfileScatterGather profile to generate a scatter-gather list. The 32-bit physical memory addresses and lengths returned in the scatter gather elements are written into a FIFO on the board, and the firmware writes MWR3 packets targeting these addresses. Firmware has been designed to place addresses and the MWR3 header into register space. Comparison between physical addresses in the scatter-gather list as seen by the driver (via Traceview) and physical addresses written into the MWR3 headers by the firmware indicates consistency. MWR3 header is as expected. However, memory targeted by the host-side test program is not filled by the DMA. Naturally I started with very short transfers (8 bytes), and worked up in size until the machine crashes at transfers of order 0x1000. For what its worth I verified via Traceview that the scatter-gather list is composed of a single element for transfers up to 0x1000 (i.e. page size).

I interpret this crash as a sign that the firmware has targeted an incorrect address.

Briefly, the WDF routines I’m calling are modelled after the widely used PLX9x5x example:

/// EvtDeviceAdd

NTSTATUS status;
PDEVICE_CONTEXT devCtx;
WDF_OBJECT_ATTRIBUTES attributes;
WDF_DMA_ENABLER_CONFIG dmaConfig;

/* code omitted */

devCtx = DeviceGetContext(Device);

WdfDeviceInitSetIoType(DeviceInit, WdfDeviceIoBuffered);

WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPE(&deviceAttributes, DEVICE_CONTEXT);

status = WdfDeviceCreate(&DeviceInit, &deviceAttributes, &Device);
if (!NT_SUCCESS(status)) {
TraceEvents(TRACE_LEVEL_INFORMATION,
TRACE_TOSC_DRIVER,
“%!FUNC! Error in WdfDeviceCreate”);
}

WdfDeviceSetAlignmentRequirement( Device,
FILE_64_BYTE_ALIGNMENT);

WDF_DMA_ENABLER_CONFIG_INIT( &dmaConfig,
(WDF_DMA_PROFILE)(WdfDmaProfileScatterGather) ,
MaxTransferLength );

status = WdfDmaEnablerCreate( Device,
&dmaConfig,
WDF_NO_OBJECT_ATTRIBUTES,
&(devCtx->DMAparms.DMAEnabler));

if (!NT_SUCCESS (status)) {
TraceEvents(TRACE_LEVEL_ERROR,
TRACE_TOSC_DRIVER,
“%!FUNC! WdfDmaEnablerCreate failed: %!STATUS!”,
status);
return status;
}

WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPE(&attributes, TRANSACTION_CONTEXT);

status = WdfDmaTransactionCreate( devCtx->DMAparms.DMAEnabler,
&attributes,
&(devCtx->DMAparms.DMAReadDmaTransaction));

if(!NT_SUCCESS(status)) {
TraceEvents(TRACE_LEVEL_ERROR,
TRACE_TOSC_DRIVER,
“%!FUNC! WdfDmaTransactionCreate A failed: %!STATUS!”,
status);
return status;
}

/// EvtIoRead
WDF_REQUEST_PARAMETERS params;
WDFDEVICE device;
PDEVICE_CONTEXT devCtx;
NTSTATUS status;

/* code omitted */

status = WdfDmaTransactionInitializeUsingRequest(
devCtx->DMAparms.DMAReadDmaTransaction,
Request,
EvtProgramReadDma,
WdfDmaDirectionReadFromDevice );

if(!NT_SUCCESS(status)) {
TraceEvents(TRACE_LEVEL_ERROR,
TRACE_TOSC_QUEUE,
“%!FUNC! failed: WdfDmaTransactionInitializeUsingRequest %!STATUS!”, status);
break;
}

status = WdfDmaTransactionExecute(devCtx->DMAparms_A.DMAReadDmaTransaction,
WDF_NO_CONTEXT);

if(!NT_SUCCESS(status)) {
TraceEvents(TRACE_LEVEL_ERROR,
TRACE_TOSC_QUEUE,
“%!FUNC! failed: WdfDmaTransactionExecute %!STATUS!”,
status);
break;
}

Let’s start at the top the, shall we?

a) Do you have WDF Verifier and Windows Driver Verifier (all options ON except low resource simulation) properly enabled for your driver?

b) What’s the crash that you’re getting. Post the basics of the !analyze -v

c) How much memory is on the target system? If pages of the user data buffer are located at or above 4MB, the transfer will be intermediately buffered (by the HAL) and you won’t be writing directly into the user’s data buffer.

Uhhhh… let’s start with that.

Peter
OSR

Apologies - the above post was incomplete. To continue:

/// EvtProgramReadDma

if(SgList->NumberOfElements==0)
return FALSE;

devCtx = DeviceGetContext(Device);

regVal.ulong = READ_REGISTER_ULONG((volatile ULONG *)&(devCtx->toscRegs->tcbFifoDepthRegister.ulong));

if(regVal.ulong==0)
tcbFifoDepth = TOSCHTG_DEFAULT_TCB_FIFO_DEPTH;
else
tcbFifoDepth = regVal.ulong;

if(devCtx->SGIndex==0) devCtx->SGList = SgList;

maxIndex = (tcbFifoDepth) < devCtx->SGList->NumberOfElements ?
(tcbFifoDepth) : devCtx->SGList->NumberOfElements;

WdfInterruptAcquireLock(devCtx->Interrupt);
for(i=0; i
/* Write transfer control blocks (i.e. buffer descriptors) to the FIFO register */

}
WdfInterruptReleaseLock(devCtx->Interrupt);

The final transfer control block contains a signal indicating to the board that it should issue an MSI after completion of the final transfer. This interrupt is handled by the ISR, which checks to see whether there are additional scatter gather elements that have not been written. All of this is a bit academic, since the DMA fails for a single TCB (i.e. a single 8 byte transfer).

A couple more notes:

- I have tried maxTransferLength of 0x00200000 and 0x04000000 with no change in behavior. With a single TCB, this was unlikely to affect the outcome.

- I tried using WdfDmaProfilePacket with no change in behavior

- I actually tried WdfDmaProfileScatterGather64 and had the firmware issue MWR4 packets instead. While I doubt this is valid practice, it produced some interesting behavior:

– By modifying the physical addresses written into the TCB FIFO, I was able to write to nonexistent memory (e.g. 32-bit upper address 0x80000000). In doing so, I verified via Traceview that the entire DMA transfer successfully proceeded through EvtProgramDMA, ISR, and DPC, and all elements of the scatter gather list were successfully processed.

– I found by trial and error that the DMA succeeded when the 32-bit upper address was nonzero. That is, data were accurately written to physical addresses above 4 GB as seen by the host-side test program.

These observations appear to indicate that the driver and firmware sections of the DMA may be programmed correctly, but that there is some misconfiguration that is occurring in transitioning between the 32-bit device address space and the 64-bit OS address space.

Any advice appreciated.

Matthew

Hello Peter:

Thank you for your quick response, and my apologies for splitting the orginal post. I can provide more information on the crash, but I’m afraid that the manifestation occurs on the firmware side rather than on the host driver side. Specifically:

1 and 2) The PCIe endpoint block provides a signal called trn_tdst_rdy_n flag, which indicates when PCIe packets may be issued by the FPGA. For transfers to nonexistent memory space or to addresses above 32-bits, this flag remains high for ~10 usec and then bobbles between high and low as the transfer proceeds. For transfers to memory space below 32-bits, this flag transitions low after ~10 usec and remains low. Since the firmware is unable to transmit packets, it cannot provide a completion packet when the driver issues a subsequent read request. The absence of a completion packet blue screens the machine. Because of this failure mode, I’m not sure that driver verifier or !analyze -v is going to provide useful information. (i.e. I can already see how it fails).

  1. There are 6 GB of RAM on the target system. I do see occurrences of pages above 4GB, and DMA transfers to these addresses succeed, with data verified in the host app. I suspect that the intermediate buffering is what is saving the transfer (somehow).

Many thanks
Matthew

Your buffer will only contain the result after you call WdfDmaTransactionDmaCompleted(WithLength).

Alex:

Apologies for my incomplete posting. In the DPC routine, I did issue the call you mention after completion of the DMA (i.e. after all TCBs are processed and the final interrupt has been issued). In the case of a single TCB, there’s a single interrupt handled by the ISR, which detects that this was the last TCB and drops into DPC.

Specifically, the DPC code is:

/// EvtInterruptDpc

/* code omitted */

transactionComplete = WdfDmaTransactionDmaCompleted(devCtx->DMAparms.DMAReadDmaTransaction,
&status);

if (transactionComplete) {
bytesTransferred = WdfDmaTransactionGetBytesTransferred(devCtx->DMAparms.DMAReadDmaTransaction);

TraceEvents(TRACE_LEVEL_INFORMATION,
TRACE_TOSC_DRIVER,
"%!FUNC! Transaction complete: Request %p, Status %!STATUS!, "
“bytes transferred %d\n”,
request,
status,
(int) bytesTransferred );

status = WdfDmaTransactionRelease(devCtx->DMAparms.DMAReadDmaTransaction);

if(!NT_SUCCESS(status)) {
TraceEvents(TRACE_LEVEL_ERROR,
TRACE_TOSC_DRIVER,
“%!FUNC! WdfDmaTransactionRelease failed: %!STATUS!”,
status);
}

//
// Unmark request cancellable
//
status = WdfRequestUnmarkCancelable(request);

if(status != STATUS_CANCELLED) {

TraceEvents(TRACE_LEVEL_INFORMATION,
TRACE_TOSC_DRIVER,
“%!FUNC! Completing request\n”);

queue = WdfRequestGetIoQueue(request);
qstate = WdfIoQueueGetState(queue,
&QueueRequests,
&DriverRequests);

TraceEvents(TRACE_LEVEL_INFORMATION,
TRACE_TOSC_DRIVER,
“%!FUNC! qstate %x qr %x dr %x\n”,
qstate,
QueueRequests,
DriverRequests);

//
// Complete this Request.
//
WdfRequestCompleteWithInformation( request,
status,
bytesTransferred);

}

Many thanks
Matthew

You say that your device only supports 32 bit physical address. This is quite bad design, but Windows will help you with that at the cost of doing an intermediate copy. You MUST specify WdfDmaProfileScatterGather, NOT WdfDmaProfileScatterGather64. Or get with the times and fix the device architecture, so it will support full 64 bit physical address.

How do you perform writes to your TCB FIFO?

Alex:

I’m certainly not disagreeing with you about getting with the times, and will be looking to upgrade the firmware to support 64 bit physical addresses. It would help me to be able to show some level of performance, even if it entailed an intermediate copy. And as far as I can tell, WDK DMA framework is supposed to be able to support 32-bit devices.

I have specified WdfDmaProfileScatterGather in the above call to WDF_DMA_ENABLER_CONFIG_INIT. I find that for this profile the physical addresses returned in the scatter gather list all lie below the 32-bit boundary. (It seems to me that physical addresses lying above 4 GB in my 6 GB of RAM must be mapped to lower memory by WDF via map registers so as to present a 32 bit address to the device, courtesy of WdfDmaProfileScatterGather). This all seems fine, but executing the DMA transfer does not succeed in populating the host-side memory - as described above.

I’m writing to the TCB FIFO via four successive calls to WRITE_REGISTER_ULONG: high address, low address, length, and interrupt enable. The firmware reads these via a 128-bit bus, correctly places these fields into hardware registers, and issues the PCIe MWR3 packets until the trn_tdst_rdy_n flag deasserts. Subsequently, this flag never reasserts and the machine blue screens.

Note: in the above code I have called

WdfDeviceInitSetIoType(DeviceInit, WdfDeviceIoBuffered);

prior to my call to WdfDeviceCreate. I’m not sure whether this guarantees that an intermediate copy will take place.

Many thanks
Matthew

Yikes! That’s certainly a problem. You need to change your call to WdfDeviceInitSetIoType to specify Direct I/O… You can’t really do DMA with the Buffered.

Try that, report back.

BTW did you answer my original questions about whether Verifier is enabled?

Peter
OSR