PCIe Device to Host DMA not working (Windows 7)

Hi,

We are writing a Windows 7 KMDF PCIe device driver for a storage device that is capable of DMA read & write. We have followed the KMDF framework completely and verified using DriverVerifier and !dma debuggers. We are tracking the data on the protocol analyzer that can show the packets traveling between PC and Device.

Problem:
We have an issue in our DMA read.
We are able to see DMA write (Host to Device) working. That is, DMA write command to the device and data is seen on the protocol analyzer.
On the DMA read, we are seeing DMA read command from the host and data from device on the protocol analyzer but we are not seeing the data in the allocated buffer.

Our steps:

We followed the stapes to create DMA transactions ( followed PLX9x5x example).
We are creating a Non paged memory using MmAllocateContiguousMemorySpecifyCache for buffer
Build an Mdl for the buffer and generate SGL using MmBuildMdlForNonPagedPool
Initialize Dma using WdfDmaTransactionInitialize
call WdfDmaTransactionExecute.

When the ProgramReadDma callback gets called, we take the Bus logical address for the SGL that gets passed as input and communicate this address to the device with a bar register write along with data direction.

We don?t have multiple SGL elements since our data buffer is page aligned and total data is 512 bytes. The same procedure is followed for DMA from host to device and is working fine.

Questions:

  1. Is there any suggestions regarding this?
  2. Is there some kind of memory protection for RAM update?
  3. We are not explicitly flushing CPU cache and DMA Adapter cache assuming that it will be taken care by KMDF. Is this assumption correct?
  4. Are there any tools that can track the DMA data inside the host?

Does your device support DMA for address over 4GB?

No. Device DMA does not support address over 4GB.

Shouldn’t you be using AllocateCommonBuffer instead of
MmAllocateContiguousMemorySpecifyCache so that the memory is accessible from
both the system and the device ?

//Daniel

wrote in message news:xxxxx@ntdev…
> Hi,
>
> We are writing a Windows 7 KMDF PCIe device driver for a storage device
> that is capable of DMA read & write. We have followed the KMDF framework
> completely and verified using DriverVerifier and !dma debuggers. We are
> tracking the data on the protocol analyzer that can show the packets
> traveling between PC and Device.
>
> Problem:
> We have an issue in our DMA read.
> We are able to see DMA write (Host to Device) working. That is, DMA write
> command to the device and data is seen on the protocol analyzer.
> On the DMA read, we are seeing DMA read command from the host and data
> from device on the protocol analyzer but we are not seeing the data in the
> allocated buffer.
>
> Our steps:
>
> We followed the stapes to create DMA transactions ( followed PLX9x5x
> example).
> We are creating a Non paged memory using
> MmAllocateContiguousMemorySpecifyCache for buffer
> Build an Mdl for the buffer and generate SGL using
> MmBuildMdlForNonPagedPool
> Initialize Dma using WdfDmaTransactionInitialize
> call WdfDmaTransactionExecute.
>
> When the ProgramReadDma callback gets called, we take the Bus logical
> address for the SGL that gets passed as input and communicate this address
> to the device with a bar register write along with data direction.
>
> We don?t have multiple SGL elements since our data buffer is page aligned
> and total data is 512 bytes. The same procedure is followed for DMA from
> host to device and is working fine.
>
> Questions:
> 1. Is there any suggestions regarding this?
> 2. Is there some kind of memory protection for RAM update?
> 3. We are not explicitly flushing CPU cache and DMA Adapter cache assuming
> that it will be taken care by KMDF. Is this assumption correct?
> 4. Are there any tools that can track the DMA data inside the host?
>
>
>

Actually the driver allocates the buffer, address is converted to physical address and passed to the dma controller. The dma engine does not care; it writes at a given address wherever this is.

Can you make sure about the address in the pcie packets?


Sent from my Android phone with K-9 Mail. Please excuse my brevity.

xxxxx@resplendence.com wrote:

Shouldn’t you be using AllocateCommonBuffer instead of
MmAllocateContiguousMemorySpecifyCache so that the memory is accessible from
both the system and the device ?

//Daniel

wrote in message news:xxxxx@ntdev…
> Hi,
>
> We are writing a Windows 7 KMDF PCIe device driver for a storage device
> that is capable of DMA read & write. We have followed the KMDF framework
> completely and verified using DriverVerifier and !dma debuggers. We are
> tracking the data on the protocol analyzer that can show the packets
> traveling between PC and Device.
>
> Problem:
> We have an issue in our DMA read.
> We are able to see DMA write (Host to Device) working. That is, DMA write
> command to the device and data is seen on the protocol analyzer.
> On the DMA read, we are seeing DMA read command from the host and data
> from device on the protocol analyzer but we are not seeing the data in the
> allocated buffer.
>
> Our steps:
>
> We followed the stapes to create DMA transactions ( followed PLX9x5x
> example).
> We are creating a Non paged memory using
> MmAllocateContiguousMemorySpecifyCache for buffer
> Build an Mdl for the buffer and generate SGL using
> MmBuildMdlForNonPagedPool
> Initialize Dma using WdfDmaTransactionInitialize
> call WdfDmaTransactionExecute.
>
> When the ProgramReadDma callback gets called, we take the Bus logical
> address for the SGL that gets passed as input and communicate this address
> to the device with a bar register write along with data direction.
>
> We don?t have multiple SGL elements since our data buffer is page aligned
> and total data is 512 bytes. The same procedure is followed for DMA from
> host to device and is working fine.
>
> Questions:
> 1. Is there any suggestions regarding this?
> 2. Is there some kind of memory protection for RAM update?
> 3. We are not explicitly flushing CPU cache and DMA Adapter cache assuming
> that it will be taken care by KMDF. Is this assumption correct?
> 4. Are there any tools that can track the DMA data inside the host?
>
>
>


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

>Can you make sure about the address in the pcie packets?
I think also it is very likely a problem. If OP sees Read transaction by a protocol analyzer it very likely goes to a wrong address. OP should carefully debug DMA Read routine.

Igor Sharovar

>>Can you make sure about the address in the pcie packets?

I think also it is very likely a problem. If OP sees Read transaction by a
protocol analyzer it very likely goes to a wrong address.>

I can verify in the Protocol Analyzer that the packets sent from the device are addressed to the same physical address as that of the allocated host buffer.

OP should carefully debug DMA Read routine.

The driver does not have an Read routine, it has an IOCTL. But once I allocate a buffer and communicate its address to the device, there is nothing being done in the driver. I just wait for 2 seconds and check the buffer for the new data ( The device does not support interrupts so I just wait for dma to complete). By this time i can see that the DMA data is transferred at the protocol analyzer but the Allocated buffer still has the old data.! So where should I be debugging?

In DMA chain you must have two physical addresses. Are they both correct?
Could you publish your code which does Read I/Os?

Igor

Do you flush the cache (KeFlushIoBuffers) ?
Though, this should not be the problem on Intel machines.
– pa

On 23-May-2012 20:48, xxxxx@gmail.com wrote:

>> Can you make sure about the address in the pcie packets?
> I think also it is very likely a problem. If OP sees Read transaction by a
> protocol analyzer it very likely goes to a wrong address.>

I can verify in the Protocol Analyzer that the packets sent from the device are addressed to the same physical address as that of the allocated host buffer.

> OP should carefully debug DMA Read routine.

The driver does not have an Read routine, it has an IOCTL. But once I allocate a buffer and communicate its address to the device, there is nothing being done in the driver. I just wait for 2 seconds and check the buffer for the new data ( The device does not support interrupts so I just wait for dma to complete). By this time i can see that the DMA data is transferred at the protocol analyzer but the Allocated buffer still has the old data.! So where should I be debugging?

xxxxx@gmail.com wrote:

We followed the stapes to create DMA transactions ( followed PLX9x5x example).
We are creating a Non paged memory using MmAllocateContiguousMemorySpecifyCache for buffer
Build an Mdl for the buffer and generate SGL using MmBuildMdlForNonPagedPool
Initialize Dma using WdfDmaTransactionInitialize
call WdfDmaTransactionExecute.

If you are using a common buffer, you don’t need a scatter/gather list.
You know the buffer is physically contiguous.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

xxxxx@gmail.com wrote:

I can verify in the Protocol Analyzer that the packets sent from the device are addressed to the same physical address as that of the allocated host buffer.

Where did you get your PCIExpress IP? Are you sure you’re handling the
packet sizes correctly?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Yeah, the key point here. We are dealing daily with polishing our
“non-conformity” to the specs on our PCI express designs! We have got quite
a few times packets dropped when upgrading our DMA controller to native 64
bits, the devil is in the details :). But in this case the OP sees the
packet arriving on the host, assuming this is a software protocol analyzer.
Packets can be dropped by PCIe bridges easily but if it reaches the root
complex, then it should be all right!

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tim Roberts
Sent: Wednesday, May 23, 2012 8:56 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] PCIe Device to Host DMA not working (Windows 7)

xxxxx@gmail.com wrote:

I can verify in the Protocol Analyzer that the packets sent from the
device are addressed to the same physical address as that of the allocated
host buffer.

Where did you get your PCIExpress IP? Are you sure you’re handling the
packet sizes correctly?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> We are creating a Non paged memory using MmAllocateContiguousMemorySpecifyCache for buffer

Build an Mdl for the buffer and generate SGL using MmBuildMdlForNonPagedPool

Use ->AllocateCommonBuffer instead of both.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

>In DMA chain you must have two physical addresses. Are they both correct?

Could you publish your code which does Read I/Os?
What are the 2 physicial addresses? I only know about the physicial address of the buffer that i have created. Also i memtioned that there is no Read IO in my driver, I am doing DMA in an IOCTL.
Here is the code anyway.

Initilization of DMA transaction:

TraceEvents(TRACE_LEVEL_INFORMATION, DBG_PNP, “Alignment returned from WdfDeviceGetAlignmentRequirement = 0x%x, FILE_OCTA_ALIGNMENT = 0x%x”, WdfDeviceGetAlignmentRequirement(DeviceContext->WdfDevice), FILE_OCTA_ALIGNMENT);
WdfDeviceSetAlignmentRequirement( DeviceContext->WdfDevice,
// WdfDeviceGetAlignmentRequirement(DeviceContext->WdfDevice));
FILE_OCTA_ALIGNMENT);

//
// Create a new DMA Enabler instance.
// Use Scatter/Gather, 64-bit Addresses, Duplex-type profile.
//
{
WDF_DMA_ENABLER_CONFIG dmaConfig;
DeviceContext->MaximumTransferLength = DMA_MAX_TRANSFER_LENGTH;
WDF_DMA_ENABLER_CONFIG_INIT( &dmaConfig,
WdfDmaProfilePacket, //WdfDmaProfileScatterGatherDuplex,
DeviceContext->MaximumTransferLength );

TraceEvents(TRACE_LEVEL_INFORMATION, DBG_PNP," - The DMA Profile is WdfDmaProfilePacket");

status = WdfDmaEnablerCreate( DeviceContext->WdfDevice,
&dmaConfig,
WDF_NO_OBJECT_ATTRIBUTES,
&DeviceContext->DmaEnabler );

if (!NT_SUCCESS (status)) {

KdPrint((“WdfDmaEnablerCreate failed: %!STATUS!”, status));
return status;
}
}

WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPE(&attributes, TRANSACTION_CONTEXT);

status = WdfDmaTransactionCreate( DeviceContext->DmaEnabler,
&attributes,
&DeviceContext->ReadDmaTransaction);

if(!NT_SUCCESS(status)) {
KdPrint((“WdfDmaTransactionCreate(read) failed: %!STATUS!”, status));
return status;
}

Transaction creation for data read:

pDevContext->pMemVA = MmAllocateContiguousMemorySpecifyCache( DataSize, PhyMin, PhyMax, PhyMin, MmNonCached);
if( pDevContext->pMemVA == NULL ) {
TraceEvents( TRACE_LEVEL_VERBOSE, DBG_INIT, “MmAllocateContiguousMemorySpecifyCache() FAILED to allocate buffer. Exiting\n”);
return ERROR;
}
memset(pDevContext->pMemVA, 0 , DataSize);
PhysicalAddr = MmGetPhysicalAddress(pDevContext->pMemVA) ;
//Initilize DMA transaction object for this
Mdl = IoAllocateMdl((PUCHAR)pDevContext->pMemVA, DataSize, FALSE, FALSE, NULL);
if( Mdl == NULL ) {
TraceEvents( TRACE_LEVEL_VERBOSE, DBG_INIT, “Mdl forCommon buffer could not be created. Exiting\n”);
return ERROR;
}
MmBuildMdlForNonPagedPool( Mdl );
TraceEvents( TRACE_LEVEL_VERBOSE, DBG_INIT, “Initializing DMA read transaction, Flushing cache for data buffer\n”);
KeFlushIoBuffers( Mdl, TRUE, TRUE);

status = WdfDmaTransactionInitialize( pDevContext->ReadDmaTransaction, NMMPEvtProgramReadDmaDummy, WdfDmaDirectionReadFromDevice, Mdl, MmGetMdlVirtualAddress(Mdl), MmGetMdlByteCount(Mdl) );
if( status != STATUS_SUCCESS ) {
TraceEvents(TRACE_LEVEL_ERROR, DBG_INIT, “WdfDmaTransactionInitialize() Failed. Exiting\n”);
return ERROR;
}

TraceEvents( TRACE_LEVEL_VERBOSE, DBG_INIT, “Initialized DMA read transaction\n”);
status = WdfDmaTransactionExecute ( pDevContext->ReadDmaTransaction, NULL);
if( status != STATUS_SUCCESS ) {
TraceEvents(TRACE_LEVEL_ERROR, DBG_INIT, “WdfDmaTransactionExecute() Failed. Exiting\n”);
return ERROR;
}
TraceEvents(TRACE_LEVEL_ERROR, DBG_IOCTLS,“Virt Address of non paged mem = 0X%X\n”, pDevContext->pMemVA);
}

/*
NMMPEvtProgramReadDmaDummy is a dummy function that does not do anything. But in this function I verify that the Physical address given in SgList is same as the one i get when i use MmGetPhysicalAddress(pDevContext->pMemVA) ;
After this callback is called, I send the Physical address of the buffer to the device as a part of a command that i write to the device bar register.
*/

When your driver knows dev-to-host dma is done, break into debugger. Note
that I don’t use DMA READ which is host centric. It’s very confusing while
looking at PCI trace which is device centric.

Then look at the corresponding MWr burst packet captured on your analyzer.
Find its target mem addr and expand the data field of the packet. Check if
the mem addr matches your PhysicalAddress. If yes, do a !dd on the mem addr
to see if the data matches the data found in the packet. You would pay
attention to the endianness configure of your analyzer. Then do a dd (no
bang) on the virtual address to check data from virtual space. Tell us your
result.

Calvin

On Wed, May 23, 2012 at 10:00 PM, wrote:

> >In DMA chain you must have two physical addresses. Are they both correct?
> >Could you publish your code which does Read I/Os?
> What are the 2 physicial addresses? I only know about the physicial
> address of the buffer that i have created. Also i memtioned that there is
> no Read IO in my driver, I am doing DMA in an IOCTL.
> Here is the code anyway.
>
> Initilization of DMA transaction:
>
> TraceEvents(TRACE_LEVEL_INFORMATION, DBG_PNP, “Alignment
> returned from WdfDeviceGetAlignmentRequirement = 0x%x, FILE_OCTA_ALIGNMENT
> = 0x%x”, WdfDeviceGetAlignmentRequirement(DeviceContext->WdfDevice),
> FILE_OCTA_ALIGNMENT);
> WdfDeviceSetAlignmentRequirement( DeviceContext->WdfDevice,
>
> // WdfDeviceGetAlignmentRequirement(DeviceContext->WdfDevice));
>
> FILE_OCTA_ALIGNMENT);
>
> //
> // Create a new DMA Enabler instance.
> // Use Scatter/Gather, 64-bit Addresses, Duplex-type
> profile.
> //
> {
> WDF_DMA_ENABLER_CONFIG dmaConfig;
> DeviceContext->MaximumTransferLength =
> DMA_MAX_TRANSFER_LENGTH;
> WDF_DMA_ENABLER_CONFIG_INIT( &dmaConfig,
>
> WdfDmaProfilePacket, //WdfDmaProfileScatterGatherDuplex,
>
> DeviceContext->MaximumTransferLength );
>
> TraceEvents(TRACE_LEVEL_INFORMATION, DBG_PNP," -
> The DMA Profile is WdfDmaProfilePacket");
>
> status = WdfDmaEnablerCreate(
> DeviceContext->WdfDevice,
>
> &dmaConfig,
>
> WDF_NO_OBJECT_ATTRIBUTES,
>
> &DeviceContext->DmaEnabler );
>
> if (!NT_SUCCESS (status)) {
>
> KdPrint((“WdfDmaEnablerCreate failed:
> %!STATUS!”, status));
> return status;
> }
> }
>
> WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPE(&attributes,
> TRANSACTION_CONTEXT);
>
> status = WdfDmaTransactionCreate( DeviceContext->DmaEnabler,
>
> &attributes,
>
> &DeviceContext->ReadDmaTransaction);
>
> if(!NT_SUCCESS(status)) {
> KdPrint((“WdfDmaTransactionCreate(read) failed:
> %!STATUS!”, status));
> return status;
> }
>
>
>
>
> Transaction creation for data read:
>
> pDevContext->pMemVA =
> MmAllocateContiguousMemorySpecifyCache( DataSize, PhyMin, PhyMax, PhyMin,
> MmNonCached);
> if( pDevContext->pMemVA == NULL ) {
> TraceEvents( TRACE_LEVEL_VERBOSE, DBG_INIT,
> “MmAllocateContiguousMemorySpecifyCache() FAILED to allocate buffer.
> Exiting\n”);
> return ERROR;
> }
> memset(pDevContext->pMemVA, 0 , DataSize);
> PhysicalAddr = MmGetPhysicalAddress(pDevContext->pMemVA) ;
> //Initilize DMA transaction object for this
> Mdl = IoAllocateMdl((PUCHAR)pDevContext->pMemVA, DataSize,
> FALSE, FALSE, NULL);
> if( Mdl == NULL ) {
> TraceEvents( TRACE_LEVEL_VERBOSE, DBG_INIT, “Mdl
> forCommon buffer could not be created. Exiting\n”);
> return ERROR;
> }
> MmBuildMdlForNonPagedPool( Mdl );
> TraceEvents( TRACE_LEVEL_VERBOSE, DBG_INIT, “Initializing
> DMA read transaction, Flushing cache for data buffer\n”);
> KeFlushIoBuffers( Mdl, TRUE, TRUE);
>
> status = WdfDmaTransactionInitialize(
> pDevContext->ReadDmaTransaction, NMMPEvtProgramReadDmaDummy,
> WdfDmaDirectionReadFromDevice, Mdl, MmGetMdlVirtualAddress(Mdl),
> MmGetMdlByteCount(Mdl) );
> if( status != STATUS_SUCCESS ) {
> TraceEvents(TRACE_LEVEL_ERROR, DBG_INIT,
> “WdfDmaTransactionInitialize() Failed. Exiting\n”);
> return ERROR;
> }
>
> TraceEvents( TRACE_LEVEL_VERBOSE, DBG_INIT, “Initialized
> DMA read transaction\n”);
> status = WdfDmaTransactionExecute (
> pDevContext->ReadDmaTransaction, NULL);
> if( status != STATUS_SUCCESS ) {
> TraceEvents(TRACE_LEVEL_ERROR, DBG_INIT,
> “WdfDmaTransactionExecute() Failed. Exiting\n”);
> return ERROR;
> }
> TraceEvents(TRACE_LEVEL_ERROR, DBG_IOCTLS,“Virt Address of
> non paged mem = 0X%X\n”, pDevContext->pMemVA);
> }
>
> /*
> NMMPEvtProgramReadDmaDummy is a dummy function that does not do anything.
> But in this function I verify that the Physical address given in SgList is
> same as the one i get when i use MmGetPhysicalAddress(pDevContext->pMemVA) ;
> After this callback is called, I send the Physical address of the buffer
> to the device as a part of a command that i write to the device bar
> register.
> */
>
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

@Calvin Guan
I checked the buffer with both virtual and physicial address as you said. The new data ( as shown from the protocol analyzer) is not in it. It still contains a pattern of 0 to f which i had earlier initiated it with. Also mem address of the packet matches exactly with that of buffers physical address.

It sounds like the packet was dropped somewhere along its way to RC.
bummer. Last time (as I remembered) I had this problem is DMA’ing from
hyper-V guest to host in early version of Windows. You aren’t in hyper-V
guest, are you?

What does the TLP attribute bits read? in particular the “no snoop” and
“relaxed ordering”? How many bytes are in the packet? Does the 1st and last
BE look right? Have you tried AllocCommonBuffer instead of MMACSC?

Calvin

On Fri, May 25, 2012 at 6:39 PM, wrote:

> @Calvin Guan
> I checked the buffer with both virtual and physicial address as you said.
> The new data ( as shown from the protocol analyzer) is not in it. It still
> contains a pattern of 0 to f which i had earlier initiated it with. Also
> mem address of the packet matches exactly with that of buffers physical
> address.
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

@Calvin Guan
No we are not in hyper-V guest. “no snoop” and “relaxed ordering” are both set to zeros in the packet attributes.

Also one interesting thing that happened last week is that we saw the data sent from device in the host memory couple of times !!! We verified that the data was in the proper physicial address( the one that we allocate using common buffer), but after we restated the Host we never saw the data in the ram ever again… We tried running the same application and driver many many times without success. Would that suggest anything?

Have you validate your hardware on simpler environment, i.e. DOS? I would
do that if not to get some confident tests on hardware and platform as a
whole.

Calvin

On Sun, Jun 3, 2012 at 10:38 PM, wrote:

> @Calvin Guan
> No we are not in hyper-V guest. “no snoop” and “relaxed ordering” are both
> set to zeros in the packet attributes.
>
> Also one interesting thing that happened last week is that we saw
> the data sent from device in the host memory couple of times !!! We
> verified that the data was in the proper physicial address( the one that we
> allocate using common buffer), but after we restated the Host we never saw
> the data in the ram ever again… We tried running the same application and
> driver many many times without success. Would that suggest anything?
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>