DMA transfe size limitation Issue

I am writing WDF driver for a PCIe device in Windows 7, 64 bit. We required a high performance data transfer in the system and using Scatter gather DMA transaction . Iam facing an issue with larger DMA tansfer size. We have to do both write and read operation with the device.

The issue is, iam able to read only fixed size of data ( 3792 bytes ) from the device,
whatever be the requested data size(right now i have tested only with 16 KB). We are using scatter-gather DMA.

I will explan the process iam doing.

Trial 1 :
I will recieve the application buffer in IOCTL with WdfRequestRetrieveOutputWdmMdl().
In the read DMA function iam initializing DMA transaction by calling WdfDmaTransactionInitialize().
Execute the transaction using WdfDmaTransactionExecute();
In the EvtProgramReadDma() call back, iam doing Discriptor table and the device register configuration. After DMA completion iam calling WdfDmaTransactionRelease() and WdfRequestCompleteWithInformation(). Finally iam trying to print the data in the sample application. But only 3792 bytes i can see in the application.

WdfDmaEnablerGetFragmentLength() returns 0x19000 which is 100 KB.
I have taken return value of MmGetSystemAddressForMdlSafe() and printed the content from the driver in Debugview. There also i could see only 3792 bytes transfer.

Trial 2 :
But when iam creating a seperate memory in driver with WdfMemoryCreate() in a nonpaged pool area, And allocating mdl using IoAllocateMdl() and build the mdl using MmBuildMdlForNonPagedPool(). Finally iam doing the DMA initialize and DMA execute.
After DMA completion when iam trying to print the buffer, i can see full data (16KB) in the Debugview.

Can anybody tell why it is happening and how can i increase the transfer size?
Final transger size should be in some MBs for me.

xxxxx@gmail.com wrote:

The issue is, iam able to read only fixed size of data ( 3792 bytes ) from the device,

I’d like to point out, just for completeness, that you are ASSUMING that
the problem is that your transfer stopped at 3792 bytes. It is equally
possible that the transfer completed to its full length, but that the
rest of the bytes were written to other pages in memory. DMA problems
are tricky.

whatever be the requested data size(right now i have tested only with 16 KB). We are using scatter-gather DMA.

WdfDmaEnablerGetFragmentLength() returns 0x19000 which is 100 KB.
I have taken return value of MmGetSystemAddressForMdlSafe() and printed the content from the driver in Debugview. There also i could see only 3792 bytes transfer.

The most likely possibility is that you are programming your hardware
incorrectly. The process of translating the page entries in an MDL to
whatever descriptor format your hardware requires is not necessarily
trivial. Perhaps you should post that code. It’s likely, for example,
the 3792 is just the amount of data in the first page of your MDL. If
you had messed up the addresses of succeeding page entries, that might
cause what you see.

Is there a chance your hardware stops transferring when it gets to an
entry less than 4096 bytes? That would be a fatal design flaw, but it’s
an assumption that a hardware designer might make (“all pages except the
last must be full pages, right?”).


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Thanks Tim for your reply…

Is there a chance your hardware stops transferring when it gets to an
entry less than 4096 bytes?

But As i mentioned in the trial 2, when iam creating a buffer in the driver, allocating MDL after that
Iam able to get full data in the driver. So i think hardware is always sending the complete data.

It is equally possible that the transfer completed to its full length, but that the
rest of the bytes were written to other pages in memory.

I suspect that can be an issue. Any way, i will post the code. Please check it.

Here the code…

#define DEV_TO_HOST_DATA_SIZE 16*1024
#define DMA_MAX_TRANSFER_LENGTH 100*1024

// DMA Initialization function which has called while driver installation.
NTSTATUS PCIInitializeDMA( PDEVICE_CONTEXT DeviceContext )
{
WDF_OBJECT_ATTRIBUTES attributes;
WDF_DMA_ENABLER_CONFIG dmaConfig;
WdfDeviceSetAlignmentRequirement( DeviceContext->WdfDevice, FILE_LONG_ALIGNMENT );
DeviceContext->MaximumTransferLength = DMA_MAX_TRANSFER_LENGTH;
WDF_DMA_ENABLER_CONFIG_INIT( &dmaConfig,
WdfDmaProfileScatterGatherDuplex,
DeviceContext->MaximumTransferLength );
status = WdfDmaEnablerCreate( DeviceContext->WdfDevice,
&dmaConfig,
WDF_NO_OBJECT_ATTRIBUTES,
&DeviceContext->DmaEnabler );
WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPE(&attributes, TRANSACTION_CONTEXT);
status = WdfDmaTransactionCreate( DeviceContext->DmaEnabler,
&attributes,
&DeviceContext->ReadDmaTransaction);
}

//Recieving the buffer from the application in an IOCTL code.
PCIIoctlFpgaToHostDMA(
__in WDFQUEUE Queue,
__in WDFREQUEST Request,
__in size_t OutBufferSize,
__in size_t InBufferSize)
{
PMDL mdl = NULL;
status = WdfRequestRetrieveOutputWdmMdl( Request,&mdl );
PCIEvtIoRead(Queue,Request,mdl);
return;
}

PCIEvtIoRead(
IN WDFQUEUE Queue,
IN WDFREQUEST Request,
OUT PMDL mdl
)
{
MmProbeAndLockPages(mdl,KernelMode,IoReadAccess);// Locking the pages
length = MmGetMdlByteCount(mdl);
virtualAddress = MmGetMdlVirtualAddress(mdl);
// Initialize DMA transaction
status = WdfDmaTransactionInitialize( deviceContext->ReadDmaTransaction,
PCIEvtProgramReadDma,
WdfDmaDirectionReadFromDevice,
mdl,
virtualAddress,
length );
// Execute DMA transaction
status = WdfDmaTransactionExecute( deviceContext->ReadDmaTransaction, WDF_NO_CONTEXT);
bytesTransferred = WdfDmaTransactionGetBytesTransferred( deviceContext->ReadDmaTransaction );
MmUnlockPages(mdl);
WdfDmaTransactionRelease(deviceContext->ReadDmaTransaction);
WdfRequestCompleteWithInformation( Request, status, bytesTransferred);
}

//Evt call back function
BOOLEAN PCIEvtProgramReadDma(
IN WDFDMATRANSACTION Transaction,
IN WDFDEVICE Device,
IN WDFCONTEXT Context,
IN WDF_DMA_DIRECTION Direction,
IN PSCATTER_GATHER_LIST SgList
)

{
PHYSICAL_ADDRESS PhyMin, PhyMax;
PhyMin.LowPart = 0;
PhyMin.HighPart = 0;
PhyMax.LowPart = 0xFFFFFFFF;
PhyMax.HighPart = 0x0;
deviceContext = PCIGetDeviceContext(Device);
// Allocating buffer for Device DTE.
RcBufDT = MmAllocateContiguousMemorySpecifyCache( DEV_TO_HOST_DATA_SIZE, PhyMin, PhyMax, PhyMin, MmNonCached);
// Programming the device DTE
RcBufDT[i] = SgList->Elements[0].Address.HighPart;
RcBufDT[i] = SgList->Elements[0].Address.LowPart;

// Programming the device DMA registers.

// waiting for the DMA completion
//Return back from callback function

}

One of the ways that would be useful to check this out would be to know
what address you were reading into. For example, if it is a
page_boundary-3792, that would confirm the hypothesis that the data is
somehow appearing elsewhere. Or try another trick: each time you read,
increment the buffer pointer by 1 on each call,
making sure you have enough buffer to cover the transfer. For example, if
you have a 4096-byte transfer, I’d try

LPBYTE buffer;

buffer = new BYTE [2 * MAX_TRANSFER_SIZE];

DWORD bytesRead;

for(int i = 0; i < MAX_TRANSFER_SIZE; i++)
{
ReadFile(handle, &buffer[i], MAX_TRANSFER_SIZE, bytesRead, NULL);

}

and then, on each ReadFile, see if the buffer truncates at 3792, 3791,
3790, etc. bytes. What does the “bytes transferred” result get based on?
For example, if ReadFile always says MAX_TRANSFER_BYTES are read, but the
buffer does not contain all the bytes, then a trick like setting the
buffer to some bizarre fill pattern (such as 0xEF, or as one company did
many years ago, 0xDEADBEEF), performing the ReadFile, then seeing how much
of this bogus pattern is left, will help identify the problem. Just
knowing what this experiment would produce might be informative in other
ways.

Page-sized boundary errors are always suspect, especially when doing DMA.
If you do some _tprintf() statements from your app and some
DbgPring/KdPrint or whatever from your driver, and look for correlations,
this could also be informative.
joe

xxxxx@gmail.com wrote:
> The issue is, iam able to read only fixed size of data ( 3792 bytes )
> from the device,

I’d like to point out, just for completeness, that you are ASSUMING that
the problem is that your transfer stopped at 3792 bytes. It is equally
possible that the transfer completed to its full length, but that the
rest of the bytes were written to other pages in memory. DMA problems
are tricky.

>
> whatever be the requested data size(right now i have tested only with 16
> KB). We are using scatter-gather DMA.
> …
> WdfDmaEnablerGetFragmentLength() returns 0x19000 which is 100 KB.
> I have taken return value of MmGetSystemAddressForMdlSafe() and printed
> the content from the driver in Debugview. There also i could see only
> 3792 bytes transfer.

The most likely possibility is that you are programming your hardware
incorrectly. The process of translating the page entries in an MDL to
whatever descriptor format your hardware requires is not necessarily
trivial. Perhaps you should post that code. It’s likely, for example,
the 3792 is just the amount of data in the first page of your MDL. If
you had messed up the addresses of succeeding page entries, that might
cause what you see.

Is there a chance your hardware stops transferring when it gets to an
entry less than 4096 bytes? That would be a fatal design flaw, but it’s
an assumption that a hardware designer might make (“all pages except the
last must be full pages, right?”).


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Have you tried running driver verifier with DMA verification on?

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Saturday, July 28, 2012 12:04 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] DMA transfe size limitation Issue

Thanks Tim for your reply…

Is there a chance your hardware stops transferring when it gets to an
entry less than 4096 bytes?

But As i mentioned in the trial 2, when iam creating a buffer in the driver, allocating MDL after that Iam able to get full data in the driver. So i think hardware is always sending the complete data.

It is equally possible that the transfer completed to its full length,
but that the
rest of the bytes were written to other pages in memory.

I suspect that can be an issue. Any way, i will post the code. Please check it.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

I would have assumed that had already been done. Yes, this is essential.
Until this has been done, there is no point to doing additional debugging.
joe

Have you tried running driver verifier with DMA verification on?

Jan

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@gmail.com
Sent: Saturday, July 28, 2012 12:04 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] DMA transfe size limitation Issue

Thanks Tim for your reply…
> Is there a chance your hardware stops transferring when it gets to an
entry less than 4096 bytes?

But As i mentioned in the trial 2, when iam creating a buffer in the
driver, allocating MDL after that Iam able to get full data in the driver.
So i think hardware is always sending the complete data.

> It is equally possible that the transfer completed to its full length,
> but that the
rest of the bytes were written to other pages in memory.

I suspect that can be an issue. Any way, i will post the code. Please
check it.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

xxxxx@gmail.com wrote:

Here the code…

#define DEV_TO_HOST_DATA_SIZE 16*1024
#define DMA_MAX_TRANSFER_LENGTH 100*1024

DEV_TO_HOST_DATA_SIZE is the size of the buffer that hold the DMA
descriptors. Why is this 16k? If the maximum transfer is 100k, that is
at most 25 entries in the scatter/gather list. Even with an 8-byte
physical address and an 8-byte length, that’s only 400 bytes.

PCIEvtIoRead(
IN WDFQUEUE Queue,
IN WDFREQUEST Request,
OUT PMDL mdl
)

The MDL is an “IN” parameter to this routine, not an “OUT”.

// Execute DMA transaction
status = WdfDmaTransactionExecute( deviceContext->ReadDmaTransaction, WDF_NO_CONTEXT);
bytesTransferred = WdfDmaTransactionGetBytesTransferred( deviceContext->ReadDmaTransaction );
MmUnlockPages(mdl);
WdfDmaTransactionRelease(deviceContext->ReadDmaTransaction);
WdfRequestCompleteWithInformation( Request, status, bytesTransferred);

You are assuming that WdfDmaTransactionExecute is synchronous – that it
will not return until the entire transaction is complete. That’s not
correct. It returns once the transaction has STARTED. If you unlock
the pages while the transaction is underway, the hardware might write to
pages that are no longer mapped.

When the DMA completes, presumably you will get an interrupt. It is up
to your ISR’s DPC to release the transaction and complete the request.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.