Common buffer DMA driver and application communication

Hello,

my name is Florian and I am a computer science student from germany.
In my job as a software developer I took the task of developing a device driver for custom hardware.
There is a linux driver for this device, and now a windows driver has to be developed.

This is the first time I am in contact with driver development and so i did a lot of reading.

The device is a PCI card with an FPGA acting as DSP to process audio data. This data should then be transferred to a user application via DMA.
Further the firmware for the FPGA must be uploaded to the device on initialization.

My conclusion is, that I want to develop a kmdf driver that uses a common buffer to transfer data. The hardware does not support scatter gather, so common buffer should be de next efficient Solution.
Now I am able to allocate a common buffer as shown in the WinDDK examples.

The questions I ask myself are:

  1. How does the driver communicate with the user application? In the end the application should configure how big the common buffer is, so the driver will be able to allocate the buffer as soon as the application tells it, how big the buffer has to be.
    (I read about IOCTL stuff there, but I am not sure how it is supposed to be used)
  2. My understanding is, that I simply give the buffer address to the hardware and let it start the DMA transfer, and it notifies the driver when it is finished. Then the driver will notify the application, and the application can process the data in the common buffer.

Are my assumptions correct? Is there an example code for such a constellation of driver/device, common buffer and user application anywhere?
I just can find examples where scatter gather is used, and almost no examples which explain the communications between application and driver.

Thanks in advance for any help,
Florian

Be certain to read the article from The NT Insider about basic DMA concepts in Windows:

http://www.osronline.com/article.cfm?article=539

I suspect this will help you quite a lot.

Let’s settle the basic architecture question first:

Hmmmm… MAYbe. But unless your device is DESIGNED as a common buffer device, that is the data is described by a series of descriptors that are located in the common buffer, this is unlikely to be the best choice.

In ANY case, what you almost certainly do NOT want to do is copy any data from the user data buffer to the common buffer (if you can at all avoid it).

To your other questions:

  1. How does the driver communicate with the user application?

Well, it sends reads, writes, and/or device controls of course. Think of how an application reads or writes a disk: The user calls ReadFile specifying the base address and length of the data buffer into which the storage stack (handwave) will place the data to be read. Write works the same, just in the other direction. “Device Control”? That’s simple! That’s just the function used for “everything else” that’s not a read or a write. If you were writing a device driver for a missile launcher, you’d need a function that a user can send to the driver to launch missiles… and that wouldn’t be a read or a write, right? It’s be some OTHER function. That’s a Device Control, and the control code – which can be defined by the driver – defines for the driver and application WHAT that “other” function to be performed is.

  1. My understanding is, that I simply give the buffer address to the hardware
    and let it start the DMA transfer, and it notifies the driver when it is
    finished.

Modulo lots of handwaving, that’s correct. You want to follow the established programming patterns in KMDF here:

  1. Create a DMA Enabler for your device (typically done in EvtDriverDeviceAdd)
  2. Create a DMA Transaction – If you device handles on request at a time, this can also be done quite easily in EvtDriverDeviceAdd and the WDF Transaction can be re-used.

Then… for each request you get from the user:

  1. Initialize the DMA Transaction

  2. Execute the DMA Transaction – This will result in your getting a callback with a scatter gather list that you’ll use to program your device. You do that, and then you tell your device to execute the operation you programmed.

Now, you said your device doesn’t support scatter/gather. That’s fine… when you create your DMA Enabler you’ll describe (as part of the profile you indicate) whether your device supports Scatter/Gather and 64-bit addresses or not. If you do NOT support scatter/gather… presto! Windows will ensure your scatter/gather list never comprises more than one element per call to your execution routine.

  1. Complete the transaction

  2. Complete the request.

The KMDF documentation in the WDK is VERY helpful on this… It really DOES show you step-by-step what you need to do. Read and re-read. The examples in the WDK (the code examples in the documentation and the sample drivers) are only fair, I’m sorry to say.

I hope that helps… post back if you have more specific questions. But DO be sure you read that article from The NT Insider and then re-read the DMK’s description of DMA operations in KMDF.

Peter
OSR

An application will not have direct view to the common buffer. So to transfer data you’ll have to just copy it to/from the app IOCTL buffer. With typical sound data rates this extra copying cost is not a problem.

You could have the application IOCTL pending, and when the transfer gets enough data, copy it to the buffer and complete the IOCTL. This way the application gets notified AND gets data at the same time.

Hello,

I read the suggested articles and deepened my knowledge in windows driver development.

I came to the conclusion, that it's the best way to start with the driver/application communication and began to implement IOCTLs.
This went fine and I can call my own IOCTLs from a test application.

The next step was to read and write registers.

I implemented an IOCTL that takes a memory offset and returns the value:

=================================================
case IOCTL_CM_REGISTER_READ:

TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, "Called IOCTL_CM_REGISTER_READ\n");

status = WdfRequestRetrieveInputBuffer(Request, 0, &inBuf, &bufSize);
if(!NT_SUCCESS(status)) {
status = STATUS_INSUFFICIENT_RESOURCES;
break;
}
ASSERT(bufSize == InputBufferLength);

regdata = *(PREGDATA)inBuf;

TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, "Application sent: %08X\n", regdata.offset);

status = WdfRequestRetrieveOutputBuffer(Request, 0, &outBuf, &bufSize);
if(!NT_SUCCESS(status)) {
status = STATUS_INSUFFICIENT_RESOURCES;
break;
}

ASSERT(bufSize == OutputBufferLength);

// Writing to the buffer over-writes the input buffer content
//

TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, "Reading at: "%p"\n", devCtx->RegsBase+regdata.offset);

regdata.data = *(unsigned int*)(devCtx->RegsBase+regdata.offset);
TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, "Driver read: "%08X" at offset "%08X"\n", regdata.data, regdata.offset);

regdata.data = (unsigned int)READ_REGISTER_UCHAR(devCtx->RegsBase+regdata.offset);
TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, "Driver read (READ_REGISTER_ULONG): "%08X" at offset "%08X"\n", regdata.data, regdata.offset); // Unexpected data

RtlCopyMemory(outBuf, &regdata, OutputBufferLength);

// Assign the length of the data copied to IoStatus.Information
// of the request and complete the request.
//
WdfRequestSetInformation(Request,
OutputBufferLength < datalen ? OutputBufferLength:datalen);

// When the request is completed the content of the SystemBuffer
// is copied to the User output buffer and the SystemBuffer is
// is freed.

break;

As you can see, I tried two ways of reading: simply return the value, or use READ_REGISTER_UCHAR.
Both ways lead to the same result: I am getting values in the dumps, and these are the same everything I try it.

So far so good.

Then I wanted to write registers. But it seems the registers are read-only or something. No matter how i try it, the value in the registers remains unchanged.
Here's my code:

=================================================
case IOCTL_CM_REGISTER_WRITE:

TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, "Called IOCTL_CM_REGISTER_WRITE\n");

status = WdfRequestRetrieveInputBuffer(Request, 0, &inBuf, &bufSize);
if(!NT_SUCCESS(status)) {
status = STATUS_INSUFFICIENT_RESOURCES;
break;
}
ASSERT(bufSize == InputBufferLength);

TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, "InputBuffer: %p , BufferSize: %08X", inBuf, bufSize);

regdata = *(PREGDATA)inBuf;

TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, "Regdata: Offset-> %08X Size->%08X", regdata.offset, sizeof(regdata.data));

TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, "Application wants to write "%08X" to offset "%08X"\n", regdata.data, regdata.offset);

// Writing to the buffer over-writes the input buffer content
//
memPointer = (PULONG)(devCtx->RegsBase + regdata.offset);
TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, "Address: %p\n", memPointer);
TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, "Data at %p before copy: 0x%08X", memPointer, *memPointer);

regdata.data = *(PULONG)(devCtx->RegsBase+regdata.offset);

regdata.data = regdata.data | ( 1 << 1);

//memcpy(memPointer, &regdata.data, sizeof(regdata.data));
WRITE_REGISTER_ULONG(memPointer, regdata.data);

TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, "Data at %p after copy: 0x%08X", memPointer, *memPointer);
regdata.data = 0;
regdata.offset = 0;
TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, "Data at %p after reset: 0x%08X", memPointer, *memPointer);

// Assign the length of the data copied to IoStatus.Information
// of the request and complete the request.
//
WdfRequestSetInformation(Request,
InputBufferLength < datalen ? InputBufferLength:datalen);

// When the request is completed the content of the SystemBuffer
// is copied to the User output buffer and the SystemBuffer is
// is freed.

break;

Again I tried two methods: direct writing and using WRITE_REGISTER_ULONG. The result was the same: no changes...

My question is now: Is my code correct and maybe I'm missing something related to my hardware?
Or is something wrong in the way I try to read and write registers?

mapping of registers:

// Map in the Registers Memory resource: BAR1
devCtx->RegsBase = (PUCHAR) MmMapIoSpace( regsBasePA,
regsLength,
MmNonCached );

devCtx->RegsLength = regsLength;

// Set seperated pointer to PCI5020_REGS structure.
devCtx->Regs = (PREG5020) devCtx->RegsBase;

Thanks in advance,
Florian

xxxxx@eurokey.de wrote:

I read the suggested articles and deepened my knowledge in windows driver development.

I came to the conclusion, that it’s the best way to start with the driver/application communication and began to implement IOCTLs.
This went fine and I can call my own IOCTLs from a test application.

The next step was to read and write registers.

I implemented an IOCTL that takes a memory offset and returns the value:

regdata.data = *(unsigned int*)(devCtx->RegsBase+regdata.offset);
TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, “Driver read: "%08X" at offset "%08X"\n”, regdata.data, regdata.offset);

regdata.data = (unsigned int)READ_REGISTER_UCHAR(devCtx->RegsBase+regdata.offset);
TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, “Driver read (READ_REGISTER_ULONG): "%08X" at offset "%08X"\n”, regdata.data, regdata.offset); // Unexpected data

The first one is reading a 4-byte value. The second one is reading a
1-byte value and zero-extending it. Did you intend to use
READ_REGISTER_ULONG?

Then I wanted to write registers. But it seems the registers are read-only or something. No matter how i try it, the value in the registers remains unchanged.
Here’s my code:

memPointer = (PULONG)(devCtx->RegsBase + regdata.offset);
TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, “Address: %p\n”, memPointer);
TraceEvents(TRACE_LEVEL_VERBOSE, DBG_IOCTLS, “Data at %p before copy: 0x%08X”, memPointer, *memPointer);

regdata.data = *(PULONG)(devCtx->RegsBase+regdata.offset);
regdata.data = regdata.data | ( 1 << 1);

//memcpy(memPointer, &regdata.data, sizeof(regdata.data));
WRITE_REGISTER_ULONG(memPointer, regdata.data);

You are not using the regdata.data value that the user provided. You
are throwing that away, then reading the old contents of the register,
setting bit 1, and writing it back out. It’s no surprise this sets the
same value every time. What are those two lines in the middle supposed
to be doing?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

=================================================
regdata.data = *(PULONG)(devCtx->RegsBase+regdata.offset);

regdata.data = regdata.data | ( 1 << 1);

//memcpy(memPointer, &regdata.data, sizeof(regdata.data));
WRITE_REGISTER_ULONG(memPointer, regdata.data);

This Code was testcode. I simply wanted to take the current value and set bit 1, to see if I can actually write to the register.
When I tried to write the value provided by regdata.data nothing changed at all.

So no matter what I try to write to the registers, the content remains unchanged.
So my question is: Is there something wrong with my code, or do i have to search the problem at the hardware itself?

xxxxx@eurokey.de wrote:

=================================================
regdata.data = *(PULONG)(devCtx->RegsBase+regdata.offset);

regdata.data = regdata.data | ( 1 << 1);

//memcpy(memPointer, &regdata.data, sizeof(regdata.data));
WRITE_REGISTER_ULONG(memPointer, regdata.data);

This Code was testcode. I simply wanted to take the current value and set bit 1, to see if I can actually write to the register.
When I tried to write the value provided by regdata.data nothing changed at all.

So no matter what I try to write to the registers, the content remains unchanged.
So my question is: Is there something wrong with my code, or do i have to search the problem at the hardware itself?

Well, in that particular circumstance, you are reading from
RegsBase+regdata.offset and writing to memPointer. We don’t have any
assurance that the two addresses actually point to the same place. And
if they do, then why use two different expressions?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Hello, it’s me again.

Communication with our driver is now fine. We implemented IOCTLs for reading and writing registers, to reset the hardware etc.

We are now trying to implement complete tasks, such as uploading a firmware.
In the existing linux driver this is achieved by mapping the PCI BAR0 of the device into application space and then the firmware is copied page-by-page into that memory.

I learned from many posts here that this is not a good idea to do in windows.

So how would i transfer a firmware to the PCI memory of my device?

There are 256k of memory available, and the firmware is about 3 times this size. In the existing driver after each page, the hardware is notified and copies the chunk of firmware to some internal memory. After the transfer is complete, the card is restarted and loads the firmware.

What is common (and good) practice here?

Thanks in Advance,
Florian

Application issues write of 256K chunk to driver.
Driver pends the IO request.
Driver copies 256K to device.
Device notifies driver it is ready for next chunk.
Driver completes write.

Repeat as needed.

Mark Roddy

On Fri, Jun 10, 2011 at 5:12 AM, wrote:
> Hello, it’s me again.
>
> Communication with our driver is now fine. We implemented IOCTLs for reading and writing registers, to reset the hardware etc.
>
> We are now trying to implement complete tasks, such as uploading a firmware.
> In the existing linux driver this is achieved by mapping the PCI BAR0 of the device into application space and then the firmware is copied page-by-page into that memory.
>
> I learned from many posts here that this is not a good idea to do in windows.
>
> So how would i transfer a firmware to the PCI memory of my device?
>
> There are 256k of memory available, and the firmware is about 3 times this size. In the existing driver after each page, the hardware is notified and copies the chunk of firmware to some internal memory. After the transfer is complete, the card is restarted and loads the firmware.
>
> What is common (and good) practice here?
>
> Thanks in Advance,
> Florian
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

Uhm well…

The sequence of events is not the problem.
The question is, with which technique the data should be transferred. Transfer the data with IOCTLs and buffers or…

I’am (as one can see in my previous posts) not a pro in driver development.
So I am asking for a mechanism to complete this task.

Thanks
Florian

> The question is, with which technique the data should be transferred. Transfer the data with IOCTLs and buffers

Yes, this is best.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

xxxxx@eurokey.de wrote:

The sequence of events is not the problem.
The question is, with which technique the data should be transferred. Transfer the data with IOCTLs and buffers or…

One METHOD_OUT_DIRECT ioctl should do it. You could use WriteFile if
that makes more sense to you, but that seems a bit silly.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

“Tim Roberts” wrote in message news:xxxxx@ntdev…
> xxxxx@eurokey.de wrote:
>> The sequence of events is not the problem.
>> The question is, with which technique the data should be transferred.
>> Transfer the data with IOCTLs and buffers or…
>
> One METHOD_OUT_DIRECT ioctl should do it. You could use WriteFile if
> that makes more sense to you, but that seems a bit silly.
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.

DeviceIoControl is very powerful and a real blessing… until a need arises
to port the solution to other OS.
People with Linux background prefer reads & writes because they are told to
avoid ioctl.

/* Some concrete details for those who like them :
http://unix.stackexchange.com/questions/4711/what-is-the-difference-between-ioctl-unlocked-ioctl-and-compat-ioctl
*/

Linux folks may think that “silly” is requirement to write a custom kernel
driver,
instead of being able to just open and mmap a PCI BAR from usermode :slight_smile:

Regards,
– pa