DMA from virtual memory to hardware

My PCI hardware creates interrupt every ~10ms. Upon interrupt the application has to write a buffer to the hardware using the device driver.
I thought to allocate a virtual continuous buffer in application and to send its address to the driver using IOCTL request.
Upon interrupt the driver will start a DMA from this address to the hardware. Upon completion the application and driver will advance the address by a predefined number.
The assumption is that there will always be a new buffer written before an interrupt arrives.
This way the application will not have to handle each interrupt.
But if the virtual memory will be allocated in disk (not in RAM) ? will DMA work ?
Is this a wise strategy ?

Thanks.

That’s usually the way it is done. For example, you use WriteFile, with
your device specifying DO_DIRECT_IO in the device object flags. You will
get a MDL representing the virtual user buffer. You do your DMA transfer
into this buffer.

If you require multiple DMA transfers to satisfy the request, then yes, you
will advance the internal pointer by the size of the previous DMA transfer,
and start the next DMA transfer. When the entire buffer is filled, an error
occurs, or there is some other condition (such as EOF), you complete the IRP
appropriately.

During the DMA transfer, the VM you are using is always locked in memory,
and cannot move (it cannot be paged out, or have its physical address
changed). When you IoCompleteRequest, the MDL is released, and the pages of
the buffer again become pageable.

This is the simplest model. You can use DeviceIoControl or WriteFile as
appropriate, depending upon a number of design decisions (for example, if
your data is best expressed as a ‘packet’, with a specific format that must
be interpreted by the driver to make the device function, DeviceIoControl is
a good choice; if the data is merely a byte stream, WriteFile is the
preferred choice).
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@elta.co.il
Sent: Saturday, December 20, 2008 1:29 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] DMA from virtual memory to hardware

My PCI hardware creates interrupt every ~10ms. Upon interrupt the
application has to write a buffer to the hardware using the device driver.
I thought to allocate a virtual continuous buffer in application and to send
its address to the driver using IOCTL request.
Upon interrupt the driver will start a DMA from this address to the
hardware. Upon completion the application and driver will advance the
address by a predefined number.
The assumption is that there will always be a new buffer written before an
interrupt arrives.
This way the application will not have to handle each interrupt.
But if the virtual memory will be allocated in disk (not in RAM) ? will DMA
work ?
Is this a wise strategy ?

Thanks.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and dangerous content by
MailScanner, and is believed to be clean.

xxxxx@elta.co.il wrote:

My PCI hardware creates interrupt every ~10ms. Upon interrupt the application has to write a buffer to the hardware using the device driver.
I thought to allocate a virtual continuous buffer in application and to send its address to the driver using IOCTL request.
Upon interrupt the driver will start a DMA from this address to the hardware. Upon completion the application and driver will advance the address by a predefined number.
The assumption is that there will always be a new buffer written before an interrupt arrives.
This way the application will not have to handle each interrupt.
But if the virtual memory will be allocated in disk (not in RAM) ? will DMA work ?
Is this a wise strategy ?

A wise strategy for an electronic engineer
is to find somebody software-minded to do the driver.
“It’s better together” (c)
Or… take a course.

Interaction between driver and app is better done with a queue
of pending requests - as in many replies you’ve already received.
The driver completes the request when it’s data is transferred,
the app recycles the request.
It’s possible to get away with one common cyclic buffer -
in the sense that nothing is impossible. But why, when
there’s a simpler and safer way.

The term “common” or “shared” buffer in WDK means locked,
non-pageable, often non-cached kernel memory shared between
the PC and the DMA master (your PCIe card).
This does not mean that the sharing should occur also between
kernel and user mode.

Have a good week, and Seasonal Greetings,
–PA

> From: Pavel A.
> A wise strategy for an electronic engineer
> is to find somebody software-minded to do the driver.
> “It’s better together” (c)
> Or… take a course.

Chip guys usually have no clue on board level troubleshooting. Both board designers and chip designers?usually have very little idea how Windows works.?Many sw folks have trouble understanding flip-flop, D-trigger, gates, transistors, clocking,?PLL, data/clock?settling…

Coming from?an electrical engineering background, although it’s in the field of radar/microwave where digital circuits is less focused, I’m glad that I understand a little bit of both Windows and hardware such that I can entertain both sw and hw guys in their own language. One big reason not to work for pure sw firms where anyone has a CS degree could do much better-:slight_smile:


Calvin Guan
Broadcom Corp.
Connecting Everything(r)

__________________________________________________________________
Instant Messaging, free SMS, sharing photos and more… Try the new Yahoo! Canada Messenger at http://ca.beta.messenger.yahoo.com/

> My PCI hardware creates interrupt every ~10ms. Upon interrupt the application has to write

a buffer to the hardware using the device driver. > I thought to allocate a virtual continuous buffer in application and to send its address to the driver using IOCTL request.
> Upon interrupt the driver will start a DMA from this address to the hardware.
Upon completion the application and driver will advance the address by a predefined number.

Do you actually read the responses that you get??? Look - you have already started two threads, and on both threads Joe and myself are having a tough discussions. One of them is “inverted call vs event”, and, although we disagree on the actual method of informing an app in your particular situation, we both completely agree that, no matter what this mechanism is, you DON’T need to share a buffer, because, in the vast majority of cases this is a really bad idea. What do you think we are supposed to feel when we see that you still want to share a buffer???

As you have stated yourself, Windows in not an RTOS, so that you cannot know in advance when target consumer thread runs. Here we are speaking about interrupt frequency of every ~10ms., i.e. unless any given thread in the system that runs at a moment yields the CPU voluntarily, your interrupt will fire 3 times
before CPU gets taken away from this thread. You don’t know the number of the active threads in the system, and you don’t know how their priorities fair against the one of your thread either. Therefore, it may very well happen that the buffer will get simply exhausted, i.e. get completely filled with data while the consumer thread is in a ready state. What are you going to do then???

Anton Bassov

Output is easier to manage than input.

Nothing fancy is required, certainly not something of this complexity.

Using WriteFile or DeviceIoControl, use Direct I/O mode. Upon completion of
the IRP, start the next packet from the queue of pending IRPs and complete
the current IRP.

Note that starting the next IRP from the queue is a very fast operation,
probably, realistically, < 10us.

Pre-queue a bunch of messages. Note that the space required for buffers is
no greater than the space required by your proposed solution (in fact, you
can arrange it so that you don’t use any more space in either design).

You are hung up on the notion that the application has to process an
interrupt. This is a completely bogus line of reasoning. The application
doesn’t even CARE about the concept of interrupt, nor should it.

A simple FIFO design (no more complex than the old
IoStartPacket/IoStartNextPacket model) is sufficient, but you would want to
use KMDF and cancel-safe queues, but the base complexity is about an order
of magnitude smaller than the idea you are proposing. Small and simple
beats baroque and grotesque every time.

The app still has to generate data fast enough, but the simple FIFO queue
model does it in a manner consistent with the philosophy of Windows I/O, and
the solutions of using a large buffer as you describe result in NO
performance improvement in user-space/kernel-space interaction, and just
create a bizarre structure that will be hard to program and maintain.

Keep the solution simple.

As I already pointed out, the buffer will be locked down (in either case) so
the memory will never be “allocated [on] disk”, that is, paged out. The
total buffer consumption is essentially identical in your model and the
simple FIFO queue model.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Pavel A.
Sent: Saturday, December 20, 2008 3:05 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] DMA from virtual memory to hardware

xxxxx@elta.co.il wrote:

My PCI hardware creates interrupt every ~10ms. Upon interrupt the
application has to write a buffer to the hardware using the device driver.
I thought to allocate a virtual continuous buffer in application and to
send its address to the driver using IOCTL request.
Upon interrupt the driver will start a DMA from this address to the
hardware. Upon completion the application and driver will advance the
address by a predefined number.
The assumption is that there will always be a new buffer written before an
interrupt arrives.
This way the application will not have to handle each interrupt.
But if the virtual memory will be allocated in disk (not in RAM) ? will
DMA work ?
Is this a wise strategy ?

A wise strategy for an electronic engineer is to find somebody
software-minded to do the driver.
“It’s better together” (c)
Or… take a course.

Interaction between driver and app is better done with a queue of pending
requests - as in many replies you’ve already received.
The driver completes the request when it’s data is transferred, the app
recycles the request.
It’s possible to get away with one common cyclic buffer - in the sense that
nothing is impossible. But why, when there’s a simpler and safer way.

The term “common” or “shared” buffer in WDK means locked, non-pageable,
often non-cached kernel memory shared between the PC and the DMA master
(your PCIe card).
This does not mean that the sharing should occur also between kernel and
user mode.

Have a good week, and Seasonal Greetings, --PA


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and dangerous content by
MailScanner, and is believed to be clean.

I’ve often said one of my strengths is I can talk from silicon to =
system.
Hardware types build “test cases” for their hardware on dedicated chips =
on
their development machine (often MS-DOS), convince themselves they have =
a
clue as to what is going on, and get unhappy when real OS programmers
(Windows, linux, Unix, Mac OS X, Solaris…) try to explain why their
hardware is completely unprogrammable on any real system…
joe
=20

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Calvin Guan
Sent: Saturday, December 20, 2008 8:58 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] DMA from virtual memory to hardware

From: Pavel A.
> A wise strategy for an electronic engineer is to find somebody=20
> software-minded to do the driver.
> “It’s better together” (c)
> Or… take a course.

Chip guys usually have no clue on board level troubleshooting. Both =
board
designers and chip designers=A0usually have very little idea how Windows
works.=A0Many sw folks have trouble understanding flip-flop, D-trigger, =
gates,
transistors, clocking,=A0PLL, data/clock=A0settling…

Coming from=A0an electrical engineering background, although it’s in the =
field
of radar/microwave where digital circuits is less focused, I’m glad that =
I
understand a little bit of both Windows and hardware such that I can
entertain both sw and hw guys in their own language. One big reason not =
to
work for pure sw firms where anyone has a CS degree could do much =
better-:slight_smile:


Calvin Guan
Broadcom Corp.
Connecting Everything(r)

__________________________________________________________________
Instant Messaging, free SMS, sharing photos and more… Try the new =
Yahoo!
Canada Messenger at http://ca.beta.messenger.yahoo.com/


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:=20
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=3DListServer


This message has been scanned for viruses and dangerous content by
MailScanner, and is believed to be clean.

> I thought to allocate a virtual continuous buffer in application and to send its address to the driver

using IOCTL request.

Use IoGetDmaAdapter to get the DMA adapter object for the device, do this in MN_START_DEVICE

Used METHOD_IN_DIRECT in your IOCTL code, in this case the IO manager itself will lock the app provided buffer’s pages to Irp->MdlAddress.

Then pass this MDL to either:

  • ->AllocateAdapterChannel and ->MapTransfer in a loop to loop over the whole MDL
    or
    ->GetScatterGatherList

You will get a scatter-gather list, feed it to the hardware and tell it to start.

If your hardware can run over the scatter-gather list in the main RAM, DMAing it to itself and them DMAing the data (the so-called “chain DMA”) - then you also need to allocate the common buffer in MN_START_DEVICE, and place the descriptor chains there.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com