PCIe DMA driver with possible memory leak?

Hi

I have a PCIe DMA BusMaster driver which I got from the Xilinx XAPP1052 application note. It’s an older WDM driver used for DMA transfers to and from an FPGA (Asyncronous / Overlapped mode). I’m only transferring data from the FPGA to the PC. The DMA TX buffer on the PC is allocated by the application and a pointer is sent to the driver.

The problem is this:
If I do 1000 DMA transfers (each of 2kB), the kernel memory is also increased with 1000kB according to the TaskManager, Performance tab. If my application continues at this rate, Windows XP runs out of system resources within a few minutes.
If I exit the application before it runs out of system resources, the application still remains in the TaskList.

I belive this is caused by a memory leak in the driver. I have tried with another application using the same driver and it does the same.

I have checked with WinDbg that the driver behaves as expected (as far as I could see from the KdDebug messages) and I was not able to detect any pending IRP’s after the application was closed.

The problem only occours if at least one DMA transfer is done. If I break my application before it starts the DMA BusMaster transfer, but after setting up the buffer address in the FPGA, there is no leak and the application can be unloaded without problems.

Should the DMA BusMaster transfer allocate any memory in the driver during a transfer and how do I eventually free this memory again before issuing the next transfer?
Are the Scather / Gather lists not free’ed when PutScatterGatherList is called?

Any ideas how to find this leak is appreciated.

Thanks in advance

Best regards.
Brian Rasmussen

Step one is to enable the driver verifier for this driver, this should
help identify if that driver is leaking.

Don Burn
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

“xxxxx@amfitech.dk” wrote in message news:xxxxx@ntdev:

> Hi
>
> I have a PCIe DMA BusMaster driver which I got from the Xilinx XAPP1052 application note. It’s an older WDM driver used for DMA transfers to and from an FPGA (Asyncronous / Overlapped mode). I’m only transferring data from the FPGA to the PC. The DMA TX buffer on the PC is allocated by the application and a pointer is sent to the driver.
>
> The problem is this:
> If I do 1000 DMA transfers (each of 2kB), the kernel memory is also increased with 1000kB according to the TaskManager, Performance tab. If my application continues at this rate, Windows XP runs out of system resources within a few minutes.
> If I exit the application before it runs out of system resources, the application still remains in the TaskList.
>
> I belive this is caused by a memory leak in the driver. I have tried with another application using the same driver and it does the same.
>
> I have checked with WinDbg that the driver behaves as expected (as far as I could see from the KdDebug messages) and I was not able to detect any pending IRP’s after the application was closed.
>
> The problem only occours if at least one DMA transfer is done. If I break my application before it starts the DMA BusMaster transfer, but after setting up the buffer address in the FPGA, there is no leak and the application can be unloaded without problems.
>
> Should the DMA BusMaster transfer allocate any memory in the driver during a transfer and how do I eventually free this memory again before issuing the next transfer?
> Are the Scather / Gather lists not free’ed when PutScatterGatherList is called?
>
> Any ideas how to find this leak is appreciated.
>
> Thanks in advance
>
> Best regards.
> Brian Rasmussen

Do you return STATUS_PENDING from your dispatch routines? Make sure to properly call IoMarkIrpPending when appropriate.

Hi guys

I have already tried with Driver Verifier, but the only counters increasing are the global counters for “Syncronized Runs”, “IRQL-raises”, “Spinlocks” and “Trims” (sorry for my translations but my Win XP is in Danish language).
Do you have any hints how to get more info out of Driver Verifier?

I have also verified that I return STATUS_PENDING from the Control routine when I issue af IOCTL_DMA_START and from the Dispatch routine (at IRP_MJ_WRITE). I do call the IoMarkIrpPending in the Dispatch routine too.

BR
Brian

Have your application issue a single write request only. Set a breakpoint in the write dispatch routine. Issue !thread command to note the current thread object. After the request is completed, issue

!thread

command with the argument equal to the thread where the request was issued. See if it shows any pending IRP in the list.

Hi Alex

I managed to get the breakpoint set in the Dispatch routine at MJ_IRP_WRITE. First time I issue “!thread” there is only one IRP in the list:

IRP List:
89f88f00: (0006,0100) Flags: 40000a00 Mdl: 88c87a58

The DMA transmission is first started when I do a call to StartDMA in the driver Control routine. The !thread output there shows two IRP’s:

IRP List:
89f1ef00: (0006,0100) Flags: 40000030 Mdl: 00000000
89f88f00: (0006,0100) Flags: 40000a00 Mdl: 88c87a58

DMA transmission is then done and in the DpcForIsr routine there is no IRP list in the !thread output.

After issueing a new write request the !thread output in the Dispatch at MJ_IRP_WRITE is:

IRP List:
8b3caf00: (0006,0100) Flags: 40000a00 Mdl: 88c87a58
89f1ef00: (0006,0100) Flags: 40000030 Mdl: 00000000

In Control routine at StartDMA the !thread output is:

IRP List:
8a444f00: (0006,0100) Flags: 40000030 Mdl: 00000000
8b3caf00: (0006,0100) Flags: 40000a00 Mdl: 88c87a58
89f1ef00: (0006,0100) Flags: 40000030 Mdl: 00000000

In DpcForIsr the !thread output is:

IRP List:
8b298e00: (0006,01fc) Flags: 40000000 Mdl: 00000000

Another write request will show another IRP in the Dispatch routine:

IRP List:
89f7ef00: (0006,0100) Flags: 40000a00 Mdl: 88c87a58
8a444f00: (0006,0100) Flags: 40000030 Mdl: 00000000
89f1ef00: (0006,0100) Flags: 40000030 Mdl: 00000000

etc.

So the problem is for sure the IRP list grows. I will try to figure out why the IRP’s aren’t removed properly. Any hints would be a great help.

Thanks for your help so far!

BR
Brian

Are you building your own IRPs?

Hi Alex

No, not as far as I know (I’m new to driver development, so I might have missed something).

I have two places in the driver where it calls IoMarkAsPending. The first one is in Dispatch routine when it receives an IRP_MJ_WRITE. The second one is in the Control routine of the driver where it receives an IOCTL code from a DeviceIoControl call. This sets a register in the FPGA and the DMA transmission starts. This IOCTL call is also marked as STATUS_PENDING and it also calls IoMarkAsPending.

I expect the first IRP to be completed in the DpcForIsr routine, but I have no idea where the last one would get cleared. I have tried to remove the last one and return SUCCESS on the IOCTL call, but I’m not sure if it is the right solution. It seems to be working, except for some timing issues I haven’t seen before…

BR
Brian

You understand that regardless of what you return from your dispatch routine, you have to make sure to call IoCompleteRequest() on the IRP itself too, right?

-p

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@amfitech.dk
Sent: Thursday, October 13, 2011 11:48 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] PCIe DMA driver with possible memory leak?

Hi Alex

No, not as far as I know (I’m new to driver development, so I might have missed something).

I have two places in the driver where it calls IoMarkAsPending. The first one is in Dispatch routine when it receives an IRP_MJ_WRITE. The second one is in the Control routine of the driver where it receives an IOCTL code from a DeviceIoControl call. This sets a register in the FPGA and the DMA transmission starts. This IOCTL call is also marked as STATUS_PENDING and it also calls IoMarkAsPending.

I expect the first IRP to be completed in the DpcForIsr routine, but I have no idea where the last one would get cleared. I have tried to remove the last one and return SUCCESS on the IOCTL call, but I’m not sure if it is the right solution. It seems to be working, except for some timing issues I haven’t seen before…

BR
Brian


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Hi Peter

Yes I know I have to call IoComplete for each IRP. I’m just a bit confused about how I will get back to a specific IRP again.

If I receive an IRP_MJ_WRITE in the dispatch routine and mark the IRP as pending, how will I then get back to the IRP again, so it can be marked as Complete? Is it done in the DpcForIsr routine in the driver? Can it be done elsewhere? E.g. the pending IRP from the Control routine in the driver as I have mentioned above. How will it be possible to get back to this and complete it?

BR
Brian

I just wanted to make sure.

There’s no magic for “getting back to the IRP”. You need to make sure you have a pointer to it stored somewhere. It could be in a KDEVICE_QUEUE if you’re using those for serialization. It could be in a completion routine when you send the IRP to a lower driver. It could be stored in your device extension - either a single pointer the one of that type of request that you process at a time (presumably you would also have a queue of the pending requests) or some sort of list if you’re processing N requests of that type at a time.

If your device can only do one thing at a time then you can only really run one request at a time. So you store the pointer in your device extension (and queue or reject the rest).

If your device can do N things at a time then it probably has some sort of control packet that you write to it. And that probably contains something you can use as a unique identifier on completion to figure out what just completed. If the something is a 64-bit tag value you set, then you could store the IRP pointer there. If it’s a smaller tag, or a unique value you don’t control (like the physical address of the original control packet in your common buffer) then you need to manage a list yourself and do a lookup.

If you’re completing all the outstanding IRPs yourself then you need to look at whether your application is picking up all those completions. If it’s not then you may have a pending problem rather than an IRP completion problem.

If you use KMDF for this the management of I/O flow and pending is handled for you.

-p

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@amfitech.dk
Sent: Thursday, October 13, 2011 12:35 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] PCIe DMA driver with possible memory leak?

Hi Peter

Yes I know I have to call IoComplete for each IRP. I’m just a bit confused about how I will get back to a specific IRP again.

If I receive an IRP_MJ_WRITE in the dispatch routine and mark the IRP as pending, how will I then get back to the IRP again, so it can be marked as Complete? Is it done in the DpcForIsr routine in the driver? Can it be done elsewhere? E.g. the pending IRP from the Control routine in the driver as I have mentioned above. How will it be possible to get back to this and complete it?

BR
Brian


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Hi Peter

As far as I can see in the code, there is only one handle to an active (pending?) IRP in the device context. There is also a write queue there, where IRP’s can be queued, but it is not used. I’m only processing one pending IRP at a time.

I have now made a “hack” in the driver so the application can do an IOCTL to get the state of the actual DMA transmission, before it issues another one. I’m quite sure this is not the way to do it, but it works for now with the required speed. The project is behind schedule so I might have to dig into this another time… I think I would prefer a blocking IOCTL call which makes the application completely independent of the driver state.

Thanks for your help.

BR
Brian

Does the driver call IoStartPacket() or KeInsertDeviceQueue() anywhere? Both of those are routines commonly used to queue requests from the dispatch routines.

-p

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@amfitech.dk
Sent: Friday, October 14, 2011 2:32 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] PCIe DMA driver with possible memory leak?

Hi Peter

As far as I can see in the code, there is only one handle to an active (pending?) IRP in the device context. There is also a write queue there, where IRP’s can be queued, but it is not used. I’m only processing one pending IRP at a time.

I have now made a “hack” in the driver so the application can do an IOCTL to get the state of the actual DMA transmission, before it issues another one. I’m quite sure this is not the way to do it, but it works for now with the required speed. The project is behind schedule so I might have to dig into this another time… I think I would prefer a blocking IOCTL call which makes the application completely independent of the driver state.

Thanks for your help.

BR
Brian


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer