WDF Serial driver writes to user stack

I have purchased PC104+ (PCI) HDLC card which comes with driver source on git-com, unfortunately the driver sometimes writes ulong 06 to user stack of calling function while it is processing a FileWrite or FileRead call,

I use overlapped File descriptor, but the data write occurs before the user program returns from FileWrite

I have never developed driver before, I have used VS2013 to run windbg thru a null modem cable, the verbose WDF log shows nothing out of the ordinary that I can see I can post more when I am back at work, I am also able to connect to driver source & add a breakpoint

I am using a partial checked version of Windows 7 x64 with a checked version of the driver
it uses version 1.09 of WDF, built by msbuild command line interface.

THe driver works by each IOCTL calls to setup 1 of the 2 FSCC ports on the card, and then FileRead to read the next HDLC Frame from a circlular buffer or FileWRite to put an HDLC frame in a transmit buffer,

My bug seems to have been present thru 2 generations of the driver?!

I can see the ulong that is corrupted in the user program but dont know how to relate this to the driver, or how to pin point it if I could, that is before I get near a stratagy for the fix

Manufactures want support us but origonal developer has left, there is not really an alternative on the market as card is actually a good one, we had used there previous version of the card but only runs in Windows XP, any pointers would be great

Another NT Insider article about WDM or WDF reccomends Russinovich and Solomon
is this Windows Insider could u help with exactly which book would be best as amazon is not very clear

Any suggestions or articles to check would be so greatly recieved with much thanks

Best Wishes Nick

Finding memory corruption can be tricky. What I would suggest is you need to find a way to make the corruption reproducible. Is it always the same location that?s corrupted, or does to vary based on some factor or is it random. I?m assuming since you know corruption happens, you know how to find the corruption, so is not just some random memory location is changed in your process. It also sometimes helps to see what conditions the corruption is sensitive too, so if the FileWrite parameters are currently stored in stack locals, allocate some heap blocks to store them instead. The goal is to figure out what the the corruption is ?connected? to. Once you can influence the corruption, it sometimes possible to come up with tests that extract more details about it.

The goal is also to make a simple program that can reproduce the corruption, so if the normal program that does it is 800K lines of code, try to extract into a tiny test program just enough to stimulate the problem. The goal is to keep reducing the complexity of the problem until it within the range of what can be observed and understood.

If you get a simple reproducible case, you should perhaps give that test to your device vendor and asks them to actually fix it. It is hard for vendors to fix things they can?t reproduce, but if it can be easily reproduced then the vendor should be responsible for fixing it, unless perhaps your use case is very unusual. For bugs in rare use cases, the vendor has to make a choice about the cost of loosing you as a customer vs the cost to fix the bug. Sometimes it?s not cost justified to fix a rarely stimulated bug. Vendors sometimes are expected to fix bugs that turn out to be bugs in customer code, so are often unwilling to even look at at issue until you the customer can prove the bug in a simple test case.

The WriteFile path in the driver may also not be that complex, so you could also walk though the code, in a text editor or with the debugger live.

You might also put asserts in the driver to validate everything touched is what it thinks it is. If you have buffers or structures, putting a unique signature on them helps.

You also should be running driver verifier with the knobs turned fairly far up (no memory error allocation simulation).

If you can get things to be repeatable, so you will know in advance the memory address that will be corrupted, you can set a hardware memory write breakpoint (a x86 processor feature) on that location with the debugger. If your device is doing dma, and the corruption is coming from the hardware, a write breakpoint won?t detect it. A thing that can sometimes help detect dma corruption is to enable dma checking in driver verifier. This causes dma operations to be done against a temporary buffer with extra memory at both ends, which is checked after the transfer.

You might also try varying the environment, like 32-bit vs 64-bit, single vs multiple cpus, if a dma device, there are OS switches to force the OS to change it?s memory allocation strategy (like allocate all memory above the 4 GB line).

Jan

On Aug 30, 2014, at 10:13 AM, xxxxx@googlemail.com wrote:

I have purchased PC104+ (PCI) HDLC card which comes with driver source on git-com, unfortunately the driver sometimes writes ulong 06 to user stack of calling function while it is processing a FileWrite or FileRead call,

I use overlapped File descriptor, but the data write occurs before the user program returns from FileWrite

My gut instinct tells me this isn’t the driver, the driver doesn’t directly have access to the user mode stack . Rather it is typically a user mode mistake where it uses on stack buffers (for OVERLAPPED or Io buffers, etc) for async Io and the buffers should be allocated from the heap. While the corruption may occur before the IO API returns, it could be the /previous/ IO operation which is corrupting the stack because the previous call unwound and the stack was reused for the current API call

d

Bent from my phone


From: xxxxx@googlemail.commailto:xxxxx
Sent: ?8/?30/?2014 10:13 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: [ntdev] WDF Serial driver writes to user stack

I have purchased PC104+ (PCI) HDLC card which comes with driver source on git-com, unfortunately the driver sometimes writes ulong 06 to user stack of calling function while it is processing a FileWrite or FileRead call,

I use overlapped File descriptor, but the data write occurs before the user program returns from FileWrite

I have never developed driver before, I have used VS2013 to run windbg thru a null modem cable, the verbose WDF log shows nothing out of the ordinary that I can see I can post more when I am back at work, I am also able to connect to driver source & add a breakpoint

I am using a partial checked version of Windows 7 x64 with a checked version of the driver
it uses version 1.09 of WDF, built by msbuild command line interface.

THe driver works by each IOCTL calls to setup 1 of the 2 FSCC ports on the card, and then FileRead to read the next HDLC Frame from a circlular buffer or FileWRite to put an HDLC frame in a transmit buffer,

My bug seems to have been present thru 2 generations of the driver?!

I can see the ulong that is corrupted in the user program but dont know how to relate this to the driver, or how to pin point it if I could, that is before I get near a stratagy for the fix

Manufactures want support us but origonal developer has left, there is not really an alternative on the market as card is actually a good one, we had used there previous version of the card but only runs in Windows XP, any pointers would be great

Another NT Insider article about WDM or WDF reccomends Russinovich and Solomon
is this Windows Insider could u help with exactly which book would be best as amazon is not very clear

Any suggestions or articles to check would be so greatly recieved with much thanks

Best Wishes Nick


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx>

>the verbose WDF log

Note that KMDF does not support completing the IRP inline without pending, unless you do this using EvtIoInCallerContext.

So, if the serial port driver writer does not care about EvtIoInCallerContext, then the pending status is always returned, which breaks some stuff like MSCOMM.OCX and probably your app too.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

>Finding memory corruption can be tricky

Verifier’s Special Pool helps a lot.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Hi, Thankyou for your help, I need to do some more tests but I believe the problem was that I was not completing asynchronous FileRead’s with CancelIO, & the new card completed each FileRead when I next executed the next FileWrite, and my overlapped structure had been on the stack but was no more, just a new one for the new filewrite.
the previous card quitely forgot about FileRead transactions if no HDLC frame was ready.