SPECIAL_POOL_DETECTED_MEMORY_CORRUPTION

Hi,
I’m working on a driver for a webcam. I started getting memory corruption issue and hence enabled special pool in driver verifier. With special pools enabled, I started getting the BSOD C1 with sometimes as Parameter4 = 0x23 and sometimes 0x24. The system gives a BSOD after several thousands of transactions - not easily predictable.

There are two linked lists which I maintain in my driver - one for the pending IRPs (my own _IRPCONTEXT struct) and the other for pending URBs (my own _URBCONTEXT struct). Both these are allocated from NonPagedPool. The _IRPCONTEXT struct has always a fixed size (16 bytes). The _URBCONTEXT varies in size depending on the type of transfer (control, bulk or ISO). In one of the resolutions of the webcam, the transfer buffer length of ISO transfer is as large as 630k bytes!! Generally there are at least 6 to 8 ISO requests pending on the device.

I see that the verifier wraps my small allocations (_IRPCONTEXT) with non-accessible pages and also fills it with a pattern. However, memory is allocated from large pages for my _URBCOTEXT whose length is more than a page.

My understanding is that the verifier does NOT wrap large pages with inaccessible pages. Is this understanding correct?

Most of the memory corruption happens in my _IRPCONTEXT and few of them in the _URBCONTEXT . The verifier bug checks when I try to delete the context in the completion routine. BY this time, already the memory has been corrupted. The observation is that, there is always a memory corruption of 32 bytes or 64 bytes. I have taken a look at these values (in windbg). These values do not match any of my strings/structs. Also, this corruption ALWAYS happen within the “valid” page boundary - never in the inaccessible pages. Also, the actual data in these pages remain valid. The corruption happens somewhere outside my valid data boundary, but within the same page. I have tried the two options - “verify start” and “verify end” (in GFlags) and still the results are same.

So is there any other tool/technique by which I can detect who is corrupting the memory?
Any comments/hints are welcome.

Thanks,
Sri

Indeed, in the current Windows versions, Special Pool doesn?t attempt to guard large pool allocations (with size > approximately 1 page) using a non-accessible page, the way it does for smaller pool blocks.

Bugcheck 0xC1 subcode 0x24 is typically a “simple” buffer overrun. Typical way to start debugging it:

1: kd> !analyze -v
SPECIAL_POOL_DETECTED_MEMORY_CORRUPTION (c1)
Special pool has detected memory corruption. Typically the current thread’s
stack backtrace will reveal the guilty party.
Arguments:
Arg1: abf08fd0, address trying to free
Arg2: abf08ff9, address where bits are corrupted
Arg3: 00e50029, (reserved)
Arg4: 00000024, caller is freeing an address where bytes after the end of the allocation have been overwritten

1: kd> !pool abf08fd0
Pool page abf08fd0 region is Special pool
*abf08fd0 size: 29 non-paged special pool, Tag is Tst1

1: kd> db abf08fd0 l29
abf08fd0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …
abf08fe0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …
abf08ff0 00 00 00 00 00 00 00 00-00 …

1: kd> db
abf08ff9 00 e5 e5 e5 e5 e5 e5 ??-?? ?? ?? ?? ?? ?? ?? ?? …??? <<<<<<<<<<<- 1 byte buffer overrun
abf09009 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???

Note from this output Verifier/Special Pool detected that my driver zeroed out byte 0x2a from my allocation, but I allocated just 0x29 bytes. Perhaps you can find a similar buffer overrun in your driver.

Bugcheck 0xC1 subcode 0x23 is usually harder to figure out. It typically means corruption *behind* the pool block that the driver is freeing. This probably means that:

  • Someone (let?s call him caller1) called ExAllocatePool (LargerSize) and got back a pool allocation p1, inside this virtual memory page.
  • Caller1 called ExFreePool (p1).
  • Caller2 called ExAllocatePool (SmallerSize) and got back a pool allocation p2, that happened to live inside the same virtual memory page.
  • Caller1 continued to use p1 after freeing it (caller1 wrote to the memory block that he freed already). So caller1 is the culprit.
  • Caller2 calls now ExFreePool (p2). Special Pool detected unexpected memory contents, behind p2.

These are the bugcheck parameters:
0x23 : caller is freeing an address where nearby bytes within the same page have been corrupted
1 - address trying to free
2 - address where bits are corrupted
3 - (reserved)

?dc ? will show the corruption pattern. Maybe you can recognize that pattern, and figure out that way who wrote to that memory.

If you use Vista or newer OS, ?!verifier 0x80 ? will show stack traces for any recent ExFreePool calls for a pool block that included address . You might notice who caller1 was, from these stack traces.

Let me know if you need more help.

Thanks Dan for your input. Sorry for the delayed response.

Currently this is possible only on XP, so I cant try out on vista even if it looks promising!

More inputs on this - First of all, this webcam is actually enumerated on a virtual USB bus. This is a usb-over-network kind of scenario. The webcam is plugged in to a pc (say pc1 -device is enumerated on MS USB bus). This will get redirected and enumerated on different pc (say, pc2 - device is enumerated on virtual USB bus).

The webcam has 3 interfaces - one for video streaming (interface 1, 1 different alt settings with varying ISO bandwidth requirements), one of audio streaming (interface 2 for inbuilt microphone, with 2 alt settings) and interface 3 for HID “snapshot button”.

If I capture ONLY video or audio, I do not face any problem. The camera runs for hours together. The moment I try to record audio and video together (with a record button on the application), my physical side (pc1) crashes with bug check C1. Please note here that PC1 crashes and not PC2 (virtual side). When I try to record audio and video together, interface 1 will be active which will be streaming isochronous video data of 128 packets (the packet size depends on the resolution selected). Also, interface 3 will become active which will stream isochronous audio data of 10 packets (max packet size is 36 bytes where as, the length is always 32 bytes on return from host controller).

I have tried whatever you have suggested on XP, but could not find the culprit.

Are there any special considerations when there are 2 active interfaces with different packets?

Any comments/suggestions are welcome.

Thanks,
Sri

If you can’t try Vista or Windows 7 Beta, but you are willing to show me a memory.dmp file from this break, please send email to verifier @ microsoft.com or to my own microsoft.com address (dmihai). Maybe I can find something useful in the memory dump.