To whom it may help!
I was facing random crashes mostly with BAD_POOL_HEADER, but also other bug checks on one of the 32bit test machines. 64bit Windows was fine. Note the kernel driver uses only nonpaged memory.
The crashes were not really reproducible but occurred randomly only when my driver was loaded.
* First assumption was there must be a data type overrun issue. Nope couldn't find anything.
* Crash dump analyses didn't really help as it always pointed to somewhere in the Windows OS, never at our code.
* All double checks against ExAllocatePoolWithTag() and ExFreePoolWithTag() were ok.
* Driver Verifier's special pool didn't help either, no hints, worked fine (even with low resource simulation).
After reading a lot, these pages from ntdev gave me some clues:
I could finally narrow the issue down by making the crash reproducible:
* Use Windows 10 32bit in a VM with only 384 MB of memory.
* Create a restart loop: Use auto login, let it run for a minute and auto restart: Bingo after 1 . .4 restarts it always crashed.
Note: It never crashed in six month on my Windows 64 bit, 32GB production machine as it has enough memory! Make memory low and it will crash eventually!
Now I could narrow the issue down by disabling all code paths and re-enabling one by one and let it run in the VM. After a few cycles I eventually found it. It was a modification of an USB descriptor when the request was not the actual descriptor but getting the memory size of that descriptor. Here we accidentally modified > only one byte < with 16 bytes offset over the valid request buffer.
After the fix the crashes were gone.
Hope this helps somebody out there.