StorPort and ambakdrv.sys random BSOD

So my driver has been doing OK for some time, but a user discovered that it will often crash when loading/unloading if Aomei's Backupper is also installed.

I am assuming the problem is in my driver, and not in a commercial available driver like ambakdrv.sys.

If I do not call "StorPortInitialize()" in my driver, the rest of it work fine, and can load and unload, even with ambakdrv.sys. I can Load and Unload until I grow bored.

If I manage to load the driver with StorPortInitialize(), and ambakdrv.sys, without crashing, the full driver also works well.

So with increasing debugging desperation, I have hacked away all my other code, so now DriverEntry() only calls StorPortInitialize(). I intercept DriverUnload just so I can print that it is unloading, then call the original StorPort Unload function.

At this point I grabbed a copy of SFBDVMiniport.c from Microsoft-Service-Fabric, and call it instead, and still it can BSOD. So perhaps it isn't the code, but maybe I am not registering something else correctly. Are the important things in .INF file, or similar, to make StorPort drivers work, or some other non-code piece of information? Does something have to go into Registry and missing it will crash?

The BSODs themselves are generally indicative of stack corruption, or overflow, and often not related to my driver at all.

For example:

BSOD1

06 ffffbe8b`fc90e600 fffff807`38d20b8c     nt!KiRaiseSecurityCheckFailure+0x346
07 ffffbe8b`fc90e790 fffff807`3b602535     nt!RtlRemoveEntryHashTable+0x4c
08 ffffbe8b`fc90e7c0 fffff807`3b6013ce     NETIO!FreeSomeCacheBucketEntries+0x7d
09 ffffbe8b`fc90e810 fffff807`3b607218     NETIO!FreeSomeCacheEntries+0x3e
0a ffffbe8b`fc90e840 fffff807`3b60442b     NETIO!UpdateCacheLruBucket+0x828
0b ffffbe8b`fc90e910 fffff807`3b6118f7     NETIO!LruCleanupWorkItemRoutine+0x9b
0c ffffbe8b`fc90e940 fffff807`38ce0afe     NETIO!NetiopIoWorkItemRoutine+0x57
0d ffffbe8b`fc90e990 fffff807`38cf4b35     nt!IopProcessWorkItem+0x8e

BSOD2

: 00000000`00000000 ffffe308`6e5d8720 ffffbe8b`00000000 00000000`00000000 : nt!ExFreePoolWithTag+0x1e5
: ffffe308`6e5d8938 ffffe308`6e5d8050 ffffe308`6d325030 ffffe308`6bc1d550 : storport!RaidDeleteDeferredQueue+0x1b
: ffffe308`6e5d81a0 ffffe308`6e5d8050 ffffe308`6d325030 ffffe308`6bc1d550 : storport!RaidDeleteAdapter+0x12e
: ffffe308`6bc1d550 00000000`00000002 00000000`00000000 00000000`00010000 : storport!RaidAdapterRemoveDeviceIrp+0x8e
: ffffe308`61dfd6f0 fffff807`3a6361f5 ffffe308`61dfd710 ffffbe8b`fc5c4250 : storport!RaidAdapterPnpIrp+0x16100
: ffffe308`61dfd710 ffffbe8b`fc5c4250 ffffe308`61dfd710 fffff807`39766400 : storport!RaDriverPnpIrp+0x94
: ffffe308`6bc1d550 ffffe308`6e5d8050 ffffe308`6b7ddb40 00000000`00000000 : nt!DifIRP_MJ_PNPWrapper+0xdb
: 00000000`00000000 ffffe308`6b7ddca0 00000000`00000000 fffff807`37795180 : nt!IopfCallDriver+0x53
: ffffe308`6bc1d6f8 ffffe308`6bc1d550 ffffe308`6b7ddc98 00000000`00000000 : nt!IovCallDriver+0x5f
: ffffe308`6b7ddca0 ffffbe8b`fc5c4430 ffffbcd2`2b6bddf7 fffff807`38d0ca21 : nt!IofCallDriver+0x2272df
: 00000000`00000000 ffffe308`6b7dd9f0 ffffe308`6e5e0040 ffffe308`6aed4180 : ambakdrv+0x16da
: ffffe308`6bc1d740 fffff807`3bbfd73e ffffe308`6bc1d550 ffffe308`6b7dd9f0 : ambakdrv+0x15b4
: ffffe308`6bc1d550 fffff807`3bbfd73e ffffe308`6b7dd9f0 ffffe308`6e5e0190 : ambakdrv+0x1478
: 00000000`00000000 fffff807`38c20f8d 00000000`00000000 00000000`00000000 : nt!IopfCallDriver+0x53
: ffffe308`6b7dd9f0 00000000`00000001 00000000`00000000 00000000`00000000 : nt!IovCallDriver+0x5f
: ffffe308`6b7dd9f0 00000000`00000001 ffffe308`6bc1d550 fffff807`390dbede : nt!IofCallDriver+0x2272df
: ffffe308`6bc1d550 fffff807`3915ae44 ffffe308`6e5e0040 ffff9f04`636be000 : volsnap!VolSnapPnp+0x9e
: ffffbe8b`fc5c46e0 00000000`00000000 ffffe308`6bc1d550 fffff807`394e8c35 : nt!IopfCallDriver+0x53
: ffffe308`6e5e0040 00000000`00000000 ffffe308`6e5e0040 00000000`00000000 : nt!IovCallDriver+0x5f
: ffffe308`6e5e0040 00000000`00000000 ffffbe8b`fc5c46e0 ffffe308`6bc1d550 : nt!IofCallDriver+0x2272df
: 00000000`00000002 ffffe308`6e5e1300 ffffe308`6a8329f0 ffffe308`6e5e1300 : nt!IopSynchronousCall+0xf8
: ffff9f04`63f0a010 ffffe308`6a8329f0 00000000`00000200 00000000`0000000a : nt!IopRemoveDevice+0x108
: ffffe308`6e5e1300 ffffe308`6a8329f0 00000000`00000000 cb3a4008`00200001 : nt!PnpRemoveLockedDeviceNode+0x1a8
: ffffe308`6a8329f0 ffffbe8b`fc5c4830 ffffe308`6e5e1300 00000000`00000000 : nt!PnpDeleteLockedDeviceNode+0x52
: ffffe308`6e5e1300 00000000`00000002 ffffe308`6e5e1300 00000000`00000000 : nt!PnpDeleteLockedDeviceNodes+0xd3
: ffffbe8b`fc5c49b0 ffffe308`6a832900 ffffe308`6bceb400 ffff9f04`00000001 : nt!PnpProcessQueryRemoveAndEject+0x3b3
: ffff9f04`63f0a010 ffff9f04`63781a10 ffffe308`6189aa00 00000000`00000000 : nt!PnpProcessTargetDeviceEvent+0x109
: ffffe308`6189aaa0 ffffe308`6684c340 ffffbe8b`fc5c4b00 00000000`00000000 : nt!PnpDeviceEventWorker+0x2ca
: ffffe308`6684c340 00000000`00000235 ffffe308`6684c340 fffff807`38cf49e0 : nt!ExpWorkerThread+0x155

BSOD3

BAD_POOL_CALLER (c2)
The current thread is making a bad pool request.  Typically this is at a bad IRQL level or double freeing the same allocation, etc.
Arguments:
Arg1: 000000000000000d, Attempt to release quota on a corrupted pool allocation.
Arg2: ffffe308669b45a0, Address of pool
Arg3: 00000000fffff807, Pool allocation's tag
Arg4: 7ac30bb7ed7893e2, Quota process pointer (bad).

: 00000000`000000c2 00000000`0000000d ffffe308`669b45a0 00000000`fffff807 : nt!KeBugCheckEx+0x107
: ffffe308`61202000 ffffbe8b`fffff807 00000000`00000d40 fffff807`38e46257 : nt!ExReturnPoolQuota+0x222647
: a6068e68`00000000 e22a33bf`00000000 01000000`00100000 00000000`00000000 : nt!ExFreeHeapPool+0x4b8
: ffff9f04`5ee849e6 00000000`00000000 ffffe308`6bcefe20 fffff807`38c325a4 : nt!ExFreePool+0x25
: ffffbe8b`fc80f000 00000000`00000000 ffffe308`6bcefe20 00000000`00000000 : nt!ExFreeToNPagedLookasideList+0x2a
: ffffe308`669b45e8 00000000`00000000 00000000`00000000 00000000`00000000 : nt!FsRtlFreeExtraCreateParameter+0xa7
: ffffe308`669b45e8 ffffe308`669b45e8 ffffe308`669b45e8 00000000`00000000 : FLTMGR!FreeTargetedIoCtrl+0x61
: ffffbe8b`fc814690 00000000`00000404 ffffe308`729ce3f8 00000000`00000000 : FLTMGR!FltpCleanupFileObjectContextForClose+0xb8
: 00000000`00000000 ffffe308`669b45e8 ffffe308`6672d010 fffff807`38c31de8 : FLTMGR!FltpPassThrough+0x31f
: ffffe308`729ce010 ffffe308`66b2bb30 00000000`00000000 ffff9f04`5eb7a690 : FLTMGR!FltpDispatch+0x8b
: 00000000`00000000 00000000`00000020 00000000`0000000c fffff807`38e58eb9 : nt!IopfCallDriver+0x53
: ffffe308`6b20b670 ffffe308`66b148f0 ffffe308`6d68e630 00000000`00000000 : nt!IovCallDriver+0x5f
: ffffe308`6b20b670 ffffe308`729ce010 00000000`00000000 00000000`00000000 : nt!IofCallDriver+0x2272df
: ffffe308`619fd140 ffffe308`6d68e630 ffffe308`6b20b640 ffff9f04`5eb7a690 : nt!IopDeleteFile+0x13c

1: kd> !pool ffffe308669b45a0
Pool page ffffe308669b45a0 region is Nonpaged pool
 ffffe308669b4000 size:   50 previous size:    0  (Free)       ....
 ffffe308669b4050 size:   c0 previous size:    0  (Allocated)  RaDf
 ffffe308669b4110 size:   c0 previous size:    0  (Allocated)  RaDf
 ffffe308669b41d0 size:   c0 previous size:    0  (Allocated)  RaDf
 ffffe308669b4290 size:   c0 previous size:    0  (Allocated)  Ipur
 ffffe308669b4350 size:   c0 previous size:    0  (Allocated)  PPMp
 ffffe308669b4410 size:   c0 previous size:    0  (Allocated)  Muta

ffffe308669b44d0 doesnt look like a valid small pool allocation, checking to see
if the entire page is actually part of a large page allocation...

ffffe308669b44d0 is not a valid large pool allocation, checking large session pool...
Unable to read large session pool table (Session data is not present in mini and kernel-only dumps)
ffffe308669b44d0 is not valid pool. Checking for freed (or corrupt) pool
Bad previous allocation size @ffffe308669b44d0, last size was 0

As you can see, they are fairly random, and often the corruption is noticed from NETIO first, which is probably as it is called frequently. If more details are wanted for a specific looking crash, it isn't hard for me to reproduce.

I have run verifier.exe, ticking on my driver, ambakdrv.sys and volsnap.sys, but nothing triggers, nor stand out after it crashes. (Which is a little surprising as it is the first time I have run my driver with verifier. It only pointed out to alloc with *Nx pools).

So I am reaching out to see if there are something else I can do to try to figure out why my driver calling StorPortInitialize() will corrupt the system.

Host: Windows 10 19045.3930, VisualStudio 2022 Pro 17.10.3, Clang 17.0.3.
Target: Windows 11 22635.3209
Recently updated VS/clang in debugging desperation.

I wouldn’t make that assumption. Commercially available does not imply without defects.

Storport miniport drivers are mostly protected from the complications of driver removal and unloading by their containing port driver. It’s possible this is your bug, of course, but the fact (I think) that load/unload works just fine without the backup driver present tends to point more at the backup driver than at your driver.

The bsods are all consistent with memory corruption- somebody is writing on memory it no longer owns. Verifier memory checking can help and I would enable it for both drivers.

It is somewhat unusual for storport drivers to get unloaded. What sort of storage controller is this?

Verifier memory checking can help and I would enable it for both drivers.

Oh I just saw the top post about Verifier when using the GUI. I'll setup verifier again, in case it can give clues after all.

It is somewhat unusual for storport drivers to get unloaded. What sort of storage controller is this?

Mostly when updating the driver to a new version. I could force reboot to avoid unload, but it does also crash at times when loading. The miniport part is a small bit only, to deal with virtual volumes.

If I am allowed to post links, I am working on it over here: