Bug check in NDIS / mrxsmb driver (ARM64 only)

I am maintaining tap-windows6 driver. Since the recent commit it started to bug check on ARM64 machines:

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED_M (1000007e)

5: kd> kn

Child-SP RetAddr Call Site

00 fffff30214353430 fffff80312bc4a20 nt!KeBugCheck2+0x234
01 fffff30214353a10 fffff8031280e538 nt!PspSystemThreadStartup$filt$0+0x58
02 fffff30214353a20 fffff80312a67dcc nt!_C_ExecuteExceptionFilter+0x38
03 fffff30214353a80 fffff8031280cf34 nt!_C_specific_handler+0xcc
04 fffff30214353ae0 fffff80312905d48 nt!RtlpExecuteHandlerForException+0x14
05 fffff30214353b00 fffff80312877fd4 nt!RtlDispatchException+0x2e8
06 fffff30214354160 fffff8031287856c nt!KiDispatchException+0x3f4
07 fffff30214354650 fffff803128c07d0 nt!KiDispatchExceptionOnExceptionStack+0xc4
08 fffff30214354680 fffff80312803c00 nt!KiSynchronousException+0xd0
09 fffff30214354770 fffff8031280285c nt!KzSynchronousException+0x24
0a fffff302143547d0 fffff80317009adc nt!KiArm64ExceptionVectors+0x5c
0b fffff30214354b40 fffff803170e1ff8 ndis!ndisValidOid+0x24
0c fffff30214354b40 fffff8031700957c ndis!ndisMiniportOidIoctl+0x128
0d fffff30214354cc0 fffff80312856160 ndis!ndisDeviceControlHandler+0x17c
0e fffff30214354d50 fffff80312d3ce38 nt!IofCallDriver+0x30
0f fffff30214354d80 fffff80312d3e390 nt!IopSynchronousServiceTail+0x170
10 fffff30214354e10 fffff80312d49d3c nt!IopXxxControlFile+0x658
11 fffff30214355070 fffff8031280c460 nt!NtDeviceIoControlFile+0x2c
12 fffff302143550a0 fffff8031280bf60 nt!KiSystemServiceCopyEnd+0x38
13 fffff30214355110 fffff802d88d7248 nt!KiServiceInternal+0x60
14 fffff30214355470 fffff802d88d6f28 mrxsmb!MRxSmbQueryLbfoTeamCapability+0x1d8
15 fffff302143555c0 fffff80317175754 mrxsmb!MRxSmbIPv4AddressChangeHandler+0x108
16 fffff302143558c0 fffff80317382228 NETIO!NsiParameterChange+0x244
17 fffff30214355970 fffff80317335564 tcpip!IppNotifyAddressChangeAtPassive+0x2d8
18 fffff30214355a90 fffff8031716f63c tcpip!IppCompartmentNotificationWorker+0x74
19 fffff30214355ac0 fffff8031285a56c NETIO!NetiopIoWorkItemRoutine+0x7c
1a fffff30214355b20 fffff80312980604 nt!IopProcessWorkItem+0xec
1b fffff30214355b80 fffff803128fa300 nt!ExpWorkerThread+0x114
1c fffff30214355d50 fffff8031280be4c nt!PspSystemThreadStartup+0x50
1d fffff30214355d90 0000000000000000 nt!KiStartSystemThread+0x24

While there are no traces of tap-windows6 driver there (it would be tap0901!xxx), I found out that bug check started to happen after adding NDIS_RECEIVE_FLAGS_RESOURCES
flag to NdisMIndicateReceiveNetBufferLists() call. This flag means that driver retains ownership on passed NBL after NDIS call, which allows driver to free NBL (which driver itself allocated)
and complete IO request in place. The reason I added that flag is to work around TCP performance issue I’ve experienced on Windows Server 2019 and 2022.

It appears that this flag causes bug check in NDIS/mrxsmb driver when used for packets which are generated by the driver itself (which is mostly DHCP responses). Not using the flag for those packets doesn’t affect performance since they driver-generated packets do not go via TCP stack, which according to my analysis in some cases introduces one second delay. I removed that flag (and had to bring back ReturnNetBufferListsHandler and the code which deals with in-flight packets) and I don’t see the bug check anymore.

I am curious why seemingly unrelated change (adding NDIS_RECEIVE_FLAGS_RESOURCES flag) causes bug check in mrxsmb driver and only in ARM64? x64 works just fine. Is this a bug in arm64 version of mrxsmb driver? How can I investigate it further?

Have you tried running the driver verifier since your last change? I’m always surprised when I make a minor change and it seems to work fine. Then I turn on the driver verifier and it blows up in my face.

@Rob_Wallace said:
Have you tried running the driver verifier since your last change?

Yes I did (Driver Verifier, verifier.exe). No complains. The driver is WDM based no unfortunately another one, KMDF verifier, won’t help.

From the doc: “NDIS_RECEIVE_FLAGS_RESOURCES flag indicates that an underlying driver is running low on receive resources.”

I would guess this is causing the bug check in mrxsmb driver. I think you can avoid setting this flag if you write your data using an ioctl call instead of IRP_MJ_WRITE. There are also some performance advantages such as sending many packets with one call.

Sorry if this is not a great suggestion but I’ve never had much success using IRP_MJ_WRITE.