Backporting NDIS driver causes bugcheck on old system

Hello!

I’m porting an NDIS protocol driver written for Windows 10, to work with Windows 7. Basically this meant replacing the ExXxxTimer calls with KeXxxTimer calls, since those were the only incompatible calls. All of this works on Windows 10 now, but throws a bugcheck on Windows 7, and I have no idea what could cause the issue, since the bugcheck happens in code unrelated to the timers and callbacks I replaced.

The hardware I test it on is identical between the Windows 7 and Windows 10 machines.

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_SERVICE_EXCEPTION (3b)
An exception happened while executing a system service routine.
Arguments:
Arg1: 00000000c0000005, Exception code that caused the bugcheck
Arg2: fffff88001ad23b8, Address of the instruction which caused the bugcheck
Arg3: fffff88009ef7c00, Address of the context record for the exception that caused the bugcheck
Arg4: 0000000000000000, zero.

0: kd> k
 # Child-SP          RetAddr           Call Site
00 fffff880`09ef7338 fffff800`032c9f29 nt!KeBugCheckEx
01 fffff880`09ef7340 fffff800`032c987c nt!KiBugCheckDispatch+0x69
02 fffff880`09ef7480 fffff800`032f699d nt!KiSystemServiceHandler+0x7c
03 fffff880`09ef74c0 fffff800`032f5775 nt!RtlpExecuteHandlerForException+0xd
04 fffff880`09ef74f0 fffff800`03306cfd nt!RtlDispatchException+0x415
05 fffff880`09ef7bd0 fffff800`032ca00e nt!KiDispatchException+0x17d
06 fffff880`09ef8260 fffff800`032c8b7a nt!KiExceptionDispatch+0xce
07 fffff880`09ef8440 fffff880`01ad23b8 nt!KiPageFault+0x23a
08 fffff880`09ef85d0 fffff880`01ad2846 ndis!ndisMSendCompleteNetBufferListsInternal+0x158
09 fffff880`09ef8670 fffff880`01ad25d5 ndis!ndisMFakeSendNetBufferLists+0x76
0a fffff880`09ef86b0 fffff880`05042b66 ndis!NdisSendNetBufferLists+0x85
0b fffff880`09ef8710 fffff880`05042e20 PacketDriver!PacketDriver::StartDiscovery+0x336
0c fffff880`09ef87b0 fffff800`035da57a PacketDriver!PacketDriver::UserIoControl+0xf0
0d fffff880`09ef8850 fffff800`035eea5c nt!IopSynchronousServiceTail+0xfa
0e fffff880`09ef88c0 fffff800`035eeaf6 nt!IopXxxControlFile+0xc49
0f fffff880`09ef8a00 fffff800`032c9c13 nt!NtDeviceIoControlFile+0x56
10 fffff880`09ef8a70 00000000`74c52e09 nt!KiSystemServiceCopyEnd+0x13
11 00000000`0029e978 00000000`00000000 0x74c52e09

(Most details left out)

Obviously something is going wrong in my StartDiscovery function where I call NdisSendNetBufferLists, though I’m fairly certain the implementation matches the ndisprot samples in the Windows driver samples repo.

There shouldn’t be a difference between what Windows 7 and Windows 10 expect in NdisSendNetBufferLists, right?

Thanks in advance. Maybe somebody has had similar issues?

Perhaps a code snippet and the rest of the analysis might help.

There shouldn’t be a difference between what Windows 7 and Windows 10 expect in NdisSendNetBufferLists, right?

There’s ~9 years of active development between Windows 7 and the latest Windows 10, so of course there are a inevitable differences. But NDIS makes a profound effort to avoid breaking compatibilty within an NDIS contract version. So if your driver is version 6.20, NDIS tries to treat your driver the same on Windows 7 as on Windows 10. If you changed the version as part of your backport, then there will be more differences.

Although – ususally the differences are that we make future versions more strict; I can’t think of many cases where 6.20 would be more strict than the latest 6.8x. One case that does come to mind is that Windows 8+ make more of an effort to fix a bad NET_BUFFER_LIST::SourceHandle. I suggest verifying that you are correctly setting NET_BUFFER_LIST::SourceHandle to your Open handle.

One thing that jumps out at me from the callstack is that it looks like your driver might be trying to transmit NBLs while the stack is paused or the media is disconnected. (ndisMFakeSendNetBufferLists is the hint.) !ndiskd.miniport would disambiguate the exact reason(s) that the Tx path was shunted. This could be a clue that your driver doesn’t properly stop its datapath when it needs to (i.e., pause). I suggest checking whether this repros only while the Open is paused; if so, you need to fix the synchronization in your NetEventPause handler.

One case that does come to mind is that Windows 8+ make more of an effort to fix a bad NET_BUFFER_LIST::SourceHandle. I suggest verifying that you are correctly setting NET_BUFFER_LIST::SourceHandle to your Open handle.
Thank you for the replies! You’re spot on - I found the issue: NET_BUFFER_LIST::SourceHandle wasn’t set. After adding it to all usages of NET_BUFFER_LIST everything works fine now.

Another (non-related) issue with strictness is apparently how DeviceIOControl, ReadFile and WriteFile work, as the last two params do need either one to be set in 7, but it worked fine in 10.

I’ll take another look at the possible pausing issue, but I didn’t see any other issues after setting the SourceHandle.

Again, thank you for your time!

I’ll take another look at the possible pausing issue

There is an inevitable race between your protocol’s Tx path and the NIC’s detection of media disconnect. So it’s not illegal to transmit a packet when the datapath is shunted for link-down. However, it’s a waste of everyone’s time & CPU to be sending packets to a disconnected NIC, so it’s best if you can avoid doing unnecessary work as soon as you do notice that the link is down.

In contrast, it is a contract violation (and thus bug) to transmit packets after your protocol has returned success from a NetEventPause.

The callstack you posted does not distinguish from the two cases; it could be either pause or link down. So I can’t distinguish whether it’s just an inefficiency or a bona fide bug.