I am starting a discussion on a Windows Filter Platform callout driver I have been working on that has two registered filters (INBOUND & OUBOUND MAC NATIVE). Each filter registered has an assigned thread that, in turn, has an assigned LIST ENTRY. The inbound filter does a quick check for specific packets types, and valid packets are added to a linked list (LIST ENTRY). The threading portion of this I already verified is working, no memory leaks or the like, and it has been verified working when using cloned packets with filters at the IPV4 DISCARD filter layer. However, I am encapsulating packets and needed the ability to be able to create NBL chains in order to improve performance when dealing with large file transfers and the like (i.e. typically for every 1 packet during an SMB file transfer one needs to generate at least 2 packets per 1 original packet because of MTU issues). Because of this and additional processing that occurs, it made sense to just do a full "deep copy" for all inbound and valid layer 2 (native) packets and then just do the typical packet absorbtion for the original packets. The driver can be compiled to use cloning or deep copy, and the code base works fine with cloned packets.
The problem I am running into is that, when compiled for deep copy, the copied packets have well formed ethernet and IPv4 headers (already done memory watch and inspected the constructed NBL, the single NB, and the MDL assigned to the net buffer and all looks good) however when I inject them using fwpsinjectmacsendasync (typically to the outbound filter path and on a separate NIC index) the method call returns STATUS_SUCCESS and the completion callback shows STATUS_SUCCESS but no packets are detected on the expected NIC (or any other NIC on the kernel debug machine) and it ~almost appears~ that after the TTL period expires for the first few packets injected there is a crash in the WFP lower LWF NDis driver (it is breaking upon freeing an MDL).
The interesting part is that, for sanity check purposes, added an additional OID request code to send pre-constructed packets invoked by a user mode application. While the driver is started and running in the same "proxy" mode as it does when it crashes, I can send hundreds of packets using the same methods used to construct new packet copies for inbound traffic. Basically, I have verified that the construction of a new packet (self crafted or copy) and the deletion of the allocated memory (context structure, NBL, NB, and MDL) are all working fine (or appears this way) when self crafting (user mode app) and sending packets.
Memory alignment (compiled for 8 byte alignment) of the copied packets, the MAC & IP addresses, proper network order of things like IPV4 TotalLength, and such have already been verified as valid...so I have narrowed it down to one of two possibilities (that I can think of) as the problem (obviously something wrong on my side) and before posting a bunch of code and crash dumps I was hoping someone could verify and/or provide additional insight in the following areas:
1.) When sending packets from a kernel mode thread (constructed during driver start/initializing), other than typical thread safe practices, is there anything "special" one must do in order to inject a packet (i.e. can you pass an injection handle created during initialization to the thread in question or does this need to be constructed by the thread itself).
2.) The driver receives and queues from an inbound filter and then injects to the same layer ID but into a registered outbound filter (NATIVE MAC). Are there known issues with having multiple Layer 2 filters (Inbound/Outbound)?
I can post a crash dump (I realize I am not posting code so it could be hard to answer), but the crash occurs much later and after packets have been injected, returned no errors, but did not show up on Wireshark. As well, it seems I can self-craft and send packets on any NIC with Wireshark running...but when I try to send a packet to the debug box, the packet is received, unwrapped (i.e. already encapsulated), and then the inner-packet (less than MTU) is sent...and then a BSOD in some net buffer list clone...even though I am not cloning packets...this only happens if wireshark is running and if I copy inbound packets, extract the encapsulated packet, and then inject the extracted packet outbound on a different NIC index (NDIS port 0/default)...and get a different BSOD...but again not in my code...it is elsewhere in the LWF NDIS layer. Of course, if I take the exact same internal packet, self craft it, and just send it (via OID code and data to driver) then it sends with no problem, wireshark does not crash, and memory is cleaned up with no problems...all using the same code to construct a new packet (from user mode application OID code invoked request's data buffer), send the packet, and to clean up once it is sent.
The only thing I haven't tried is when sending a self-crafted packet, as opposed to sending it under the OID invocation context, as a queued item for a thread to process and send (running that test tomorrow) to see if that changes the behavior.
For now I am just opening this up for anyone who might have run across a similar situation when dealing with LAYER 2 WFP callout filters and threading. Again, can post a crash dump, but it doesn't crash immediately (typical of malformed MDL, NB, or NBLs or bad packet data) and requires that I send several encapsulated packets (from client machine running the same driver in "NBL clone" mode) before it crashes...but if Wireshark is running crashes immediately...but not when I self-craft (OID path) the packets....and the crash is not any MDL, NB, or NBL that I have created or inspected (which leads me to think that perhaps memory is being stomped, but then again...the same code that copies inbound traffic is used for the OID invoked self-constructed packets...and using that path I can send thousands of packets for long durations with no crashes).
So, for now that is the summary...I might just ditch WFP and jump down to LWF NDIS.