Most optimal solution for writing a deep packet inspector firewall in Windows? (UDP/TCP)

I want to write a firewall that does deep packet inspection for UDP/TCP packets (IPv4 & IPv6), and blocks/allows packets based on packet inspection.

Based on my knowledge (so correct me if there are other and better approaches) there are two ways to do this:

  1. Ndis LWF driver
  2. WFP driver

And my main concern the most optimal approach in terms of bandwidth reduction, Now I have two questions:

  1. In terms of bandwidth reduction, is there any difference between LWF vs WFP?
  2. Considering that I only want to inspect UDP/TCP packets, can I inspect every packet at STREAM + DATAGRAM_DATA layer in WFP? Or do I need to inspect the INBOUND+OUTBOUND TRANSPORT layer?

When you say ‘deep’, how deep are you intending? That’s probably the most important question other than what you want to do when you find something interesting. If you plan to modify the content, then you must buffer the content until enough has arrived for your algorithm to detect whatever it needs to detect, but if you plan on terminating the connection, you might decide that allowing a certain amount of interesting content to flow through is okay and you can analyze async

Given your description, I would suggest wfp and stream / datagram layer. you won’t see every packet, but you will see the effective content of those packets - probably you don’t care about retransmissions

this is a complicated sort of problem and the best method to use will depend on what sort of data is interesting

@MBond2 said:
When you say ‘deep’, how deep are you intending? That’s probably the most important question other than what you want to do when you find something interesting. If you plan to modify the content, then you must buffer the content until enough has arrived for your algorithm to detect whatever it needs to detect, but if you plan on terminating the connection, you might decide that allowing a certain amount of interesting content to flow through is okay and you can analyze async

Given your description, I would suggest wfp and stream / datagram layer. you won’t see every packet, but you will see the effective content of those packets - probably you don’t care about retransmissions

this is a complicated sort of problem and the best method to use will depend on what sort of data is interesting

By deep i mean I just need to scan the packet contents (every TCP/UDP packet) and look for something like a exploit, if it is found, I need to block it right away (So I can’t let the packet be received and later terminate the connection, I need to block it and terminate the connection right away).

So based on this, should I use WFP and only monitor the stream/datagram layer? Will I receive every TCP/UDP packet in this case? If not, which type of TCP/UDP packets will I miss?

And what about bandwidth reduction compared to NDIS LWF (Considering that I need to inspect every TCP/UDP packet), will there be any difference? (For example in 1Gb/s or 10Gb/s connections)

This may be a bit pedantic, but first some basics. UDP is a simpler protocol, so starting there UDP itself has no concept of packets. UDP transmits datagrams. Each datagram has a maximum size of 65535 bytes - a size that exceeds the typical sizes for IP packets and so each datagram is composed of 1 or more fragments. NDIS filters can see fragments - including those that are ultimately discarded because not all of the fragments arrive and those that arrive in the wrong order. WFP will see only those complete datagrams. Presumably, when analyzing UDP traffic, you will start with single datagram analysis and then go deeper to look for patterns between requests

TCP is more complex, but also has no concept of packets. TCP sends a bidirectional stream of bytes. Many higher level protocols carve TCP streams into messages of various kinds, but on the lower side, TCP transmits segments. Each segment may be as small as 1 byte, and may be as large as the detected MSS - but can still be fragmented when that calculation is incorrect. Each segment is presented only once and in order into the stream of data that the application can read, but the bytes may be sent several times over the network and may arrive in the wrong sequence. Again, NDIS filters can see packets that are duplicated, out of order fragmented etc. Presumably, when analyzing TCP traffic you will start with trying to parse certain known protocols from the stream, and then try to detect known patterns from that data. This is where performance reduction really comes into play. Detecting patterns requires that you look at portions of the stream that do not correspond with the segments transferred. That requires buffering of some kind if you are going to do it before the application has a chance to get data from the stream. The depth of that buffer will be related to the longest pattern that you plan to detect, but it is important to remember to ‘push’ shorter data along when the actual application protocol does not conform to what you expect. This can get complicated quickly