Can WFP drivers do all the functionality that is possible with NDIS LWF drivers?

Based on what Microsoft says, they introduced WFP to replace the NDIS drivers, and developers are encouraged to use WFP after windows vista.

My question is, Can WFP drivers do all the functionality that is possible with NDIS drivers, including NDIS LWF drivers?
If so, is there any performance advantage of using WFP drivers instead of NDIS LWF drivers, specially in very high bandwidth networks? (Mostly for deep packet inspection)

I also should mention that we are not interested in inspecting layer 2 packets that much, and are mostly interested in higher layers for deep packet inspection and blocking malicious packets (inbound and outbound). It is also very important for us to be able to have our driver inspect packets as early in boot as possible.

Would be an easier question if you just said what you’re trying to do and we tell you if its possible in WFP.

@Jason_Stephenson said:
Would be an easier question if you just said what you’re trying to do and we tell you if its possible in WFP.

We want to do deep packet inspection in 10/40G networks and we want it to be as optimized as possible, so whichever solution gives the least bottleneck is the best for us. Is there any benefit of using WFP instead of NDIS LWF?

What kind of malicious packets do you want to detect in higher layers? Ex. SQL injections, cookies, malformed html, photos of dead kittens?
(trying to understand does this filtering belong in the kernel or application level?)
40G is a serious BW, maybe you’d want a network filtering appliance here.

@Pavel_A said:
What kind of malicious packets do you want to detect in higher layers? Ex. SQL injections, cookies, malformed html, photos of dead kittens?
(trying to understand does this filtering belong in the kernel or application level?)
40G is a serious BW, maybe you’d want a network filtering appliance here.

Its mostly for inspecting the content of TCP/UDP (if no application layer) and also the content of popular application layer packets such as HTTP/SNMP/DNS/SSH/SMB etc, for signature based detection.
Also we mostly have 1G and 10G, so 40G is really rare for us so just assume the maximum is 10G. but obviously we don’t want our driver to cause half of the bandwidth to go to waste, we want it to be as efficient as possible, with the least CPU usage as possible. also as i said, running as early in the boot as possible is very important for us.

So, which route should we go for this? WFP? NDIS LWF?

So, which route should we go for this? WFP? NDIS LWF?

A special content filtering appliance for the whole customer location and any device & OS that they use there.
These boxes usually come with a handy management interface and help for installation and configuration.
Zero development time, zero blue screens. Pay and play.

– pa

C’mon… That’s not helpful, Mr @Pavel_A … this isn’t “product management interest list”… it’s the “Windows system software developers interest list” — i think it’s pretty safe to assume the OP wants the benefit our our experience developing software on Windows, not telling him to buy another company’s product.

If you can’t help him that’s fine. I can’t help him either… I am entirely ignorant of network filtering.

Hang in there @brad_H … we have folks with filtering experience AND folks from the NDIS team regularly wander by.

Peter

I’m fairly certain you could get access to parts of the packet you’re interested in by registering at the appropriate WFP layers (https://docs.microsoft.com/en-us/windows/win32/fwp/management-filtering-layer-identifiers-?redirectedfrom=MSDN).

How or where you’d put your sig checking / scanning logic would then be a decision you have to make (in kernel or usermode).
In terms of performance, I’ve never dealt with traffic of that volume on windows so can’t advise.

My naive understanding of WFP is that it is a simpler framework for writing network drivers, with clear rules around arbitration (https://docs.microsoft.com/en-us/windows/win32/fwp/filter-arbitration?redirectedfrom=MSDN) that allow to better integrate with other providers.

Disclaimer: I wouldn’t know where to start with a fullblown ndis driver, so i could be very wrong.

@Pavel_A said:

So, which route should we go for this? WFP? NDIS LWF?

A special content filtering appliance for the whole customer location and any device & OS that they use there.
These boxes usually come with a handy management interface and help for installation and configuration.
Zero development time, zero blue screens. Pay and play.

– pa

Thank you for the suggestion Mr Pavel, but as Mr Viscarola said, we want to write it ourselves and don’t want to purchase any product.

@Jason_Stephenson said:
I’m fairly certain you could get access to parts of the packet you’re interested in by registering at the appropriate WFP layers (https://docs.microsoft.com/en-us/windows/win32/fwp/management-filtering-layer-identifiers-?redirectedfrom=MSDN).

How or where you’d put your sig checking / scanning logic would then be a decision you have to make (in kernel or usermode).
In terms of performance, I’ve never dealt with traffic of that volume on windows so can’t advise.

My naive understanding of WFP is that it is a simpler framework for writing network drivers, with clear rules around arbitration (https://docs.microsoft.com/en-us/windows/win32/fwp/filter-arbitration?redirectedfrom=MSDN) that allow to better integrate with other providers.

Disclaimer: I wouldn’t know where to start with a fullblown ndis driver, so i could be very wrong.

Will there be any issue in high bandwidth networks with WFP vs NDIS LWF? for example in 1G or 10G networks? i heard some stuff about 100MB limitations in certain scenarios but I’m not sure.

Any sample code out there for WFP or NDIS that works well in 1G/10G networks? I assume the driver needs to fully utilize multiprocessor environments and there’s no way to pull 1G or 10G with just one thread right?

https://docs.microsoft.com/en-us/windows-hardware/drivers/network/multiprocessor-support-in-network-drivers

Because network packets can arrive in any order with any arbitrary data size, deep inspection implies buffering. Buffering implies a performance impact - less or more depending on how much and the application design. My advise to you is that if you have an established algorithm for the inspection that you are trying to do, then tell us more and we can help you with performance issues. If not, then develop that first and don’t worry about performance until you have that.

you will find that there is an enormous difference between the performance impact of inspection on many connections with aggregate bandwidth sufficient to saturate a 10 / 40 Gb/s link, versus a single or small number of connections that transmit

@MBond2 said:
Because network packets can arrive in any order with any arbitrary data size, deep inspection implies buffering. Buffering implies a performance impact - less or more depending on how much and the application design. My advise to you is that if you have an established algorithm for the inspection that you are trying to do, then tell us more and we can help you with performance issues. If not, then develop that first and don’t worry about performance until you have that.

you will find that there is an enormous difference between the performance impact of inspection on many connections with aggregate bandwidth sufficient to saturate a 10 / 40 Gb/s link, versus a single or small number of connections that transmit

But should we solve this with a NDIS LWF or WFP? any difference in this problem?

As you wrote “we are not interested in inspecting layer 2 packets that much, and are mostly interested in higher layers” - this means WFP (callout drivers, possibly with something at the BFE level).

@Pavel_A said:
As you wrote “we are not interested in inspecting layer 2 packets that much, and are mostly interested in higher layers” - this means WFP (callout drivers, possibly with something at the BFE level).

So is there any performance difference between WFP vs NDIS, specially in high bandwidth networks such as 1G/10G?
Also any tip for implementing the driver in such a way that it doesn’t waste most of the bandwidth? for example i assume we really have to utilize all the CPU cores in kernel as well as user-mode, right?

What on earth are you planning to build here? I assume you realize that most networks do not sustain 90% bandwidth for long periods of time.

I will ask my question again: do you have an algorithm for what you want to ‘inspect’ and are seeking to improve it’s performance; or are you simply casting around for what’s possible?

If you have an inspection algorithm and are seeking to improve its performance, then we may be able to help you if you can describe the inputs that it needs and the results that it produces. Without knowing anything about the algorithm itself, we can help you adapt it to the windows IO model and general KM environment.

but if not, don’t start your effort with performance optimization. Start your effort with functional requirements and functionality that provides value for your prospective users.

if you really want a tip on performance, understand that the majority of data sent over modern networks is TCP data. That data has to be processed in order for every given combination of source IP + source port + destination IP + destination port. Grouping together packets that have these common characteristics and processing them together on a single thread or core will help you tremendously - but that’s not new news since it has been done by switch and router manufacturers for the last 20 years or so

@MBond2 said:
I will ask my question again: do you have an algorithm for what you want to ‘inspect’ and are seeking to improve it’s performance; or are you simply casting around for what’s possible?

If you have an inspection algorithm and are seeking to improve its performance, then we may be able to help you if you can describe the inputs that it needs and the results that it produces. Without knowing anything about the algorithm itself, we can help you adapt it to the windows IO model and general KM environment.

but if not, don’t start your effort with performance optimization. Start your effort with functional requirements and functionality that provides value for your prospective users.

if you really want a tip on performance, understand that the majority of data sent over modern networks is TCP data. That data has to be processed in order for every given combination of source IP + source port + destination IP + destination port. Grouping together packets that have these common characteristics and processing them together on a single thread or core will help you tremendously - but that’s not new news since it has been done by switch and router manufacturers for the last 20 years or so

@Tim_Roberts said:
What on earth are you planning to build here? I assume you realize that most networks do not sustain 90% bandwidth for long periods of time.

My main question is not about optimizing our algorithms, right now i am just asking whether switching to WFP from NDIS LWF will improve performance or not. and lets assume everything is the same and lets also assume the task is really simple and its just blocking certain TCP packets that contain specific bytes in their payload.

whether switching to WFP from NDIS LWF will improve performance or not.
No it won’t. I think the largest benefit to doing this is (potentially) a simpler / more maintainable solution