Best approach to have a dynamic buffer that we use to send the received packets from NDIS to users?

@MBond2 said:
There is another way to approach this problem though. Instead of scanning data in advance of forwarding it up, you can copy it and scan it after it has been forwarded. if anything of interest is found, then you can flag the connection and take whatever remedial action seems good to you (black holing and tar pit schemes are especially effective as they waste the resources of your attacker).

This is not gonna work, because in the case of many exploits, if you let one packet slide then the attacker all of the sudden has kernel code execution.

Could there be a workaround for the reduction of speed in the case of SMB based share file transfer? Obviously we can’t just stop looking at port 445/SMB packets, and we can’t just only inspect the first few hundred packets of a SMB transfer, since for example the attacker/malware might try to exploit after many packets have been sent/received, and its never determined how many packets are we required to inspect until we can determined the connection to be safe…

I also found an interesting old thread in OSR:

https://community.osr.com/discussion/290695/wfp-callout-driver-layer2-filtering

In it @NILC says:

However, I am encapsulating packets and needed the ability to be able to create NBL chains in order to improve performance when dealing with large file transfers and the like (i.e. typically for every 1 packet during an SMB file transfer one needs to generate at least 2 packets per 1 original packet because of MTU issues)

But i don’t get why “for every 1 packet during an SMB file transfer one needs to generate at least 2 packets per 1 original packet because of MTU issues”? And what does it mean to encapsulate packets for overcoming this issue?

@brad_H said:

@craig_howard said:
I would definitely build the “inspect” WFP sample and have it do the same kind of packet scan (or even just a short delay, to simulate a scan) and see what the result is …

Well i tried the inspect WFP that Microsoft has written to see how well the WFP performs.

But to my surprise its even worse than NDIS. After i removed the IP based filter (because the default code only monitors one IP), file transfer speed from shares get reduced from 80MB/s to 20MB/s, and this is without me implementing any packet content checking! note that in the case of NDIS it only got reduced after i implemented a simple packet checking.

So is this normal? why is WFP so much worse than NDIS in terms of reducing the transfer speed of files through shares?

Note that the only thing i change in the Inspect source code was the IP based filter.

So… what should i test next? does this mean NDIS is better for deep packet inspection?

(We want to monitor every IP and port, so adding port/IP filters is not gonna work)

Hmm … OK, my interest has been piqued so I’m going to load up the WFP filter and see what’s up … so that we’re on the same sheet of music, could you verify some of my assumptions …

  • You’re using the “inspect” WFP sample on the github repo, which was updated two months ago
  • You’re using the latest VS2019 … which you’ll know is the latest because the sample won’t compile without putting the line “#define _NO_CRT_STDIO_INLINE” in the main header before anything else due to a bug that the MS folks introduced in the latest update
  • You’re doing all of the packet introspection in the context of the filter; no passing data into userland, no IOCTL’s, nothing, everything entirely done in the kernel context. I plan on simply sniffing the packet header looking for the IP of a machine on my network and putting out a debug message when that hits, so as to minimize any performance impact from the introspection
  • You’re examining raw TCP and UDP packets, without any SSL decodes
  • You’re running in debug mode under the KMDF verifier

Are these assumption accurate?

We want to monitor every IP and port, so adding port/IP filters is not gonna work

What are you going to do with this info in your NDIS driver ? Unless we are speaking about the requests on some well-known in advance port that is associated with a certain well-known service, the address/port combination alone may not always be sufficient to identify your target recipient /sender process, which means you need a WPF part of the solution anyway. This “upper” WFP part is going to relate the address/port combination to a particular process, which allows the “lower” NDIS filter to make the actual use of this info.

In fact, WFP part alone should normally suffice. Introducing NDIS filter is nothing more than just an optomisation that allows you to block the “undesirable” packets before they have even had a chance to reach the protocol stack, effectively improving the performance. However, the opposite is not true - as long as you want to relate your target packets of interest to some particular process, WFP-level part if the solution is the absolute must

This is not gonna work, because in the case of many exploits, if you let one packet slide then the attacker all of the sudden
has kernel code execution.

Well, if this is the case, it means that the target machine has been already compromised right at the kernel level, right? This, in turn, automatically implies that it is already too late to do anything about it. Just to give you an idea, what hold the KM attacker back from simply modifying the callback chain, effectively disabling your driver(s) and turning it into a piece of the dead code that never ever actually gets executed, despite being physically loaded in RAM???

Anton Bassov

@brad_H said:

@MBond2 said:
There is another way to approach this problem though. Instead of scanning data in advance of forwarding it up, you can copy it and scan it after it has been forwarded. if anything of interest is found, then you can flag the connection and take whatever remedial action seems good to you (black holing and tar pit schemes are especially effective as they waste the resources of your attacker).

This is not gonna work, because in the case of many exploits, if you let one packet slide then the attacker all of the sudden has kernel code execution.

Umm … no. A packet is simply a chunk of data in a stream that is then assembled and acted upon by recipient. What I think you’re thinking of is shellcode which takes advantage of a program “mistake” like a buffer overflow or use after free to execute a stack pivot and begin calling gadgets; there’s a good starting point for GoogleFu to understand this here [https://securityimpact.net/2017/01/20/exploit-development-3-rop-buffer-overflow/]

What’s important to note about this (and it’s the only realistic way that an attacker can get kernel code execution due to the way that memory has execution protection these days) is that first the attacker has to spray the heap with the shellcode, then that program “mistake” has to be able to access that sprayed shellcode … neither of these is likely (or really possible, frankly) from one little packet flying around waiting to be assembled into a stream …

As a good reference, here’s an article that is describing just that [https://ivanitlearning.wordpress.com/2018/10/07/exploiting-buffer-overflow-to-run-shellcode-on-ftp-client/] … but note that this example is no longer possible on Win10, but it’s instructive nonetheless …

Can you monitor for a spray attempt? Yes (just remember that it’s hard to tell a spray from legitimate data). Can you tell if a data stream is going to attempt to pivot on a program “mistake” that is using the stream? Nope, that’s up to the receiving program, totally out of your domain. As @Anton mentioned, if you’ve got a vulnerable program and there’s an attack underway then the battle has already been effectively lost …

@craig_howard said:

@brad_H said:

@craig_howard said:
Hmm … OK, my interest has been piqued so I’m going to load up the WFP filter and see what’s up … so that we’re on the same sheet of music, could you verify some of my assumptions …

  • You’re using the “inspect” WFP sample on the github repo, which was updated two months ago
  • You’re using the latest VS2019 … which you’ll know is the latest because the sample won’t compile without putting the line “#define _NO_CRT_STDIO_INLINE” in the main header before anything else due to a bug that the MS folks introduced in the latest update
  • You’re doing all of the packet introspection in the context of the filter; no passing data into userland, no IOCTL’s, nothing, everything entirely done in the kernel context. I plan on simply sniffing the packet header looking for the IP of a machine on my network and putting out a debug message when that hits, so as to minimize any performance impact from the introspection
  • You’re examining raw TCP and UDP packets, without any SSL decodes
  • You’re running in debug mode under the KMDF verifier

Are these assumption accurate?

The only inspect i found was this:
https://github.com/microsoft/Windows-driver-samples/tree/master/network/trans/inspect
which was updated 5 months ago not 2? are you talking about the same project?

I built it using VS2019, and i didn’t had to put _NO_CRT_STDIO_INLINE for it to compile, it compiled without any problem as far as i remember.
Then i tried two things:

  1. I removed the IP based filter from the TLInspectAddFilter function (by setting the number of conditions to 0), so it monitors every IP and not just a single one. This causes the file transfer speed to go from 110MB/s to 60MB/s from network shares (without even implementing any packet inspection), tested on a i7 6700K in a windows 10 x64 VM.

  2. Then i added a filter to only monitor TCP connection, hoping it would improve the performance, but no change, still the speed of file transfer is reduced by 50-60MB/s…

  3. Removed the ALE callouts, therefore only registering the transport callouts, still no luck and got the same result.

Did you read the thread that was started by @NILC ? because it seems like he was experiencing the same issue with large file transfers using SMB, but i don’t understand how he solved it? he says “I am encapsulating packets and needed the ability to be able to create NBL chains in order to improve performance when dealing with large file transfers”, hmm, encapsulating packets and creating NBL chains to solve this issue? what?

@anton_bassov said:

We want to monitor every IP and port, so adding port/IP filters is not gonna work

What are you going to do with this info in your NDIS driver ? Unless we are speaking about the requests on some well-known in advance port that is associated with a certain well-known service, the address/port combination alone may not always be sufficient to identify your target recipient /sender process, which means you need a WPF part of the solution anyway. This “upper” WFP part is going to relate the address/port combination to a particular process, which allows the “lower” NDIS filter to make the actual use of this info.

But right now matching a packet to a process is not a high priority for me, my main concern right now is to implement signature matching with packet content inspection. Later on i might add the ability to match it to a process but right now I’m mostly trying to solve the issue of speed reduction of large file transfers through shares.

Well, if this is the case, it means that the target machine has been already compromised right at the kernel level, right? This, in turn, automatically implies that it is already too late to do anything about it. Just to give you an idea, what hold the KM attacker back from simply modifying the callback chain, effectively disabling your driver(s) and turning it into a piece of the dead code that never ever actually gets executed, despite being physically loaded in RAM???

Take CVE-2017-0144 for example, a remote attacker can gain kernel code execution by exploiting the SMB vulnerability, so if we let some SMB packets slide on a vulnerable machine and inspect them with a delay, then the shellcode is already execute in the kernel. So if we inspect it, and only if it was OK let it go through, then this attack would fail, but if we let it go through and inspect it later on then its already too late.

But right now matching a packet to a process is not a high priority for me, my main concern right now is to implement
signature matching with packet content inspection

???

What is the point of doing ANY packet filtering/inspection/etc on the target machine then???

Let’s face it - if you want your decisions to be based solely upon the address/port combination, then the most obvious solution seems to be moving the whole thing to the external gateway/router/firewall/etc. More on it below

.>I’m mostly trying to solve the issue of speed reduction of large file transfers through shares.

Not only it is more than likely to be opmimised specifically for the tasks of this kind, but it may, probably, even utilise some special-purpose hardware acceleration that has been specifically designed for this purpose…

Anton Bassov

@anton_bassov said:

But right now matching a packet to a process is not a high priority for me, my main concern right now is to implement
signature matching with packet content inspection
Let’s face it - if you want your decisions to be based solely upon the address/port combination, then the most obvious solution seems to be moving the whole thing to the external gateway/router/firewall/etc. More on it below

I apologize Anton, i think i explained the problem poorly.
No, i want my decisions (block or allow) to be solely based on the packet content matching malicious signatures. So right now i don’t care which process send/receiving the packet, or what is the src/dst port/IP, i only care about inspecting packet contents to match them against signatures.
So for example we know a particular exploit contain certain bytes in some of its packet, then we want to block it.
Yes i agree that this can be implemented via external firewall, etc, but for now lets assume that we only can solve this via deploying a driver on end points. And obviously if process information is also available (WFP) then obviously we will use it, but as i said, that’s another problem…

@brad_H said:

@anton_bassov said:

… snip …

Take CVE-2017-0144 for example, a remote attacker can gain kernel code execution by exploiting the SMB vulnerability, so if we let some SMB packets slide on a vulnerable machine and inspect them with a delay, then the shellcode is already execute in the kernel. So if we inspect it, and only if it was OK let it go through, then this attack would fail, but if we let it go through and inspect it later on then its already too late.

Umm … no …

The SMB vulnerability was not a function of the data in the packet, it was a program “mistake” that was triggered by the contents of the packet … there’s a big difference between the two that it’s important to understand …

Remember, data in a packet is just that: data. How that data is used by the recipient program is entirely up to the recipient. In the FTP example I cited earlier, the FTP data itself was just that, data … it’s how the FTP program used that data that caused the problem, as with the SMB attack. The data itself was blameless, it was the recipient program that had the problem …

What this means for your packet scanning is that it’s not enough for you to simply scan for a data pattern (like the SMB packet), you also need to know that that packet is going to a recipient program which has an flaw when it gets that packet, which is basically impossible for you to know from the scanner …

If you’re going to be filtering out all packets which might cause problems for some recipient program then as @Anton said you’re writing a firewall, not a packet scanner. If you’re going to be filtering out packets which you know to be causing problems then you’re again writing a firewall …

The only real value in a packet scanner is actually for indicators of compromise in a command and control chain, and that’s going to be extremely tough to find as those are heavily obfuscated (and there are companies filled with entire buildings full of engineers working on that kind of scanning) …

Hmmm … and we’ve really veered far off into the weeds from the original question, so I’ll go back to my WFP experiment now … :slight_smile:

@craig_howard said:

@brad_H said:

@anton_bassov said:

The SMB vulnerability was not a function of the data in the packet, it was a program “mistake” that was triggered by the contents of the packet … there’s a big difference between the two that it’s important to understand …

I agree, but just assume that my goal is signature matching against packets that is being sent to my processes from remote machines, or packets that are sent via my processes, and that’s it.
So for a simpler case lets assume that a simple malware has a specific bytes in its communication, and if i see this in any pack i want to block it. Obviously if i can also see which process is sending or receiving it then its a plus for me, but not the main concern right now. And for now lets not think about obfuscated packets and such, lets just try to solve the simple scenario for now.

Basically the problem that i am trying to solve is just how to do packet scanning without reducing the speed of file transfer via SMB.

@brad_H said:

@craig_howard said:

@brad_H said:

@anton_bassov said:

The SMB vulnerability was not a function of the data in the packet, it was a program “mistake” that was triggered by the contents of the packet … there’s a big difference between the two that it’s important to understand …

I agree, but just assume that my goal is signature matching against packets that is being sent to my processes from remote machines, or packets that are sent via my processes, and that’s it.
So for a simpler case lets assume that a simple malware has a specific bytes in its communication, and if i see this in any pack i want to block it. Obviously if i can also see which process is sending or receiving it then its a plus for me, but not the main concern right now. And for now lets not think about obfuscated packets and such, lets just try to solve the simple scenario for now.

Basically the problem that i am trying to solve is just how to do packet scanning without reducing the speed of file transfer via SMB.

Ah, got it! So in essence your question is not “how to I scan NDIS packets quickly”, it’s actually “how do I quickly scan SMB file transfer packets”

With that information, I would do some GoogleFu here and in general for WFP filtering of SMB traffic … just a few keystrokes here found this [https://community.osr.com/discussion/290695/wfp-callout-driver-layer2-filtering] for example, plus loads of others … that might give some insights on how to accomplish SMB packet scanning …

@craig_howard said:

@brad_H said:

@craig_howard said:

@brad_H said:

@anton_bassov said:

The SMB vulnerability was not a function of the data in the packet, it was a program “mistake” that was triggered by the contents of the packet … there’s a big difference between the two that it’s important to understand …

I agree, but just assume that my goal is signature matching against packets that is being sent to my processes from remote machines, or packets that are sent via my processes, and that’s it.
So for a simpler case lets assume that a simple malware has a specific bytes in its communication, and if i see this in any pack i want to block it. Obviously if i can also see which process is sending or receiving it then its a plus for me, but not the main concern right now. And for now lets not think about obfuscated packets and such, lets just try to solve the simple scenario for now.

Basically the problem that i am trying to solve is just how to do packet scanning without reducing the speed of file transfer via SMB.

Ah, got it! So in essence your question is not “how to I scan NDIS packets quickly”, it’s actually “how do I quickly scan SMB file transfer packets”

With that information, I would do some GoogleFu here and in general for WFP filtering of SMB traffic … just a few keystrokes here found this [https://community.osr.com/discussion/290695/wfp-callout-driver-layer2-filtering] for example, plus loads of others … that might give some insights on how to accomplish SMB packet scanning …

But i already mentioned the same thread that you linked several time in this thread, and i even asked you a question regarding it :smiley: :

Did you read the thread that was started by @NILC ? because it seems like he was experiencing the same issue with large file transfers using SMB, but i don’t understand how he solved it? he says “I am encapsulating packets and needed the ability to be able to create NBL chains in order to improve performance when dealing with large file transfers”, hmm, encapsulating packets and creating NBL chains to solve this issue? what?

So I’m not sure what does he mean when he says he encapsulated the packets and created NBL chains to solve the SMB file transfer problem?!

@craig_howard said:

@brad_H said:

@craig_howard said:

@brad_H said:

@anton_bassov said:

The SMB vulnerability was not a function of the data in the packet, it was a program “mistake” that was triggered by the contents of the packet … there’s a big difference between the two that it’s important to understand …

I agree, but just assume that my goal is signature matching against packets that is being sent to my processes from remote machines, or packets that are sent via my processes, and that’s it.
So for a simpler case lets assume that a simple malware has a specific bytes in its communication, and if i see this in any pack i want to block it. Obviously if i can also see which process is sending or receiving it then its a plus for me, but not the main concern right now. And for now lets not think about obfuscated packets and such, lets just try to solve the simple scenario for now.

Basically the problem that i am trying to solve is just how to do packet scanning without reducing the speed of file transfer via SMB.

Ah, got it! So in essence your question is not “how to I scan NDIS packets quickly”, it’s actually “how do I quickly scan SMB file transfer packets”

With that information, I would do some GoogleFu here and in general for WFP filtering of SMB traffic … just a few keystrokes here found this [https://community.osr.com/discussion/290695/wfp-callout-driver-layer2-filtering] for example, plus loads of others … that might give some insights on how to accomplish SMB packet scanning …

Yes i saw that thread and mentioned it several times in this thread :

Did you read the thread that was started by @NILC ? because it seems like he was experiencing the same issue with large file transfers using SMB, but i don’t understand how he solved it? he says “I am encapsulating packets and needed the ability to be able to create NBL chains in order to improve performance when dealing with large file transfers”, hmm, encapsulating packets and creating NBL chains to solve this issue? what?

So what do you think? what does he mean when he says he encapsulated the packets and created NBL chains to solve SMB’s speed problem?!

NBL chains look tremendously complex from the very helpful codemachine link, and remember you’ll need to decode the NDIS 5.0 stuff as well … I’d really stick with the WFP “inspect” sample, it might be slow but at least it hides all that.

I’d also look at the “WFPSampler” as it contains a full firewall which is what you’re actually building …

this thread is full of a lot of nonsense and hyperbole - including some of my own :wink:

consider the axiom that is it impossible to ‘scan’ in any way any TCP stream without impacting the performance of the transfer. It is axiomatic and if you can’t get over that, then anything else is a waste of time

the next question you will ask is how to do that scanning in ‘the least impactful way’. Or in the way that is least deleterious to performance. However you frame it, there must always be a serious impact to performance of the transfer if any meaningful inspection is to be done. The design of the TCP protocol necessitates this and the impact will be the greatest if the inspection is done on one of the hosts involves in the connection as opposed to a network intermediary - this also is axiomatic although less obvious

the next question is about the efficacy of the inspection. If it involves only short portions of the TCP stream, protocol determination and content inspection will be of no value - this type of thing has been available for 20+ years even in the commodity market

what you should do, is like the like the engineers frustrated by the thermodynamics of ordinary computing - change the rules of the game. They postulated quantum computing, which is still a quanta out there I think, but you can provide nearly as effective asynchronous protection with your algorithm as you can synchronous protection at a fraction of the costs in terms of latency and bandwidth. I cant make your value judgement on this point but it is vexatious that you cant accept that there is one to be made

[begin quote]

this thread is full of a lot of nonsense and hyperbole - including some of my own :wink:

what you should do, is like the like the engineers frustrated by the thermodynamics of ordinary computing - change the rules of the game. They postulated quantum computing, which is still a quanta out there I think, but you can provide nearly as effective asynchronous protection with your algorithm as you can synchronous protection at a fraction of the costs in terms of latency and bandwidth. I cant make your value judgement on this point but it is vexatious that you cant accept that there is one to be made

[end quote]

Don’t you find a combination of these two excerpts of the same post …ugh… let’ say, “hilarious”???

Anton Bassov

@MBond2 said:
consider the axiom that is it impossible to ‘scan’ in any way any TCP stream without impacting the performance of the transfer. It is axiomatic and if you can’t get over that, then anything else is a waste of time

I never said i expected 0 impact on performance, but obviously 60-70% is a lot, But i already solved the issue and reduced it to 20% via the combination of direct I/O and NBL chaining. I opened another thread asking for thoughts on the effects of NBL chaining here, if anyone is experienced with NDIS and knows exactly how indicating works please share your thoughts:

https://community.osr.com/discussion/292969/using-ndisfindicatereceivenetbufferlists-for-every-packet-vs-chaining-them-all-together-to-receive#latest

@craig_howard said:
NBL chains look tremendously complex from the very helpful codemachine link, and remember you’ll need to decode the NDIS 5.0 stuff as well … I’d really stick with the WFP “inspect” sample, it might be slow but at least it hides all that.

Being complex is a not a problem for me, if there’s something to learn then i’ll take the opportunity. And for this problem i don’t need to worry about NDIS 5.0, this solution is only for vista+ (LWF).