I will create Endpoint security solution software

lastime1650 · December 8, 2023, 9:37am

I’m going to create endpoint security software. I’m going to develop a personal information solution or DLP solution through Windows kernel drivers. Any recommended APIs and ideas?@?@?@?@?@@?@?

Tim_Roberts · December 8, 2023, 8:57pm

There literally dozens and dozens of industrial-quality endpoint security packages. Do you really have anything to add that isn’t already being provided? The current products have had tens or hundreds of man-years of research and development effort. Wouldn’t it be more satisfying to go to work for Symantec or Cisco instead?

MBond2 · December 9, 2023, 1:37am

Assuming you mean Data Loss Prevention (DLP) the good ones are implemented in hardware. The software ones are mostly malware IMHO

Dejan_Maksimovic · December 9, 2023, 8:33am

More like slowmeware

lastime1650 · December 9, 2023, 12:14pm

Thank you for your comments. I was asking these questions to create a brief portfolio for my first job preparation! My plan was to gather information using callback functions such as “mini-filter”, “WFP”, “Notify”, and “WSK” in kernel drivers, and pass it to the server to create an AI model for detecting personal information leakage after processing Tensorflow internally. I will develop on my own and post code and logic later. Before that, I will contribute to community activities by posting several questions here! You are my benefactor! thank you!!!@@@@

Dejan_Maksimovic · December 9, 2023, 1:14pm

Alone?

lastime1650 · December 9, 2023, 1:24pm

yes!!! Yes, I am always alone. Because I don’t have a mentor at my school. Because I think it’s enough on my own because I know how to study on my own.

Tim_Roberts · December 9, 2023, 6:37pm

I’m just acting as a devil’s advocate here, not intentionally to discourage you, but to point out pitfalls that may not have occurred to you.

How will you determine what is “personal information”? Name? Address? Credit cards? Whose? Does spelling matter?

Consider the huge performance implications of what you suggest. You’re going to intercept every outgoing network packet and put the kernel processing on hold while you ship that packet to a user-mode service, presumably in Python if you’re using TensorFlow, which will then have to run a model and send a signal BACK to kernel mode so the packet can continue. A busy network can handle tens of thousands of packets per second. Your computer will be so busy doing this checking that it will not have time for anything else.

And you’ll pay that performance penalty for EVERY packet, not just for the extremely rare bad ones. It’s a lot like insurance, which you buy hoping you will never need it. I’m thinking the cost of your plan will FAR outweigh the benefits.

Now, it is perfectly reasonable to attempt a WFP filter on your own, but the pipeline to user mode is a seriously complicated task. Perhaps you should write something that does a simpler task (like looking for some variant of your name) that can be done entirely in kernel mode. That’s a task you could achieve on your own and have some success.

lastime1650 · December 9, 2023, 7:54pm

Thank you for all the comments!

First of all, I also think that the most important thing will be whether or not to judge personal information.

First, let me tell you the ideas I thought about whether to judge personal information.

The Windows kernel driver stores key data entered on the keyboard at regular intervals in a buffer and sends it to the server. It then uses AI to detect personal information (credit card, social security number, phone number, etc.) based on the string.
(Pending) For “image” files detected by the mini-filter, send the image file to the server (Reason: for the detection of credit cards, ID cards, etc.)
Use “OCR” detection to extract strings from photographs and learn classification based on strings (Softmax).

And next, about network TCP connection (kernel<->Python) settings

The data sent from the kernel driver (client) is passed to a Python server. Then, Python uses the “STRUCT” module to break and interpret the data into regular sizes. Then, it interprets and implements processing of the cut data. Then, it creates threads to implement asynchronous processing. (The server bears the burden on clients to reduce their wait for responses.) After, continue connection

I’ve thought about this much for now.

MBond2 · December 10, 2023, 12:44am

with respect, I think you are quite naïve in assessing the difficulty of this task. ‘AI’ won’t help if you algorithm can’t understand the format of the data. Something as simple as using a character encoding like EBCIDIC probably defeats your detection. And what if I use a custom one that is not a published standard? And that’s only an obfuscation technique - never mind actual encryption. And what if the data in question wasn’t entered via the keyboard? And for that matter which keyboard?

Remember that all so called AI algorithms are not intelligent at all. They are dynamic heuristic detection and application algorithms. At best they can implement the ‘fool me once shame on you, fool me twice shame on me’ level of detecting anything. But if the algorithm only speaks English, and I send the message in Viking runes, it won’t find any patterns at all.

Once you consider the feasibility problems, then think about the performance issues. For any practical system, the performance must be ‘reasonable’ and commensurate with the level of protection provided. If not, users will uninstall, disable or bypass your code. All OCR, AI and other heuristic analysis is time consuming - and more importantly non-deterministically time consuming. Most network traffic is TCP based (although that it changing with QUIC and other UDP based protocols) and TCP throughput suffers greatly from high latency (and more so from latency inconsistency) even when there is plenty of bandwidth available

Tim_Roberts · December 10, 2023, 7:11am

The Windows kernel driver stores key data entered on the keyboard at regular intervals in a buffer and sends it to the server. It then uses AI to detect personal information (credit card, social security number, phone number, etc.) based on the string.

Not a chance. How do you think you are going to intercept keyboard data? That’s simply not available to you. And what about people that handle mailing lists that have to type in numbers all the time? No, the ONLY possible way you could do that is to have a UI that lets you register the exact personal data you want to look for. You can’t determine this automatically.

For “image” files detected by the mini-filter, send the image file to the server (Reason: for the detection of credit cards, ID cards, etc.) Use “OCR” detection to extract strings from photographs and learn classification based on strings (Softmax).

Again, not a chance. Images get split across many packets, and even if you could reconstruct the image, OCR is not that good, and you most certainly do not have the time to do OCR.

If you intend to do a proof-of-concept product, then start by doing a kernel filter that looks for hardcoded text. That in itself will be quite an accomplishment. You can then try to get a user-mode service to interact with that.

Dejan_Maksimovic · December 10, 2023, 10:42am

Are you so unaware of capabilities of computing, physics and maths?
Either way, discussing this in technical terms is pointless.

lastime1650 · December 10, 2023, 10:54am

Thank you so much for your interest. First of all, I’m going to try anything. Also, I’m trying to learn a lot from the experience of failing while recording all the tasks because I’m blogging! This project is also too broad in scope. The minimum goal is to at least implement an agent-based security control system! I love you all(real 99999./100% )