Registry Filtering Using CmRegisterCallbackEx and Usermode Notification

ChrisG · September 21, 2016, 11:17am

I need to be able to monitor all registry operations from the kernel and notify a usermode application when they occur along with details about each operation. For example, any RegNtSetValueKey operation would need to report back the process that initiated the call to RegSetValue(Ex), the root key and value name, the type of data being set, and the actual data set.

I’m able to achieve this using CmRegisterCallbackEx and processing the data in my callback function, but my question is really what the best way to get the data back to the usermode application is? My concern really revolves around wanting to avoid heavy memory usage in the kernel for storing event data before it can be retrieved by the usermode application.

My current method for getting data from the callback to my application is via usage of a sequenced singly linked list to temporarily store the event data from the callback and then signal a synchronized event object which causes a blocking thread in my usermode application to issue an IOCTL to retrieve the event data from the list. I am doing this for lack of knowledge of a more appropriate method and wanted to know what the best method is for sharing this kind of event data?

The registry is quite spammy, especially when it comes to RegNtQueryValueKey operations and there are obvious issues of storing potentially large data in a list waiting on another component to retrieve it so it can be freed. I had considered filtering in my driver for only specific operations I am interested in, but I’d rather that be left up to the usermode application which means I basically have a firehose of data being sent up from the kernel.

Peter_Viscarola_OSR · September 21, 2016, 11:32am

These “help me with the architecture and design of my solution” questions are always difficult. So much depends on the details of what you want to achieve. And answering a question on a list – even a good list like this one – isn’t likely to get you the level of carefully considered analysis and opinion that such questions deserve. In an actual consulting assignment, I’d want to spend at least an afternoon considering what you want to accomplish, what the constraints are, and discussing the various trade-offs involved. That’s engineering. Here, you’re going to get “off the top of my head” answers – which may be right, wrong, or likely somewhere in between. That’s not really engineering. It’s random data, often no more than simple bloviating. Remember: Everybody has an opinion.

So, having said that… what we usually do in cases like what you describe is create small “easy as possible to build in kernel-mode” records for each event. We have the user app send us up (multiple, preferably) big buffers (1MB or more, depending on the required latency) using Direct I/O. We have the kernel-mode driver store the data records directly into the user buffer, packing as many records as will fit. We complete the Direct I/O when the buffer is full or a maximum latency timer expires, whichever comes first.

This places the vast majority of the interpretation of the data records and such into user mode, where Gxd intends it to be. You know, maybe you want the ultimate event log to be in XML… but you sure as hell don’t want to be formatting XML data in your driver. The user-mode app is also responsible for ensuring that it posts enough buffers so that the kernel-mode side doesn’t run out and start dropping events.

So… there’s one approach for you to consider.

Peter
OSR
@OSRDrivers

OSR_Community_User · September 21, 2016, 12:11pm

Store only a limited number of events in memory and the remaining on disk when possible.

Tim_Roberts · September 21, 2016, 12:28pm

xxxxx@protonmail.com wrote:

I’m able to achieve this using CmRegisterCallbackEx and processing the data in my callback function, but my question is really what the best way to get the data back to the usermode application is? My concern really revolves around wanting to avoid heavy memory usage in the kernel for storing event data before it can be retrieved by the usermode application.

My current method for getting data from the callback to my application is via usage of a sequenced singly linked list to temporarily store the event data from the callback and then signal a synchronized event object which causes a blocking thread in my usermode application to issue an IOCTL to retrieve the event data from the list.

Let me second Peter’s suggestion. There’s no need to have two separate
operations here (that is, “wait for notification” and “fetch the
data”). Do them both at once. Have the app send down a couple of
ioctls with empty buffers and wait for them. Now, your kernel driver
can copy into user memory instead of kernel memory, and as soon as you
complete the request, the user mode thread will already have the data.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

ChrisG · September 21, 2016, 12:41pm

Thanks for that info. I didn’t want to frame the question in a “help me with architecture and design” way, I was just trying to get a little more insight on ways people traditionally solve these problems. The only information I usually find from Microsoft or elsewhere regarding sharing data from kernel -> user has involved shared event objects for signaling and IOCTLs so I tried to bend that as best I could to accomplish what I needed, however it just felt like there was a better way I may not have been aware of.

Peter, for your approach it sounds like you are doing something similar to what is suggested here: https://support.microsoft.com/en-us/kb/191840 under the Shared Memory Object Method. I’ll take a look at implementing something like that as it sounds like the better of the two.

It’s funny you mentioned you don’t want to be formatting XML in the driver. It reminded me of some bugs I saw recently with Symantec Endpoint Protection where they were implementing their file unpackers and PE file parsing in their kernel driver. Clearly that turned out to be a pretty bad decision for them.

Peter_Viscarola_OSR · September 21, 2016, 12:56pm

No, no, no, no, no, a thousand times NO. Please.

Just have the user app allocate the memory, using VirtualAlloc. Then have the user app send up an IOCTL that’s specified with Direct I/O. It’ll take you, oh, all of 4 minutes to code and then you’ll be done. Close to zero chance for bugs.

There’s absolutely, positively, no reason to get fancy and complicated. There’s nothing to be gained.

That KB, by the way, (KB 191840) is fundamentally correct… but so concise that it’s easily misleading.

Peter
OSR
@OSRDrivers

ChrisG · September 21, 2016, 2:42pm

But Microsoft says “This method is simple and easy”. Anyway, thanks for the clarification and guidance