Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results
The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.
Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/
We have a file protection minifilter that protects some files from getting written on. The problem is that on very few systems that have both kasperseky and our minifilter, sometimes the system "freezes", but its not a full freeze, mouse signals seems to still work because they can move the mouse, but thats the only thing that works, and for example cntrl+alt+delete doesnt work to bring up task manager. This happens both in Win7 and Win10 machines, And if they unload our driver, the machines no longer freeze.
The problem is that when the machine freezes like this, we can only move the mouse so the output of dbgview or procmon cannot be captured anymore.
So we set up the required keys in the kbdhid service's registry to force BSOD, and indeed that one worked and we managed to get a full kernel dump to work it and find out what is happening.
So I thought i ask the experts here on how should i go on about finding the root cause of this freeze? What windbg commands do you guys suggest i should use to find what is causing this?
Upcoming OSR Seminars | ||
---|---|---|
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead! | ||
Kernel Debugging | 30 January 2023 | Live, Online |
Developing Minifilters | 20 March 2023 | Live, Online |
Internals & Software Drivers | 17 April 2023 | Live, Online |
Writing WDF Drivers | 22 May 2023 | Live, Online |
Comments
My go to behavior is pretty "brute force and ignorance" but FWIW:
.logappend
to a file (the next two commands product a lot of output will be long)!locks
!process 0 7
(this will take a fair chunk of time)Then open the log file in an editor where it’s easy to search.
9 times out of 10 the output of
!locks
will lead you to a deadlock then its just a matter of working out what you are holding that Kaspersky is taking umbrage to.From your description it feels like something is blocked while holding the system volume locked - all the other threads will be backed up behind that - it's a matter of working out why that thread isn't moving
If
!locks
doesn't help you are going to have to spelunk through all the threads (the output of the!process
command) looking for candidates for a deadlock -look for threads with your driver on the stack or with kaspersky on the stack.!stacks 2 <yourdrvier.sys>
just lists the threads where you are on the stack and can be helpful but I find that I usually need to be able to peruse all the threads at some stage and so doing it up front saves me time in the long run.Hi Rod, thank you for the reply.
I am looking through the output of the commands you suggested as i am writing this, but as a side note, i found our driver and kaspersky's klfle in the stack of one of the cpu cores, although i am not sure whether its related to the problem or not, but in this stack we are calling FltCreateFile and then kaspersky is calling FltGetFileNameInformation, probably on the filename that we just tried to open. And we are opening the file using FILE_SHARE_READ/WRITE/DELETE and also with the READ+WRITE+SYNCHRONIZE access and FILE_SYNCHRONOUS_IO_NONALERT flag, and the file is a local file and not a network file.
output of locks:
Looking at the output of !locks, it seems like there are no deadlocks, and looking through the output of !process 0 7, this is the only thread that i can find both our driver and kaspersky in the stack:
Interestingly, i just remembered a while ago we had a BSOD issue with another AV (manual BSOD, not a Windows BSOD), that also had a similar call stack:
So in both cases, this starts in our preclose callback in the minifilter when we try to open a handle to the backup of file, using the name of the backup file that we stored on the context (and the name is correct), and in both cases, the target AV is calling FltGetFileNameInformation on that file after we use the FltCreateFile, anyone has any idea what is happening? I explained the flags that we pass to FltCreateFile in the above post for anyone interested.
What is thread ffffb408e3bbc080 doing?
-scott
OSR
This is a chrome.exe thread, and it seems like its doing some stuff one of its tmp files in appdata, and its kernel time seem to be relatively high compared to other threads:
This is its callstack:
And after further investigating its IRP list and their corresponding FILE_OBJECT, it seems like all related to some tmp files in its cachestore, and all the IRPs (close,cleanup,create for \FileSystem\FltMgr) are pending?
or Cleanup was always involved, the other IO could be any.
There is always a deadlock, just not visible with !locks sometimes. Not our
locks, NTFS locls on the file.
It looks like some combo of file IO can cause it, but I also saw it with
some AVs only, never with just our filter. Iirc, Avast and Symantec SEP.
Not directly helpful, but you are not alone.
We moved some IO in the end, we had no other choice.
Dejan.
What status is NTFS raising there?
And I can tell you where to go from there...
Any chance you can put the dump somewhere? It would be faster than trying to do it over the forum.
-scott
OSR
I asked the customer and unfortunately we are not allowed to share the dump file.
So it seems like the status is 0xC0000005 = STATUS_ACCESS_VIOLATION:
(based on the fact that the EDI is the status value and its content and the rbx+10h both are 0xC0000005)
Rdi is 80000005 there and that is a normal error for Ntfs to cause an
exception on (stupid, but it does).
It's buffer overflow, quite common code for an Information Query request.
I apologize, indeed it is a buffer overflow error (0x80000005), i thought rdi was the classic 0xC0000005, seems like I'm going blind.
So can this be happening because of some sort of loop in the minifilter stack? How can i find who's fault is it? We are following the rules and using FltCreate instead of NtCreate, thus the request should only go below us and not loop back, and i don't see anyone not following the rule on the call stack.
So do you think this is because of windows running out of kernel stack limit? How can i check if this is what's causing it? Could this error even be related to why the system is freezing?
I might also try to reduce our altitude below the AV and see if that helps or not.
call (both file info query and directory listing query).
I don't see how it relates to a stack limit?