Upper filter not loading on Win10 RS2 (version 1703)

Jamey_Kirby · July 1, 2017, 7:38pm

Not yet. I have to setup a test/debug environment for this situation. I am
reporting on initial dump file analysis and reports from test department.
I’ll know more after the weekend.

On Sat, Jul 1, 2017 at 2:47 PM hgfhhgf jhgfgbbgf
wrote:

> > The dump file indicates a spinlock might have been held too long, and
> the call stack shows it is when OurDriver is calling PsCreateSystemThread.
>
> PsCreateSystemThread.is a PASSIVE_LEVEL routine. Don’t you have a code
> analysis warning ? You can’t call this routine while a spin-lock is held.
> — NTDEV is sponsored by OSR Visit the list online at: MONTHLY seminars
> on crash dump analysis, WDF, Windows internals and software drivers!
> Details at To unsubscribe, visit the List Server section of OSR Online at

Jamey_Kirby · July 1, 2017, 7:38pm

I’m not calling it with a spinlock help. That is what is odd.

On Sat, Jul 1, 2017 at 7:37 PM Jamey Kirby wrote:

> Not yet. I have to setup a test/debug environment for this situation. I am
> reporting on initial dump file analysis and reports from test department.
> I’ll know more after the weekend.
>
> On Sat, Jul 1, 2017 at 2:47 PM hgfhhgf jhgfgbbgf
> wrote:
>
>> > The dump file indicates a spinlock might have been held too long, and
>> the call stack shows it is when OurDriver is calling PsCreateSystemThread.
>>
>> PsCreateSystemThread.is a PASSIVE_LEVEL routine. Don’t you have a code
>> analysis warning ? You can’t call this routine while a spin-lock is held.
>> — NTDEV is sponsored by OSR Visit the list online at: MONTHLY seminars
>> on crash dump analysis, WDF, Windows internals and software drivers!
>> Details at To unsubscribe, visit the List Server section of OSR Online
>> at
>
>

Javier_Guerrero · July 4, 2017, 4:29am

Hi again,

So I have been investigating this issue, making all sort of changes, removing unnecessary operations from my filter, until having the minimum stuff for attaching and detaching, but the problem is still happening.

My driver is based on the WDM Toaster class upper filter included in WDK 7, so I removed my filter and put that sample (CLSUPPER.SYS) instead, and the issue is also reproduced with that driver.

Note that, just in case, I also tried modifying the WDK sample so, during IRP_MN_REMOVE_DEVICE, the call to IoReleaseRemoveLockAndWait would be made AFTER passing the irp down with IoCallDriver, but the problem still occurs.

So are you aware of any bug in the WDM version for the Toaster upper filter sample? Any wrong behavior arising only in Win10 Rs2?

I know the recommended way of filtering would be KMDF but at this moment I can’t afford porting my driver…any thoughts?

Thanks in advance!

Javier_Guerrero · July 4, 2017, 5:07am

FWIW, the problem goes away if I use Verifier on the filter…sounds like a race condition, but where?

Thanks!

Javier_Guerrero · July 4, 2017, 7:37am

Since Toaster WDM filter is also leaking a reference, I have tried this: put the additional call to ObDereferenceObject, during IRP_MN_REMOVE_DEVICE, after detachment and device deletion, and then right after that I break with debugger and then put a memory breakpoint, on the object reference count pointer (it is 0x18 bytes before DEVICE_OBJECT ptr) like this:

ba w4 0x82d52e88 “k;g”

After that, I can see these 2 stacks:

ChildEBP RetAddr

00 8b7d7968 8132447f nt!ObfDereferenceObjectWithTag+0x25
01 8b7d79a4 814b5423 nt!IopSynchronousCall+0xcb
02 8b7d7a00 811ea11f nt!IopRemoveDevice+0xe3
03 8b7d7a2c 81499c56 nt!PnpRemoveLockedDeviceNode+0x1f5
04 8b7d7a40 81499d6d nt!PnpDeleteLockedDeviceNode+0x63
05 8b7d7a80 814997ea nt!PnpDeleteLockedDeviceNodes+0xb9
06 8b7d7ab0 81499bdd nt!PipRemoveDevicesInRelationList+0x61
07 8b7d7acc 81499b49 nt!PnpDelayedRemoveWorker+0x6c
08 8b7d7ae8 811ed1f7 nt!PnpChainDereferenceComplete+0xe4
09 8b7d7b14 814ad8e9 nt!PnpIsChainDereferenced+0xcc
0a 8b7d7b88 8142abc3 nt!PnpProcessQueryRemoveAndEject+0x80c
0b 8b7d7ba8 8138be9a nt!PnpProcessTargetDeviceEvent+0xf6c31
0c 8b7d7be8 810de216 nt!PnpDeviceEventWorker+0x1f8
0d 8b7d7c38 8110bbec nt!ExpWorkerThread+0xc0
0e 8b7d7c70 81186cf5 nt!PspSystemThreadStartup+0x4a
0f 8b7d7c7c 00000000 nt!KiThreadStartup+0x15

ChildEBP RetAddr

00 8b7d79f8 810c4ece nt!ObfDereferenceObjectWithTag+0x25
01 8b7d7a00 811ea1b3 nt!ObfDereferenceObject+0x12
02 8b7d7a2c 81499c56 nt!PnpRemoveLockedDeviceNode+0x289
03 8b7d7a40 81499d6d nt!PnpDeleteLockedDeviceNode+0x63
04 8b7d7a80 814997ea nt!PnpDeleteLockedDeviceNodes+0xb9
05 8b7d7ab0 81499bdd nt!PipRemoveDevicesInRelationList+0x61
06 8b7d7acc 81499b49 nt!PnpDelayedRemoveWorker+0x6c
07 8b7d7ae8 811ed1f7 nt!PnpChainDereferenceComplete+0xe4
08 8b7d7b14 814ad8e9 nt!PnpIsChainDereferenced+0xcc
09 8b7d7b88 8142abc3 nt!PnpProcessQueryRemoveAndEject+0x80c
0a 8b7d7ba8 8138be9a nt!PnpProcessTargetDeviceEvent+0xf6c31
0b 8b7d7be8 810de216 nt!PnpDeviceEventWorker+0x1f8
0c 8b7d7c38 8110bbec nt!ExpWorkerThread+0xc0
0d 8b7d7c70 81186cf5 nt!PspSystemThreadStartup+0x4a
0e 8b7d7c7c 00000000 nt!KiThreadStartup+0x15

But I’m not sure I understand the meaning of this, because it’s the system who is touching the count…any ideas?

OSR_Community_User · July 4, 2017, 11:17am

Can you unplug and then replug the usb stick when your filter isn’t loaded ?

Can you unplug and then replug the usb stick with a pass-through filter ?

Have you read this ?

https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/object-reference-tracing

Javier_Guerrero · July 4, 2017, 2:40pm

Yes, I can unplug and then replug the usb stick with my filter unloaded, no problem.

With the Toaster passthru filter, I can unplug the stick but when I replug it again then the driver can’t load, because of the leaked reference. That’s the whole point of this issue.

Remember, it also happens with Toaster WDM filter, not only with my driver.

And yep, I have used the !obtrace extension, but the output didn’t throw much light on the subject.

Thanks for your help!

Javier_Guerrero · July 5, 2017, 6:26am

This is weird. The KMDF version of Toaster filter is also showing the same behavior.

It attaches the first time I plug the thumbdrive, but if I unplug and replug again, then the filter doesn’t load because of outstanding references.

Any thoughts?

Mikae · July 5, 2017, 9:36am

Javier,

How does the whole stack looks like? Do you install it on ‘clean’ OS? Is that possible that some user mode app opens a handle that pins the DO and the DRVO in memory? Also, you could try to put a hardware bp on access to DO’s ReferenceCount and then kb on every stop. I guess you won’t get too many stops, so it looks doable and might shed some light on who references you.

Doron_Holan · July 5, 2017, 10:32am

Do you have AV software installed? Do you get this behavior in other device classes?

Bent from my phone

From: xxxxx@lists.osr.com on behalf of xxxxx@yahoo.com
Sent: Wednesday, July 5, 2017 6:35:52 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Upper filter not loading on Win10 RS2 (version 1703)

[You don’t often get email from xxxxx@YAHOO.COM. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.]

Javier,

How does the whole stack looks like? Do you install it on ‘clean’ OS? Is that possible that some user mode app opens a handle that pins the DO and the DRVO in memory? Also, you could try to put a hardware bp on access to DO’s ReferenceCount and then kb on every stop. I guess you won’t get too many stops, so it looks doable and might shed some light on who references you.

—
NTDEV is sponsored by OSR

Visit the list online at: https:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at https:

To unsubscribe, visit the List Server section of OSR Online at https:</https:></https:></https:>

Javier_Guerrero · July 5, 2017, 3:43pm

@Mikae, yes I am testing on a fully clean OS, Windows 10 Enterprise Creators Update (90 days evaluation version).
I also tried putting breakpoints on access to refcounts, you can check the output earlier in this same thread.
I don’t have my work machine at this moment (sorry, different time zones), next time I will put !devstack output.

@Doron, no AV software or any other kind installed, OS is fully clean, with no 3rd party programs involved.
I haven’t tried with other device classes but I suspect it has something to do with storage stack.

It’s also worth noting, that if I set the driver as LowerFilter, the problem goes away.
It also works OK if I change the order of UpperFilter values, so the Toaster filter is first and “partmgr” is second.

Thanks guys for your time!

Jamey_Kirby · July 5, 2017, 4:14pm

Try to place a space in front of the filter drivers name in the
UpperFIlters. For some reason, this fixes the odd issue I am seeing.

On Wed, Jul 5, 2017 at 3:44 PM wrote:

> @Mikae, yes I am testing on a fully clean OS, Windows 10 Enterprise
> Creators Update (90 days evaluation version).
> I also tried putting breakpoints on access to refcounts, you can check the
> output earlier in this same thread.
> I don’t have my work machine at this moment (sorry, different time zones),
> next time I will put !devstack output.
>
> @Doron, no AV software or any other kind installed, OS is fully clean,
> with no 3rd party programs involved.
> I haven’t tried with other device classes but I suspect it has something
> to do with storage stack.
>
> It’s also worth noting, that if I set the driver as LowerFilter, the
> problem goes away.
> It also works OK if I change the order of UpperFilter values, so the
> Toaster filter is first and “partmgr” is second.
>
> Thanks guys for your time!
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:>

Javier_Guerrero · July 5, 2017, 4:38pm

@Jamey, just tried that: bugcheck INACCESSIBLE_BOOT_DEVICE (7b) when rebooting.
Note that it’s set as an upper filter for disk devices.

Jamey_Kirby · July 5, 2017, 4:51pm

OK. Mine is volume filter and adding a space fixes the issue for my filter.
There is something going on.

On Wed, Jul 5, 2017, 4:38 PM wrote:

> @Jamey, just tried that: bugcheck INACCESSIBLE_BOOT_DEVICE (7b) when
> rebooting.
> Note that it’s set as an upper filter for disk devices.
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:>

Mikae · July 5, 2017, 6:23pm

@Javier

I had slightly different idea: you put the ‘ba w’ with ‘kb’ just after the DO creation and then disconnect the device. Thus you monitor all refs/derefs on the device and print stack frames. Something like !obtrace but with more control.

Also, you better do it on 32bit OS since it works better with stack frames. Then paste the output here, it could be interesting to try to figure out what happens there.

OSR_Community_User · July 5, 2017, 9:24pm

@Jamey

OK. Mine is volume filter and adding a space fixes the issue for my filter.
There is something going on.

Look at the service name in the registry and INF file. Check if it begins with a space.

@Javier

USB sticks must be prepared for safe removal. But with an outstanding reference for the disk upper filter DO, you will not receive the message telling you that the device can be safely unplugged. That is why I think that you should not be able to safely unplug the device if you have a reference leak.

When you have a usb mass storage device, you have at least two devices being created: a USB one (enumerated by the USB enumerator) and a disk device enumerated by USBSTOR. When the usb stick is prepared for safe removal, the disk device disappears and the usb device is bangged. But in your case, the disk device cannot be deleted.

Javier_Guerrero · July 6, 2017, 4:08am

@Mikae

Tried that, output it’s odd because traces keep coming out long time after extracting device. Funny thing is, after doing the break on memory thing, the problem goes away and the driver reloads ok after reinserting the thumbdrive. I guess it’s something about timing and race condition.

Oh, and the safe removal procedure is always working fine, I get the message telling me I can remove the device.

OTOH, this is the output I get from !devstack command:

(this is right after creating DO):

3: kd> !devstack 0x8c5b2b70
!DevObj !DrvObj !DevExt ObjectName

8c5b2b70 \Driver\PSINDvct 8c5b2c28
883b1bf8 \Driver\partmgr 883b1cb0
8a16d7c8 \Driver\Disk 8a16d880 DR1
883357f0 \Driver\USBSTOR 883358a8 0000008b
!DevNode 8d004320 :
DeviceInst is “USBSTOR\Disk&Ven_General&Prod_UDisk&Rev_5.00\8&19613ac9&0&_&0”
ServiceName is “disk”

(and this is after safe removal)

2: kd> !devstack 0x8c5b2b70
!DevObj !DrvObj !DevExt ObjectName

8c5b2b70 \Driver\PSINDvct 8c5b2c28

(this is the object info after removal)

0: kd> !devobj 8c5b2b70
Device object (8c5b2b70) is for:
\Driver\PSINDvct DriverObject 8a04d870
Current Irp 00000000 RefCount 0 Type 00000007 Flags 00002010
Vpb 8c5016b8 SecurityDescriptor 852749b0 DevExt 8c5b2c28 DevObjExt 8c5b2c98
ExtensionFlags (0x00000812) DOE_DELETE_PENDING, DOE_START_PENDING,
DOE_DEFAULT_SD_PRESENT
Characteristics (0x00040101) FILE_REMOVABLE_MEDIA, FILE_DEVICE_SECURE_OPEN,
FILE_PORTABLE_DEVICE
Device queue is not busy.

I noticed that “DOE_START_PENDING” extension flag, I think it shouldn’t be there at that point, right?

Slava_Imameev · July 6, 2017, 5:59am

I believe this flag is set by the PNP Manager in IRP_MN_REMOVE phase in addition to DOE_DELETE_PENDING.

In other cases this flag usually means that IRP_MN_START has not been sent or failed. If the flag is set a device is unavailable to user mode applications as NtCreateFile returns with an error.

Scott_Noone_OSR · July 6, 2017, 11:47am

OK, it wasn’t that I didn’t believe you, but I don’t believe anyone So, I
had to set this up myself and see if I could reproduce the behavior…

I think there’s definitely a bug in PartMgr on build 1703. Can you try an
experiment and see if it solves the problem? I want to try disabling the
routine partmgr!PmSendTelemetry on your target machine.

First enter the assembler:

a partmgr!PmSendTelemetry

Then type:

ret

And hit enter.

If you want to undo the change and revert to the old behavior just run the
following:

!chkimg -f partmgr.sys

If you can no longer reproduce the behavior with that routine disabled,
here’s the long story that sort of explains it (and a mitigation at the
end):

I put an access breakpoint on the device object’s reference count and upon
surprise removal PartMgr went, for lack of a better phrase, completely up
its own ass. The “notification work item” was endlessly trying to “send
telemetry” and thus referencing/deferencing the top of the device stack:

nt!IoGetAttachedDeviceReference+11dde3
partmgr!PmCreatePartition+9d
partmgr!PmUpdatePartitionsLegacyRemovable+bb
partmgr!memset+9fb
partmgr!PmGetDriveLayoutEx+6c
partmgr!PmCollectTelemetry+e3
partmgr!PmSendTelemetry+67
partmgr!PmNotificationWorkItem+380
nt!IopProcessWorkItem+fb
nt!ExpWorkerThread+d8
nt!PspSystemThreadStartup+47
nt!KiStartSystemThread+16

It would eventually stop, but only after overflowing the object reference
trace buffer.

Doing some stepping through here, in the surprise remove path PartMgr wakes
up the notification work item (partmgr!PmWakeupNotificationWorkItem). The
work item then proceeds to call partmgr!PmSendTelemetry, which does some
work and…queues the work item. it keeps doing this until a list is empty.
Looking at the list, the only thing on it at this point is a reference to
the device object that we’re trying to surprise remove.

I’m running on a single CPU VM, so the re-queuing of the work item at the
end of the work item causes the surprise remove thread to stall and you get
the “up its ass” behavior (i.e. high priority work item continues to queue
itself until the list is empty, but the list won’t be empty until some other
thread gets to run).

Going back to 1607 this call to partmgr!PmSendTelemetry doesn’t exist and,
given that this doesn’t look quite right, my SWAG is that there’s a race
here causing problems with the reference count on remove. Disabling the
routine seems to cause the system to behave much more sanely on surprise
removal and properly clean up the reference count, hence my suggestion
above.

Of course, this doesn’t actually solve your problem because you can’t go
disabling things on client machines…If this DOES work then I would suggest
simply removing your Unload entry point and preventing your filter from
unloading. You’ll still get dangling device object references out there, but
(if I’m right) this isn’t your bug and exists even without your filter.

-scott
OSR
@OSRDrivers

Javier_Guerrero · July 7, 2017, 2:52am

@scott,

Thanks, that did the trick! So I guess we can confirm the bug. Hopefully it will be addressed soon

Thanks a lot to all of you guys: H. G. , Jamey, Peter, Doron, Slava, Mikae, Scott, for taking time to read and answer my questions.

Cheers!