It certainly seems some reference leaks somewhere. The ba trick doesn’t work
very well on NDIS miniport’s dev_obj because the stack and some OS WMI junks
love to ref/deref it zillion of times. It can be done if you are desperate
enough. For every bp hit, dump the callstack and go. Do this in a
“composite” command instead of hitting the keyboard. Well, it’s slow and
painful and you have a lot of things to read. You may want to unbind all
protocols, intermediate drivers from your miniport to improve your SNR.
From what you have said, it’s not clear to me if it has anything to do with
your function dependency. I would try to run the test with all secondary
physical function disabled to rule this in or out.
Having dependency between physical PCI functions violate the “Design for
Windows” rule. Your hardware team shouldn’t have delivered such thing in the
first place. From end customer’s perspective, this is very frustrating, say
I wanted to only disable port 0, but it disables 0,1,2,3,4,5,6,7 all without
warning.
If it has not been taped out, fix the RTL. If it has been shipped, fix it in
the next revision. Function dependency should be dealt with in fw and hw,
not Windows. There are many ways doing it.
Good luck,
Calvin
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Leonid Keller
Sent: Monday, May 31, 2010 9:54 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] NDIS miniport SurpriseRemoval problem in win2k3
I’d like to bother you once more with this problem, because I can’t find
solution so far.
Here is the full story.
We have a HW device, which is handled by WDF bus driver mx4bus and its
child, NDIS miniport driver mx4eth.
The HW device can expose several functions, which - from OS’s point of view
- are fully independent.
But in fact only one of them is a primary function (PPF), all the rest are
secondary ones (PF), which are dependent from PPF.
So for every HW function I have a pair of instances mx4bus/mx4eth, handling
it.
If one first removes PFs (secondary devices) and then PPF (the primary one)
- all is OK.
But if one first removes PPF, the driver *must* first force removing of all
PFs, because they can’t work without PPF.
That was the problem in the first place.
I solved it by sending an event “RemoveItself” to all PFs from PPF, while
handling REMOVE_DEVICE IRP (for PPF).
I.e., PPF sends the events and waits for all PFs to exit.
PFs call WdfDeviceSetFailed and remove themselves.
Then PPF happily exits.
It works in Vista, but not in Windows 2003.
As I revealed, Windows 2003 doesn’t start to perform Surprise Removal on PFs
before it finishes the removing of PPF.
(Despite they are independent devices from its point of view!)
Then I made another solution.
PFs release all the PPFs resources upon “RemoveItself” event and tells it,
they’ve exited.
Then PPF successfully exits and OS starts the surprise removing of all PFs.
Seems like it has to work. But it doesn’t.
What happens?
For some unclear reason, it always works the first time - all the devices
get removed.
But if I enable/disable them all once more, all PFs get marked as “Needed
reboot” in Device Manager (PPF is removed OK).
What I see in the debugger when it happens:
- REMOVE_DEVICE IRP was handled in both drivers (Halt of mx4eth and
EvtReleaseHardware of mx4bus were called).
- Ethernet devices are removed from the device tree (‘!devnode 0 1’ do not
show them)
- Neither Unload function is called, so both drivers are still in memory.
- !ndiskd.miniports doesn’t shows mx4eth of PFs, but !ndiskd.gminiports
does.
- !ndiskd.miniport <pf_mx4eth_device> shows no protocols, all counters are
zeros, but it shows FDO and PDO.
6. !object on FDO shows PointerCount=2 (OK?), on PDO it shows PointerCount=3
(leak?).
7. !devnode shows:
State = DeviceNodeDeletePendingCloses (0x313)
Previous State = DeviceNodeRemovePendingCloses (0x311)
StateHistory[13] = DeviceNodeRemovePendingCloses (0x311)
StateHistory[12] = DeviceNodeStarted (0x308)
Flags (0x00000120) DNF_IDS_QUERIED, DNF_NO_RESOURCE_REQUIRED
CapabilityFlags (0x00000a01) DeviceD1, SurpriseRemovalOK, WakeFromD1
8. !devnode of its “father” mx4bus shows:
State = DeviceNodeRemovePendingCloses (0x311)
Previous State = DeviceNodeStarted (0x308)
StateHistory[13] = DeviceNodeStarted (0x308)
Flags (0x00000030) DNF_ENUMERATED, DNF_IDS_QUERIED
UserFlags (0x00000004) DNUF_NEED_RESTART
CapabilityFlags (0x00000210) Removable, SurpriseRemovalOK
9. !object on its PDO shows PointerCount=4 (OK?)
10. ‘!devnode 1 1’ returns “Error reading pending relations list entry”
My guess is, that PointerCount=3 (in 6.) is a leak, which causes all the
problem.
But I don’t know how to check it and don’t know how to find who made the
leak.
Vista has a facility of PointerCount tracing, but Windows 2003 - doesn’t,
AFAIK.
Can it be a bug in OS ?
What’s the way to investigate the problem ?
Is there any other solution for the entire problem ?
Thank in advance.
P.S.
1. mx4bus without mx4eth gets removed OK, so mx4eth is somehow “guilty”;
2. mx4eth can be removed with “pnpdtest ‘SurpriseRemoval’” without
problem.
3. mx4bus get marked “Need reboot” after first running of “pnpdtest
‘SurpriseRemoval’”.
Maybe there is a bug in pnpdtest, maybe it is the same problem as with
surptise removal of PFs.
The latter case is maybe simpler to investigate, because there is only
one pair of mx4bus/mx4eth (only PPF).
But how can I do it ?
The test passes OK, but device node of mx4bus gets marked with
DNUF_NEED_RESTART…
> -----Original Message-----
> From: Leonid Keller [mailto:xxxxx@mellanox.co.il]
> Sent: Monday, May 24, 2010 8:43 PM
> Subject: RE: NDIS miniport SurpriseRemoval problem in win2k3
>
> Thank you, David.
>
> I’ll try it, but I’m not too optimistic.
> The problem is, that it get stuck somewhere between Halt and Unload.
> I mean that both Halt routine of Ethernet driver and EvtReleaseHardware
> of bus driver get called.
> So, from both drivers’ point of view IRP_MN_REMOVE has been handled.
> But neither Unload routine is not called.
> Devnode of Ethernet driver is left in DeviceNodeRemovePendingCloses
> state…
> Device Manager tells “driver is still in memory. Make reboot”.
>
> > -----Original Message-----
> > From: David R. Cattley [mailto:xxxxx@msn.com]
> > Sent: Monday, May 24, 2010 6:43 AM
> > Subject: RE: NDIS miniport SurpriseRemoval problem in win2k3
> >
> > Be sure to verify that all resources (packets, etc.) passed across
> any
> > active bindings have been properly returned as well. NDIS will not
> > halt the
> > adapter until all received packets indicated by the adapter have been
> > returned by bound protocols and all packets sent from bound protocols
> > have
> > been completed by the adapter. A pending
> Miniport{Set|Get}Information
> > can
> > also delay halt as it will also be a reference on the ‘open’
> (binding).
> >
> > The !ndiskd extension can help in diagnosing such things. You might
> > also
> > try unbinding the adapter from everything (use BindView) and see if
> > that
> > changes the test result.
> >
> > Good Luck,
> > Dave Cattley
> >
> > -----Original Message-----
> > From: xxxxx@lists.osr.com
> > [mailto:xxxxx@lists.osr.com] On Behalf Of Doron Holan
> > Sent: Sunday, May 23, 2010 7:36 PM
> > To: Windows System Software Devs Interest List
> > Subject: RE: [ntdev] NDIS miniport SurpriseRemoval problem in win2k3
> >
> > You get stuck in the surprise removed state when there is an open
> > handle on
> > the device that is not closed. !object on the fdo and pdo for the nic
> > will
> > give the handle count, !devnode 1 1 might give you more info on the
> > orphaned
> > devnode.
> >
> > d
> >
> > sent from a phpne with no keynoard
> >
> > -----Original Message-----
> > From: Leonid Keller
> > Sent: May 23, 2010 3:12 PM
> > To: Windows System Software Devs Interest List
> > Subject: [ntdev] NDIS miniport SurpriseRemoval problem in win2k3
> >
> >
> > Hi All,
> >
> > We have a WDF bus driver, which creates as a child an NDIS miniport
> > Ethernet
> > adapter.
> > When the bus driver Surprise Remove itself, it works the first time,
> > but
> > gets stuck the second time (after disable/enable the bus driver).
> > DevNode of the Ethernet driver remains in SurpriseRemove state as if
> it
> > is
> > waiting for releasing some resource.
> > I run pnpdtest from WDK with “SurpriseRemove” option - the same
> result:
> > the
> > first time the test passes, the second time it fails to attach the
> > filter
> > driver.
> > All PnP staff in the bus driver is handled by WDF DLL, in the
> Ethernet
> > driver - by NDIS DLL.
> > My question is: what’s the way to debug the problem ?
> > (To remind: it happens in Windows 2003)
> >
> > Thank you.
> >
> > Leonid
> >
> >
> > —
> > NTDEV is sponsored by OSR
> >
> > For our schedule of WDF, WDM, debugging and other seminars visit:
> > http://www.osr.com/seminars
> >
> > To unsubscribe, visit the List Server section of OSR Online at
> > http://www.osronline.com/page.cfm?name=ListServer
> >
> >
> > —
> > NTDEV is sponsored by OSR
> >
> > For our schedule of WDF, WDM, debugging and other seminars visit:
> > http://www.osr.com/seminars
> >
> > To unsubscribe, visit the List Server section of OSR Online at
> > http://www.osronline.com/page.cfm?name=ListServer
—
NTDEV is sponsored by OSR
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer</pf_mx4eth_device>