Port-based harddisk access

Ken_Johnson · November 29, 2009, 2:09pm

Another DPC coming in will cause a problem if two users in the system attempt such a scheme simultaneously, ultimately resulting in a deadlock as neither party makes forward progress on getting their DPC routines spun up across all processors.

I’m certain this would be great fun to debug on a live system – all processors stuck spinning at HIGH_LEVEL with zero debuggability.

S

-----Original Message-----
From: xxxxx@hotmail.com
Sent: Sunday, November 29, 2009 5:01
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Port-based harddisk access

> It’s during the setup of the corralling I was referring to. The way I take control of all the cpu’s (when doing live migration under Xen) is:

> 1. Throw the switch to stop new requests being processed in the scsi and ndis drivers
> 2. Wait until outstanding requests are complete
> 3. Schedule a DPC onto each CPU
> 4. In the DPC for CPU != 0, go to HIGH_LEVEL, InterlockedIncrement a counter and when spin
> in a while() loop
> 5. In the DPC for CPU 0, go to HIGH_LEVEL and wait until all other CPU’s are spinning

Actually, as far as consistency is concerned, steps 1 and 2 are not really needed …

> It is between steps 3 and 4 above that a new request could come through

So what??? Depending on its position in DPC queue relative to your DPC its processing will be either completed or not yet started by the time you start your operation on the resource. Neither case poses a problem - problems would arise if your processing and the one by “legitimate owner” could somehow overlap. The one by the resource owner cannot interfere with yours because you disable interrupts, and your processing can interfere with the one by the resource owner only if “legitimate” resource owner accesses it in non-atomic fashion, which is. a situation when you cannot even think about any “corralling” scheme…

Anton Bassov

—
NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Mark_Roddy · November 29, 2009, 2:39pm

Yes this is indeed an issue and the assumption with processor corrals
is that only one actor is involved. More recent windows releases open
up IPC based callbacks, which provide a better mechanism for such
activities.

Mark Roddy

On Sun, Nov 29, 2009 at 2:09 PM, Skywing wrote:
> Another DPC coming in will cause a problem if two users in the system attempt such a scheme simultaneously, ultimately resulting in a deadlock as neither party makes forward progress on getting their DPC routines spun up across all processors.
>
> I’m certain this would be great fun to debug on a live system – all processors stuck spinning at HIGH_LEVEL with zero debuggability.
>
> - S
>
> -----Original Message-----
> From: xxxxx@hotmail.com
> Sent: Sunday, November 29, 2009 5:01
> To: Windows System Software Devs Interest List
> Subject: RE:[ntdev] Port-based harddisk access
>
>
>> It’s during the setup of the corralling I was referring to. The way I take control of all the cpu’s (when doing live migration under Xen) is:
>
>> 1. Throw the switch to stop new requests being processed in the scsi and ndis drivers
>> 2. Wait until outstanding requests are complete
>> 3. Schedule a DPC onto each CPU
>> 4. In the DPC for CPU != 0, go to HIGH_LEVEL, InterlockedIncrement a counter and when spin
>> in a while() loop
>> 5. In the DPC for CPU 0, go to HIGH_LEVEL and wait until all other CPU’s are spinning
>
>
> Actually, ?as far as consistency is concerned, steps 1 and 2 are not really needed …
>
>
>> It is between steps 3 and 4 above that a new request could come through
>
> So what??? Depending on its position in DPC queue relative to your DPC its processing will be either completed or not yet started by the time you start your operation on the resource. Neither case poses a problem - problems would arise if your processing and the one by “legitimate owner” could somehow overlap. The one by the resource owner cannot interfere with yours because you disable interrupts, and your processing can interfere with the one by the resource owner only if ?“legitimate” resource owner accesses ?it ?in non-atomic fashion, which is. a situation ?when you cannot even think about any “corralling” scheme…
>
> Anton Bassov
>
>
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

James_Harper · November 29, 2009, 5:17pm

>

I’m certain this would be great fun to debug on a live system – all
processors stuck spinning at HIGH_LEVEL with zero debuggability.

The debugger does actually work when the system is at HIGH_LEVEL, but it
is about 100 times slower.

James

James_Harper · November 29, 2009, 5:22pm

> > It’s during the setup of the corralling I was referring to. The way
I take

control of all the cpu’s (when doing live migration under Xen) is:

> 1. Throw the switch to stop new requests being processed in the scsi
and
ndis drivers
> 2. Wait until outstanding requests are complete
> 3. Schedule a DPC onto each CPU
> 4. In the DPC for CPU != 0, go to HIGH_LEVEL, InterlockedIncrement a
counter
and when spin
> in a while() loop
> 5. In the DPC for CPU 0, go to HIGH_LEVEL and wait until all other
CPU’s are
spinning

Actually, as far as consistency is concerned, steps 1 and 2 are not
really
needed …

Not for the OP, but they are for me.

> It is between steps 3 and 4 above that a new request could come
through

So what??? Depending on its position in DPC queue relative to your DPC
its
processing will be either completed or not yet started by the time you
start
your operation on the resource. Neither case poses a problem -
problems would
arise if your processing and the one by “legitimate owner” could
somehow
overlap. The one by the resource owner cannot interfere with yours
because you
disable interrupts, and your processing can interfere with the one by
the
resource owner only if “legitimate” resource owner accesses it in
non-
atomic fashion, which is. a situation when you cannot even think
about any
“corralling” scheme…

Well… we agree that there has to be no outstanding requests when you
start corralling the system right? Otherwise the notification of the
complete request would be lost.

The problem I was referring to was if you schedule the DPC’s on each
processor, but a request is sent between the time you put your DPC on
the DPC queue and the time the CPU actually runs it. Remember that just
because you schedule a DPC doesn’t mean the system drops everything to
run it straight away.

All the above is probably solvable though, it’s the bug check 0x101 I’d
be worried about.

James

Ken_Johnson · November 29, 2009, 6:00pm

The Windows kernel debugger won’t work if all processors are frozen spinning (i.e. with timer interrupts off) and cannot enter into the break-in command dispatch loop.

S

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of James Harper
Sent: Sunday, November 29, 2009 2:17 PM
To: Windows System Software Devs Interest List
Subject: RE: RE:[ntdev] Port-based harddisk access

I’m certain this would be great fun to debug on a live system – all
processors stuck spinning at HIGH_LEVEL with zero debuggability.

The debugger does actually work when the system is at HIGH_LEVEL, but it
is about 100 times slower.

James

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

David_J_Craig · November 29, 2009, 6:39pm

If the device is a legacy HD port controller, then even when the DPC gets
control it can be possible for the owner of those ports to have started a
command sequence at PASSIVE_LEVEL. That would leave the controller in an
unknown state. I think I remember that those task registers are mostly
write only and you can’t restore the state or even determine the state.

This sounds like a very bad idea, though being able to monitor all requests
as they go up and down the stack could solve this issue. And to think, I am
finding that 10 internal SATA ports is my desired motherboard configuration.
I also like one ATAPI port for optical drives. Also when wishing for
everything I also want 4 eSATA ports without touching my 10 internal ports.
I want a box with 28TB of storage, and 14 ports should work.

“James Harper” wrote in message
news:xxxxx@ntdev…
> > It’s during the setup of the corralling I was referring to. The way
I take
> control of all the cpu’s (when doing live migration under Xen) is:
>
> > 1. Throw the switch to stop new requests being processed in the scsi
and
> ndis drivers
> > 2. Wait until outstanding requests are complete
> > 3. Schedule a DPC onto each CPU
> > 4. In the DPC for CPU != 0, go to HIGH_LEVEL, InterlockedIncrement a
counter
> and when spin
> > in a while() loop
> > 5. In the DPC for CPU 0, go to HIGH_LEVEL and wait until all other
CPU’s are
> spinning
>
>
> Actually, as far as consistency is concerned, steps 1 and 2 are not
really
> needed …

Not for the OP, but they are for me.

>
> > It is between steps 3 and 4 above that a new request could come
through
>
> So what??? Depending on its position in DPC queue relative to your DPC
its
> processing will be either completed or not yet started by the time you
start
> your operation on the resource. Neither case poses a problem -
problems would
> arise if your processing and the one by “legitimate owner” could
somehow
> overlap. The one by the resource owner cannot interfere with yours
because you
> disable interrupts, and your processing can interfere with the one by
the
> resource owner only if “legitimate” resource owner accesses it in
non-
> atomic fashion, which is. a situation when you cannot even think
about any
> “corralling” scheme…
>

Well… we agree that there has to be no outstanding requests when you
start corralling the system right? Otherwise the notification of the
complete request would be lost.

The problem I was referring to was if you schedule the DPC’s on each
processor, but a request is sent between the time you put your DPC on
the DPC queue and the time the CPU actually runs it. Remember that just
because you schedule a DPC doesn’t mean the system drops everything to
run it straight away.

All the above is probably solvable though, it’s the bug check 0x101 I’d
be worried about.

James

James_Harper · November 29, 2009, 6:57pm

>

The Windows kernel debugger won’t work if all processors are frozen
spinning
(i.e. with timer interrupts off) and cannot enter into the break-in
command
dispatch loop.

Yes, you are right. You can’t break into the debugger at HIGH_LEVEL, but
you can single step it if you break in before the cpu’s go to
HIGH_LEVEL, which is what I was thinking of.

James