Does this look correct?

all,

i am reviewing a sample driver where I find that the folks have tried to
serialize IRP completion.

So this code, when it gets an IRP_MJ_WRITE, goes ahead and spawns another
IRP to write some meta data on the disk corresponding to the original IRP.
So, in essence one write becomes into two writes (on the same disk).

This is done in the main write handler, and these folks are calling their
IRP rolling function from inside the write handler of the main IRP.

To make these two IOs synchronous and so that the original IRP is blocked
till they update their own meta data, they do the following:

KEVENT event;



KeInitializeEvent(&event, NotificationEvent, FALSE);



irp = IoAllocateIrp( TopOfDeviceStack->StackSize, FALSE );



IoSetCompletionRoutine(irp,

CompRoutine,

(PVOID ) &event,

TRUE,

TRUE,

TRUE);

status =IoCallDriver(TopOfDeviceStack, irp);

if (status == STATUS_PENDING)

{

// wait for the result

(void) KeWaitForSingleObject(&event, Executive, KernelMode, FALSE, NULL);

// set the final result of this Write IRP

status = irp->IoStatus.Status;

}



IoFreeIrp(irp);

Now in the completion routine CompRoutine() they are sending a varaible
allocated from teh stack of this function, the event, I am not sure if that
is valid. is there a possibility that the irp they roll will be completed
in a different thread context?

here is the code of the completion routine, which seems to be a only
setting the event for this the original thread is waiting in the
KeWaitForSingleObject…

NTSTATUS

CompRoutine(

IN PDEVICE_OBJECT DeviceObject,

IN PIRP Irp,

IN PVOID Context

)

{

PKEVENT pEvent= (PKEVENT)Context;

PDEVICE_OBJECT VDev= DeviceObject;

if(VDev==NULL || Irp ==NULL)

{

}

if(NULL == Context)

{

//Something looks to be gone bad ,lets hang the system to findout what.

}else

{

KeSetEvent(pEvent, (KPRIORITY) 0, FALSE);

}

return STATUS_MORE_PROCESSING_REQUIRED;

}

//

Well, I’m not sure about the whole issue of two IRPs and such… but the code you show above is a very common programming pattern. It’s often used for handling PnP IRPs such as IRP_MN_START_DEVICE that need to be “bounced” off the underlying PDO before they can be processed by the function driver.

Peter
OSR

A P wrote:

Now in the completion routine CompRoutine() they are sending a
varaible allocated from teh stack of this function, the event, I am
not sure if that is valid. is there a possibility that the irp they
roll will be completed in a different thread context?

So what if they are? The function above cannot exit until
KeWaitForSingleObject returns, and that cannot happen until the
completion routine runs and sets the event. All kernel stacks are
global memory, so the address of the KEVENT is valid no matter what
thread is running.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

thanks peter, tim, for clarifying this.

We are trying to design this disk driver which keeps track of sectors
modified by IOs on a disk/partition.

The basic idea is, if an IO(write) comes to the disk, in teh write
dispatcher, we hold on to that IO and spin an IRP ourselves which will go
and mark a bit on a bitmap corresponding to that sector for which the
original IO came. Till this new IRP completes the main IO should be paused
and once the meta data updation is completed we let he main IO go thru.

So we implemented the synchronous IO compltion code discuss in the email
before.

I am a bit worried about disk performance degradation because of the
following:

  1. we are essentially doubling the number of IOs, for each write we will
    also update our bitmap making it two writes.
  2. sicne the second write is on teh same disk, we cant achieve parallalism,
    and there will be head movement and thrashing.

So increase thruput, an engineer has designed the following algo:

1.irp comes in

  1. take Ourbitmap lock

  2. update ourbitmap

  3. release Ourbitmap lock

  4. take Ourbitmap lock

  5. take ToBeWrittenBitmapMapLock

  6. copy Ourbitmap into ToBeWrittenBitMap

  7. release ToBeWrittenBitMapLock

  8. release Ourbitmap lock

  9. take ToBeWrittenBitMapLock

  10. write ToBeWrittenBitMap to disk (blocking
    call for this thread)

  11. release ToBeWrittenBitMapLock

  12. irp goes out

my qn is, is this a good way to achieve thruput, or is there a better
approach? so essentially, multiple irps will be updating ‘ourbitmap’ and
possbly be flushed in one go go ToBeWrittenBitmap, so we save on
duplication.

thanks

ap

On Fri, Jan 13, 2012 at 11:30 PM, Tim Roberts wrote:

> A P wrote:
> >
> >
> > Now in the completion routine CompRoutine() they are sending a
> > varaible allocated from teh stack of this function, the event, I am
> > not sure if that is valid. is there a possibility that the irp they
> > roll will be completed in a different thread context?
> >
>
> So what if they are? The function above cannot exit until
> KeWaitForSingleObject returns, and that cannot happen until the
> completion routine runs and sets the event. All kernel stacks are
> global memory, so the address of the KEVENT is valid no matter what
> thread is running.
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

thanks peter, tim, for clarifying this.

We are trying to design this disk driver which keeps track of sectors
modified by IOs on a disk/partition.

The basic idea is, if an IO(write) comes to the disk, in teh write
dispatcher, we hold on to that IO and spin an IRP ourselves which will go
and mark a bit on a bitmap corresponding to that sector for which the
original IO came. Till this new IRP completes the main IO should be paused

On Fri, Jan 13, 2012 at 11:30 PM, Tim Roberts wrote:

> A P wrote:
> >
> >
> > Now in the completion routine CompRoutine() they are sending a
> > varaible allocated from teh stack of this function, the event, I am
> > not sure if that is valid. is there a possibility that the irp they
> > roll will be completed in a different thread context?
> >
>
> So what if they are? The function above cannot exit until
> KeWaitForSingleObject returns, and that cannot happen until the
> completion routine runs and sets the event. All kernel stacks are
> global memory, so the address of the KEVENT is valid no matter what
> thread is running.
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

In the i/o path (READ or WRITE IRPs), it would not be a good idea to
process another irp synchronously. You could be at DISPATCH_LEVEL in the
irp handler and waiting would give you blue screens.

You might want to queue the original irps and return pending from dispatch
routines.

Then in a worker thread, process the queued irps in which you can create
another irp, process it synchronously and then complete the original irp.

Of course, there will be performance penalty.

On Sat, Jan 14, 2012 at 1:56 AM, A P wrote:

> thanks peter, tim, for clarifying this.
>
> We are trying to design this disk driver which keeps track of sectors
> modified by IOs on a disk/partition.
>
> The basic idea is, if an IO(write) comes to the disk, in teh write
> dispatcher, we hold on to that IO and spin an IRP ourselves which will go
> and mark a bit on a bitmap corresponding to that sector for which the
> original IO came. Till this new IRP completes the main IO should be paused
> and once the meta data updation is completed we let he main IO go thru.
>
> So we implemented the synchronous IO compltion code discuss in the email
> before.
>
> I am a bit worried about disk performance degradation because of the
> following:
> 1. we are essentially doubling the number of IOs, for each write we will
> also update our bitmap making it two writes.
> 2. sicne the second write is on teh same disk, we cant achieve
> parallalism, and there will be head movement and thrashing.
>
> So increase thruput, an engineer has designed the following algo:
>
> 1.irp comes in
>
> 2. take Ourbitmap lock
>
> 3. update ourbitmap
>
> 4. release Ourbitmap lock
>
> 5. take Ourbitmap lock
>
> 6. take ToBeWrittenBitmapMapLock
>
> 7. copy Ourbitmap into ToBeWrittenBitMap
>
> 8. release ToBeWrittenBitMapLock
>
> 9. release Ourbitmap lock
>
> 10. take ToBeWrittenBitMapLock
>
> 11. write ToBeWrittenBitMap to disk (blocking
> call for this thread)
>
> 12. release ToBeWrittenBitMapLock
>
> 13. irp goes out
>
>
>
> my qn is, is this a good way to achieve thruput, or is there a better
> approach? so essentially, multiple irps will be updating ‘ourbitmap’ and
> possbly be flushed in one go go ToBeWrittenBitmap, so we save on
> duplication.
>
> thanks
>
> ap
>
> On Fri, Jan 13, 2012 at 11:30 PM, Tim Roberts wrote:
>
>> A P wrote:
>> >
>> >
>> > Now in the completion routine CompRoutine() they are sending a
>> > varaible allocated from teh stack of this function, the event, I am
>> > not sure if that is valid. is there a possibility that the irp they
>> > roll will be completed in a different thread context?
>> >
>>
>> So what if they are? The function above cannot exit until
>> KeWaitForSingleObject returns, and that cannot happen until the
>> completion routine runs and sets the event. All kernel stacks are
>> global memory, so the address of the KEVENT is valid no matter what
>> thread is running.
>>
>> –
>> Tim Roberts, xxxxx@probo.com
>> Providenza & Boekelheide, Inc.
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>
> — NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging and
> other seminars visit: http://www.osr.com/seminars To unsubscribe, visit
> the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

>>In the i/o path (READ or WRITE IRPs), it would not be a good idea to process another irp >>synchronously. You could be at DISPATCH_LEVEL in the irp handler and waiting would give you blue screens

No it will not; the dispatch routine is not going to be @ DISPATCH_LEVEL when the event is signaled. Completion routine could be at DISPATCH but than KeSetEvent can be called as wait is false.

In fact this is quite common code and you can find in WDK samples as well.

I should have qualified my response with my assumption.

Based on the problem statement, I assumed that a disk filter driver was
being discussed (to get access to the sector level info).

At those low levels in the storage stack and when the driver is in the
paging path, it is not guaranteed that the READ/WRITE dispatch function
would be at PASSIVE_LEVEL.

From an earlier discussion at
http://www.osronline.com/showthread.cfm?link=159907

“Disk/volume stack’s dispatch routines can be called at any time with IRQL
<= DISPATCH_LEVEL, especially with the presence of 3rd party filters.”
From MSDN.

http://msdn.microsoft.com/en-us/library/windows/hardware/ff544039(v=vs.85).aspx

On Tue, Jan 17, 2012 at 7:33 PM, wrote:

> >>In the i/o path (READ or WRITE IRPs), it would not be a good idea to
> process another irp >>synchronously. You could be at DISPATCH_LEVEL in the
> irp handler and waiting would give you blue screens
>
> No it will not; the dispatch routine is not going to be @ DISPATCH_LEVEL
> when the event is signaled. Completion routine could be at DISPATCH but
> than KeSetEvent can be called as wait is false.
>
> In fact this is quite common code and you can find in WDK samples as well.
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

> The function above cannot exit until KeWaitForSingleObject returns, and that cannot happen

until the completion routine runs and sets the event. All kernel stacks are global memory, so the address
of the KEVENT is valid no matter what thread is running.

True, but consider what happens if a completion routine signals the event at elevated IRQL and stack of the waiting thread where event has been allocated gets swapped out while thread is waiting - by the very action of signaling the event it is going to cause page fault at elevated IRQL, right. This is what ‘WaitMode’ parameter to
KeWaitXXX is for - it specifies whether the kernel stack of the sleeping thread can be swapped out. I think making this clarification is very important in this context…

Anton Bassov

>No it will not; the dispatch routine is not going to be @ DISPATCH_LEVEL
when the event is signaled
This is not a safe assumption. Cannot see a situation where block
read/write come at higher IRQL, but a production quality driver cannot make
this assumption since you dont know the driver above you does.

-Sisimon

On Wed, Jan 18, 2012 at 10:16 AM, wrote:

> > The function above cannot exit until KeWaitForSingleObject returns, and
> that cannot happen
> > until the completion routine runs and sets the event. All kernel stacks
> are global memory, so the address
> > of the KEVENT is valid no matter what thread is running.
>
>
> True, but consider what happens if a completion routine signals the event
> at elevated IRQL and stack of the waiting thread where event has been
> allocated gets swapped out while thread is waiting - by the very action of
> signaling the event it is going to cause page fault at elevated IRQL,
> right. This is what ‘WaitMode’ parameter to
> KeWaitXXX is for - it specifies whether the kernel stack of the sleeping
> thread can be swapped out. I think making this clarification is very
> important in this context…
>
> Anton Bassov
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>