Hi Guys,
I have a typical driver where
- I fire the command from passive level.
- I wait for the command using KeWaitForSingleObject.
- I trigger the above event from DPC using KeSetEvent
This entire setup works great on VMs with multiple CPUs but when I just give the VM a single CPU, my command always times out and KeWaitForSingleObject returns with timeout status.
I am waiting at passive level like
status = KeWaitForSingleObject(
WaitEvt,
Executive,
KernelMode,
FALSE,
&timeout); [5000 ms, relative to current time]
I am setting the event with this call.
KeSetEvent(
WaitObject,
0,
FALSE);
Not sure what I am missing here.
Any pointers appreciated.
AJ
You need to provide more details about what you’re doing: What does “fire the command” mean/do? What work needs done between “fire the command” and “set the event”? You don’t show the value of “timeout”…How are you setting it? (Hint: I’m my experience if it’s not with WDF_REL_TIMEOUT_IN_XXX then it’s probably wrong ?)
What does “fire the command” mean/do?
It means that a “context” is prepared in device extension. In this context there is PKEVENT on which we are waiting. When the hardware completes the command we retrieve the “context” in DPC and call KeSetEvent.
I calculate the timeout like this:-
LARGE_INTEGER timeout = { .QuadPart = 10000 * -1};
KeClearEvent(WaitEvt);
timeout.QuadPart *= 5000;
status = KeWaitForSingleObject(
WaitEvt,
Executive,
KernelMode,
FALSE,
&timeout);
When the hardware completes the command …
Is this real hardware? Do you see an interrupt happening? Do you see the DPC starting?
Where exactly (in code lines) do you send off an event that needs to be
handled or how does it “queue” for handling?
The below code has a typical race condition look to me where the event is
set in DPC but gets reset by KeClearEvent before the KeWait happens.
If you move it to after KeWait, it does not look obvious but it might be
the same case, depending on how you handle event wait.
This is just a guess, of course.
Dejan.
@Dejan_Maksimovic
I do not believe that there is a race condition. The code that I posted:-
LARGE_INTEGER timeout = { .QuadPart = 10000 * -1};
KeClearEvent(WaitEvt);
timeout.QuadPart *= 5000;
status = KeWaitForSingleObject(
WaitEvt,
Executive,
KernelMode,
FALSE,
&timeout);
Is a single shot code which gets executed only when the driver is loading. Which means:-
The above code is a function, which only gets called during driver initialization and only one after other. Which means that once the command is pending, the other callers would wait for the command to finish.
fire-> wait -> dpc -> wakeup -> process complete -> fire-> wait -> dpc -> wakeup -> process complete ...
It would be a bug in the driver if the fire gets twice in which I would run in to the race condition that you mentioned. I will look for it.
@Tim_Roberts : I am debugging your suggestions as well. Will post once I have the results.
Thanks
Aj
Hang on, what do you mean by “only when the driver is loading”? You can’t block for a hardware event until the hardware is initialized enough to have interrupts enabled. That’s way late in the process, after DriverEntry, after EvtStartDevice, etc. There is code inside PnP that serializes certain requests, so it won’t send down some of the startup ioctls until you have successfully handled the earlier ones.
I meant to say this code path gets executed only in EvtD0Entry. By this time the interrupts are claimed and all the bar resources are in place.
Remember guys, the driver works fine when I give more than virtual CPU to VM.
EvtDeviceD0Entry is called after interrupts are claimed, but BEFORE interrupts are enabled. That’s the problem. You need to move this to a later callback.
I am sorry. You are right. The initialization is happening in PostInterruptEnable not in D0Entry.
There is no reason at all to call KeClearEvent prior to calling KeWaitForSingleObject. As pointed out, this creates a race condition with your ISR/DPC code, and as we now know, interrupts are in fact enabled when you are doing this. So don’t do that. Also why are you using manual reset events?
Why do you need to block for this result? Why can’t you just fire-and-forget?
@Mark_Roddy : I have been thinking about the race conditions that is being outlined here. I think I understand it now. Basically if the hardware raised a interrupt before the KeClearEvent is called the wait will always timeout. Particularly on a single CPU, because the interrupt will be hosted by the same core, there will be a immediate context switch and it will always appear that the command is timed out. I will remove the KeClearEvent and then try this out.
"Also why are you using manual reset events?"
Is there anything else which I can use to achieve the same effect?
@Tim_Roberts : I have to wait for the command to finish as the result of that command has some information which is needed for next command to fire. These commands are used to initialize the hardware.
Thanks guys. This indeed was a race condition between the KeClearEvent and KeWaitForSingleObject. I also moved to AutoResetEvent and it all worked.
1 Like