the below is snippet of my code. There are ioctls IOCTL_ENABLE_CALLBACK and IOCTL_DISABLE_CALLBACK will just set and reset a global volatile variable, g_enableCB. This variable is checked at a PsSetCreateProcessNotifyRoutine callback function, that is process the callback only if the g_enableCB is set, otherwise just return.
LONG volatile g_enableCB = FALSE;
NTSTATUS
DeviceControl(
IN PDEVICE_OBJECT DeviceObject,
IN PIRP Irp)
{
irpSp = IoGetCurrentIrpStackLocation(Irp);
inBufLength = irpSp->Parameters.DeviceIoControl.InputBufferLength;
outBufLength = irpSp->Parameters.DeviceIoControl.OutputBufferLength;
switch (irpSp->Parameters.DeviceIoControl.IoControlCode)
{
case IOCTL_ENABLE_CALLBACK:
// g_enableCB = TRUE;
InterlockedCompareExchange(&g_enableCB, TRUE, FALSE);
break;
case IOCTL_DISABLE_CALLBACK:
// g_enableCB = FALSE;
InterlockedCompareExchange(&g_enableCB, FALSE, TRUE);
break;
};
Irp->IoStatus.Status = ntStatus;
IoCompleteRequest(Irp, IO_NO_INCREMENT);
return ntStatus;
}
// PsSetCreateProcessNotifyRoutine callback function
VOID
ProcessNotifyRoutine(
IN HANDLE ParentId,
IN HANDLE ProcessId,
IN BOOLEAN Create)
{
if (InterlockedCompareExchange(&g_enableCB, TRUE, TRUE) == FALSE)
// if (g_enableCB == FALSE)
{
return;
}
// log the process create and terminate
}
Though accessing the variable or load / store on g_enableCB is atomic as per Intel, I am not sure how immediate the change made in variable g_enableCB by one thread (of DeviceControl) is visible to another thread of function ProcessNotifyRoutine parallel in another core. From msdn : The InterlockedCompareExchange function generates a full memory barrier(or fence) to ensure that memory operations are completed in order. So instead doing a simple/direct assignment on g_enableCB like “g_enableCB = TRUE” or “g_enableCB = FALSE” (please check the commented lines) using a interlocked way of assignment I think it?s faster as there is no out-of-order execution and got reflected in cache of all the cores immediately. Can anyone please tell me whether my assertion is correct ?
xxxxx@gmail.com wrote:
the below is snippet of my code. There are ioctls IOCTL_ENABLE_CALLBACK and IOCTL_DISABLE_CALLBACK will just set and reset a global volatile variable, g_enableCB. This variable is checked at a PsSetCreateProcessNotifyRoutine callback function, that is process the callback only if the g_enableCB is set, otherwise just return.
…
Though accessing the variable or load / store on g_enableCB is atomic as per Intel, I am not sure how immediate the change made in variable g_enableCB by one thread (of DeviceControl) is visible to another thread of function ProcessNotifyRoutine parallel in another core.
The caches on x86 systems are coherent. Once the value gets written,
the next instruction to read that memory value will get the modified
value, no matter what processor it’s on.
From msdn : The InterlockedCompareExchange function generates a full memory barrier(or fence) to ensure that memory operations are completed in order. So instead doing a simple/direct assignment on g_enableCB like “g_enableCB = TRUE” or “g_enableCB = FALSE” (please check the commented lines) using a interlocked way of assignment I think it?s faster as there is no out-of-order execution and got reflected in cache of all the cores immediately. Can anyone please tell me whether my assertion is correct ?
I can’t honestly tell what you’re asserting here, but I can tell you the
code you have there is silly. Just use normal code:
g_enableCB = TRUE;
and
g_enableCB = FALSE;
and
if (!g_enableCB)
And, in this particular case, even if the write was delayed for a few
cycles, who on earth cares? You’re hyperoptimizing a totally
unimportant case.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
I don’t think it matters. Why? b/c I think you want the code after the enabled check to be atomic with other operations // state in the driver, even with knowing that when you complete the disable IOCTL the callback has finished completing processing of a potentially in flight callback. If it needs to be atomic beyond the variable itself you are talking about a lock and then interlocked operations are not necessary
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Monday, November 2, 2015 12:46 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] interlocked vs assignment
the below is snippet of my code. There are ioctls IOCTL_ENABLE_CALLBACK and IOCTL_DISABLE_CALLBACK will just set and reset a global volatile variable, g_enableCB. This variable is checked at a PsSetCreateProcessNotifyRoutine callback function, that is process the callback only if the g_enableCB is set, otherwise just return.
LONG volatile g_enableCB = FALSE;
NTSTATUS
DeviceControl(
IN PDEVICE_OBJECT DeviceObject,
IN PIRP Irp)
{
irpSp = IoGetCurrentIrpStackLocation(Irp);
inBufLength = irpSp->Parameters.DeviceIoControl.InputBufferLength;
outBufLength = irpSp->Parameters.DeviceIoControl.OutputBufferLength;
switch (irpSp->Parameters.DeviceIoControl.IoControlCode)
{
case IOCTL_ENABLE_CALLBACK:
// g_enableCB = TRUE;
InterlockedCompareExchange(&g_enableCB, TRUE, FALSE);
break;
case IOCTL_DISABLE_CALLBACK:
// g_enableCB = FALSE;
InterlockedCompareExchange(&g_enableCB, FALSE, TRUE);
break;
};
Irp->IoStatus.Status = ntStatus;
IoCompleteRequest(Irp, IO_NO_INCREMENT);
return ntStatus;
}
// PsSetCreateProcessNotifyRoutine callback function VOID ProcessNotifyRoutine(
IN HANDLE ParentId,
IN HANDLE ProcessId,
IN BOOLEAN Create)
{
if (InterlockedCompareExchange(&g_enableCB, TRUE, TRUE) == FALSE)
// if (g_enableCB == FALSE)
{
return;
}
// log the process create and terminate }
Though accessing the variable or load / store on g_enableCB is atomic as per Intel, I am not sure how immediate the change made in variable g_enableCB by one thread (of DeviceControl) is visible to another thread of function ProcessNotifyRoutine parallel in another core. From msdn : The InterlockedCompareExchange function generates a full memory barrier(or fence) to ensure that memory operations are completed in order. So instead doing a simple/direct assignment on g_enableCB like “g_enableCB = TRUE” or “g_enableCB = FALSE” (please check the commented lines) using a interlocked way of assignment I think it?s faster as there is no out-of-order execution and got reflected in cache of all the cores immediately. Can anyone please tell me whether my assertion is correct ?
—
NTDEV is sponsored by OSR
Visit the list at: https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.osronline.com%2Fshowlists.cfm%3Flist%3Dntdev&data=01|01|Doron.Holan%40microsoft.com|fd31006d1d1c40a52d1108d2e3c6c012|72f988bf86f141af91ab2d7cd011db47|1&sdata=BA8BJv3mAmwVlRKwd3NhOerSKWpRE9f5Z0Lwytm0na0%3D
OSR is HIRING!! See https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.osr.com%2Fcareers&data=01|01|Doron.Holan%40microsoft.com|fd31006d1d1c40a52d1108d2e3c6c012|72f988bf86f141af91ab2d7cd011db47|1&sdata=93Abe507J8sSdUZxESx00Cx2czJmX0Gu%2FuHXsUZ17E4%3D
For our schedule of WDF, WDM, debugging and other seminars visit:
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.osr.com%2Fseminars&data=01|01|Doron.Holan%40microsoft.com|fd31006d1d1c40a52d1108d2e3c6c012|72f988bf86f141af91ab2d7cd011db47|1&sdata=NBdkdvriMVDSKKy7IfuYOfVWgGhij%2Btcc3%2BKKP9qwv4%3D
To unsubscribe, visit the List Server section of OSR Online at https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.osronline.com%2Fpage.cfm%3Fname%3DListServer&data=01|01|Doron.Holan%40microsoft.com|fd31006d1d1c40a52d1108d2e3c6c012|72f988bf86f141af91ab2d7cd011db47|1&sdata=dNqZCycPigpwzG81DUSMi6FanM%2BtEd%2BRufX6b1dPM8w%3D
This is just incredible …
As recently as 3 days ago Mr.Aseltine made a reference to an epic “InterlockedRead / InterlockedWrite” thread, and here we go again - a poster on that thread was wondering why there is no InterlockdRead() function available, and the OP on this one believes assignments can be made run faster by means of interlocked operation…
Anton Bassov
> Raja Kannan wrote:
Though accessing the variable or load / store on g_enableCB is atomic as per
Intel, I am not sure how immediate the change made in variable g_enableCB by one
thread (of DeviceControl) is visible to another thread of function
ProcessNotifyRoutine parallel in another core.
Modern CPUs have a special buffers (store buffers) for lazy write to memory.
So, writing to memory by one processor may be not visible to another processor
immediately, this requires a memory barrier or special instruction (or interrupt).
This behavior strictly defined in the Intel and AMD architecture specifications.
For example, see the “Intel? 64 and IA-32 Architectures Software Developer?s Manual”
and the “Memory Ordering” chapter.
case IOCTL_ENABLE_CALLBACK:
// g_enableCB = TRUE;
InterlockedCompareExchange(&g_enableCB, TRUE, FALSE);
break;
Other possible solutions:
InterlockedExchange(&g_enableCB, TRUE);
or
g_enableCB = TRUE;
MemoryBarrier(); // Full hardware fence (MS specific).
// PsSetCreateProcessNotifyRoutine callback function
VOID
ProcessNotifyRoutine(
IN HANDLE ParentId,
IN HANDLE ProcessId,
IN BOOLEAN Create)
{
if (InterlockedCompareExchange(&g_enableCB, TRUE, TRUE) == FALSE)
// if (g_enableCB == FALSE)
{
return;
}
Interlocked function is not required here.
The visibility of the ‘g_enableCB’ variable is already provided by the
memory fence in the example above and by using ‘volatile’ keyword.
You don’t need the barriers or the interlocked instructions. Should your
callback not proceed immediately after the value was set because of a memory
ordering problem, then it will surely proceed next round.
If you think that’s a problem then realize that after setting the enable
value in your IOCTL handler, when your callback fires, it may well receive
events that have executed before the enable value was set.
You can improve this sample, not by using barriers or interlocked
instructions but by removing the polling from the callback and calling
PsSetCreateProcessNotifyRoutineEx right from your IOCTL handler to enable or
disable it. That also avoids that your callback gets called for nothing.
//Daniel
What Mr. Terhell said. Who cares? There are FAR bigger timing issues that could cause you to miss a single, particular, operation. Consider what would happen if the thread that’s sending the IOCTL happens to get rescheduled while in the process of sending that IOCTL?
Either I miss the point of your question entirely (which is certainly possible), or your question is worrying about something of no consequence in a real system in the real world given the other issues at play.
Peter
OSR
@OSRDrivers
thanks for replies.
going through all the replies, certainly the question that I posted above is to takacare a very tiny window of the problem but even then the consequence (that is, callback thread(s) proceed to one more round which is going to be wasted) is minimal. I will go ahead with the simple code like below
g_enableCB = TRUE;
and
g_enableCB = FALSE;
and
if (!g_enableCB)