Yielding to guarantee full quantum execution?

Hi gurus,

Wwhat I’m trying to achieve…

We have a powerfail interrupt in our driver. Once signalled, the power is
guaranteed to be up for at least a further 25ms.

There’s one critical IOCTL call in the driver that must be an atomic
operation. Once it starts, it must finish. If it doesn’t start at all, no
problem. If power fails during the operation - problem!

I was thinking that the best way to maximise the possiblity that it
would complete is to do this within the driver IOCTl call:

* raise thread priority to real-time
* yield
* check powerfail status
* do atomic operation
* lower thread priority to normal

My thinking is that by yielding, the thread is guaranteed to execute for
its full quantum when it is re-scheduled (barring pre-emption). So the
atomic operation should have enough time to complete, even if power fails
immediately after the thread has checked the status.

Anyone have any thoughts on this? Am I right?

And lastly, how do I yield so that I can guarantee a full quantum of
execution (barring pre-emption) when re-scheduled (as opposed I guess to
the current thread simply having its current quantum decremented)? How
about KeDelayExecution(arbitrary-time)?

TIA
Regards,


Mark McDougall, Engineer
Virtual Logic Pty Ltd, http:
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266</http:>

> My thinking is that by yielding, the thread is guaranteed to execute for

its full quantum when it is re-scheduled (barring pre-emption).

What makes you believe so??? If there is any thread of higher priority that is currently eligible for execution, your thread will get pre-empted immediately, and if your thread priority is in the real-time range, then, as long as no other thread of the same or higher priority that is eligible for execution is around, your thread will run until it voluntary yields execution. Therefore, in the real-time range unused quantum may have any effect only when “fighting for CPU” with threads of the same priority as yours.

BTW, don’t forget that, apart from context switches, there are also interrupts - your thread may get interrupted by any interrupt of priority that is above your current IRQL. Therefore, if you want your operation to be really atomic, you have either to disable interrupts or raise IRQL to the highest possible level, although a this point you get severely restricted in what you can do…

Anton Bassov

xxxxx@hotmail.com wrote:

What makes you believe so??? If there is any thread of higher priority
that is currently eligible for execution, your thread will get
pre-empted immediately, and if your thread priority is in the real-time
range, then, as long as no other thread of the same or higher priority
that is eligible for execution is around, your thread will run until it
voluntary yields execution. Therefore, in the real-time range unused
quantum may have any effect only when “fighting for CPU” with threads
of the same priority as yours.

The situation I’m trying to avoid is having the writes start just before
the thread quantum is due to expire, and get swapped out. So by yielding,
I thought it would start the writes immediately after it got re-scheduled,
and hence maximise the probability that it would complete before power
dropped out…

Are you saying “real-time” threads don’t have a quantum, and will execute
until pre-emption or voluntary yield?

BTW, don’t forget that, apart from context switches, there are also
interrupts - your thread may get interrupted by any interrupt of
priority that is above your current IRQL.

Understand.

operation to be really atomic, you have either to disable interrupts or
raise IRQL to the highest possible level, although a this point you get
severely restricted in what you can do…

Not really a problem when you’ve just lost power! :wink:

Regards,


Mark McDougall, Engineer
Virtual Logic Pty Ltd, http:
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266</http:>

Mark McDougall wrote:

There’s one critical IOCTL call in the driver that must be an atomic
operation. Once it starts, it must finish. If it doesn’t start at all, no
problem. If power fails during the operation - problem!

I was thinking that the best way to maximise the possiblity that it
would complete is to do this within the driver IOCTl call:

* raise thread priority to real-time
* yield
* check powerfail status
* do atomic operation
* lower thread priority to normal

My thinking is that by yielding, the thread is guaranteed to execute for
its full quantum when it is re-scheduled (barring pre-emption). So the
atomic operation should have enough time to complete, even if power fails
immediately after the thread has checked the status.

Anyone have any thoughts on this? Am I right?

Why don’t you just grab a spinlock to raise your IRQL? That won’t
protect against other interrupts, but it should protect you from being
rescheduled.

(Interesting side trip: while trying to answer the question “how would I
yield the CPU in a kernel driver”, I Googled for “yield kernel” and was
presented with many pages of agricultural reports on grains and nuts.)


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> So by yielding, I thought it would start the writes immediately after it got re-scheduled,

and hence maximise the probability that it would complete before power dropped out…

You should never make any assumtions about the thread scheduling - everything depends on how your thread fares against all other threads in the system, and, for the practical purposes, you cannot make any judgement of the system’s current state, because the state of the system tables may change at any moment without your knowledge. Therefore, any info about the system’s current state is potentially obsolete and cannot be used for any practical purposes by anyone, apart from the OS itself…

Are you saying “real-time” threads don’t have a quantum…

I am not saying that - any thread has its quantum. The only reason why I made a special notice about real-time threads is because they are not subjected to various manipulations with *current* priority that are done by the scheduler with those threads that fall into the dynamic range. However,
when its comes to real-time threads, their current priorities are always equal to their basic ones

… and will execute until pre-emption or voluntary yield?

If no threads of the same or higher priority that are eligible for execution are around, then your thread will run until it yields the CPU. If there is some thread of the same priority that is eligible for execution, then it will get scheduled only after currently running thread’s quantum expires (or currently running thread yields execution). If some thread of higher priority is eligible for execution, currently running thread will get pre-empted staright away.

> although a this point you get severely restricted in what you can do…

Not really a problem when you’ve just lost power! :wink:

Think about it carefully, and you will understand that if your IOCTL is unable to do what it has to without raising unrecoverable exceptions, it does not really make sense to worry about its completion timeframe - it will never get completed successfully anyway, and I believe this is not something that you are interested in…

Anton Bassov

How long does your atomic operation take to complete? If the answer is less
than half a millisecond or so, just wrap it in:

KeRaiseIrql(DISPATCH_LEVEL, &oldIrql);



KeLowerIrql(oldIrql);

- Jake Oshins

“Mark McDougall” wrote in message news:xxxxx@ntdev…
> Hi gurus,
>
> Wwhat I’m trying to achieve…
>
> We have a powerfail interrupt in our driver. Once signalled, the power is
> guaranteed to be up for at least a further 25ms.
>
> There’s one critical IOCTL call in the driver that must be an atomic
> operation. Once it starts, it must finish. If it doesn’t start at all, no
> problem. If power fails during the operation - problem!
>
> I was thinking that the best way to maximise the possiblity that it
> would complete is to do this within the driver IOCTl call:
>
> * raise thread priority to real-time
> * yield
> * check powerfail status
> * do atomic operation
> * lower thread priority to normal
>
> My thinking is that by yielding, the thread is guaranteed to execute for
> its full quantum when it is re-scheduled (barring pre-emption). So the
> atomic operation should have enough time to complete, even if power fails
> immediately after the thread has checked the status.
>
> Anyone have any thoughts on this? Am I right?
>
> And lastly, how do I yield so that I can guarantee a full quantum of
> execution (barring pre-emption) when re-scheduled (as opposed I guess to
> the current thread simply having its current quantum decremented)? How
> about KeDelayExecution(arbitrary-time)?
>
> TIA
> Regards,
>
> –
> Mark McDougall, Engineer
> Virtual Logic Pty Ltd, http:
> 21-25 King St, Rockdale, 2216
> Ph: +612-9599-3255 Fax: +612-9599-3266
></http:>

What’s the effective difference between KeRaiseIrql(DISPATCH_LEVEL, …) and KeRaiseIrqlToDpcLevel()?

Is KeRaiseIrqlToDpcLevel more efficient?

> What’s the effective difference between KeRaiseIrql(DISPATCH_LEVEL, …) and

KeRaiseIrqlToDpcLevel()?

Is KeRaiseIrqlToDpcLevel more efficient?

It depends on definitions…

If you look at wdm.h, you will see the following declarations:

#define KeRaiseIrql(a,b) *(b) = KfRaiseIrql(a)

NTKERNELAPI VOID KeRaiseIrql ( IN KIRQL NewIrql, OUT PKIRQL OldIrql);

Therefore, depending on definitions, KeRaiseIrql() in your code may result either in calling
KeRaiseIrql() export (which calls KfRaiseIrql() that actually writes to the Task Priority Register, so that few CPU cycles are wasted on this extra call), or in calling KfRaiseIrql() export directly.

However, KeRaiseIrqlToDpcLevel() in your code always result in calling KeRaiseIrqlToDpcLevel()export, and this function writes to the Task Priority Register itself.

If you compare the efficiency of KeRaiseIrqlToDpcLevel() vs KfRaiseIrql(), the latter has to do one memory access more than the former. IRQL is just an index to the the array where values to be written to the Task Priority Register are stored, so that KfRaiseIrql() has to get this value first. However, KeRaiseIrqlToDpcLevel() does not need it - it just writes a pre-defined value of 0x41
to the TPP.

Anton Bassov

Jake Oshins wrote:

How long does your atomic operation take to complete? If the answer is
less than half a millisecond or so, just wrap it in:
KeRaiseIrql(DISPATCH_LEVEL, &oldIrql);

> KeLowerIrql(oldIrql);

Heh, actually, I’ve been asked how long they are allowed to complete!

Basically, there is a background task that consists of continually
performing a set of atomic operations. After powerfail is detected, any
outstanding ‘set’ needs to complete. Consistency is the key here.

I’m trying to ascertain the maximum length of such an operation that can
typically be performed under these conditions.

Each individual access takes about 15us. However, to be useful, they must
be able to issue a set of access - to be considered ‘atomic’ - and they
want to know how many of these accesses they have to play with.

So if I tell them 1000 accesses, which will take in the order of 15ms,
then the “guarantee” is that all accesses will (typically) complete before
power is lost. So that would mean I need to ensure that this thread will
get either 0 or at least 15ms of quanta after powerfail is signalled and
before power is really lost.

Regards,


Mark McDougall, Engineer
Virtual Logic Pty Ltd, http:
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266</http:>

xxxxx@hotmail.com wrote:

If no threads of the same or higher priority that are eligible for
execution are around, then your thread will run until it yields the
CPU. If there is some thread of the same priority that is eligible for
execution, then it will get scheduled only after currently running
thread’s quantum expires (or currently running thread yields
execution). If some thread of higher priority is eligible for
execution, currently running thread will get pre-empted staright away.

Consider this scenario:

The system has two threads with real-time priority. Between them, they get
round-robin scheduled. If one of them is my thread, then it may get to the
point where it starts these ‘atomic’ accesses, and then its quantum
expires and the other thread is scheduled. By the time the other thread
expires, power could be lost… and the accesses don’t complete.

Now, same scenario but my thread yields immediately before starting the
accesses. When it is re-scheduled, it has an entire quantum to complete
the atomic accesses, since I know it’s not due to get swapped out.

(Ignoring pre-emption of higher priorities, IRQs etc, since that’s outside
my control)…

Does this make sense?

Regards,


Mark McDougall, Engineer
Virtual Logic Pty Ltd, http:
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266</http:>

Tim Roberts wrote:

Why don’t you just grab a spinlock to raise your IRQL? That won’t
protect against other interrupts, but it should protect you from being
rescheduled.

The process involves an IOCTL call executing in the calling thread’s
context, including data from user-mode buffers, so I don’t think this is
an option. :frowning:

Regards,


Mark McDougall, Engineer
Virtual Logic Pty Ltd, http:
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266</http:>

Mark,

I am afraid you are on the wrong venue from the very, very beginning…

I would rather do things the following way:

I would logically break my operation into pre-processing, processing and post-processing, and define my goal as insuring atomicity of operation throughout the processing stage - after all, as you said it yourself, if power goes off before processing begins or after it has been already done, there is no problem whatsoever.

My pre-processing stage would involve building MDLs for all user-mode buffers that I have to access during processing stage, and locking them in RAM - if this part gets aborted, there is no problem whatsoever. After all buffers that I need have been locked in RAM and are safely accessible at elevated IRQL, I would raise IRQL to DPC level, and, at this point, I would consider the actual operation having commenced. Once our current IRQL==DISPATCH_LEVEL operation
cannot be interrupted by context switches, which increases the probability that operation will get completed successfully before the machine is off. When it is over, I would unlock memory - again,
if this part gets aborted, there is no problem whatsoever.

Certainly, you cannot spend too long at elevated IRQL. As you say, you have to calculate the right
number of accesses. Therefore, just make sure that it falls within MSFT-defined limit of 100 microseconds - otherwise, your driver has no chance to get certified…

Anton Bassov

xxxxx@hotmail.com wrote:

I would logically break my operation into pre-processing, processing
and post-processing, and define my goal as insuring atomicity of
operation throughout the processing stage - after all, as you said it
yourself, if power goes off before processing begins or after it has
been already done, there is no problem whatsoever.

My pre-processing stage would involve building MDLs for all user-mode
buffers that I have to access during processing stage, and locking
them in RAM - if this part gets aborted, there is no problem
whatsoever. After all buffers that I need have been locked in RAM and
are safely accessible at elevated IRQL, I would raise IRQL to DPC
level, and, at this point, I would consider the actual operation having
commenced. Once our current IRQL==DISPATCH_LEVEL operation cannot be
interrupted by context switches, which increases the probability that
operation will get completed successfully before the machine is off.
When it is over, I would unlock memory - again, if this part gets
aborted, there is no problem whatsoever.

Certainly, you cannot spend too long at elevated IRQL. As you say, you
have to calculate the right number of accesses. Therefore, just make
sure that it falls within MSFT-defined limit of 100 microseconds -
otherwise, your driver has no chance to get certified…

I suspect you are correct, and in an ideal world, I would probably have
the luxury of following your advice and starting over.

In this case, however, the original driver was written by the customer. We
actually had nothing at all to do with the software at the start (we
weren’t even privvy to the requirements) - we did the firmware - but as
the job is winding up several “outstanding” tasks have been assigned to us
as their original software resources are busy elsewhere.

One of those tasks involves profiling the (existing) driver performance to
ascertain how much processing we can be “guaranteed” to do in 25ms after
power has failed. This processing is done continuously - not just after
power fail - so it can’t be too greedy with resources. When power fails,
any incomplete atomic operation is completed, otherwise the thread hangs.

As for certification, that’s not an issue - it’s a closed platform.

Regards,


Mark McDougall, Engineer
Virtual Logic Pty Ltd, http:
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266</http:>

Anton touched on this, but the buffers in the user mode application
could be paged out and the act of bringing in those pages could take
longer than your max time requirements. If you are using buffered io in
your IOCTLs, the allocation for the double buffer could also fail. At
what point do you accept the limitations of the OS, especially an OS
that is not real time?

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Mark McDougall
Sent: Sunday, August 05, 2007 10:20 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Yielding to guarantee full quantum execution?

xxxxx@hotmail.com wrote:

I would logically break my operation into pre-processing, processing
and post-processing, and define my goal as insuring atomicity of
operation throughout the processing stage - after all, as you said it
yourself, if power goes off before processing begins or after it has
been already done, there is no problem whatsoever.

My pre-processing stage would involve building MDLs for all user-mode
buffers that I have to access during processing stage, and locking
them in RAM - if this part gets aborted, there is no problem
whatsoever. After all buffers that I need have been locked in RAM and
are safely accessible at elevated IRQL, I would raise IRQL to DPC
level, and, at this point, I would consider the actual operation
having
commenced. Once our current IRQL==DISPATCH_LEVEL operation cannot be
interrupted by context switches, which increases the probability that
operation will get completed successfully before the machine is off.
When it is over, I would unlock memory - again, if this part gets
aborted, there is no problem whatsoever.

Certainly, you cannot spend too long at elevated IRQL. As you say, you
have to calculate the right number of accesses. Therefore, just make
sure that it falls within MSFT-defined limit of 100 microseconds -
otherwise, your driver has no chance to get certified…

I suspect you are correct, and in an ideal world, I would probably have
the luxury of following your advice and starting over.

In this case, however, the original driver was written by the customer.
We
actually had nothing at all to do with the software at the start (we
weren’t even privvy to the requirements) - we did the firmware - but as
the job is winding up several “outstanding” tasks have been assigned to
us
as their original software resources are busy elsewhere.

One of those tasks involves profiling the (existing) driver performance
to
ascertain how much processing we can be “guaranteed” to do in 25ms after
power has failed. This processing is done continuously - not just after
power fail - so it can’t be too greedy with resources. When power fails,
any incomplete atomic operation is completed, otherwise the thread
hangs.

As for certification, that’s not an issue - it’s a closed platform.

Regards,


Mark McDougall, Engineer
Virtual Logic Pty Ltd, http:
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer</http:>

Mark,

One of those tasks involves profiling the (existing) driver performance to
ascertain how much processing we can be “guaranteed” to do in 25ms after
power has failed.

Don’t you understand that *ANY* high-precision profiling at low IRQL is just infeasible it itself???

To begin with, as long as you are at IRQLa page fault (let’s say your thread of the highest possible priority and no other threads of the same priority are around). Page faults have to be processed synchronously. Therefore, Memory Manager
will issue an IRP to the FSD, and your thread will get blocked until IRP get completed, so that CPU will be given to another thread. As you can see, pretty much depends on the memory status - the number of context switches is at least equal to the number of page faults that your thread generates, so that the results may be rather different on two different tests. I don’t even mention the fact that another thread may decide that it does not want to be pre-empted, and lowers your thread’s priority while it is in the waiting state - you have no control over it whatsoever.

As you can see, nothing depends on you here - in order to get more-or-less precise results you have to profile your code at least at DPC level…

Anton Bassov

Doron Holan wrote:

Anton touched on this, but the buffers in the user mode application
could be paged out and the act of bringing in those pages could take
longer than your max time requirements. If you are using buffered io in
your IOCTLs, the allocation for the double buffer could also fail. At
what point do you accept the limitations of the OS, especially an OS
that is not real time?

It’s looking more and more like the only option is to have each IOCTL call
memcpy the data into non-pageable memory in the driver. Under normal
circumstances the driver could then continue in the current thread context
and then mark the atomic operation as ‘complete’.

On a powerfail, the DPC for the power fail interrupt in the driver could
then check for incomplete accesses and re-start the entire atomic
operation within the DPC context, minimising the chance of getting
swapped-out. I guess it would then flag the operation as complete, so that
if the original thread ever got re-scheduled before power loss, it
wouldn’t bother with any further accesses…

Regards,


Mark McDougall, Engineer
Virtual Logic Pty Ltd, http:
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266</http:>

What is the buffering type of the ioctl? Buffered or direct? If it is
buffered, the buffer (which is a union of input and output) you get is
already non paged. For a direct ioctl, the input buffer is non paged as
well. So, I am a bit confused about where you will do the memcpy and
what that buys you unless you are getting a UM VA and then manually
probing/locking and then mapping it into system VA.

My point was really referring to the initial call in your UM application
to DeviceIoControl. The buffer it passes to this API can be paged out
and there is nothing you can do about that in the driver before hand.
The application must either pend the ioctl before power is lost or the
application uses VirtualAlloc and specifies that the pages are locked
down (assuming the app has sufficient rights to do this).

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Mark McDougall
Sent: Sunday, August 05, 2007 11:27 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Yielding to guarantee full quantum execution?

Doron Holan wrote:

Anton touched on this, but the buffers in the user mode application
could be paged out and the act of bringing in those pages could take
longer than your max time requirements. If you are using buffered io
in
your IOCTLs, the allocation for the double buffer could also fail. At
what point do you accept the limitations of the OS, especially an OS
that is not real time?

It’s looking more and more like the only option is to have each IOCTL
call
memcpy the data into non-pageable memory in the driver. Under normal
circumstances the driver could then continue in the current thread
context
and then mark the atomic operation as ‘complete’.

On a powerfail, the DPC for the power fail interrupt in the driver could
then check for incomplete accesses and re-start the entire atomic
operation within the DPC context, minimising the chance of getting
swapped-out. I guess it would then flag the operation as complete, so
that
if the original thread ever got re-scheduled before power loss, it
wouldn’t bother with any further accesses…

Regards,


Mark McDougall, Engineer
Virtual Logic Pty Ltd, http:
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer</http:>

Doron Holan wrote:

What is the buffering type of the ioctl? Buffered or direct? If it is
buffered, the buffer (which is a union of input and output) you get is
already non paged.

It’s buffered.

Hmm, I guess I should’ve thought about it more - I’ve been too focused on
the scheduling/quantum thing and switched to this line of thinking based
on the feedback to my original post… thanks!

My point was really referring to the initial call in your UM application
to DeviceIoControl. The buffer it passes to this API can be paged out
and there is nothing you can do about that in the driver before hand.
The application must either pend the ioctl before power is lost or the
application uses VirtualAlloc and specifies that the pages are locked
down (assuming the app has sufficient rights to do this).

If the power fail is signalled before the driver starts the writes, then
that’s OK because the 1st thing this particular IOCTL does is check that
power hasn’t failed. If it has, it stops before doing any accesses.

So I guess I need to modify the behaviour of this IOCTL to firstly store
the buffer address in the devext, and flag an ‘atomic’ operation in
progress. In the normal course of events, the IOCTL can then complete in
the original context and then flag that the ‘atomic’ operation has
completed before returning.

In the case of a power fail, the driver gets an interrupt. The DPC can
detect an atomic operation has started, and then re-start the accesses and
complete the entire IOCTL call in the DPC context. This will maximise (not
guarantee) the probability that the operation will complete after a powerfail.

Thanks again for your input!
Regards,


Mark McDougall, Engineer
Virtual Logic Pty Ltd, http:
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266</http:>

xxxxx@hotmail.com wrote:

Don’t you understand that *ANY* high-precision profiling at low IRQL is
just infeasible it itself???

As you can see, nothing depends on you here - in order to get
more-or-less precise results you have to profile your code *at least*
at DPC level…

I understand this, yes.

The original intention was to run a statistically significant number of
tests on a system under a known load and profile the performance of this
IOCTL call.

My initial tests showed a scattering of results - the worst of which
exhibited over 40ms latency before even starting. That’s not surprising,
but I had to start somewhere.

What I’m trying to do now is formulate a strategy that will maximise the
probability that a certain number of accesses will complete within a
certain time period. And from feedback it would appear DPC is the way to
go - an angle I hadn’t considered originally, mainly because I was
originally asked to merely profile the performance of this driver I
didn’t write. Now that it is clear it’s not going to work for them, I’ve
been asked to look at improving the odds.

I understand that the code is at the whim of windows scheduling, paging
and other tasks on the system. Which is why we’ve characterised a heavy
load as a ‘worst-case scenario’. Again, I’m not after any guarantees at all.

And in case you’re about to throw Heisenberg into the mix - the profiling
is done completely within an FPGA so there’s no software to taint the
results. :wink:

Regards,


Mark McDougall, Engineer
Virtual Logic Pty Ltd, http:
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266</http:>

> I understand that the code is at the whim of windows scheduling, paging

and other tasks on the system. Which is why we’ve characterised a heavy
load as a ‘worst-case scenario’.

Now, once you have agreed that you should do your tests at DPC level, you have a chance to get more-or-less usable results. In order to improve reliability of your tests even more, I would advise you to run 2 types of tests:

  1. Test DPC latency, i.e. the longest period that may elapse in between your ISR had returned and DPC that it had queued started running. Please note that it may take quite a while under some circumstances (for example, when network traffic is really heavy)

  2. Substract the estimated latency from your 25 ms, and check how many accesses
    at DPC level can be made. At this point interrupts are your only distraction - DPCs that these interrupts ISRs queue don’t affect you, since they may have a chance to be executed only after
    your DPC routine has returned

If you do it this way, I believe you can get pretty much reliable results

Anton Bassov