Sleep function

wesley · February 7, 2013, 10:44am

Hi, all,

I have a question about the resolution of the sleep function.

Actually, I want to add sleep in the usbsamp.exe code.
But MSDN said that Sleep function resolution is determined by hardware.

So I try to get the resolution code timeGetDevCaps?
But it failed by “unresolved external symbol of this function”
I have include:
//#include “Mmsystem.h”
//#pragma comment (lib,“winmm.lib”)
still could not successfully generate the exe.

While, I change to using following function to replace timeGetDevCaps to get the really time delay for 10ms.
///
QueryPerformanceFrequency(&nFreq);
dqFreq=(double)nFreq.QuadPart;
printf(“Freq: %d\n”, nFreq.QuadPart);

QueryPerformanceCounter(&time_start);
Sleep(10);
QueryPerformanceCounter(&time_over);
printf(“Time consuming: %f\n”,(time_over.QuadPart-time_start.QuadPart)/dqFreq);

///

My question is:
1, why I can not using timeGetDevCaps even i include the need .h and lib?
Does the usbsamp console app could not use this function?
Or this function could not use on Win7 system?
2. Does the replace method could get the correct time delay correct?
3. By the replace method, sleep(10), always less than 10ms, such 9.1 and so on, why?
4. MSDN said that if PC tick is 15ms, so sleep(1) will less than 1ms?
But someone disagree and said that, it could be 15ms, not less, when you set a value less than a tick.
But the really case, I do with queryxxx function, it is less than 10ms, why?

And what PC hardware tick need 15ms, so long? or does it just a example in MSDN?

Tim_Roberts · February 7, 2013, 12:30pm

workingmailing@163.com wrote:

I have a question about the resolution of the sleep function.

It depends on a number of factors, including server vs workstation.
These days, it’s in the range of 10ms to 16ms.

Actually, I want to add sleep in the usbsamp.exe code.
But MSDN said that Sleep function resolution is determined by hardware.

Well, sort of.

So I try to get the resolution code timeGetDevCaps?
But it failed by “unresolved external symbol of this function”

That’s not a real error message. Please tell is the exact message.

My question is:
1, why I can not using timeGetDevCaps even i include the need .h and lib?
Does the usbsamp console app could not use this function?

You must have made a mistake, because it works fine.

C:\tmp>type x.cpp
#include <windows.h>
#include <stdio.h>
#include <mmsystem.h>
#pragma comment( lib, “winmm.lib” )

int main()
{
TIMECAPS p;
UINT sz = 0;
timeGetDevCaps( &p, sz );
return 0;
}

C:\tmp>cl x.cpp
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01
for 80x86
Copyright (C) Microsoft Corporation. All rights reserved.

x.cpp
Microsoft (R) Incremental Linker Version 10.00.40219.01
Copyright (C) Microsoft Corporation. All rights reserved.

/out:x.exe
x.obj

C:\tmp>

> 2. Does the replace method could get the correct time delay correct?

Depends on what you mean. Remember that there can be a thread switch or
interrupt at any time. QueryPerformanceCounter measures real elapsed
time, so it continues to tick even while the CPU is busy handling
another task.

Note that it is a huge mistake to attempt to do any kind of real-time
processing in user mode. Windows is not a real-time system (especially
in user mode), and it never will be. You need to rethink what you’re doing.

> 4. MSDN said that if PC tick is 15ms, so sleep(1) will less than 1ms?

No. Think about it this way. The timer counts are only checked during
a “scheduler interval”, which is every 10ms to 16ms. If you do a
Sleep(1), your thread is put to sleep, and another thread is given the
CPU. At the next scheduler interval, the scheduler will check the
current timer wait list. In your case, your thread’s wait time has
expired, so your thread becomes “ready to run”. So, at that point, 16ms
have already elapsed. However, your thread might not be started
immediately. You are “ready to run”, but you are competing with all of
the other “ready to run” threads.

Typically, a Sleep(1) does sleep for exactly one scheduler interval.
The contract is that your process sleeps for “at least” the number of
milliseconds you ask for.

> But someone disagree and said that, it could be 15ms, not less, when you set a value less than a tick.
> But the really case, I do with queryxxx function, it is less than 10ms, why?

Hard to say. Could be a rounding issue.

> And what PC hardware tick need 15ms, so long? or does it just a example in MSDN?

It’s a trade-off. If you reduce the scheduler interval, then you
increase the system overhead. You’re giving each process a shorter time
to run, and incurring the overhead of the scheduler process much more often.

Also, remember that these interval times date from WAY back, when a hot
processor was a 90 MHz Pentium. At that rate, 15ms gets you about 1.5
million cycles. Today, 15ms gets you about 45 million cycles, so it’s
not unreasonable to think about reducing the interval.

The 16-bit systems (Windows 95 and 98) used a 55ms interval.

You CAN reduce the scheduler interval by using timeBeginPeriod and
timeEndPeriod, but that has a negative impact on overall system performance

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.</mmsystem.h></stdio.h></windows.h>

OSR_Community_User · February 7, 2013, 1:26pm

Tim covered all the high points, but the basic question remains: what do
you think a Sleep(1) is going to accomplish? What problem are you trying
to solve?
joe

Hi, all,

I have a question about the resolution of the sleep function.

Actually, I want to add sleep in the usbsamp.exe code.
But MSDN said that Sleep function resolution is determined by hardware.

So I try to get the resolution code timeGetDevCaps?
But it failed by “unresolved external symbol of this function”
I have include:
//#include “Mmsystem.h”
//#pragma comment (lib,“winmm.lib”)
still could not successfully generate the exe.

While, I change to using following function to replace timeGetDevCaps to
get the really time delay for 10ms.
///
QueryPerformanceFrequency(&nFreq);
dqFreq=(double)nFreq.QuadPart;
printf(“Freq: %d\n”, nFreq.QuadPart);

QueryPerformanceCounter(&time_start);
Sleep(10);
QueryPerformanceCounter(&time_over);
printf(“Time consuming:
%f\n”,(time_over.QuadPart-time_start.QuadPart)/dqFreq);

///

My question is:
1, why I can not using timeGetDevCaps even i include the need .h and lib?
Does the usbsamp console app could not use this function?
Or this function could not use on Win7 system?
2. Does the replace method could get the correct time delay correct?
3. By the replace method, sleep(10), always less than 10ms, such 9.1 and
so on, why?
4. MSDN said that if PC tick is 15ms, so sleep(1) will less than 1ms?
But someone disagree and said that, it could be 15ms, not less, when you
set a value less than a tick.
But the really case, I do with queryxxx function, it is less than 10ms,
why?

And what PC hardware tick need 15ms, so long? or does it just a example in
MSDN?

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

wesley · February 7, 2013, 9:36pm

My problem is:

in my USB device, there is a ISO ep.

I use bulk out and ISO in to verify the correctness of the ISO transfer.

It is USBSamp.exe write/read/compare. For example 16KB.

After runing some time, for example, 10.000 times, this compare failed, for the reason of ISO in the first DP is 0 length packet.

Device will return 0 length packet to ISO IN request if it does not finish preparation.

So I make a delay between Write and Read, sleep(5).

But it still after running sometimes.

So I want to known, does host really wait 5ms seconds or not?

If host does wait 5ms, then it is device have problem.
If host wait 5ms is not guranteened, then maybe sometimes, host wait very less time, which time is not enough for device to preparation the ISO in data, then it return 0 length for the first DP of 16K (16 DPs, each DP is for 1KB)

OSR_Community_User · February 7, 2013, 10:37pm

> My problem is:

in my USB device, there is a ISO ep.

I use bulk out and ISO in to verify the correctness of the ISO transfer.

It is USBSamp.exe write/read/compare. For example 16KB.

After runing some time, for example, 10.000 times, this compare failed,
for the reason of ISO in the first DP is 0 length packet.

Device will return 0 length packet to ISO IN request if it does not finish
preparation.

So I make a delay between Write and Read, sleep(5).

But it still after running sometimes.

As Tim pointed out, the minimum wait is typically one motherboard timer
tick, so whatever value you give is “rounded up” to the next tick quantum,
so 1…15 => 15, 16…30 => 30, etc.

Simple answer: it won’t wait LESS than 5ms, but it could wait somewhere
between 15ms and forever (run the program void main(){ while(1); } as a
priority 31 thread and plan on waiting a while).

If your driver malfunctions under such conditions, the fix properly
belongs in the driver. Essentially, you block starting a read until a
write has completed.

Something I used to tell my app-level students: “If your program doesn’t
work unless you have strategically-placed Sleep() calls, your design is
wrong”. I would suggest the same is true at driver level. You cannot
rely on the app programmer putting the correct Sleep() calls in, and as an
ISO transmission, you are already time-sensitive. Adding additional
unknown and unpredictable delays at app level cannot possibly help this.
So I’d suggest that read IRPs are not dequeued if a write IRP is in
progress. Fix the problem at the source. This is like painting over a
joint between two boards instead of gluing them together.

Essentially, what you have said is, “My driver seems to work by accident,
and every once in a while the accident fails and so my driver fails. So I
want to kludge a solution that preserves the characteristics of this
accident”. But the real solution is that your driver has to enforce any
sequencing behavior the device requires.

joe

So I want to known, does host really wait 5ms seconds or not?

If host does wait 5ms, then it is device have problem.
If host wait 5ms is not guranteened, then maybe sometimes, host wait very
less time, which time is not enough for device to preparation the ISO in
data, then it return 0 length for the first DP of 16K (16 DPs, each DP is
for 1KB)

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

wesley · February 7, 2013, 11:01pm

can’t not LINK function “timeGetDevCaps” is fixed

Because I use WDK win7 x86 check build, it should include target libs into the source file.

TIMECAPS value get by timeGetDevCaps, on my PC, Min is 1 and MAX is 1,000,000
Which value is high resolution after using timeBeginPeriod? Min or Max?

If I want to delay time with value between 1ms to 10ms of function Sleep(), what value should I set in the timeBeginPeriod?

From Tim’s help text
If I did not known the original resolution of my PC timer, and what ever it is.
Can I consider that: code “Sleep(5)”, could only be wait more than 5ms, but no case of less than 5ms? For the reason of original scheduler interval is more than 5ms and “ready but not running immediately” even make the time more longer?
processor was a 90 MHz Pentium. At that rate, 15ms gets you about 1.5 million cycles.
Sorry, how you calculate the cycles?
I re-calculate it:
it should be: 1,350,000 cycles.
How can I measure the Sleep delay correctness?
After I call “timeBeginPeriod(1)”
Then
/////
QueryPerformanceFrequency(&nFreq);
dqFreq=(double)nFreq.QuadPart;
printf(“Freq: %d\n”, nFreq.QuadPart);

QueryPerformanceCounter(&time_start);
Sleep(5);
QueryPerformanceCounter(&time_over);
printf(“Time consuming: %f\n”,(time_over.QuadPart-time_start.QuadPart)/dqFreq);
/////

the print result is:
Freq: 2435927
Time consuming 0.004306

Does this value is really express the Sleep(5) or not?

I under Tim’s text as:
As my app call Sleep(5). then my app relinguish its time slice, and after 5ms, it return and get the CPU, but in this idle time, tick is going on
so the QueryPerformanceCounter function is get the absolute time, so this value is right?
Does my understand right?

wesley · February 7, 2013, 11:10pm

And it is weird:

When I set the timeEndPeriod() to the max resolution get by timeGetDevCaps

the print result is:
Freq: 2435927
Time consuming xxxxx

have not much differ to set with the min resolution.

Why?

In my opinion, if I set to max, 1,000,000

the Time consuming print should at least 1000000ms, but it is not, why?

wesley · February 7, 2013, 11:15pm

sorry, it is timeBeginPeriod for upper post, not timeEndPeriod.
An other question, how to get the original resolution value, and does timeEndPeriod restore the original resolution?

Daniel_Terhell · February 7, 2013, 11:25pm

>And it is weird:

In my opinion, if I set to max, 1,000,000
the Time consuming print should at least 1000000ms, but it is not, why?

“Windows uses the lowest value (that is, highest resolution) requested by
any process”.
It means you cannot use timeBeginPeriod to higher the clock frequency, you
can only use it to lower it. The docs for ExSetTimerResolution that this
boils down to explain this better: " The routine changes the clock interrupt
frequency only if the specified DesiredTime value is less than the current
setting. "

You can use the clockres utility to check the current setting.

//Daniel

wesley · February 8, 2013, 2:55am

Excuse me, clockres utility is specific to what functions?

OSR_Community_User · February 8, 2013, 5:02am

> 1. can’t not LINK function “timeGetDevCaps” is fixed

Because I use WDK win7 x86 check build, it should include target libs into
the source file.

TIMECAPS value get by timeGetDevCaps, on my PC, Min is 1 and MAX is
1,000,000
Which value is high resolution after using timeBeginPeriod? Min or Max?

If I want to delay time with value between 1ms to 10ms of function
Sleep(), what value should I set in the timeBeginPeriod?

From Tim’s help text
If I did not known the original resolution of my PC timer, and what ever
it is.
Can I consider that: code “Sleep(5)”, could only be wait more than 5ms,
but no case of less than 5ms? For the reason of original scheduler
interval is more than 5ms and “ready but not running immediately” even
make the time more longer?

As far as I know, the time honored by Sleep can never be less than the
number of ms you request. But it is unbounded as far as maximum delay is
concerned.

Note that many, perhaps nearly all, kernel threads run at priorities >=
16. Unless you have the privileges to do otherwise, the threads in your
app all run at < 16. This means that if your driver is running in an
environment that needs a disproportionate amount of kernel thread
execution, you could be talking about delays two or three orders of
magnitude greater than your request. Also, look at the MultiMedia
Scheduling Service if you are running Vista+, which allows any app to get
thread priorities in a “normal” app to lie between 16 and 25 or 26, I
don’t have my course notes handy. This can work for you, allowing you to
get better-than-expected response time, or against you, because other apps
using MMSS (or some vaguely similar acronym) could interfere with your
app.

If your thread is marked “ready to run” it is placed at the tail of all
other threads of that priority, and will run after all higher-priority
threads have run and all the threads ahead of it have run (which
description naively ignored multicore scheduling effects).

processor was a 90 MHz Pentium. At that rate, 15ms gets you about 1.5
million cycles.
Sorry, how you calculate the cycles?
I re-calculate it:
it should be: 1,350,000 cycles.

Which is about 1.5 million, depending on what you mean by “about”. Also,
CPU cycles are not a good measure of expected performance because all
architectures beyond and including the Pentium II have at least caches,
instruction pipes, and dynamic register renaming. In later architectures,
the notions of superscalar architectures with opportunistic execution,
hyperthreading, speculative reads, and branch prediction got added until
the modern chips resemble supercomputer mainframes of the 1980s. So
“cycles”, as such, do not allow direct cross-comparison of performance
across chip generations. Even the hoary old MIPS designations are no
longer good indicators. Then add in multicore performance bottlenecks
such as cache coherency management and (over)use of the LOCK prefix, and
you find that clock speed, memory speed, etc. are at best indicators of
performance, but the complexity of execution patterns does not let you say
“machine X will be this much faster than machine Y”. Then you end up, in
some cases, where refactoring a computation to be cache-aware can buy you
factors of 20-50, and making anything that does paging becoe refactored to
be paging-aware, and you can see improvements of 3 to 5 orders of
magnitude for disproportionately small coding effort. In recent years,
add offloading to GPUs, and the discussions have to end up at “wall clock
time”.

So don’t be overly fixated on precision of cycle computations.

How can I measure the Sleep delay correctness?
After I call “timeBeginPeriod(1)”
Then
/////
QueryPerformanceFrequency(&nFreq);
dqFreq=(double)nFreq.QuadPart;
printf(“Freq: %d\n”, nFreq.QuadPart);

QueryPerformanceCounter(&time_start);
Sleep(5);
QueryPerformanceCounter(&time_over);
printf(“Time consuming:
%f\n”,(time_over.QuadPart-time_start.QuadPart)/dqFreq);
/////

the print result is:
Freq: 2435927
Time consuming 0.004306

Does this value is really express the Sleep(5) or not?

I under Tim’s text as:
As my app call Sleep(5). then my app relinguish its time slice, and after
5ms, it return and get the CPU, but in this idle time, tick is going on
so the QueryPerformanceCounter function is get the absolute time, so this
value is right?
Does my understand right?

Yes, but then note that on multicore machines, each has its own frequency
counter, and the core you are executing on when you ask to QPC to get the
end tie may not be the core you are executing on when you read the start
time, unless you SetThreadAffinity to only one core. And while this
renders the RDTSC time more correct, it also disallows opportunistic
thread execution on whatever core is available, which means that although
your timing may be more precise, the actual time-to-completion may be
artificially biased high because only one core is allowed to run the
thread.

I still think Sleep is the wrong approach here; I’d be looking at how to
modify the driver so that the offending IRP cannot be dispatched if there
is a conflict.
joe

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Tim_Roberts · February 8, 2013, 1:03pm

workingmailing@163.com wrote:

TIMECAPS value get by timeGetDevCaps, on my PC, Min is 1 and MAX is 1,000,000
Which value is high resolution after using timeBeginPeriod? Min or Max?

Did you even read the documentation? Like, where it says “wPeriodMin is
the minimum supported resolution in milliseconds”?

If I want to delay time with value between 1ms to 10ms of function Sleep(), what value should I set in the timeBeginPeriod?

You ought to be able to answer this question. If you want the timer to
have a resolution of 1ms, what value do you need to send to timeBeginPeriod?

However, as we have pointed out several times in this thread, you will
NEVER be able to turn Windows into a real-time operating system. If you
think you are going to tune your device by adjusting the sleep values
between 1 and 10, then you are just fooling yourself. It’s not going to
happen. Plus, you are working with a USB device. While you are
micro-tuning your applications, remember that there are undefined delays
in communicating from your app through the kernel stack, plus more
delays while you wait for USB scheduling, plus more delays while the
device responds, plus more delays while the requests complete, plus more
delays while it switches back into user-mode to notify you.

The only way to have success with this kind of thing is to think of all
delays less than about 25ms as identical – a “short delay”.

From Tim’s help text
If I did not known the original resolution of my PC timer, and what ever it is.
Can I consider that: code “Sleep(5)”, could only be wait more than 5ms, but no case of less than 5ms? For the reason of original scheduler interval is more than 5ms and “ready but not running immediately” even make the time more longer?

Right.

processor was a 90 MHz Pentium. At that rate, 15ms gets you about 1.5 million cycles.
Sorry, how you calculate the cycles?
I re-calculate it:
it should be: 1,350,000 cycles.

1,350,000 is about 1.5 million, in the same way that I say “it’s 9:30”
when it’s actually 9:27.

How can I measure the Sleep delay correctness?
After I call “timeBeginPeriod(1)”
Then
/////
QueryPerformanceFrequency(&nFreq);
dqFreq=(double)nFreq.QuadPart;
printf(“Freq: %d\n”, nFreq.QuadPart);

QuadPart is a 64-bit value. You’re printing it with a 32-bit print
specifier. You need to write:
printf( “Freq: %I64d\n”, nFreq.QuadPart );

What is your hardware? What CPU, exactly?

the print result is:
Freq: 2435927
Time consuming 0.004306

Does this value is really express the Sleep(5) or not?

Yes, to the resolution of QueryPerformanceCounter.

I under Tim’s text as:
As my app call Sleep(5). then my app relinguish its time slice, and after 5ms, it return and get the CPU, but in this idle time, tick is going on
so the QueryPerformanceCounter function is get the absolute time, so this value is right?
Does my understand right?

That’s what QPC is supposed to be – a clock that keeps counting
continuously, no matter who is running.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

wesley · February 8, 2013, 10:27pm

thank you very much Tim and all other peoples.

I want to ask more questions, for the reason of my basic is not so good.

I have a ISO/bulk SS device.
Now I have modify some code of USBSamp both driver and app, and have my own coded FW.

FW is doing:
Bulk out data to device, device received the 16KB, generate the interrupt, in the ISR, it prepare 16KB for ISO in, response the Host ACK IN.

App is doing:
Write 16KB –> delay(time) –> Read(16KB).

Context:

Actually, dev FW prepare each 1KB data for ISO in is about 15us, so 16K take as much as 300us. (16*15 =240), much less than 2ms or 5ms.
If device have no iso buffer for ACK IN, then device will response 0 length DP for ISO ACK IN.

Issues:

At first, I set delay with 2ms, after running some time, for example, 30,000 times of Write&Read of USBSamp, device return a 0 length ISO DP.
Even I think 2ms is enough for device to prepare 16KB for ISO in, but it happened the first DP is 0 length packet, and the after 15 DP is 1KB in the 16KB read.
So I change the delay to 5ms, it also happen the same case.

As Tim said, not matter it is 2ms, or 5ms, host delay could only be equal or more longer than 2ms or 5ms.

SO in this case, I think maybe device prepare the buffer/data for ISO in take more than 2ms/5ms sometimes.

But why it take so long, as each 1KB only take 15us.

Second,

After check the struct:
typedef union _ LARGE_INTEGER
??{
?? struct
?? {
?? DWORD LowPart;
?? LONG HighPart;
?? };
?? LONGLONG QuadPart;
??} LARGE_INTEGER;

it is a union,

in what case, we use it as struct , and in what case, we use it as LONGLONG?
does LARGE_INTEGER have relationship with OS system, and hardware such CPU, and PCIe?
QuadPart is a 64 bit variable, it is print with %I64d,
but some code example type change it as:
double dqFreq=(double)nFreq.QuadPart;

Here is my print result:
Freq-QuadPart:
2435898(d), 10462102246391808(I64d), 0.000000(f)
dqFreq:
0(d), 1094882717(I64d), 0.000000(f)

why they have different in %I64d and %d, and why it is 0 in %f mode?

And:
///////////////////////////////////
#define tmdelay 5
QueryPerformanceFrequency(&nFreq);
dqFreq=(double)nFreq.QuadPart;
printf(“Freq-QuadPart: %d(d), %I64d(I64d), %f(f)\n”, nFreq.QuadPart,nFreq.QuadPart,nFreq.QuadPart);
printf(" dqFreq: %d(d), %I64d(I64d), %f(f)\n",dqFreq, dqFreq, dqFreq);

QueryPerformanceCounter(&time_start);
Sleep(tmdelay);
QueryPerformanceCounter(&time_over);
printf(“Time consuming: %f(f), %d(d), %I64d(I64d) , %d\n”,
(time_over.QuadPart-time_start.QuadPart)/dqFreq*1000,
(time_over.QuadPart-time_start.QuadPart)/dqFreq*1000,
(time_over.QuadPart-time_start.QuadPart)/dqFreq*1000,
tmdelay);
///////////////////////////////////////////////////////////////////
Time consuming: 4.271115(f), 507790068(d),2180941736368477599(I64d).

Same question, which value is I expected Time sonsuming?

What is the print rule for double, and QuadPart?

wesley · February 8, 2013, 10:29pm

Time consuming: 4.271115(f), 507790068(d),2180941736368477599(I64d), 1075006000
And why the tmdelay is 5, it print out is : 1075006000

OSR_Community_User · February 9, 2013, 8:06pm

> thank you very much Tim and all other peoples.

I want to ask more questions, for the reason of my basic is not so good.

I have a ISO/bulk SS device.
Now I have modify some code of USBSamp both driver and app, and have my
own coded FW.

FW is doing:
Bulk out data to device, device received the 16KB, generate the interrupt,
in the ISR, it prepare 16KB for ISO in, response the Host ACK IN.

App is doing:
Write 16KB –> delay(time) –> Read(16KB).

Context:

Actually, dev FW prepare each 1KB data for ISO in is about 15us, so 16K
take as much as 300us. (16*15 =240), much less than 2ms or 5ms.

If device have no iso buffer for ACK IN, then device will response 0
length DP for ISO ACK IN.
Then it is your responsibility to ensure there is always a buffer. The
delay indivated above will only make this situation worse, not better

Issues:

At first, I set delay with 2ms, after running some time, for example,
30,000 times of Write&Read of USBSamp, device return a 0 length ISO DP.

As pointed out, your delay was probably 15ms, and by delaying the read,
I’m amazed you don’t get failure more often.

Even I think 2ms is enough for device to prepare 16KB for ISO in, but
it happened the first DP is 0 length packet, and the after 15 DP is 1KB in
the 16KB read.
So I change the delay to 5ms, it also happen the same case.

The delay is the wrong approach here, so I’m not sure there is any way to
fix this at the app level.

As Tim said, not matter it is 2ms, or 5ms, host delay could only be equal
or more longer than 2ms or 5ms.

SO in this case, I think maybe device prepare the buffer/data for ISO in
take more than 2ms/5ms sometimes.

But why it take so long, as each 1KB only take 15us.

That’s entirely up to te firmware in the device.

Second,

After check the struct:
typedef union _ LARGE_INTEGER
??{
?? struct
?? {
?? DWORD LowPart;
?? LONG HighPart;
?? };
?? LONGLONG QuadPart;
??} LARGE_INTEGER;

it is a union,

in what case, we use it as struct , and in what case, we use it as
LONGLONG?
due to sone really fundamental design problems, 64-bit values are
all-too-frequently stored as two 32-bit integers. This has not actually
been necessary for about 20 years, but we are stll stuck with this
horrible interface. To convert two 32-bit numbers to a single 64-bit
number, store into the LowPart and HighPart, then read from the QuadPart.
To convert the other way, store to the Quadpart and read from tge High/Low
parts.

does LARGE_I
No, it is just a bridge to help you get from bad esign to workable code

(concert is starting, will finish later)
joe

NTEGER have relationship with OS system, and hardware such

CPU, and PCIe?
3) QuadPart is a 64 bit variable, it is print with %I64d,
but some code example type change it as:
double dqFreq=(double)nFreq.QuadPart;

Here is my print result:
Freq-QuadPart:
2435898(d), 10462102246391808(I64d), 0.000000(f)
dqFreq:
0(d), 1094882717(I64d), 0.000000(f)

why they have different in %I64d and %d, and why it is 0 in %f mode?

And:
///////////////////////////////////
#define tmdelay 5
QueryPerformanceFrequency(&nFreq);
dqFreq=(double)nFreq.QuadPart;
printf(“Freq-QuadPart: %d(d), %I64d(I64d), %f(f)\n”,
nFreq.QuadPart,nFreq.QuadPart,nFreq.QuadPart);
printf(" dqFreq: %d(d), %I64d(I64d), %f(f)\n",dqFreq, dqFreq,
dqFreq);

QueryPerformanceCounter(&time_start);
Sleep(tmdelay);
QueryPerformanceCounter(&time_over);
printf(“Time consuming: %f(f), %d(d), %I64d(I64d) , %d\n”,
(time_over.QuadPart-time_start.QuadPart)/dqFreq*1000,
(time_over.QuadPart-time_start.QuadPart)/dqFreq*1000,
(time_over.QuadPart-time_start.QuadPart)/dqFreq*1000,
tmdelay);
///////////////////////////////////////////////////////////////////
Time consuming: 4.271115(f), 507790068(d),2180941736368477599(I64d).

Same question, which value is I expected Time sonsuming?

What is the print rule for double, and QuadPart?

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

MBond · February 9, 2013, 8:31pm

Note that the thread affinitization recommendation for
QueryPerformanceCounter has nothing to do with differing clock speeds across
multiple CPUs but rather ‘BIOS / HAL bugs’; otherwise read ‘shame on your
AMD’

To the OP:

Note that while there are some legitimate uses for Sleep, your problem does
not appear to be one of them. I am not sufficiently clear as to what you
are actually doing to say anything for sure, but if you have a driver / code
that works when fed data at ‘slow’ speed and fails when fed data at ‘high’
speed, then most likely you have a bug. note that I differentiate between
failure because the machine simply cannot process data an arbitrarily high
rate and errors are returned to the application and crashes or memory
corruption resulting from feeding too fast. In the first case, the
application should track and limit the number of pending IOOPs (through some
internal method) to avoid overrunning all hardware resources; in the second
case, your have a code bug (most likely in synchronization code) and it
needs to be found & fixed

wrote in message news:xxxxx@ntdev…

can’t not LINK function “timeGetDevCaps” is fixed

Because I use WDK win7 x86 check build, it should include target libs into
the source file.

TIMECAPS value get by timeGetDevCaps, on my PC, Min is 1 and MAX is
1,000,000
Which value is high resolution after using timeBeginPeriod? Min or Max?

If I want to delay time with value between 1ms to 10ms of function
Sleep(), what value should I set in the timeBeginPeriod?

From Tim’s help text
If I did not known the original resolution of my PC timer, and what ever
it is.
Can I consider that: code “Sleep(5)”, could only be wait more than 5ms,
but no case of less than 5ms? For the reason of original scheduler
interval is more than 5ms and “ready but not running immediately” even
make the time more longer?

As far as I know, the time honored by Sleep can never be less than the
number of ms you request. But it is unbounded as far as maximum delay is
concerned.

Note that many, perhaps nearly all, kernel threads run at priorities >=
16. Unless you have the privileges to do otherwise, the threads in your
app all run at < 16. This means that if your driver is running in an
environment that needs a disproportionate amount of kernel thread
execution, you could be talking about delays two or three orders of
magnitude greater than your request. Also, look at the MultiMedia
Scheduling Service if you are running Vista+, which allows any app to get
thread priorities in a “normal” app to lie between 16 and 25 or 26, I
don’t have my course notes handy. This can work for you, allowing you to
get better-than-expected response time, or against you, because other apps
using MMSS (or some vaguely similar acronym) could interfere with your
app.

If your thread is marked “ready to run” it is placed at the tail of all
other threads of that priority, and will run after all higher-priority
threads have run and all the threads ahead of it have run (which
description naively ignored multicore scheduling effects).

processor was a 90 MHz Pentium. At that rate, 15ms gets you about 1.5
million cycles.
Sorry, how you calculate the cycles?
I re-calculate it:
it should be: 1,350,000 cycles.

Which is about 1.5 million, depending on what you mean by “about”. Also,
CPU cycles are not a good measure of expected performance because all
architectures beyond and including the Pentium II have at least caches,
instruction pipes, and dynamic register renaming. In later architectures,
the notions of superscalar architectures with opportunistic execution,
hyperthreading, speculative reads, and branch prediction got added until
the modern chips resemble supercomputer mainframes of the 1980s. So
“cycles”, as such, do not allow direct cross-comparison of performance
across chip generations. Even the hoary old MIPS designations are no
longer good indicators. Then add in multicore performance bottlenecks
such as cache coherency management and (over)use of the LOCK prefix, and
you find that clock speed, memory speed, etc. are at best indicators of
performance, but the complexity of execution patterns does not let you say
“machine X will be this much faster than machine Y”. Then you end up, in
some cases, where refactoring a computation to be cache-aware can buy you
factors of 20-50, and making anything that does paging becoe refactored to
be paging-aware, and you can see improvements of 3 to 5 orders of
magnitude for disproportionately small coding effort. In recent years,
add offloading to GPUs, and the discussions have to end up at “wall clock
time”.

So don’t be overly fixated on precision of cycle computations.

How can I measure the Sleep delay correctness?
After I call “timeBeginPeriod(1)”
Then
/////
QueryPerformanceFrequency(&nFreq);
dqFreq=(double)nFreq.QuadPart;
printf(“Freq: %d\n”, nFreq.QuadPart);

QueryPerformanceCounter(&time_start);
Sleep(5);
QueryPerformanceCounter(&time_over);
printf(“Time consuming:
%f\n”,(time_over.QuadPart-time_start.QuadPart)/dqFreq);
/////

the print result is:
Freq: 2435927
Time consuming 0.004306

Does this value is really express the Sleep(5) or not?

I under Tim’s text as:
As my app call Sleep(5). then my app relinguish its time slice, and after
5ms, it return and get the CPU, but in this idle time, tick is going on
so the QueryPerformanceCounter function is get the absolute time, so this
value is right?
Does my understand right?

Yes, but then note that on multicore machines, each has its own frequency
counter, and the core you are executing on when you ask to QPC to get the
end tie may not be the core you are executing on when you read the start
time, unless you SetThreadAffinity to only one core. And while this
renders the RDTSC time more correct, it also disallows opportunistic
thread execution on whatever core is available, which means that although
your timing may be more precise, the actual time-to-completion may be
artificially biased high because only one core is allowed to run the
thread.

I still think Sleep is the wrong approach here; I’d be looking at how to
modify the driver so that the offending IRP cannot be dispatched if there
is a conflict.
joe

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · February 9, 2013, 9:12pm

(continuing at intermission…)

> thank you very much Tim and all other peoples.
>
> I want to ask more questions, for the reason of my basic is not so good.
>
> 1. I have a ISO/bulk SS device.
> Now I have modify some code of USBSamp both driver and app, and have my
> own coded FW.
>
> FW is doing:
> Bulk out data to device, device received the 16KB, generate the
> interrupt,
> in the ISR, it prepare 16KB for ISO in, response the Host ACK IN.
>
> App is doing:
> Write 16KB –> delay(time) –> Read(16KB).
>
> Context:
> 1. Actually, dev FW prepare each 1KB data for ISO in is about 15us, so
> 16K
> take as much as 300us. (16*15 =240), much less than 2ms or 5ms.
>
> 2. If device have no iso buffer for ACK IN, then device will response 0
> length DP for ISO ACK IN.
Then it is your responsibility to ensure there is always a buffer. The
delay indivated above will only make this situation worse, not better

>
> Issues:
> 1. At first, I set delay with 2ms, after running some time, for example,
> 30,000 times of Write&Read of USBSamp, device return a 0 length ISO DP.
>
As pointed out, your delay was probably 15ms, and by delaying the read,
I’m amazed you don’t get failure more often.

> 2. Even I think 2ms is enough for device to prepare 16KB for ISO in, but
> it happened the first DP is 0 length packet, and the after 15 DP is 1KB
> in
> the 16KB read.
> So I change the delay to 5ms, it also happen the same case.

The delay is the wrong approach here, so I’m not sure there is any way to
fix this at the app level.

>
> As Tim said, not matter it is 2ms, or 5ms, host delay could only be
> equal
> or more longer than 2ms or 5ms.
>
> SO in this case, I think maybe device prepare the buffer/data for ISO in
> take more than 2ms/5ms sometimes.
>
> But why it take so long, as each 1KB only take 15us.
>
That’s entirely up to te firmware in the device.

> 2. Second,
>
> After check the struct:
> typedef union _ LARGE_INTEGER
> ??{
> ?? struct
> ?? {
> ?? DWORD LowPart;
> ?? LONG HighPart;
> ?? };
> ?? LONGLONG QuadPart;
> ??} LARGE_INTEGER;
>
> it is a union,
> 1) in what case, we use it as struct , and in what case, we use it as
> LONGLONG?
due to sone really fundamental design problems, 64-bit values are
all-too-frequently stored as two 32-bit integers. This has not actually
been necessary for about 20 years, but we are stll stuck with this
horrible interface. To convert two 32-bit numbers to a single 64-bit
number, store into the LowPart and HighPart, then read from the QuadPart.
To convert the other way, store to the Quadpart and read from tge High/Low
parts.

> 2) does LARGE_I
No, it is just a bridge to help you get from bad esign to workable code

(concert is starting, will finish later)
joe

NTEGER have relationship with OS system, and hardware such
> CPU, and PCIe?
> 3) QuadPart is a 64 bit variable, it is print with %I64d,
> but some code example type change it as:
> double dqFreq=(double)nFreq.QuadPart;

Yes, it can convert it to a double. This is a common transformation.

>
> Here is my print result:
> Freq-QuadPart:
> 2435898(d), 10462102246391808(I64d), 0.000000(f)
> dqFreq:
> 0(d), 1094882717(I64d), 0.000000(f)
You would not want to print a duoble usinh %I64d; it makes no sense
whatsoever. If you need to see thie bits, use %I64x. Also, take a look
at my Floating Point Explorer which you can find starting at
www.flounder.com/mvp_tips.htm.

>
> why they have different in %I64d and %d, and why it is 0 in %f mode?
because %d prints only the high-oder bits and advances the argument
pointrt by sizeof(int), so the %f picks up the wrong value and thus is
meaningless.

>
> And:
> ///////////////////////////////////
> #define tmdelay 5
> QueryPerformanceFrequency(&nFreq);
> dqFreq=(double)nFreq.QuadPart;
> printf(“Freq-QuadPart: %d(d), %I64d(I64d), %f(f)\n”,
> nFreq.QuadPart,nFreq.QuadPart,nFreq.QuadPart);
> printf(" dqFreq: %d(d), %I64d(I64d), %f(f)\n",dqFreq, dqFreq,
> dqFreq);
>
> QueryPerformanceCounter(&time_start);
> Sleep(tmdelay);
> QueryPerformanceCounter(&time_over);
> printf(“Time consuming: %f(f), %d(d), %I64d(I64d) , %d\n”,
> (time_over.QuadPart-time_start.QuadPart)/dqFreq*1000,
> (time_over.QuadPart-time_start.QuadPart)/dqFreq*1000,
> (time_over.QuadPart-time_start.QuadPart)/dqFreq*1000,
> tmdelay);
> ///////////////////////////////////////////////////////////////////
> Time consuming: 4.271115(f), 507790068(d),2180941736368477599(I64d).
>
> Same question, which value is I expected Time sonsuming?
>
> What is the print rule for double, and QuadPart?
>
>
The problem is passing types of one size and using format requests for
another size, so you printouts are irrelevant. The are just nonsense.
You print requests MUST be for the size of the argument you pass.
joe
>
> —
> NTDEV is sponsored by OSR
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Alex_Grig · February 10, 2013, 12:37am

Your mistake is that you chose to use isochronous transfer. Forget about it.

Use bulk transfers. Your data exchange will be much more reliable and you will save yourself many headaches.

wesley · February 10, 2013, 3:26am

I don’t agree with delay make the ISO transfer worse.
Neither get rid of ISO transfer.

I have to make clear about my design now:

aim to verify, my USB device ISO in transfer and its data is correct
out data using bulk transfer, make sure the out data, device received is correct compare to ISO out
Once device get 16K bulk data, in FW ISR, it start prepare 16K data for ISO in, in 16 DP.
If device did not prepare the 16K ready, when get host ACK in, device will reply with 0 length packet DP
If device finished 16K data, it reply host ACK IN with DPs
So delay between the Write and Read is
give device enough time to prepare 16K data, and delay the host send out ACK IN to device.

Alex_Grig · February 10, 2013, 4:14pm

With the bulk transfer from the device, you would not have to have a delay. The host could be getting data as the device is preparing it and making available for transfers. And also if the data is corrupted on the wire, it would be automatically resent. Not so with ISO. This is why ISO should be discouraged, except for very few applications.