Further input…
I have now tested on three systems:
(1) my fast Win7-x64 development laptop: 7.7 seconds
(2) my old Win7-x86 laptop: 5.2 seconds
(3) a rather slow Win7-x64 desktop box: 4.4 seconds
This is really driving me crazy. It looks like (2) and (3) make sense. I just have to deal with the outlier (1). It looks like I must have some kind of debugging/tracing slowing things down there.
What would that be likely to be?
– Bob Ammerman
RAm Systems
-----Original Message-----
From: /o=Copeland Data Systems/ou=CDS/cn=Recipients/cn=rammerman
Sent: Sunday, September 15, 2013 9:56 PM
To: ‘Windows System Software Devs Interest List’
Subject: RE: [ntdev] 32-bit driver much faster than 64-bit
Checked: Device is only device on root hub on both machines
I could try to investigate whether there is an issue with the USB stack which slows it down on the x64 machine, but I would expect that if there were performance issues with Win7-64 vs. Win7-32 we would have heard about it by now ![:slight_smile: :slight_smile:](/images/emoji/twitter/slight_smile.png?v=12)
What numbers do you want to see?
I’d still like to hear from a USB bus-level guru about the differences in the packet counts on the bus between the two systems. That looks pretty fishy to me.
– Bob Ammerman
RAm Systems
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@flounder.com
Sent: Sunday, September 15, 2013 9:17 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] 32-bit driver much faster than 64-bit
I am measuring the time it takes to program the target microcontroller
as reported by the application.
I dragged out my trusty beagle and saw some interesting numbers (both
for the same operation):
32-Bit:
IN 301077
OUT 312328
ACK 7407
NAK 605999
STALL 0
DATA 319747
64-bit:
IN 397509
OUT 397492
ACK 6388
NAK 788614
STALL 0
DATA 401073
I am not quite sure how to interpret these numbers, but it looks like
the 64-bitter is so fast that it is resulting in extra packets on the
USB bus?!?
OHO! It is a USB device! So the only way to reliably measure this is to make sure this device is the only one plugged into the root hub, or that both systems have identical devices plugged into the root hub which are behaving identically. You also need to make sure it is not an artifact of the implementations of the USB stack by measuring the performance of other USB devices; if you see comparable degradations, it is not your problem.
I haven’t counted, but I think this takes me up to (p) in my previous list.
You have still not produced numbers we can draw any conclusions from.
joe
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Bob Ammerman
Sent: Sunday, September 15, 2013 9:56 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] 32-bit driver much faster than 64-bit
Joe, thanks for your time. I have tried to answer your questions as best I can, although many of them just bring up more questions.
Quoting you: “So when I hear of an inverse situation, I’m more inclined to credit it to artifacts other than code.”
Yep! Just need to figure out what the artifacts are.
– Bob Ammerman
RAm Systems
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@flounder.com
Sent: Sunday, September 15, 2013 9:08 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] 32-bit driver much faster than 64-bit
I have written a UMDF driver for a USB device, using the ATL-based
template provided by VS 2012, with the guts largely patterned on the
OSR
FX2 driver. The board it supports is a simple endpoint 1 in-and-out
device (a microcontroller programmer). For some reason I haven’t been
able to determine, the 32-bit version of the driver runs significantly
faster (about 35%) than the 64-bit version. The 64-bit version is
running on a high performance modern laptop (i7-2860QM @2.5Ghz). The
32-bit version is running on a six year old, much slower, machine
(Core 2 Duo T8300 @2.4GHz).
Not enough information here to draw any conclusions
(a) what is your measurement method, and why do you trust it?
Clock time as reported by application program. It seems to match my perception of elapsed time ![:slight_smile: :slight_smile:](/images/emoji/twitter/slight_smile.png?v=12)
(b) is the device capable of generating 64-bit addresses on a 64-bit bus, or is it restricted to 32bits?
That is in layers way below where I am, somewhere in the guts of the Windows USB stack and controllers. I don’t really know the answer to this.
(c) How much memory is on each machine?
Fast 64-bit machine: 16GB, slow 32-bit machine: 4GB
(d) is the app a 32-bit app running on a 64-bit machine?
Yes
(e) are you certain Driver Verifier is off on both machines?
Yes
(f) [just for completeness, since this question was already asked] are both compiled as debug or both compiled as release?
Yes: both are debug. I was able to get the release driver going on the 64-bit machine. There is no noticible difference in times between debug and release. I am having an issue getting the release driver to install on the 32-bit machine. Windows keeps insisting that the driver is up-to-date when trying to update it using device manager. I suppose I’ll have to get down and do it manually.
(g) are there any passive-level threads in your completion path?
UMDF driver: the whole thing runs as passive-level, doesn’t it?
(h) does the device have exclusive use of its interrupt level on both machies, or does it share interrupts on one of them?
It looks like the USB controller is sharing its interrupt with other USB controllers on both machines. In any case, there isn’t much else going on in either case. I did test with the beagle unplugged.
(i) does the 64-bit system have any device that runs at a higher DIRQL than your device and therefore may be stealing cycles
I don’t know, and I don’t know how to find out. The machine isn’t doing much, though.
(j) does a lower-DIRQL device, by queueing DPCs, force the 64-bit system into a priority-inversion condition?
I don’t think so, but I can’t be sure. I suppose I could look at DPC counts in perfmon.
(k) is there a lot of locking, and is lock contention higher on the faster processor?
Not sure why that would happen. The machine is more or less idle.
(l) is it something else I haven’t thought of, but might yet, that represents a system difference between the two platforms? [note: when I first wrote this point, it was (h); while writing the following text, and fixing typos, I thought of a few more]
When you see major differences like this, you need to make sure you have first eliminated all knowable artifacts that may impact the validity of your measurements. You might not be measuring the performance of two drivers as the performance of two systems, and there may be artifacts that impact the behavior at the system level. My first reaction to numbers like this is “what is the measurement tool, what is it measuring, and why do you trust it?” and my next reaction is that it is not a 32/64 bit issue, but a much more global hardware and/or software issue that is causing the differences.
btw, when I have run some compute-intensive single-core apps on my 32-bit and 64-bit systems, which happen to have the same clock speed, then for debug mode, t(64) ~= 0.75*t(32), but I’ve found a few that, for full optimizations on, t(64) ~= 0.1*t(32), so what is being measured in the second case is not the quality of the execution engine or the memory system, but the performance of the compiler (it’s only about 0.5*t(32) without /LTCG, for example). I believe a lot of the 0.75 factor comes from larger caches and TLB.
So when I hear of an inverse situation, I’m more inclined to credit it to artifacts other than code.
joe
Any ideas what might be going on here, or how I can try to track the
issue?
– Bob Ammerman
RAm Systems
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer