Perhaps the win7 machine’s power policy settings are set to throttle the CPU more aggressively compared to xp
d
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Christiaan Ghijselinck
Sent: Tuesday, November 17, 2015 3:28 PM
To: Windows System Software Devs Interest List Subject: Re: RE:[ntdev] Slow PCI read on Win7 vs XP
>>>Any chance you can run the two systems on exactly the same hardware? > Can be done if I get two identical PCs. I will try.
My own golden rule is :
"if you compare software performance on different OS versions , do it on the same machine ( same CPU , same RAM memory type and same amount )
In your case , you need to be able to multiboot intoWin 7 and Xp on the same machine.
Christiaan
----- Original Message ----- From: <boris.shikhalev> To: “Windows System Software Devs Interest List” Sent: Tuesday, November 17, 2015 11:46 PM Subject: RE:[ntdev] Slow PCI read on Win7 vs XP
>>>Any chance you can run the two systems on exactly the same hardware? > Can be done if I get two identical PCs. I will try. > >>>someVar = READ_REGISTER_ULONG(MyRegister); vs someVar = *MyRegister; > Already tried yesterday that and the timing was identical. > > Thanks.</boris.shikhalev>
OK. So, we know it’s not the memory barrier. We really knew that already (the slowdown you’re reporting is far too dramatic for me to think it could be accounted for by a memory barrier, but you never know with these things).
Mr. Holan’s idea of CPU throttling is creative… You’d hope the system was “smart enough” to not significantly slow a CPU-intensive code loop, but as I said before “you never know with these things.”
So, where does that leave us?
Assuming the KMDF code that you’ve written is effectively the same as the DriverWorks code you have (and that’s a big assumption that we’re trusting you about): We’re confirming for you that there’s nothing intrinsically different between XP and Win7 that should, on its own, account for the slow-down. We’re also confirming that what you’re seeing is not any sort of known KMDF issue, and that what you’re seeing isn’t inherently a KMDF vs DriverWorks thing.
Again, assuming the device setup and use is *identical*, that really only leaves system interactions, which could either be hardware or software. You will only be able to eliminate this as the cause when you can run both drivers on the exact same system.
In fact, you should run both drivers on the exact same hardware AND THE SAME VERSION OF WINDOWS. Given that you’re probably re-writing the driver for a reason, I’d suggest you run both drivers on XP. There’s no reason a driver written in KMDF for Win7 can’t run on Windows XP (I just finished a major project where I did this with two drivers, so I’m speaking from experience here).
2MB/s occupying 70% of a 3GHz core. What is the core doing?
It wouldn’t hurt to check PCIe link on the new system too just to make sure things look sane (expected number of lanes and speed). There may be link issues on the newer system with the older card. It’s a stretch to think that there are enough errors to drop the throughput down to the roughly 2MB/s that OP is reporting, but it’ll only take a couple minutes to check it with SIV.
Without knowing more about the nature of the device and it’s interface I’m afraid the best we can do is theorize on potential issues.
When you state that CPU utilization is 70% can you explain this a little further? If you open Windows Task Manager and go to the Performance tab, then click the ‘View’ menu item and make sure ‘Show Kernel Times’ is selected. Is your CPU usage predominantly red? I suspect that you aren’t clearing interrupts properly and you are spending a large portion of the CPUs time handling interrupts.
Ordinarily good advice, but the PLX 9030 is a (10+ year old) traditional PCI device, not a PCIe device. While it’s possible there’s some other sort of translation going on (PCI to Express), I sort of doubt it.
<boris.shikhalev> wrote in message news:xxxxx@ntdev… >>>Any chance you can run the two systems on exactly the same hardware? > Can be done if I get two identical PCs. I will try. > >>>someVar = READ_REGISTER_ULONG(MyRegister); vs someVar = *MyRegister; > Already tried yesterday that and the timing was identical. > > Thanks. ></boris.shikhalev>
>In your case , you need to be able to multiboot intoWin 7 and Xp on the same
>machine
I have made a triple boot PC and did some tests. The Win7 still performs worse (10-15% slower)than the XP but not as dramatic as compare to older XP machines (40% slower).
May be our PLX9030 does not play well with newer motherboards.
>Perhaps the win7 machine’s power policy settings are set to throttle the CPU
>more aggressively compared to xp
Played with CPU throttling. Set Processor performance core parking min cores, Minimum/Maximum process state to 100%. No effect.
>I’d suggest you run both drivers on XP. There’s no reason a driver
>written in KMDF for Win7 can’t run on Windows XP
That would be a great test but my driver for some reason cannot run on XP. I can install it but get “Windows cannot load the device driver for this hardware. The driver may be corrupted or missing. (Code 39)”.
>2MB/s occupying 70% of a 3GHz core. What is the core doing?
The core is busy in Deferred Procedure Calls and Interrupt Service Routines.
>Is your CPU usage predominantly red? I suspect that you aren’t clearing interrupts
>properly and you are spending a large portion of the CPUs time handling
>interrupts.
Yes, because it spends all the time at Deferred Procedure Calls and Interrupt Service Routines.
I clear up the interrupts.
>Is it really READ_REGISTER_ULONG who is to blame?
It is not. I have tried to copy data directly with the same result (unless direct copy implicitly got converted to something similar to READ_REGISTER_ULONG)
>Try to surround this loop with perf counter queries. What will they show?
I haven’t tried the perf counters but ETW trace timestamps showed ~30ms per loop (64KB)
>Look at PCI config space (!pci in WinDbg)
I did not use the WinDbg but compared PCI registers using PLX Monitor. They are the same for Win7 and XP
First, “good job” Mr. Shikhalev in answering all the accumulated pending questions… AND in taking the time to actually do the tests that we recommended. And do the necessary analysis.
It’s unfortunate that your WDF driver won’t load on XP, because that would give us the ultimate comparison, right? Note that you DO need to use KMDF V1.9 (which is I think the latest version supported by XP)… I don’t know if you tried that. If so, then your driver is calling a kernel function that’s not available on XP. Oh well.
Without some very careful hands-on analysis, I’m afraid that I don’t think *I* can provide you any more ideas/insights. Maybe some of the other clever guys here on the forum have additional ideas for you.
At least the same system performance difference isn’t as dramatic as it was between the two other systems.
Sorry I couldn’t help you find a definitive answer to this interesting issue,
Can you answer the following questions? It may help me understand your device and design.
Is this a legacy PCI interrupt?
It sounds like you are reading multiple times from a single register to drain a FIFO. Your storing the results in a ring buffer resident in your driver. User space applications are then responsible for reading from that buffer before it fills. Is this correct?
What is the condition in hardware that initiate the interrupt? When “any” data is available in the FIFO? Perhaps when the FIFO is 75% full?
When your ISR is called, are there additional interrupt conditions that need to be handled? Or is it only when data is available in the FIFO?
What does your hardware require to clear the interrupt condition?
When do you clear the interrupt condition? As soon as your ISR is called? As soon as you’ve identified the cause and scheduled your DPC? After the DPC has completed it’s work?
>>a bus sniffer is all you need here.
Unfortunately we do not have the tool not likely to acquire it anytime soon
>1) Is this a legacy PCI interrupt?
Yes
>2) It sounds like you are reading multiple times from a single register to drain
>a FIFO. Your storing the results in a ring buffer resident in your driver.
>User space applications are then responsible for reading from that buffer before
>it fills. Is this correct?
Yes
>3) What is the condition in hardware that initiate the interrupt? When “any”
>data is available in the FIFO? Perhaps when the FIFO is 75% full?
Configurable “almost full” watermark. Default 75%.
>4) When your ISR is called, are there additional interrupt conditions that need
>to be handled? Or is it only when data is available in the FIFO?
Only data in the FIFO.
>5) What does your hardware require to clear the interrupt condition?
write 1 to latched status register
>6) When do you clear the interrupt condition? As soon as your ISR is called?
>As soon as you’ve identified the cause and scheduled your DPC? After the DPC
>has completed it’s work?
As soon as your ISR is called
My best bet would be on the hardware firing another interrupt after every word was read. Since the DPC is already running an the interrupt processor, the ISR will be called between the words are read.
To test this suspicion, raise IRQL to DIRQL before reading the buffer; don’t forget to restore it back after that.
To make sure the interrupt doesn’t happen while in DPC, disable the device interrupt by the device-specific means in the ISR and reenable it in the end of DPC. Simple acknowledging the interrupt may not be enough.
There is slight difference between true PCI INTA behavior and PCIe INTA legacy emulation (which is also in effect if a device is behind PCI->PCIe bridge). INTA deassertion message may arrive to the root way after the driver made the device deassert the INTA in the device. This allows for spurious extra interrupts after you leave the ISR and go back to lower IRQL.
>>My best bet would be on the hardware firing another interrupt after every word
>was read. Since the DPC is already running an the interrupt processor, the ISR
>will be called between the words are read.
I am not sure that it happens in my case. I have traces in both ISR and DPC and see them coming in expected order (ISR-DPC-ISR-DPC-…) in strict time intervals so the spurious interrupts should be noticeable and I did not see them.
>It’s unfortunate that your WDF driver won’t load on XP, because that would give
>us the ultimate comparison, right? Note that you DO need to use KMDF V1.9
>(which is I think the latest version supported by XP)… I don’t know if you
>tried that. If so, then your driver is calling a kernel function that’s not
>available on XP.
I am eager to test it. I have checked the KMDF version I use and it is 1.9 so it should be fine. I suspect some problems with the driver’s registration because the legacy driver was installed there before the new one and I saw some leftovers of its registration in the registry. I will look closer on Monday.
>Note that you DO need to use KMDF V1.9
>(which is I think the latest version supported by XP)… I don’t know if you
>tried that. If so, then your driver is calling a kernel function that’s not
>available on XP.
I am eager to test it. I have checked the KMDF version and it is 1.9 so I should be fine there. I suspect some problems with the driver’s registration because I saw some leftovers from the legacy driver in the registry. I will take a closer look on Monday.
Note that the error you’re getting can also be caused by using a function that’s not available on XP. Getting a Win7 WDF driver to run on XP is possible, but it’s not necessarily trivial.
Just wanted you to know this before yo spent a ton of time working installation issues.
>>Note that the error you’re getting can also be caused by using a function that’s
>not available on XP.
I thought that WDK 7.1.0 with KMDF 1.9 does not have anything that is not supported on XP so if I build with _NT_Target_Version=$(_NT_TARGET_VERSION_WINXP) then I should be fine.
[quote] There is slight difference between true PCI INTA behavior and PCIe INTA legacy
emulation (which is also in effect if a device is behind PCI->PCIe bridge). INTA
deassertion message may arrive to the root way after the driver made the device
deassert the INTA in the device. This allows for spurious extra interrupts after
you leave the ISR and go back to lower IRQL. [/quote]
I have personally experienced this and was why I asked. I can’t explain the observed difference in behavior between your two drivers under two different environments. However, in this case since the only interrupt condition you are concerned with handling is to service the FIFO, I would consider clearing the interrupt ‘after’ you have drained the scheduled your DPC to drain the FIFO. I suspect it would even be sufficient in this case to do all the work in the ISR rather than do the typical clear the condition as quickly as possible, then offload the work to a DPC. However, you’d have a better feel for typical device behavior for your situation.
Yes… I agree… that’s what *I* usually prefer to do as well.
But there can be dangers and complexities with this approach, right? Where do you put the data (answer: directly into the user’s data buffer if possible), how much data can you read out (if data KEEPS showing up, you don’t want to spend TOO long at DIRQL), and what you do with data that can arrive after you’ve queued the DPC to complete the Request and before that DPC has run.
So… it depends not only on device behavior but also device usage patterns and the design of the upper edge.
And of course, this doesn’t really explain why the OP is seeing the behavior that he’s seeing. Gad, wouldn’t it SUCK if he moved his FIFO processing to the ISR and the speed was STILL bad!?