Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results
The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.
Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/
|Upcoming OSR Seminars|
|OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!|
|Internals & Software Drivers||19-23 June 2023||Live, Online|
|Writing WDF Drivers||10-14 July 2023||Live, Online|
|Kernel Debugging||16-20 October 2023||Live, Online|
|Developing Minifilters||13-17 November 2023||Live, Online|
/* I cannot understand what the Intel folks were thinking about helping external developers.
Core and even Atom class processors and their bridges are packed with features, and developers are struggling to reach to the gpios.
Meanwhile, almost every tiny, dirt cheap Cortex-M chip has sort of simple logic analyzer (ITM), accessible over cheap 3-pin interfaces.
(well, thru yet another dirt cheap USB dongle).
The comments on this board have been incredibly helpful to me.
Telling me that 9.5us was very wrong led me to carefully time each part of the code to find the problem. The sample I was using contained macros that translate to READ/WRITE_REGISTER_ULONG, and each of those instructions was taking ~2us. I don't believe I need a memory barrier in my design, changed to using normal access, and that removed nearly all delay.
Telling me I was misunderstanding or misinterpreting what I saw got me to start changing individual settings to verify I was affecting anything: if it is difficult to get something to work, it should be easy to break parts and at least tell that I'm doing what I think I'm doing.
One of the things I did was to intentionally fail to reset the interrupt in the ISR while leaving the rest of the logic in place such that after I return from the interrupt the system will immediately interrupt again. I expected the system to crash, but instead I found my diagnostic messages that print every 5,000 interrupts were printing approximately every 0.045 seconds based on the timestamps in the debug output. I calculate that to mean my driver running on a low-end i3 can service over 100,000 interrupts per second or approximately 10 times the rate I require - while in debug mode and sending diagnostics.
I have sequence counters that verify to me the interrupt really can be serviced over 100,000 / seconds and wanted to share that; it was incredibly good news to me.
In addition, I found some of my settings are having no effect at all, and I now know where to concentrate.
@Pavel_A : I have been logging timestamps from KeQueryPerformanceCounter to an array in the Extension, examining them every 5,000 interrupts, and sending analysis to DbgView with DbgPrintf. Resource Monitor was indicating all cores were idle previously. When I use my intentionally broken driver that continuously interrupts I see 16-20% of the CPU being used by System Interrupts, and overall CPU usage of 6-7%. GPU usage goes from 0 to 30% when I engage the continuous interrupt driver, presumably from all the updating of DbgView.
@Peter_Viscarola_(OSR) : 1) I believe I am the only device on this PCI interrupt based on the following: I was able to set the interrupt as exclusive in the call to IoCreateDevice without any repercussions, and there are only two boards in the system (the CPU and the 9054 board). The interrupt assigned is 16-19 depending on which slot I place the 9054 board as expected and the slot does not impact performance.
2) The interrupts should be every 100us or 10,000 / second. They are currently arriving at 1.6ms intervals. I believe what I'm seeing is some sort of timeout rather than normal operation. The packet from the board should be in response to my sending a DMA packet to the board, and that is the part of my driver that is making no difference; I don't think I'm actually sending a packet.
What I suspect I have done is relied on a shortcut for scheduling the DMA to the board (that works in the DOS driver), and I need to be more meticulous in my setup.
Thank you once again for all the assistance. It has made a world of difference.
How are you measuring that? READ_REGISTER_ULONG becomes nothing more than a single CPU read instruction that becomes a bus cycle. Even on a Gen1 PCIe bus, that should be single-digit nanoseconds, unless your hardware is stalling.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
@Tim_Roberts : slightly modified to use what VS2019 says the macros translate to and to make it a code snippet. The elapsed time was 20 or 21 for that line consistently.
@Tim_Roberts : modified slightly to make it a code snippet and translate the macros per VS2019 hint
I have succeeded. It was a setup issue. Thank you all once again.
Yay! I’m glad you got stuff to work!
But…C’mon dude. A lot of folks here spent a lot of time trying to help you. Maybe you can tell us a bit more than “It was a setup issue”??
I did want to comment, however, on your timing code. When
KeQueryPerformanceCounterreturns 20 or 21, that does NOT mean 2 microseconds. It does not return 100ns units. You have to divide that number by the counter frequency, which returned in the pointer you pass to KQPC, which you are passing as NULL. That number varies a fair amount based on operating system and sometimes on hardware.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
@Peter_Viscarola_(OSR) I had missed part of the initialization sequence. A command packet instructs the board (there is no manual) to change the interrupt frequency. The driver was working as expected, but the 9054 board was not properly configured. In the DOS driver it was intentionally commanded to a maximum delay to allow the initialization to complete then later changed to the desired frequency. I was seeing that maximum delay, which was why it was so consistent and not at all random.
@Tim_Roberts I did measure the granularity and got 1.0E7.
' DbgPrint("Perf Counter Ticks Per Second = %d:%d", tps.HighPart, tps.LowPart); '
returned '= 0:10000000'
- I wasn't very clear about how I got the 10Mhz / 100ns numbers
Thanks for that little bit of background. Much appreciated.
IIRC, QueryPerformanceFrequency will return a constant value from OS boot to shutdown, but the value could change after a reboot. This most often happens after a hardware change, but has been known to happen after certain windows updates too. It is a cheap function to call and should generally be called every time that a performance counter delta value should be converted into a time