Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results
The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.
Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/
This document is the best set of guidelines i've seen for supporting PCIe over Thunderbolt. It covered all the things that I learned while updating a PCIe driver and PCIe hardware for use on Thunderbolt.
Thunderbolt Device Driver Programming Guide
Yes it is from Apple, so you can ignore all the macOS specific stuff but most of the issues are the same on any OS.
Tolerating PCI Latency: This is the area where we also made hardware changes to the PCIe device along with driver changes.
Using Hot Plug Operation with PCI Devices: Check for device gone everywhere.
Hopefully someone finds this helpful.
|Upcoming OSR Seminars|
|OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!|
|Internals & Software Drivers||19-23 June 2023||Live, Online|
|Writing WDF Drivers||10-14 July 2023||Live, Online|
|Kernel Debugging||16-20 October 2023||Live, Online|
|Developing Minifilters||13-17 November 2023||Live, Online|
Do you have any experience with the Kernel DMA Protection introduced in Windows 10 1803? I think we are encountering a failure due to this with a legacy WDM PCI driver. We are getting a DRIVER_VERIFIER_DMA_VIOLATION bugcheck with Arg1 0x26 (IOMMU detected DMA violation) when running the hardware over Thunderbolt. My suspicion is that this driver does not use the Windows DMA APIs and thus does correctly deal with DMA remapping. I found it interesting that it reports as a DRIVER_VERIFIER violation even though Driver Verifier is not running on this system. I have this vague nagging feeling that I've encountered one other bugcheck in the past that had this behavior (i.e., saying it was a DV bugcheck when DV was not running), but it still caught me by surprise.
That would, of course, cause the problem.
Are you SURE verifier isn’t running at the time of the crash? You know Windows WILL automatically enable it following certain crashes.
Yeah, I need to dig through the (maze of twisty passages) code and make sure of this, but from what I know about it it's not a bad hunch. (I just noticed that I put "does correctly" instead of "does not correctly" in the OP. )
I haven't had my hands on the system yet, but it is with a colleague that I trust and he said that DV was not enabled. I didn't realize that it would automatically be enabled following certain crashes. Are those documented somewhere, or is it just tribal knowledge?
I dunno. I’m just telling you what I’ve experienced. You can check from the dump using !verifier.
!verifier 0x1 doesn't show any drivers being verified, and the only flag set is (0x00000000) Automatic Checks. Notably, (0x00000080) DMA checking is not enabled.
That is super interesting.
Google says there are other folks seeing issues like the one you're reporting. One guy, like you, is very clear that Verifier is not running. In some cases, people have solved their problem by flashing the BIOS with the latest. Others have resulted from errors in the Dell Thunderbolt dock driver.
So, in addition to your observation (which is definitive) there's additional evidence that the IOMMU checks are being done even when Verifier is not enabled. I guess this makes sense... they can figure out of the IOMMU isn't being used (properly) without having to go to the extreme of forcing data to be double-buffered (which is what the DMA Verification option does. Assuming this is the case, I can see how they might just use the Driver Verifier bugcheck code to indicate that the error results from checking on the activity of an errant driver (there aren't an unlimited number of bugcheck codes, afterall). But what they've done is make things confusing to us devs... as you've pointed out.
They can fix this in the documentation... and it'd be nice if they told us SOMEthing about what this check is and what a violation means.
Last year we were debugging a crash with DRIVER_VERIFIER_DMA_VIOLATION, and I got verification from Microsoft folks that this is indeed poorly named - the check is done irrespective of Verifier settings.
Thanks for the confirmation. Did they say that was true of all DRIVER_VERIFIER_DMA_VIOLATION subtypes, or just specific failure modes?
Sorry, I don't know about all subtypes. I was focused on my particular failure.
Thank you @Diane! That’s very helpful.
I can tell you for sure that “ordinary” DMA verification is absolutely not enabled without Verifier. The overhead would be untenable.
I did a little assembly searching for fun on whatever Win10 version of nt/hal I'm running. So, this shouldn't be taken as definitive, but it's something.
From what I see the HAL appears to generated two different DRIVER_VERIFIER_DMA_VIOLATION bugchecks, regardless of whether or not Verifier is enabled:
The kernel is a bit more confusing...Most of the bugchecks come from DMA Verifier being enabled. However, if you're running on a system where DMA is not cache coherent (as controlled by the nt!KiSystemFullyCoherent global) you might see a DRIVER_VERIFIER_DMA_VIOLATION from KeFlushIoBuffers with a Parameter1 == 4 ("Driver has freed too many simultaneous adapter channels") or 5 ("Freed too many map registers") even without Verifier enabled
Soooo, yeah, they should have picked a different crash code. Probably seemed like a good idea at the time