Recommendations for Automated Driver Build & Test

I’m new to Windows driver development, so be gentle. :smile:

I would like to build and test a Windows PCIe (so KMDF) driver in an automated CI pipeline.

I would like to avoid (graphical) Visual Studio, as my team and I are primarily Linux developers and command-line tools seem to work best in CI automation. I’ve started with CMake + FindWDK + EWDK. Currently I can build a driver + inf on my local machine with that scheme, but it’s untested, so not sure if what I have is usable.

I checked in on the “Containerized WDK Build Environment” discussion and it seems like folks have varying degrees of success, but lean towards VMs instead of (Docker) containers for isolated builds. That would play well with a testing idea I have (below), too.

Ideally I’d like to test as much as I can in the CI pipeline, so tests may range from custom-written “can I write to register X on device Y and observe change Z” tests, to Microsoft-written tests like the DevFund tests in the HLK.
On Linux, I can setup a VM with PCIe passthrough and run driver tests inside the VM. This allows me to run different OSs/kernels but keep the physical test hardware identical. However, all MSDN articles I’ve found so far explicitly/implicitly say that a discrete non-virtualized system should be used for testing.

  • If I attempt to setup VMs that run one-at-a-time, instead of building discrete systems for testing, am I going to run into trouble down the line with HLK?
  • Should I stick with static tools (Code Analysis for Drivers, Static Driver Verifier) in the CI, and assume HLK will be run manually before release?
  • Any critiques/gotchas/recommendations?

Thanks!

However, all MSDN articles I’ve found so far explicitly/implicitly say that a discrete non-virtualized system should be used for testing.

Yeah… that’s traditional, and probably also because Hyper-V didn’t (until recently) have the capability to pass through arbitrary PCI functions. I’ve been TOLD that is has this ability now (in its latest version) but I have not used it myself.

I’m going to follow this thread with interest, in the hopes that some of our colleagues will share useful information about their CI processes. As a rule, our community asks a lot of questions, but – sadly – doesn’t do real well when it comes to sharing.

Peter

1 Like

For CI pipelines you are not restricted to testing using the same system(s) you used for building. Most CI systems can separate these functions and deliver test artifacts to test systems, wait for test results and bless or fail your build based on those results. So it is just a matter of scripting your test systems. I’ve done this ad hoc using powershell scripts to both drive and run tests, and I’ve helped out building a WHQL test automation suite, although we did not deploy it in the pipeline because the WHQL tests are in general too unreliable and too long winded to used in a pipeline.

CI pipelines seem to me to be vastly overrated. As are the automated tests that go with them.

Maybe that’s just me, but effective automated tests for non-deterministic interfaces seem harder by far to create than the correct code is to write

@MBond2 said:
CI pipelines seem to me to be vastly overrated. As are the automated tests that go with them.

Maybe that’s just me, but effective automated tests for non-deterministic interfaces seem harder by far to create than the correct code is to write

If all you are doing is developing and maintaining a single driver and perhaps an associated app, then sure a CI pipeline is sort of overkill. If on the other hand a set of drivers are part of a larger collection of apps and services, all of which are developed and released on a regular basis, then your driver components and developers very likely have to live and play in some sort of structured CI build system. It is the way software is done these days.

I really don’t understand your automation complaint. Sure, writing tests is difficult and can frequently consume more time that writing the code being tested. But the point is to have a set of tests that can provide some level of confidence that new code does not contain regressions. That seems worth the effort, again particularly if your drivers are part of larger products, but even if they aren’t why not automate what can be automated?

As @Mark_Roddy mentioned, automated tests for regression and compliance is standard and required for commercial products; you simply have to know that something you added yesterday didn’t break some bug fix put in six months ago (and “trust me” doesn’t do that)

CI is also a part of that, to not only prevent “man in the cave” syndrome but to ensure everything in the code repo builds properly. The CI pipeline can be as complex as Jenkins, as simply as a task with an attached .cmd file but the idea is that every night (or every check-in) the tree is built and the regression tests are run …

Absolutely it’s a hassle, but doing that not only lets you sleep better at nights but also satisfies a whole host of regulatory requirements …

1 Like

CI, or continuous integration, does not seem to help. The whole point of that method is to integrate the efforts of several or many individuals

@MBond2 said:
CI, or continuous integration, does not seem to help. The whole point of that method is to integrate the efforts of several or many individuals

It isn’t the ‘whole point’, although certainly when you have more than one individual working on a software project, it is a good way to reduce regressions and coordinate integration. However even a single developer can benefit from using a lightweight CI system, for example github runners, to automate build and test for new commits. Those tests could be simply running code analysis and static driver verifier before merging into your main branch. I really don’t understand why one wouldn’t want to have some minimal validation of your code automatically run for you.

I really don’t understand why one wouldn’t want to have some minimal validation of your code automatically run for you.

Agree.

Even for “simple” projects, it’s very helpful. At the very least it gives you an automated buddy build. Which is nothing but a positive thing.

Peter

I agree too - I would love to have effective automated tests that cover even some of the code paths. If anyone knows how to do that well for non-deterministic code, I will be the first in line to sign up

The way that I know to do this, is to write code that artificially causes the possible different timing effects via spin loops, thread suspension etc. Test programs that do this are difficult to create and much harder to maintain as they have to rely not only on the contract interface but on things that they ‘know’ about how things work internally. The problem is much worse if you are not testing something with a directly accessible interface

I’m sure I’m missing something, but none of this has ever been useful to me

Thanks all, for your replies so far. I appreciate it. I’m going to hijack this a little bit and bring it back to my questions, lol.

Talking about whether or not CI/CD practices are useful is out of scope of this discussion. I mentioned it as context, not the topic of, this discussion.

I do need automated tests. And my goal at the end of the day is to prove that my driver + interface library code does what it claims to do, against a battery of hardware cards, every time I merge into “main”. And to run tools like “Static Driver Verifier” on every push, to shorten the feedback loop between when someone on my team pushes bad code and when we get warned about it. And, side note, I’m doing this for both Windows & Linux, so yes, I need to write something that says “go forth and test thy self” and get green check marks back for a set of OSes and hardware. CI runners are a modern way of implementing something like this.

I just want to get folk’s thoughts on how they have setup their automated testing. It sure feels like virtual machines (VMs) are the way to go, but being new to Windows driver development, I want to ask about tips & pitfalls before I spend money and time going down a bad path.


@Mark_Roddy wrote:

For CI pipelines you are not restricted to testing using the same system(s) you used for building. Most CI systems can separate these functions and deliver test artifacts to test systems, wait for test results and bless or fail your build based on those results.

That’s true. Wouldn’t that be twice the VM effort though? If this were to be setup with VMs, then I’d need “builder” VMs with build tools and then “test” VMs with some testing tools? If I’m targeting Win 7/8.1/10/11 or something, then that’s 8 VMs to setup and maintain.

I guess I was assuming that if we both build & test in the same VM(s), then it’d be less effort to stand up initially and easier to maintain later.

Unless you’re thinking I can get away with building once, say, for Windows 7, and then test all that on OS-specific VMs?

WHQL tests are in general too unreliable and too long winded to used in a pipeline.

That’s interesting. Why do you say they are unreliable? False negatives/positives? I’ve had zero experience with it, just been reading MSDN.

@MBond2 said:
I would love to have effective automated tests that cover even some of the code paths. If anyone knows how to do that well for non-deterministic code, I will be the first in line to sign up

The way that I know to do this, is to write code that artificially causes the possible different timing effects via spin loops, thread suspension etc. [ … ]

Interesting idea, but I’m not sure I follow what you are trying to test in this hypothetical. But I’d like to know more. Is this a “send a command to DUT and wait for the reply/data message back” sort of thing?

Threads? I’m new to this, but so far I’ve only seen single-threaded code in drivers. Not sure what driver you have in mind where these features are at play.

In my reading it seemed like Microsoft had written a bunch of somewhat generic tests to do this kind of thing, in the HLK. For example, I note there is a DoConcurrentIO flag in the DevFund test params. I wonder if that would help with testing the driver with overlapping interrupts and/or multiprocessor stuff.

Let me assure you that on Windows, you have never seen a single threaded driver. Even the simplest have to deal with processors and threads to some degree

No, I am not talking about a pattern like ‘send a command and wait’. What I’m talking about are testing the different possible orders in which operations could be conducted when two or more ‘threads’ access the same resource. In a simple case, you have some resource X which has some state and two threads A & B that will operate on it. It could be that the state of X is first affected by A and then later by B (an appropriate synchronization protocol of course being observed by both) or it could be that the state of X is first affected by B and then later by A. How do we test both situations?

Manually in a debugger I test both situations by suspending B when I want A to go first and then resuming it. And when I want B to go first, I do the opposite. That’s a comprehensive test, but it is a manual one.

Now let’s say that I want to automate that. If I know the exact way in which A & B can be triggered and I know of some way to retard the progress of each, then I can write a program to do exactly the same thing that I would have done manually with a debugger as part of an automated test. Think of that program as an ‘automatic debugger’. But the problem with that program is that it depends on a very specific knowledge of the code that it is testing, and if that code changes, even in a very trivial way, then it could be useless. The method used by the test program might continue to work after each code change, but it has to be evaluated which defeats a lot of the point of automated testing. It is also a program that is harder to write than the code which it is designed to test

Any real world driver or even UM program will not have one resource X and two threads A &B, but potentially thousands or millions of resources and certainly dozens of threads (processors)

@MBond2 said:
Let me assure you that on Windows, you have never seen a single threaded driver. Even the simplest have to deal with processors and threads to some degree

You’re right. ?‍♂️ Of course the driver needs to protect its shared memory from outside use. For some reason I thought you were suggesting driver code was spinning up child worker threads…

In a simple case, you have some resource X which has some state and two threads A & B that will operate on it. It could be that the state of X is first affected by A and then later by B (an appropriate synchronization protocol of course being observed by both) or it could be that the state of X is first affected by B and then later by A. How do we test both situations?

Gotcha. Hmm. I have not written a test for this yet, but I’ve heard suggestions from others that amount to having access to resource X in unit test scope. Maybe that’s including some “internal” header that normal code is not supposed to see, which defines a type that you can cast your void * X to, to make it useable.

The unit test would only focus on A or B, not both. Otherwise it’d be an “integration” test, I believe. But if A and B are indeed separate units, then we should be able to test that A behaves itself when X changes unexpectedly by writing appropriately stimulus into X while it is supposedly locked. I imagine some cases may require instrumenting the UUT with hooks which can be defined in the unit test, but would be pre-processed/compiled away for release builds?

Many drivers do spin up worker threads for a whole range of purposes. What they generally don’t do is what the C# pattern Parallel.ForEach does - split a single large problem into multiple threads so that parts can be executed in parallel. Most problems that can be split up that way are problems that UM processes work on.

Yes, that’s how you create unit tests that can test this kind of code. But the point is that you have to understand the internal details of that code. And it isn’t just a matter of including some internal headers and calling some undocumented functions. You have to understand the logic at a detailed level so that you can design a method to retard the progress of each part. Figuring out how to do that is no mean feat and is usually harder than writing the code that you are trying to test. The tests don’t have to be nearly as reliable as the driver code itself, and frequently unsafe memory access and spin wait loops are the only way

But the big problem is that once you make any sort of change, all of the work to develop these tests has to be reevaluated. And that’s an enormous amount of work. This simple example has only A then B or B then A, but any real world system will have many more possibilities for even a single resource.

I agree that developing unit testing for drivers is generally not worth the effort. But that is not the only sort of testing there is, it is just the testing strategy that is currently used in essentially all user mode components written in languages that support things like dependency injection, which would be just about every modern user mode language.

Kernel components can of course be tested as ‘black box’ entities that respond to inputs with observable outputs, and that sort of testing does not care at all about the internal implementation, all it cares about is the correct specification of what those outputs ought to be.

Also, for what it is worth, static driver verifier is sort of a unit test/code coverage framework, although its focus is primarily kernel interface compliance and obviously not the correctness of your domain specific implementation. If SDV works at all for your components, why not use it as a test gate for commits to your main branch?

@MBond2 said:
Many drivers do spin up worker threads for a whole range of purposes.

Ah okay interesting. I don’t think ours does/will, but that’s a note I’ll keep in mind!

But the point is that you have to understand the internal details of that code.

That seems like a requirement regardless. But I understand your larger point, that maybe “integration testing” (or whichever term someone prefers), where the code is exercised in a pseudo-realistic environment with a handful of processes interacting with the driver, may be equivalent if not more effective at testing asynchronous operations than an hand-tuned unit test.

However as I look at the code I’m going to be refactoring, I do see lots of things that feel like they could (should?) be unit tested. Like “does the ‘write 64-bit register’ IOCTL do what I asked it to do” feels like something I should be able to write a small unit test for sanity checking, no? Or would you not bother with that because your integration test should be touching those bits of code as well?

@Mark_Roddy wrote:
I agree that developing unit testing for drivers is generally not worth the effort. But that is not the only sort of testing there is…

Of course. Unit testing is there to test a specific cog in the larger contraption. It doesn’t tell you if the whole contraption works. But it does seem to be a way to get feedback to a developer quickly, unlike running some 10-minute integration test. So I get the impression there is still value to be extracted via unit tests. But maybe I am on the wrong path and shouldn’t bother with them in driver code?

In the Linux world, KUnit is gaining traction. I should be able to pair that with gcov so we can begin to get an idea of how many code branches are being exercised with tests. It’s not a sure-fire proof that code works in all cases, but there is value in knowing that a coverage metric goes up or down when code is modified.

Also, for what it is worth, static driver verifier is sort of a unit test/code coverage framework, although its focus is primarily kernel interface compliance and obviously not the correctness of your domain specific implementation.

Yeah, I was looking at SDV. It looks useful for us as we are not as familiar with Windows drivers as we are Linux, so catching interface compliance issues could be quite helpful. But SDV seems to have a number of limitations. A few of those (e.g. 32-bit int overflow) might be found by a linter like CppCheck though. Definitely going to be trying out SDV.

To be clear (at least in my world) there are four types of testing that need to be done for a shipping driver … and realize that my drivers are almost always talking to some piece of hardware; usually a USB device, sometimes an FPGA on a bus, rarely a kernel service, file system/ network/ storage device filter … your environment may vary.

These tests are also run using a reasonably intricate powershell script which drives a console program which calls into the driver through one of three IOCTL interface groups: testing, public user and private user. The testing IOCTL is #ifdef’d out of the release driver, the public user has a low security DACL behind it (so low privilege usermode apps can access it) and the private user has a high security DACL (admin or domain admin). The public IOCTL’s are sometimes also segregated by user class DACL’s, attempting to follow the principle of least privilege …

Each night after the daily build task runs the script runs, logs what it gets back from the driver, parses the results into a comma delimited format and that’s it for (me) to look at in the morning … :slight_smile:

  • Unit testing, which tests a functional “this should always work” item such as read/ write a register, read/ write a DMA transaction, etc. This is done early in development but are still very useful if the underlying hardware changes (common in the FPGA world) and you want to verify that the new firmware is hold up it’s end of the interface contract
  • Coverage testing, which tests all of the public and private IOCTL’s to make sure they do what they are supposed to do (at least as much as possible; you can’t always tell if the robot arm really did move or the missile did launch, but at least you tried). This is also a requirement for regulatory paperwork
  • Regression testing, which tests known bugs in the past or known problematic platforms to make sure a fix really stays fixed. I have been burned by bugs that I know I fixed in the past reappearing at some point, and is also a regulatory paperwork requirement
  • Fuzz testing, which attempts to break the driver either by causing a BSOD of the driver itself, a machine functional degradation (memory or performance) or a denial of service to the driver from usermode client(s). This essentially is a control programming trying it’s very best to be a bad citizen (open a thousand handles, make a call from the wrong privilege level, pass bad data, create a thread and kill it while a call is pending, etc. etc.) This is becoming a regulatory requirement more and more these days, and is a part of a larger thread assessment [ https://docs.microsoft.com/en-us/windows-hardware/drivers/driversecurity/threat-modeling-for-drivers ]

When most of this thread here seems to be discussing is testing of the actual code pathways, which is interesting but not particularly useful in “testing” … which is supposed to answer from a high level, “does the driver do what it’s supposed to do and nothing else”

1 Like

This thread is drifting, but my experience is that static analysis is the most effective kind of automated test. Its not a kind that you can include in a CI pipeline because the results have to be manually reviewed

The next think that i’ll say is that Windows is not Linux. Obviously. And a good Linux design is often a poor Windows one and vice versa. On Windows, an IOCTL ‘write 64-bit register’ is generally considered a very poor design - with notable exceptions for closed systems

On Windows, all devices are grouped into classes depending on what kind of functionality they are intended to provide. Many similar devices can provide the same interface, but can have radically different implementations in the hardware and driver internals. For devices that implement commodity functionality like a sound card or a NIC, the interface will be well known and many vendors will make hardware that implements it; and many will write UM software that uses it (directly or indirectly). For a device that controls specialized devices (CNC lathe, plasma cutter etc.) usually a single supplier will provide all of them. but the interface abstraction is still valid because it allows migration from one version of hardware to another without requiring concurrent changes to the UM programs and the KM + hardware.

None of that has has much to do with testing, much less automated testing and CI pipelines.

Note that I would consider a 10 minute test to be a fast one.

There is much more that could be said on all of these points, but I’m getting tired