About windows driver development

Just out of curiosity, how many people are still working on Windows driver development mainly these days? Is there still a significant demand for it, or are many folks shifting to Linux kernel development instead? (I worked on windows driver dev since 2018, now I fell boring , almost nothing to do, just some minor fix)

If the volume of posts here is an indication, very few people are still working on Windows driver development.

I wrote a new driver from scratch about 2 years ago using NetAdapterCx. For the most part though my kernel drivers are in maintenance mode these days. There’s two main reasons for that. First for most of my drivers I’ve moved the majority of my driver logic up to user space so I can easily share it between Windows and Linux. Second the bulk of my customers are using Linux.

Just today though I was having discussions to potentially do a new kernel driver for a new PCIe FPGA design. So it’s probably only a matter of time before I write another one. But yeah, by no means am I busy writing new drivers regularly.

I doubt that Windows driver writers are migrating in bulk to Linux. Rather, I think there just aren’t very many new KINDS of devices being created, so there’s no longer a great need for custom driver innovation.

Personally, I’ve been retired for 3 years, and I think I got out at the right time. I’m disturbed by what LLM AI has done to programming.

I believe you can look it in two different ways.

  1. The lesser the people working, the rarer and valuable the talent is.
  2. The more the people working, the more the demand is and competition is.

There is still a demand for Windows Driver Devs especially in security domain. But I believe Linux is the future. For low level learning and reverse engineer purpose, I found Windows Driver Development learning valuable enough.

About 6 of the last 8 years I have been doing serious Windows kernel/driver work, on ARM64. I’ve worked for two different ARM64 CPU companies.

Some of what I did was direct driver development, but a larger part was silicon/firmware/platform/OS debugging of Windows (and Linux) running on ARM64 cpus. The failures are often seen in the OS, especially when running standard or carefully designed stress tests, the root causes have been spread across cpus/firmware/OS/drivers. This usually requires starting debugging from the OS side, and does not always reproduce on a different OS. At one end of the spectrum was a cpu instruction that conformed to the ARM64 spec, but that Windows assumed had behavior not in the spec. At the other end was Windows crashing after a day when running stress tests, which turned out to be incorrect memory barriers in some Windows OS code. A firmware bug was you run cpu profiling on the OS, and when you access a UEFI variable (by running bcdedit) one of the cores stops getting profile interrupts. This was a firmware bug that incorrectly saved the performance counter enable register when a call was made into secure firmware.

Recently I have been involved in a fair amount of PCIe debugging, on Linux, especially issues with bridges. One bug was a Linux code bug if your PCIe hierarchy had a bridge with a management BAR on the upstream switch bus, another bug was with a bridge that did not follow the PCIe spec behavior. Even when hardware has bugs you often don’t get to demand the hardware folks fix their silicon, software/firmware has to cope. I don’t work on chip design, but have been told respining a complex chip is easily in the $10M price tag or more. If it cost $10M to recompile a driver each time, the software development process would likely be quite different.

I’ve read NVidia will have ARM64 Windows desktop cpus next year, so assume there must be a bunch of Windows development/debug activity on that. I assume there is a lot of GPU/NPU development/optimization work happening right now at all vendors of GPU/NPU devices, on all CPUs.

If we get ARM64 desktop systems with PCIe slots, that may motivate more companies to release ARM64 drivers for a variety of devices. Last I checked (a while ago now), the only ARM64 Windows NIC with drivers were the Mellanox cards and USB NICs using a standard USB class, although kdnet worked on some other NICs.

In theory, you can just recompile your correctly written Windows x64 driver to arm64, and it will just work. Reality is not quite that simple.

ARM64 has weak memory ordering, which is different than x64. On ARM64, if you do two memory writes from one core, another core can see those writes in any order, unless you put a memory barrier between them. Using the correct acquire/release variants of some instructions also eliminates the need for explicit barriers. There also are no tools to automatically find missing memory barriers. You don’t want to spray paint memory barriers where not really needed as that degrades performance.

A missing memory barrier often looks like a memory location has an unexpected value, which then becomes correct after a tiny delay. So your code crashes because some memory value is wrong, you get a bugcheck, and when you look in the debugger/crashdump the value is correct (because the write was inflight inside the cpu) and should not have bugchecked. A lot of driver/kernel code has been written on x64, and when that code is recompiled on ARM64 it can have latent missing barriers. ARM64 processor A might also never trip over a missing barrier when processor B does. This can be caused by things like differences in the speculative execution depth, so processor B is executing ahead 200 instructions when processor A is only 100 instructions ahead, this changes the timing of when a memory read happens, so changes the tolerance to write latencies from another core.

When you write new code, it’s not too hard to know where to insert barriers, when you have a large existing code base that was not written with this in mind, it can be a lot harder to find missing barriers. Just for example, I was debugging a Linux driver about 2 years ago that had the signature of a missing barrier. It seemed like the source code did the right thing. It turned out there were pseudo virtual function calls to a device memory write function, and someone long ago decided that should use the non-barrier variant. The result was the driver would update a global interrupt data structure, and then enable interrupts with a write to a register, and the interrupt instantly fired, and the write to the global data was not yet visible on the core that took the interrupt, and it tried to do something bogus. The fix ended up being change the pseudo virtual function pointer initialization for the device register register write function, to use the normal version that included a barrier instead of the no barrier alternate.

If it’s any indication, a month or two ago Microsoft was looking for Windows kernel developers in the core OS team on LinkedIn.

I’ve been writing Windows code since 1988, and Windows kernel code since about 1995. Might be about time to retire. I’m not super impressed by the Linux kernel development culture, seems like a group of people who in general want to do software development using technology from the distant past. Right now the Linux kernel community is have an argument about enabling some pretty innocent C compiler extensions, because the compiler option is named ms-extensions. I believe the software tools you use matter, a lot. Just for example, I started using a C++ subset for Windows drivers about 20 YEARS ago. It didn’t solve all issues but it was overall a better tool than C. Might be better use of my brain to deeply understand how matrix multiplication allows human like conversations and pictures of cats water skiing.

-Jan

8 Likes

As a hardware designer, who designed a PCIe FPGA device, and was dragged kicking and screaming into driver development I would not even contemplate starting with Windows after reading some of the experiences people have with getting their drivers signed.

Smaller companies appear to have little influence over the decisions that Microsoft make.

We use Windows Compact Embedded on our equipment. As well as IoT Enterprise LTSC (and in the past we used Windows Embedded).

Windows CE is being withdrawn altogether. The replacement was supposed to be IoT core. Our Windows CE devices use a Texas ARM processor however we cannot migrate this hardware SBC because this processor uses TI based interrupts rather than ARM based interrupts, where the latter is required by Windows 10 IoT core. So that hardware has an end of life that, for us, ends with Windows CE (unless we go the Linux route). SBC companies like Toradex are also extricating themselves from Microsoft.

We are migrating our products to Linux in general. Although the Windows IoT 11 LTSC release does give us breathing space.

I have moved back to baremetal embedded systems these days and I will be happy doing that until I retire. Well with the exception of the eclipse IDE.

1 Like

I often see your replies when I'm searching for solutions on OSR. I really want to thank you and all the experienced folks for your selfless help—it’s truly amazing!

For embedded systems PlatformIo and VSCode (on linux or windows hosts) is a great alternative.

Microchip has just launched a VSCode alternative. Hopefully they are giving up on MPLAB X.

For what it’s worth, I work for a company that produces a variety of cyber security related products for Windows and have custodial maintenance responsibilities for 2 different device drivers in 2 different products as a result of being “the last man standing” with the skillset to maintain them. At the same time, over the past year, I’ve been implementing a new driver from the ground up to serve as a replacement for both of those drivers in our “next generation” product.

ARM64 support has been built into the new driver, while the legacy drivers and their associated products are being refactored for ARM64 support.

1 Like

Great insights, It’s encouraging to hear that driver development is still active, especially with both legacy maintenance and new ARM64 support. Your experience offers valuable perspective. Would love to know what major challenges you faced during the ARM64 refactoring process.

[MODS: I sincerely apologize for the time it took to approve this post. It got stuck in the queue as potential spam. Welcome to the forum herry62.]