Maybe a driver issue maybe not

I am hoping you can answer a question for me the result of which will lead to us modifying our drivers or me going to our Microsoft vendor for help.

We developed a Wdf Kmdf PCIe driver that works (mostly) successfully in our embedded (Windows 10 IoT Enterprise LTSC) products.

While we accept Windows is not real time we are actually getting fairly good real time performance. However sometimes after the product has ran for many hours the system crashes. (We can log the time of the crash).

After setting up performance counters recording for interrupt, DPC and privileged time I see a big increase in interrupt, DPC and privileged time at the time the crash occurs. The privileged time can max out at 100% for between 5 to 10 seconds.

The event log shows things like Windows Update (can be disabled in IoT Enterprise) or Windows Security Auditing (cannt be disabled) occurring at the time of these crashes.

I have a ring buffer in the driver to collect data when we cannot get IO requests to the driver fast enough (interrupt supplies 256K of data every 160ms).

So my question is does our driver need to be able to buffer enough data to cope with 10 seconds of delay (about 16M)?

If so we will change our driver. If not we will speak to our vendor.

10s delay is ridiculous for a dedicated device. It is actually ridiculous
for a non-dedicated device. Something is wrong on your iot device.
Mark Roddy

Thank you. I will take this up with our Microsoft Vendor. (We buy a lot of IoT and CE licenses so we have some clout).

Actually perfmon only records every 5 seconds so that means at two points 5 second apart we have process privileged time of 100% and a high Interrupt time. The ring buffer we have should be able to handle about 5 seconds of data but it crashes after filling up (used TraceView to observe this).

we are actually getting fairly good real time performance

And that is fairly typical.

sometimes after the product has ran for many hours the system crashes

And this is also, sadly, fairly typical… at least for drivers under development.

it crashes after filling up

You are absolutely certain that filling up the buffer is what’s causing the crash… not something else? You want to be sure you’re solving the root cause of the problem, and not merely guessing.

The “filling up” latency that you’re measuring is between what two events? Data arriving and some app running at some priority posting a read for that data? Assuming so, can the app not use async I/O and post multiple reads?? In most cases I see where there’s a problem like this with buffer overflows, the issue can be mitigated without changing the driver (or with few changes to the driver) by having the app be “smarter.”

I had a call last week with a company that has a driver that (a) signals the app that data is ready, and then (b) the app sends an IOCTL to retrieve the data… and then this process repeats. They were seeing buffer overruns with Win10. Our very first question to them was: Given that this isn’t Linux, why do you have your app wait for the signal to send the IOCTL? Why not send a handful of async IOCTLs in advance, so that the driver has buffers “always” available.

I absolutely agree with Mr. Roddy that 10 seconds is not any sort of a delay that you should expect. A couple of seconds? Maybe. Sure. But TEN?? Nah, no way.

Peter

it crashes after filling up

You are absolutely certain that filling up the buffer is what’s causing the crash… not something else? You want to be sure you’re solving the root cause of the problem, and not merely guessing.

No I do not know what is causing the crash. When I observe the system in a “crashed” state it has filled up. However it is true that this may merely be an observed symptom and not a diagnosis. and It may have crashed before or during filling up. The logged stopped time of the application coincides with the “filling up” time in trace viewer.

In the DPC I detect if there are no valid IO requests and then add the data from the DMA buffer to the ring buffer. I then output a trace message to indicate we’re in the DPC and how much data there is stored in the ring buffer.

The “filling up” latency that you’re measuring is between what two events? Data arriving and some app running at some priority posting a read for that data? Assuming so, can the app not use async I/O and post multiple reads?? In most cases I see where there’s a problem like this with buffer overflows, the issue can be mitigated without changing the driver (or with few changes to the driver) by having the app be “smarter.”

The app currently currently uses async IO with a single read request, processing read data, which is followed by a single write request. Yes we most certainly can look into posting multiple reads. That sounds like a plan.

Thank you.

I absolutely agree with Mr. Roddy that 10 seconds is not any sort of a delay that you should expect. A couple of seconds? Maybe. Sure. But TEN?? Nah, no way.

Yeah perfmon was set up to record at 5 second intervals. So it indicated privileged time of 100% precisely at those intervals.

A weird thing is though we can leave the system to run for weeks with no problem. Then we connect a network cable and this causes the system to crash. Most internet traffic is blocked until we log on. However Microsoft stuff, e.g. update, is not blocked.

We use sequential dispatching for our driver so it is actually synchronous IO.

No I do not know what is causing the crash.

Would a crash dump not tell you conclusively?? I mean… why guess?

A weird thing is though we can leave the system to run for weeks with no problem.

That’s not SO very weird. It’s so common, in fact, that I tell stories in class about just such issues. A client with a large-format document printer that had a black stripe print out “once every couple of nights during constant testing” – A client who had crashes once every week or so, and had to have engineers camp out (literally) at a gas station somewhere in Europe waiting for the problem to happen in the field.

Then we connect a network cable and this causes the system to crash.

All manner of timing problems can cause what you’re seeing. Of course, you could absolutely be seeing some odd affect that’s creating very unusual latencies. We often say that the average latencies in WIndows drivers are quite low, but the worst case latencies can be truly astonishing. Not 10 seconds worth of astonishing. You’d have to work hard to get latencies that bad. But, still… a second or two in very worst cases level of astonishing.

Start with a crash dump. Don’t waste your time guessing and fixing what you think “must be” the problem.

Peter

Would a crash dump not tell you conclusively?? I mean… why guess?

Okay. We have been muck around with procdump and the user space app stopping after detecting a missed read. It did not tell us much.

I will delete procdump from the registry, enable user space crash dump logging and allow the user space app to run itself into oblivion.

I will delete procdump from the registry, enable user space crash dump logging and allow the user space app to run itself into oblivion.

OK. I have no idea what that means.

We’re talking about an actual Windows OS Blue Screen Of Death CRASH, right? Not something else?

If so, get get a crash dump and analyze it. Or, you know, !analyze -V and post the output here (like everyone else does).

If we’re talking about something else… if you mean “the application dies” as opposed to “the system crashes” then… that’s entirely different to what I’ve been talking about.

Peter

My bad. I will aim to be more specific in future, There is no BSOD and the application does indeed die after failing to get IO requests to the driver.

When I used the term ‘system’ I meant our embedded ‘system’. Our user space applications, running with our drivers on a Windows 10 IoT LTSC industrial motherboard to create a dedicated product.

Your help is much appreciated though.

Glad to have been able to help.

Even if I was, ah, more than just a little confused. I still AM… but despite that it seems we managed to help you. So “all is well.”

Peter

Even if I was, ah, more than just a little confused. I still AM… but despite that it seems we managed to help you. So “all is well.”

Your comment about using async IO and posting multiple reads makes a lot of sense for our application.

Thank you.

you can see some other comments about the proper and improper use of OVERLAPPED IO on this fourm and others over the last many years.

Update.

We ran a test on another instrument and it ran without crashing.

We cloned the hard drive of that instrument onto that hard drive of the instrument that was crashing an that instrument has run for a week no problem.

The problem of the instrument failing by connecting it to the network will be tested in due course. However the connecting to the network problem usually causes the unit to crash within minutes so we can use xperf to get more information in that case.

@MBond2 said:
you can see some other comments about the proper and improper use of OVERLAPPED IO on this fourm and others over the last many years.

Okay now I am learning what you mean by OVERLAPPED IO.

I need to change ReadFile and WriteFile to ReadFileEx and WriteFileEx respectively (that use the overlapped structure).

Belated thanks.

Both ReadFile and WriteFile have an LPOVERLAPPED parameter, overlapped io is not the sole reason to move to the Ex versions

Do some reading.

When you call CreateFile, set FILE_FLAG_OVERLAPPED in the dwFlagsAndAttributes parameter.

When you call ReadFile (or whatever), set a pointer to an overlapped structure:

Peter

I saw that. > @“Peter_Viscarola_(OSR)” said:

Do some reading.

I agree I need to do some reading.

I was following the pcidrv a bit too slavishly.

Thank you.

Yeah, well… don’t follow that example TOO slavishly. It is not a sample of which I am overly fond… let me just leave it at that.

Peter

In general, ReadFileEx and WriteFileEx are useful only if you want completions via APC – which IMHO is a feature that has no practical use. UM APC calls must happen during alertable waits, which make them the poor cousins of IOCP or thread pool based designs where any thread can handle the completion. In theory, some threading models can benefit – especially if there is a lot of legacy code. But practically, good old ReadFile and WriteFile or ReadFileScatter and WriteFileGather are much more useful.