Multi-core CPU interrupt routing or global locks?

I’m seeing an annoying behavior, as a user of a system, and I’m trying
to figure it out as a first step towards trying to resolve it. I’ve
convinced myself the problem is somewhere at the kernel level, but I’d
like some help theorizing or explaining why that would be.

My experience: I’ve developed drivers for other OS-es before, but never
for Windows, so if my terminology is off, please correct me. I’ve also
used WinDbg to analyze minidumps to find out how to work around driver
problems at the user level, but my Windows kernel experience stops there.

The system: Windows XP SP2 32-bit, Atlon X2 (dual core) CPU, NVIDIA 400
series chip set, ACPI 2.x BIOS, Creative X-Fi sound card, SATA hard disk
array and PATA CD/DVD drive.

The behavior: Whenever I’m playing music in iTunes for Windows (which is
horribly modal in its UI, by the way), and encoding CDs using
AudioGrabber, the sound playback stutters (repeats a buffer) when
starting and stopping the ripping process. I know that iTunes queues a
fair bit of data, so it’s likely that the stuttering is caused by the
sound driver missing an interrupt, or getting the interrupt but not
being able to handle it. There’s a small chance that the problem might
be caused higher up, such as in DirectSound not being able to re-fill
the buffer, although with two CPUs, that’s less likely to be the cause.
I don’t understand why the missing interrupt would happen, though.

First, there are two CPUs, and the documentation on the NT kernel I’ve
found says that Windows services interrupts on any CPU, so if the PATA
CD device driver disables interrupts for extended periods of time, the
interrupt service for the sound card would just go to the other CPU.

Second, I don’t think it can be any specific device contention, because
iTunes and AudioGrabber actually shares no devices (except for the
graphics card and user input).
iTunes reads from an SMB file system using a network card, and plays
through a sound card.
Audiograbber reads from a PATA device, and writes to an NTFS file system
on a SATA array device.

So, there must either be some global lock that both the CD digital audio
reading device wants, and the X-fi sound card wants, OR the discussion
about sharing interrupts across CPUs that I read is either only
advisory, or doesn’t apply to a dual-core single-package CPU. OR there’s
some totally different cause which I don’t know about at all.

Any discussion, reference, or theories you might have would help!

Cheers,

/ h+

I can’t say that I know the answer to this, but I would say that it is
well within the range of possibility that this behavior is intentional
on the part of iTunes, as what you describe is, making no comment of
whether you are doing this or not, most definitely not something that
they would wish. I have no idea of whether this is the case or not, but
behavior such as this regarding applications that utilize DRM are pretty
common. On that not, do you actually have WinDbg connected? If so, I’m
surprised, as usually any DRM playback gets sabotaged by recurring int 1
instances.

mm

>> xxxxx@mindcontrol.org 2007-01-02 13:39 >>>
I’m seeing an annoying behavior, as a user of a system, and I’m trying

to figure it out as a first step towards trying to resolve it. I’ve
convinced myself the problem is somewhere at the kernel level, but I’d

like some help theorizing or explaining why that would be.

My experience: I’ve developed drivers for other OS-es before, but never

for Windows, so if my terminology is off, please correct me. I’ve also

used WinDbg to analyze minidumps to find out how to work around driver

problems at the user level, but my Windows kernel experience stops
there.

The system: Windows XP SP2 32-bit, Atlon X2 (dual core) CPU, NVIDIA 400

series chip set, ACPI 2.x BIOS, Creative X-Fi sound card, SATA hard
disk
array and PATA CD/DVD drive.

The behavior: Whenever I’m playing music in iTunes for Windows (which
is
horribly modal in its UI, by the way), and encoding CDs using
AudioGrabber, the sound playback stutters (repeats a buffer) when
starting and stopping the ripping process. I know that iTunes queues a

fair bit of data, so it’s likely that the stuttering is caused by the
sound driver missing an interrupt, or getting the interrupt but not
being able to handle it. There’s a small chance that the problem might

be caused higher up, such as in DirectSound not being able to re-fill
the buffer, although with two CPUs, that’s less likely to be the cause.

I don’t understand why the missing interrupt would happen, though.

First, there are two CPUs, and the documentation on the NT kernel I’ve

found says that Windows services interrupts on any CPU, so if the PATA

CD device driver disables interrupts for extended periods of time, the

interrupt service for the sound card would just go to the other CPU.

Second, I don’t think it can be any specific device contention, because

iTunes and AudioGrabber actually shares no devices (except for the
graphics card and user input).
iTunes reads from an SMB file system using a network card, and plays
through a sound card.
Audiograbber reads from a PATA device, and writes to an NTFS file
system
on a SATA array device.

So, there must either be some global lock that both the CD digital
audio
reading device wants, and the X-fi sound card wants, OR the discussion

about sharing interrupts across CPUs that I read is either only
advisory, or doesn’t apply to a dual-core single-package CPU. OR
there’s
some totally different cause which I don’t know about at all.

Any discussion, reference, or theories you might have would help!

Cheers,

/ h+


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

I certainly don’t intend for this to become an iTunes gossip thread, but I don’t think what you have posted is necessarily sufficient evidence of a kernel or driver problem. I’m not asserting there isn’t a problem, only that this post doesn’t meet my threshold of proof. What you’ve posted is consistent with my own experience with recent versions of iTunes. iTunes 7.x has some serious problems with audio quality and buffering, compared to 6.x. 7.0 was so bad that playback was simply unusable on most of my home machines, and these are machines running quite vanilla XP installs; 7.0.1 fixed some of these problems, but there is still a serious problem with clock synchronization / drift, which causes a buffer drop about every 30 seconds. Very annoying. If you can get hold of the installer for any version of iTunes 6.x, I would recommend doing some experiments, and seeing whether the problem still occurs with that version.

Please note that I’m posting this as a user of iTunes, and not an employee / representative of Microsoft. I am not in any way involved in Microsoft’s music / DRM efforts / WMP, and my commentary on iTunes is solely my own. Some might consider this off-topic for NTDEV, but I think your initial approach and request for advice was well-considered.

If I had to guess, I would say that iTunes is simply not buffering much, and in ordinary circumstances, this is probably a good approach, but in your specific situation, it looks like your system experiences a burst of activity, the audio driver exhausts that buffer, and for whatever reason, iTunes cannot fill the buffer fast enough. Not exactly a Sherlock Holmes-level deduction, but my guess as to why is that the ATAPI stack, especially when dealing with CD-ROMs, can sometimes consume a lot of time in DPCs.

DPCs are a complex subject. Since you have developed drivers, here’s a pub intro to DPCs. A DPC is a “Deferred Procedure Call”, and it is one of the main ways that hardware-based drivers (as opposed to virtual NICs and the like) get scheduled and get work done. Most driver code can be lumped into three domains: 1) code paths that run in response to requests from other drivers or processes, 2) code that runs as a DPC, and 3) code that runs in response to hardware interrupts (ISRs). In *nix terminology, ISRs and DPCs are roughly equivalent to the “bottom half” of a *nix driver, and code paths that run in response to processes (system calls) are the “upper half”. Driver ISRs are intended to be as simple as possible; they examine device registers, read the cause of interrupts, acknowledge them (causing the device to stop asserting the interrupt), schedule a DPC, and then return immediately.

DPCs are intended to perform small, non-blocking units of work. DPCs never block in the scheduler; if there is a need to wait for something, DPCs must set internal state, and processing must continue in some other callback (which might be another call to the same DPC, later on). DPCs take priority over all user-mode threads. More specifically, DPCs run at DISPATCH_LEVEL, which is a priority level that is higher than PASSIVE_LEVEL, which is what all user-mode threads run at when they have not entered the kernel.

So if a driver is scheduling an excessive amount of DPCs, or those DPCs are running for too long a time, then user-mode threads won’t be scheduled enough to keep up with the real-time workload of filling audio buffers.

Try a few experiments. Disable one processor (core) and see whether the problem still happens, is worse / better, etc. If you have access to a different SMP machine (discrete processor packages, not SMP-on-die), try the same arrangement there; I suspect it won’t make a difference, though. Use perfmon (start, run, perfmon.msc) and monitor the “% DPC Time”, “% Interrupt Time”, “% User Time”, and “DPC Rate” counters under the “Processor” object. Excessive spikes in DPC activity can prevent user-mode code from doing any work, so watch those and see if they correlate with the audio drops. Do some basic monitoring with Performance Monitor, and see if this gives you an indication. If so, there are tools for getting finer-grained data on DPCs (e.g. kernrate), such as profiling the specific drivers that are scheduling DPCs and how much time they are using.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Jon Watte
Sent: Tuesday, January 02, 2007 10:40 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Multi-core CPU interrupt routing or global locks?

I’m seeing an annoying behavior, as a user of a system, and I’m trying
to figure it out as a first step towards trying to resolve it. I’ve
convinced myself the problem is somewhere at the kernel level, but I’d
like some help theorizing or explaining why that would be.

My experience: I’ve developed drivers for other OS-es before, but never
for Windows, so if my terminology is off, please correct me. I’ve also
used WinDbg to analyze minidumps to find out how to work around driver
problems at the user level, but my Windows kernel experience stops there.

The system: Windows XP SP2 32-bit, Atlon X2 (dual core) CPU, NVIDIA 400
series chip set, ACPI 2.x BIOS, Creative X-Fi sound card, SATA hard disk
array and PATA CD/DVD drive.

The behavior: Whenever I’m playing music in iTunes for Windows (which is
horribly modal in its UI, by the way), and encoding CDs using
AudioGrabber, the sound playback stutters (repeats a buffer) when
starting and stopping the ripping process. I know that iTunes queues a
fair bit of data, so it’s likely that the stuttering is caused by the
sound driver missing an interrupt, or getting the interrupt but not
being able to handle it. There’s a small chance that the problem might
be caused higher up, such as in DirectSound not being able to re-fill
the buffer, although with two CPUs, that’s less likely to be the cause.
I don’t understand why the missing interrupt would happen, though.

First, there are two CPUs, and the documentation on the NT kernel I’ve
found says that Windows services interrupts on any CPU, so if the PATA
CD device driver disables interrupts for extended periods of time, the
interrupt service for the sound card would just go to the other CPU.

Second, I don’t think it can be any specific device contention, because
iTunes and AudioGrabber actually shares no devices (except for the
graphics card and user input).
iTunes reads from an SMB file system using a network card, and plays
through a sound card.
Audiograbber reads from a PATA device, and writes to an NTFS file system
on a SATA array device.

So, there must either be some global lock that both the CD digital audio
reading device wants, and the X-fi sound card wants, OR the discussion
about sharing interrupts across CPUs that I read is either only
advisory, or doesn’t apply to a dual-core single-package CPU. OR there’s
some totally different cause which I don’t know about at all.

Any discussion, reference, or theories you might have would help!

Cheers,

/ h+


Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Arlie Davis wrote:

the ATAPI stack, especially when dealing with CD-ROMs, can sometimes consume a lot of time in DPCs.

That’s interesting to know – certainly applicable to the problem at hand.

DPCs are intended to perform small, non-blocking units of work.

Thanks for the intro – I understand their role from your explanation
(and from having emulated them with real-time kernel-space threads on
another OS in the past). Given that DPCs lock out user threads, they are
similar to what I mean by “disabling interrupts.” However, if the sound
card was running low on data, I would assume that it would stop playing
until more data was queued. Although I know what happens when I "ass"ume
:slight_smile: (and the DirectSound model is one of looping buffers, not of queued
buffers, so maybe that’s what the card does internally, too)

Use perfmon (start, run, perfmon.msc) and monitor the “% DPC Time”, “% Interrupt Time”, “% User Time”, and “DPC Rate” counters under the “Processor” object. Excessive spikes in DPC activity can prevent
That’s solid advice, and that’s what I will do to confirm/deny this
hypothesis. Thank you!

And, if it turns out to be ATAPI generated DPCs, then that’s a really
bad behavior of the ATAPI driver. I’m assuming there are no third-party
replacement drivers available, so it would, at that point, have to be a
bug report to xxxxx@microsoft.com

Cheers,

/ h+

Jon Watte writes:

And, if it turns out to be ATAPI generated DPCs, then that’s a really
bad behavior of the ATAPI driver. I’m assuming there are no third-party
replacement drivers available, so it would, at that point, have to be a
bug report to xxxxx@microsoft.com

Please do not assume that Microsoft is not interested in bugs, and fixing them. I mostly lurk here and answer the odd easy question, but there are several heavy-weights here who answer the hard questions, listen, route questions and bug reports to other internal teams, and generally follow through. Everyone (everywhere) has finite bandwidth, and of course this is not an official support channel, but the NTDEV list has long been a great resource. If you can demonstrate a real problem, especially with such a common and important driver, I’m sure someone will be interested in it.

And also, there *is* more than one ATAPI driver implementation; a lot of motherboard manufacturers ship their own ATAPI drivers on the OEM / drivers CD. Their quality (and performance, etc.) varies. I usually ignore these drivers, and only pay attention to them when I’m doing something that is really disk-perf intensive, or if the hardware is so broken that you need the manufacturer’s drivers to work around some hardware bug. That’s partly why I suggested trying some experiments on different platforms. The quality of ATAPI hardware varies far more, from “atrocious” to “makes me not miss SCSI so much”.

Also, a high DPC rate / time spent in DPC is not necessarily a driver problem; bogus hardware can generate a lot of pain, as well.

Microsoft provides buildable source code for the PCI parallel ATAPI driver as part of the Windows DDK, which is free and quite useful. The ATAPI driver code lives under %winddk%\src\storage\miniide\pciide.

http://www.microsoft.com/whdc/devtools/wdk/default.mspx - DDK front page
http://www.microsoft.com/whdc/resources/downloads.mspx - Lots of goodies
http://www.microsoft.com/whdc/DevTools/ddk/default.mspx - Download page for DDK 3790.1830

I’m especially curious about how the number of processors (and their interconnect topology) affects the problem you are seeing. DPC scheduling happens per-processor; each processor has a DPC queue, and only one DPC can run on a specific processor at a time. (There are exceptions, but that’s generally true.) A specific hardware device will usually only have one DPC runnable at a time (again, there are exceptions, of course). So if part of the problem is excessive time spent in DPCs caused by a specific device, then having more processors/cores should allow more user-mode threads to be runnable.

– arlie

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Jon Watte
Sent: Tuesday, January 02, 2007 2:07 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Multi-core CPU interrupt routing or global locks?

Arlie Davis wrote:

the ATAPI stack, especially when dealing with CD-ROMs, can sometimes consume a lot of time in DPCs.

That’s interesting to know – certainly applicable to the problem at hand.

DPCs are intended to perform small, non-blocking units of work.

Thanks for the intro – I understand their role from your explanation
(and from having emulated them with real-time kernel-space threads on
another OS in the past). Given that DPCs lock out user threads, they are
similar to what I mean by “disabling interrupts.” However, if the sound
card was running low on data, I would assume that it would stop playing
until more data was queued. Although I know what happens when I "ass"ume
:slight_smile: (and the DirectSound model is one of looping buffers, not of queued
buffers, so maybe that’s what the card does internally, too)

Use perfmon (start, run, perfmon.msc) and monitor the “% DPC Time”, “% Interrupt Time”, “% User Time”, and “DPC Rate” counters under the “Processor” object. Excessive spikes in DPC activity can prevent
That’s solid advice, and that’s what I will do to confirm/deny this
hypothesis. Thank you!

And, if it turns out to be ATAPI generated DPCs, then that’s a really
bad behavior of the ATAPI driver. I’m assuming there are no third-party
replacement drivers available, so it would, at that point, have to be a
bug report to xxxxx@microsoft.com

Cheers,

/ h+


Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Arlie,

Your contribution to the rest of this thread has been very useful, but
this bit is just silly. Yes, technically, the pciide minidriver is a
buildable component of the ATAPI driver set, consisting of ATAPI.sys,
PciIdeX.sys, and the IDE minidriver (PCIIDE, INTELIDE, CMDIDE, etc…).
The utility of the IDE minidriver to anyone but Microsoft is vanishingly
small. None of the ISR/DPC code lives in the IDE minidriver. HBA vendors
can’t add any value with it, which is why almost all the HBA vendors
provide a SCSI miniport instead.

Whatever happened to AtaPort? Delayed until LHS?

Phil

Philip D. Barila
Seagate Technology LLC
(720) 684-1842

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Arlie Davis

Sent: Tuesday, January 02, 2007 4:00 PM
To: “Windows System Software Devs Interest List”
Subject: RE: [ntdev] Multi-core CPU interrupt routing or global locks?

[snip]

Microsoft provides buildable source code for the PCI parallel ATAPI driver
as part of the Windows DDK, which is free and quite useful. The ATAPI
driver code lives under %winddk%\src\storage\miniide\pciide.

[snip]

Hey, I readily admit my ignorance on the storage stack; my expertise lies elsewhere. All I did was a cursory scan of the DDK storage dir for Jon’s benefit; I didn’t realize it was buildable, but not enough to run an entire PCI ATAPI stack. (And that’s a shame.) Guess I should have qualified my “useful” statement.

Hopefully the perf counters, or data from kernrate, will get Jon what he needs.

– arlie

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@seagate.com
Sent: Wednesday, January 03, 2007 12:09 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Multi-core CPU interrupt routing or global locks?

Arlie,

Your contribution to the rest of this thread has been very useful, but this bit is just silly. Yes, technically, the pciide minidriver is a buildable component of the ATAPI driver set, consisting of ATAPI.sys, PciIdeX.sys, and the IDE minidriver (PCIIDE, INTELIDE, CMDIDE, etc…). The utility of the IDE minidriver to anyone but Microsoft is vanishingly small. None of the ISR/DPC code lives in the IDE minidriver. HBA vendors can’t add any value with it, which is why almost all the HBA vendors provide a SCSI miniport instead.

Whatever happened to AtaPort? Delayed until LHS?

Phil

Philip D. Barila
Seagate Technology LLC
(720) 684-1842


From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Arlie Davis
Sent: Tuesday, January 02, 2007 4:00 PM
To: “Windows System Software Devs Interest List”
Subject: RE: [ntdev] Multi-core CPU interrupt routing or global locks?

[snip]

Microsoft provides buildable source code for the PCI parallel ATAPI driver as part of the Windows DDK, which is free and quite useful. The ATAPI driver code lives under %winddk%\src\storage\miniide\pciide.

[snip] — Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256 To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

> Whatever happened to AtaPort? Delayed until LHS?

Included to Vista distro, documented in the WDK for 3rd party
non-standard-at-all ATA HBA drivers (instead of SCSIPORT).


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

To find out whether a driver that is stalling in its DPC is the root of
the problem (in my experience this is most often the case), you can
consider using our free utility dpclat which is very easy to use, see
http://www.thesycon.com/eng/latency_check.shtml
or Microsoft’s RATT which is quite easy to use. Search the Web for RATT
download.

Udo

Jon Watte wrote:

Arlie Davis wrote:
> the ATAPI stack, especially when dealing with CD-ROMs, can sometimes
> consume a lot of time in DPCs.
>
>
That’s interesting to know – certainly applicable to the problem at hand.

> DPCs are intended to perform small, non-blocking units of work.
>
Thanks for the intro – I understand their role from your explanation
(and from having emulated them with real-time kernel-space threads on
another OS in the past). Given that DPCs lock out user threads, they are
similar to what I mean by “disabling interrupts.” However, if the sound
card was running low on data, I would assume that it would stop playing
until more data was queued. Although I know what happens when I "ass"ume
:slight_smile: (and the DirectSound model is one of looping buffers, not of queued
buffers, so maybe that’s what the card does internally, too)

> Use perfmon (start, run, perfmon.msc) and monitor the “% DPC Time”, “%
> Interrupt Time”, “% User Time”, and “DPC Rate” counters under the
> “Processor” object. Excessive spikes in DPC activity can prevent
That’s solid advice, and that’s what I will do to confirm/deny this
hypothesis. Thank you!

And, if it turns out to be ATAPI generated DPCs, then that’s a really
bad behavior of the ATAPI driver. I’m assuming there are no third-party
replacement drivers available, so it would, at that point, have to be a
bug report to xxxxx@microsoft.com

Cheers,

/ h+

Thank you, that looks very useful, too!

Cheers,

/ h+

Udo Eberhardt wrote:

To find out whether a driver that is stalling in its DPC is the root
of the problem (in my experience this is most often the case), you can
consider using our free utility dpclat which is very easy to use, see
http://www.thesycon.com/eng/latency_check.shtml
or Microsoft’s RATT which is quite easy to use. Search the Web for
RATT download.

Udo