Looking for Hardware Horror Stories

To take some of the hostility out of the list for a good cause,
I am thinking of writing something on evil hardware. We all
have seen the queries on the list looking for solutions since:

  1. My hardware has to be touched every millisecond with
    an accuracy of plus or minus 100 nanoseconds

  2. My PCI card needs to have the motherboard bridge set
    to a specific mode and Windows doesn’t how do I
    fix this?

So I am looking to collect tales of bad hardware, you do not
have to name firms, but the more details the better. Any takers
to help me collect the “Worst of Device Design” ?

Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting

Well it was actually a design error in a first rev bridge chip, but I once
had a machine where the IO space aliased into the first 10K of memory under
certain conditions. This led to me having to figure out how to get the HAL
and the boot driver to load above 10K so the system would run…

Loren

– Interrupt registers where the interrupt isn’t cleared by writing a 1 to
the bit position, but by writing the (same) value to the register. Thus, if
an interrupt is signalled between reading the status and writing it to clear
the interrupt(s), one will clear the new interrupt as well. I forget how I
worked around that issue.
– Counters which have to be read frequently to detect rollover and which
don’t have an interrupt for a rollover event. (Thus, if you don’t read it
often enough, you won’t know if it rolled over or not.)
– Counters which don’t (or can’t be set to) clear on read. That makes
keeping track of the number of events inexact (particularly when combined
with the above one) since events might occur between reading the counter and
one writing a 0 to it to clear it.

A SCSI device which completes the I_T_L nexus ASAP, but continues to run the
internal processing after this.

Another commands sent by the host in this time (the host assumes the device to
be idle) are failed with BUSY status.

The device requires patched SCSIPORT or the HBA driver to be operable under
Windows.

Non-patched SCSI stack retries the BUSY CDB once per second, and several times.
So, any operation by this device takes 10 seconds or such :slight_smile:

Even with patched SCSI stack, the device requires the PCI-style retry
protocol - pinging and pinging, this is the only way to know that the internal
processing completed.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Hmmmmm … how early can we go? I can distinctly remember a CPU that had a
specific peripheral slot, the one on the right when the back was opened that
was only for the CPU. And I mean ONLY for the CPU. If you put the CPU board
in any slot to the left of that slot, and this was easy to do since they
were THICK cards and the rails were close together, you put 12v where you
didn’t want 12v. I do remember a 1/4 watt resistor the size of my little
finger turning cherry read, glowing, and then simply disappearing. Removing
the CPU card I noted that all the white paper stickers I had put on the 32
pin PROMS I had so laboriously programmed were a lovely shade of brown.

Oh well … that predated AT, ISA, and PCI busses. There’s another story
about paper tape and the fan in the bottom of tall a rack, but I think I
have told that one. :))


Gary G. Little
Seagate Technologies, LLC

“Don Burn” wrote in message news:xxxxx@ntdev…
>
> To take some of the hostility out of the list for a good cause,
> I am thinking of writing something on evil hardware. We all
> have seen the queries on the list looking for solutions since:
>
> 1. My hardware has to be touched every millisecond with
> an accuracy of plus or minus 100 nanoseconds
>
> 2. My PCI card needs to have the motherboard bridge set
> to a specific mode and Windows doesn’t how do I
> fix this?
>
> So I am looking to collect tales of bad hardware, you do not
> have to name firms, but the more details the better. Any takers
> to help me collect the “Worst of Device Design” ?
>
> Don Burn (MVP, Windows DDK)
> Windows 2k/XP/2k3 Filesystem and Driver Consulting
>
>
>

> Hmmmmm … how early can we go? I can distinctly remember a CPU that had a

specific peripheral slot, the one on the right when the back was opened
that
was only for the CPU. And I mean ONLY for the CPU. If you put the CPU
board
in any slot to the left of that slot, and this was easy to do since they
were THICK cards and the rails were close together, you put 12v where you
didn’t want 12v. I do remember a 1/4 watt resistor the size of my little
finger turning cherry read, glowing, and then simply disappearing.
Removing
the CPU card I noted that all the white paper stickers I had put on the 32
pin PROMS I had so laboriously programmed were a lovely shade of brown.

Heh. Long ago I was working on the prototype of a mainframe processor.
Thing had LED blocks on the front of all of the cards to display the
registers. It also had -2 @ 1000A and +4.75 @ 5000A supplies. It was
*really easy* to tell after card rework when they got the wrong supply
connected to the card. All the leds on the card got *very* bright, melted,
and ran down the face of the machine.

But again, that was ancient history. (Although the same machine had a disk
control where if the disk went seeking, you had to poll the control to find
out when the seek completed, because they didn’t think you needed an
interrupt for that case. So you used sync IO on the disks so you had CPU
time left over to do something useful.)

Loren

One I had a problem with was reading a 64-bit time-of-day structure from a
GPS board.

Problem was that the beautiful 64-bit structure was memory mapped to four
16-bit locations that had to be read separately. There was no hardware
synchronization, so between any of the four required reads the other three
could change.

Thomas

“Don Burn” wrote in message news:xxxxx@ntdev…
>
> To take some of the hostility out of the list for a good cause,
> I am thinking of writing something on evil hardware. We all
> have seen the queries on the list looking for solutions since:
>
> 1. My hardware has to be touched every millisecond with
> an accuracy of plus or minus 100 nanoseconds
>
> 2. My PCI card needs to have the motherboard bridge set
> to a specific mode and Windows doesn’t how do I
> fix this?
>
> So I am looking to collect tales of bad hardware, you do not
> have to name firms, but the more details the better. Any takers
> to help me collect the “Worst of Device Design” ?
>
> Don Burn (MVP, Windows DDK)
> Windows 2k/XP/2k3 Filesystem and Driver Consulting
>
>
>

The RTC in the PC is the same way. A chip we used for an external secure
time source had the same problem in that the data could change from the time
you started to read the first byte of the date/time and you finished the
last byte. The last byte did have a bit that would be set if the time
changed during the sequence, so you could detect it and just continue to
repeat until you didn’t get the bit set.

“Thomas F. Divine” wrote in message news:xxxxx@ntdev…
>
> One I had a problem with was reading a 64-bit time-of-day structure from a
> GPS board.
>
> Problem was that the beautiful 64-bit structure was memory mapped to four
> 16-bit locations that had to be read separately. There was no hardware
> synchronization, so between any of the four required reads the other three
> could change.
>
> Thomas
>
> “Don Burn” wrote in message news:xxxxx@ntdev…
> >
> > To take some of the hostility out of the list for a good cause,
> > I am thinking of writing something on evil hardware. We all
> > have seen the queries on the list looking for solutions since:
> >
> > 1. My hardware has to be touched every millisecond with
> > an accuracy of plus or minus 100 nanoseconds
> >
> > 2. My PCI card needs to have the motherboard bridge set
> > to a specific mode and Windows doesn’t how do I
> > fix this?
> >
> > So I am looking to collect tales of bad hardware, you do not
> > have to name firms, but the more details the better. Any takers
> > to help me collect the “Worst of Device Design” ?
> >
> > Don Burn (MVP, Windows DDK)
> > Windows 2k/XP/2k3 Filesystem and Driver Consulting
> >
> >
> >
>
>
>
>

I’m going to interpret this a little differently. I know what you asked
for, and I know that I’m not quite responding in kind. Instead, I’m going
to respond with a list of hardware behaviors which are really hard to deal
with under Windows. Some of these are problematic because they are simply
bad hardware design. Some are problematic because they either weren’t
designed with Windows in mind or the designer simply didn’t understand
Windows. I’ll leave it up to the list to argue about which ones land in
which category.

  1. The device’s PCI interrupt can’t be masked.

  2. The device’s PCI interrupt (INTx#) triggers whenever PME# is triggered.

  3. The device’s PME# triggers whenever INTx# is triggered.

  4. Moving a PCI device into D0 with PME_Status set will cause INTx# to be
    asserted.

  5. The PCI device decodes memory or I/O outside of what the PCI Base
    Address Registers describe.

  6. The add-in board consists of a bridge and several devices, which all
    need to controlled by a single driver, but the bridge is a standard PCI to
    PCI bridge.

  7. The device designers ran out of PCI configuration space, so they just
    added more PCI function headers to handle the spillover.

  8. Moving a PCI device from D3 to D0 causes all internal state to be reset,
    including the reason that PME# was asserted, meaning that the driver has no
    idea why the device signaled a wakeup.

  9. Moving a PCI device from D3 to D0 causes the PCI device to instantly
    start decoding memory or I/O at whatever value is written into the Base
    Address Registers (even if those values are zero.)

A) Moving a PCI device from D3 to D0 allows INTx# to be asserted.

B) A PCI to PCI bridge decodes memory or I/O subtractively, but in ways not
described in its Base Address Registers (and Programming Interface.)

C) A PCI to PCI bridge supports 64-bit cycles downstream but not upstream.

D) A PCI to PCI bridge supports 64-bit cycles for 64-bit cards, but doesn’t
support Dual Address Cycle signalling.

E) A plug-in PCI board is made out of a standard PCI to ISA bridge and an
old ISA device, leaving it ambigous which ISA bridge in the system (there is
always one to begin with) will actually subtractively decode the ISA memory
and I/O.

F) A plug-in PCI board is made out of a PCI to ISA bridge and somebody
wires the ISA interrupts directly to the PCI INTx# signals.

  1. A plug-in PCI board is made out of a PCI to ISA (or VME) bridge and
    somebody tries to wire interrupts to multiple INTx# signals in the same
    slot.

  2. A plug-in PCI board is made out of a legacy bridge and the board
    designer decides that the device will just decode the same fixed memory
    range it would have decoded as an ISA or VME device.

I suppose I could go on forever.

  • Jake


Jake Oshins
Windows Base Kernel Team

This posting is provided “AS IS” with no warranties, and confers no rights.
OR if you wish to include a script sample in your post please add “Use of
included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

“Don Burn” wrote in message news:xxxxx@ntdev…
>
> To take some of the hostility out of the list for a good cause,
> I am thinking of writing something on evil hardware. We all
> have seen the queries on the list looking for solutions since:
>
> 1. My hardware has to be touched every millisecond with
> an accuracy of plus or minus 100 nanoseconds
>
> 2. My PCI card needs to have the motherboard bridge set
> to a specific mode and Windows doesn’t how do I
> fix this?
>
> So I am looking to collect tales of bad hardware, you do not
> have to name firms, but the more details the better. Any takers
> to help me collect the “Worst of Device Design” ?
>
> Don Burn (MVP, Windows DDK)
> Windows 2k/XP/2k3 Filesystem and Driver Consulting
>
>
>

Gee Jake, it sounds like you’ve actually SEEN some of these things. A time
or two… :slight_smile:

Loren

Now I’m going to stay within the boundaries of modern technology:

Having a an operating system that will not allow me to completely control a
peripheral to do such things as diagnostics or unhindered block reads/writes
using an existing specification; a.go. ATA-5/6. From 2000 on it has been
impossible on an ATA disk to access it as other than a cluster. When you
make disk drives, that means you have to boot to another operating system to
see if a sector was really good or bad because 2000 and XP would not allow
you to read any given sector on the disk via what is called ATA pass
through. So set your cluster to 512 bytes, you say, but that is still a
cluster. I can’t go read head 3, track 12 sector 11, where cluster 19,000
was and reported an error because cluster 19,000 has moved to head 4, track
4, sector 6. Of course having an ability to execute the ATA-5 user-defined
command set would allow something like GHOST to image your OS with out
having to boot to DOS or a defrag program to really de-frag you HDD.

(PS – we have managed to convince PSS that ATA pass through is needed. It
should be released for XP in SP2 and SP5 for 2000. There is/will be a KB
article and hot fix available, but I am not sure if I can mention it now.)


Gary G. Little
Seagate Technologies, LLC

“Thomas F. Divine” wrote in message news:xxxxx@ntdev…
>
> One I had a problem with was reading a 64-bit time-of-day structure from a
> GPS board.
>
> Problem was that the beautiful 64-bit structure was memory mapped to four
> 16-bit locations that had to be read separately. There was no hardware
> synchronization, so between any of the four required reads the other three
> could change.
>
> Thomas
>
> “Don Burn” wrote in message news:xxxxx@ntdev…
> >
> > To take some of the hostility out of the list for a good cause,
> > I am thinking of writing something on evil hardware. We all
> > have seen the queries on the list looking for solutions since:
> >
> > 1. My hardware has to be touched every millisecond with
> > an accuracy of plus or minus 100 nanoseconds
> >
> > 2. My PCI card needs to have the motherboard bridge set
> > to a specific mode and Windows doesn’t how do I
> > fix this?
> >
> > So I am looking to collect tales of bad hardware, you do not
> > have to name firms, but the more details the better. Any takers
> > to help me collect the “Worst of Device Design” ?
> >
> > Don Burn (MVP, Windows DDK)
> > Windows 2k/XP/2k3 Filesystem and Driver Consulting
> >
> >
> >
>
>
>
>

I don’t know if this is exactly a “horror” story, but it’s certainly a pet
peeve of mine. Any hardware design or peripheral component that includes
“write-only” control registers, forcing me to keep a “shadow” copy of the
register contents in RAM to make it possible to change individual bits
(e.g., to enable and disable interrupt sources out of the device).

-Dan

I want to thank everybody who has been contributing and please keep it
coming. I am looking for do’s and don’ts such as Jakes as much as stories.
The goal is a list of good and bad things, along with stories of the
problems that can occur.

The purpose here is that I have recently spoken to a couple of manager who
complained bitterly about their driver developers, when I probed I discoved
that the hardware that was tossed over the wall to the driver guy was full
of the design problems we are talking about. I’m hoping to collect stories
in an effort to let managers know that the driver guys need to be able to
give input to the hardware, and that there are a lot of things that HW can
do to make life miserable to impossible for the driver developer.

Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting

Not to be flippant, but it used to be that OS’s were written for the
hardware. :slight_smile:

Alberto.

-----Original Message-----
From: Jake Oshins [mailto:xxxxx@windows.microsoft.com]
Sent: Sunday, September 07, 2003 3:19 PM
To: Windows System Software Developers Interest List
Subject: [ntdev] Re: Looking for Hardware Horror Stories

I’m going to interpret this a little differently. I know what you asked
for, and I know that I’m not quite responding in kind. Instead, I’m going
to respond with a list of hardware behaviors which are really hard to deal
with under Windows. Some of these are problematic because they are simply
bad hardware design. Some are problematic because they either weren’t
designed with Windows in mind or the designer simply didn’t understand
Windows. I’ll leave it up to the list to argue about which ones land in
which category.

  1. The device’s PCI interrupt can’t be masked.

  2. The device’s PCI interrupt (INTx#) triggers whenever PME# is triggered.

  3. The device’s PME# triggers whenever INTx# is triggered.

  4. Moving a PCI device into D0 with PME_Status set will cause INTx# to be
    asserted.

  5. The PCI device decodes memory or I/O outside of what the PCI Base
    Address Registers describe.

  6. The add-in board consists of a bridge and several devices, which all
    need to controlled by a single driver, but the bridge is a standard PCI to
    PCI bridge.

  7. The device designers ran out of PCI configuration space, so they just
    added more PCI function headers to handle the spillover.

  8. Moving a PCI device from D3 to D0 causes all internal state to be reset,
    including the reason that PME# was asserted, meaning that the driver has no
    idea why the device signaled a wakeup.

  9. Moving a PCI device from D3 to D0 causes the PCI device to instantly
    start decoding memory or I/O at whatever value is written into the Base
    Address Registers (even if those values are zero.)

A) Moving a PCI device from D3 to D0 allows INTx# to be asserted.

B) A PCI to PCI bridge decodes memory or I/O subtractively, but in ways not
described in its Base Address Registers (and Programming Interface.)

C) A PCI to PCI bridge supports 64-bit cycles downstream but not upstream.

D) A PCI to PCI bridge supports 64-bit cycles for 64-bit cards, but doesn’t
support Dual Address Cycle signalling.

E) A plug-in PCI board is made out of a standard PCI to ISA bridge and an
old ISA device, leaving it ambigous which ISA bridge in the system (there is
always one to begin with) will actually subtractively decode the ISA memory
and I/O.

F) A plug-in PCI board is made out of a PCI to ISA bridge and somebody
wires the ISA interrupts directly to the PCI INTx# signals.

  1. A plug-in PCI board is made out of a PCI to ISA (or VME) bridge and
    somebody tries to wire interrupts to multiple INTx# signals in the same
    slot.

  2. A plug-in PCI board is made out of a legacy bridge and the board
    designer decides that the device will just decode the same fixed memory
    range it would have decoded as an ISA or VME device.

I suppose I could go on forever.

  • Jake


Jake Oshins
Windows Base Kernel Team

This posting is provided “AS IS” with no warranties, and confers no rights.
OR if you wish to include a script sample in your post please add “Use of
included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

“Don Burn” wrote in message news:xxxxx@ntdev…
>
> To take some of the hostility out of the list for a good cause,
> I am thinking of writing something on evil hardware. We all
> have seen the queries on the list looking for solutions since:
>
> 1. My hardware has to be touched every millisecond with
> an accuracy of plus or minus 100 nanoseconds
>
> 2. My PCI card needs to have the motherboard bridge set
> to a specific mode and Windows doesn’t how do I
> fix this?
>
> So I am looking to collect tales of bad hardware, you do not
> have to name firms, but the more details the better. Any takers
> to help me collect the “Worst of Device Design” ?
>
> Don Burn (MVP, Windows DDK)
> Windows 2k/XP/2k3 Filesystem and Driver Consulting
>
>
>


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.

Actually, I’ve got to know a number of people in the early days,
i.e. IBM 360 or before, and they tell a heck of a lot of hardware
horror stories. In some case the bad hardware could be justified
by the limits of the technology, but there were enough where it
was just plain stupidity or lousy design.

Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting

----- Original Message -----
From: “Moreira, Alberto”
To: “Windows System Software Developers Interest List”
Sent: Monday, September 08, 2003 10:54 AM
Subject: [ntdev] Re: Looking for Hardware Horror Stories

> Not to be flippant, but it used to be that OS’s were written for the
> hardware. :slight_smile:
>
> Alberto.
>

Here, here. I recall one nightmare situation in particular where two
libraries were involved, and each – of course – kept its own shadow of
the registers. Several dozen registers, mind you. =^)

Chuck

----- Original Message -----
From: “Daniel E. Germann”
To: “Windows System Software Developers Interest List”

Sent: Monday, September 08, 2003 7:48 PM
Subject: [ntdev] Re: Looking for Hardware Horror Stories

> I don’t know if this is exactly a “horror” story, but it’s certainly a
pet
> peeve of mine. Any hardware design or peripheral component that
includes
> “write-only” control registers, forcing me to keep a “shadow” copy of
the
> register contents in RAM to make it possible to change individual bits
> (e.g., to enable and disable interrupt sources out of the device).
>
> -Dan

But there’s a difference between “bad” and “inconvenient”. For example, say
someone builds a video chip that’s oriented towards user-side rendering and
consequently it handles its own windowing and its own context switching. It
may be awfully inconvenient to someone trying to fit it within Windows, but
that isn’t because the chip is necessarily “bad”.

Or say for example that I want a keyboard that sends Unicode strings when
one presses the enter key, instead of sending it one character at a time. It
may be a heck of an inconvenience to interface it to Windows, but is that
“bad” hardware ? I don’t know.

And I’m not sure I agree that shadowing is necessarily a bad idea. When I
need performance, any I/O input is a pain because it will break any
potential output streaming or bursting I might enjoy - and heck, if you want
the current state of the chip and you are its owner, what’s the big deal in
keeping a copy of what you write, so that your reads don’t end up hampering
performance ? This is particularly true when a chip can handle multiple job
streams and you don’t want Peter’s read to impact the performance of
Alberto’s high speed writes. And wow, I’d pay to have an architecture where
every interrupt was message based so that we get away from this mess of
having to handle multiple shared interrupt lines.

Alberto.

-----Original Message-----
From: Don Burn [mailto:xxxxx@acm.org]
Sent: Monday, September 08, 2003 11:00 AM
To: Windows System Software Developers Interest List
Subject: [ntdev] Re: Looking for Hardware Horror Stories

Actually, I’ve got to know a number of people in the early days,
i.e. IBM 360 or before, and they tell a heck of a lot of hardware
horror stories. In some case the bad hardware could be justified
by the limits of the technology, but there were enough where it
was just plain stupidity or lousy design.

Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting

----- Original Message -----
From: “Moreira, Alberto”
To: “Windows System Software Developers Interest List”
Sent: Monday, September 08, 2003 10:54 AM
Subject: [ntdev] Re: Looking for Hardware Horror Stories

> Not to be flippant, but it used to be that OS’s were written for the
> hardware. :slight_smile:
>
> Alberto.
>


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.

“Don Burn” wrote in message news:xxxxx@ntdev…
> So I am looking to collect tales of bad hardware, you do not
> have to name firms, but the more details the better. Any takers
> to help me collect the “Worst of Device Design” ?
>
I can’t think of any examples right now. But a backup product I worked on
a few years ago at another company had a built-in table of “known bad” tape
drives, with special code to baby them along so they almost worked right.
The table grew larger with each new release.

Carl

I must say that one of the best software projects I worked on back in the
stone age of PDP-11s was one where us software guys were actually consulted
on the design of a new hardware board. Great collaboration all the way, and
it made programming the thing a dream.

Carl

“Don Burn” wrote in message news:xxxxx@ntdev…
>
> I want to thank everybody who has been contributing and please keep it
> coming. I am looking for do’s and don’ts such as Jakes as much as
stories.
> The goal is a list of good and bad things, along with stories of the
> problems that can occur.
>
> The purpose here is that I have recently spoken to a couple of manager who
> complained bitterly about their driver developers, when I probed I
discoved
> that the hardware that was tossed over the wall to the driver guy was full
> of the design problems we are talking about. I’m hoping to collect
stories
> in an effort to let managers know that the driver guys need to be able to
> give input to the hardware, and that there are a lot of things that HW can
> do to make life miserable to impossible for the driver developer.
>
> Don Burn (MVP, Windows DDK)
> Windows 2k/XP/2k3 Filesystem and Driver Consulting
>
>
>

“Taed Wynnell” wrote in message news:xxxxx@ntdev…
>
> – Interrupt registers where the interrupt isn’t cleared by writing a 1 to
> the bit position, but by writing the (same) value to the register. Thus,
if
> an interrupt is signalled between reading the status and writing it to
clear
> the interrupt(s), one will clear the new interrupt as well. I forget how
I
> worked around that issue.

Gad! I see this all the time. If there’s just ONE THING I could get
hardware designers to do for me as a driver writer, it would be to make
their status bits “write 1 to clear”. It’s SUCH a pleasure to use “write 1
to clear” designs, and it’s soooo annoying to deal with designs that don’t
do this.

Peter