Vista "forgets" instructions ...

Here’s one that Phil and I both are scratching our heads over …

We have code that looks something like this:

ULONG V1, V2, V3, V4;

V1 = 0;
V2 = V3 = 1;
V4 = 2;

Unchecked, V2 and V3 will never bet set. If you set a breakpoint on V4, V2
and V3 will not be set. If you set a bp on V1, then V2 and V3 will be set.

Now it gets even stranger: the line V2 = V3 = 1 compiles to 3 MOV’s in
assembly.
00BDEF57 mov dword ptr [ebp-254h],200h
00BDEF61 mov eax,dword ptr [ebp-254h]

00BDEF67 mov dword ptr [ebp-250h],eax

If you set a bp on _57, a bp after _67 and then let it run, V2 and V3 will
be set. Set a bp on line _61 and V2 equals V3, but not 1. SEt a bp on _67
and V2 will not be set but V3 will be set to 1.

It’s like Vista “forgets” or “looses” (for lack of a better description)
instructions unless something, in this case a bp, gives a chance to “find”
them. Has anyone got a clue as to this behaviour. The same code works fine
on XP XP2. The application was built using VS 2003 Enterpise, and though not
a driver, it is the app that exercises the drivers we have written. I’ve
even broken the one up and set V2 before V1 followed by setting V3, but the
same thing happens, but now it throws V1 into the mix.

Any have any ideas?


The personal opinion of
Gary G. Little

Can you provide a little more info on what happens next with V1, V2, V3, and V4? Are the values actually consumed by any future statements? The optimizing compiler will eliminate quite a lot of code if there are no perceivable side effects. It will happily eliminate assignment of simple integers, and even long sequences of increment operators, etc.

Your statement that this works on XP doesn’t jive with my suspicion about the optimizer, though. Can you provide any more info? A complete function listing, in C or asm?

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Gary G. Little
Sent: Tuesday, October 17, 2006 3:22 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Vista “forgets” instructions …

Here’s one that Phil and I both are scratching our heads over …

We have code that looks something like this:

ULONG V1, V2, V3, V4;

V1 = 0;
V2 = V3 = 1;
V4 = 2;

Unchecked, V2 and V3 will never bet set. If you set a breakpoint on V4, V2 and V3 will not be set. If you set a bp on V1, then V2 and V3 will be set.

Now it gets even stranger: the line V2 = V3 = 1 compiles to 3 MOV’s in assembly.
00BDEF57 mov dword ptr [ebp-254h],200h
00BDEF61 mov eax,dword ptr [ebp-254h]

00BDEF67 mov dword ptr [ebp-250h],eax

If you set a bp on _57, a bp after _67 and then let it run, V2 and V3 will be set. Set a bp on line _61 and V2 equals V3, but not 1. SEt a bp on _67 and V2 will not be set but V3 will be set to 1.

It’s like Vista “forgets” or “looses” (for lack of a better description) instructions unless something, in this case a bp, gives a chance to “find”
them. Has anyone got a clue as to this behaviour. The same code works fine on XP XP2. The application was built using VS 2003 Enterpise, and though not a driver, it is the app that exercises the drivers we have written. I’ve even broken the one up and set V2 before V1 followed by setting V3, but the same thing happens, but now it throws V1 into the mix.

Any have any ideas?


The personal opinion of
Gary G. Little


Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Gary G. Little wrote:

Here’s one that Phil and I both are scratching our heads over …

We have code that looks something like this:

ULONG V1, V2, V3, V4;

V1 = 0;
V2 = V3 = 1;
V4 = 2;

Unchecked, V2 and V3 will never bet set. If you set a breakpoint on V4, V2
and V3 will not be set. If you set a bp on V1, then V2 and V3 will be set.

When you say “will be set” and “will not be set”, what you mean is that
“the debugger shows they are/are not set”, right? Note that there is
HUGE difference between “a variable is not getting set” and “the
debugger is not showing me the proper value”. Which debugger are you
using? VS or WinDbg?

Now it gets even stranger: the line V2 = V3 = 1 compiles to 3 MOV’s in
assembly.
00BDEF57 mov dword ptr [ebp-254h],200h
00BDEF61 mov eax,dword ptr [ebp-254h]
00BDEF67 mov dword ptr [ebp-250h],eax

Based on that exact code, I don’t believe you. The compiler would not
store 200h into a ULONG when asked to store a “1”. This may be
leftovers from some other instruction. Was this with optimization
turned on?

As with Arlie, I’d like to see the code and a more complete assembly.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> ----------

From: xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com] on behalf of Tim Roberts[SMTP:xxxxx@probo.com]
Reply To: Windows System Software Devs Interest List
Sent: Wednesday, October 18, 2006 1:24 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Vista “forgets” instructions …

When you say “will be set” and “will not be set”, what you mean is that
“the debugger shows they are/are not set”, right? Note that there is
HUGE difference between “a variable is not getting set” and “the
debugger is not showing me the proper value”. Which debugger are you
using? VS or WinDbg?

I’d agree. Have you tried to print values using DbgPrint instead of using debugger, Gary? Unfortunately, it can change the result of optimization, too. What is the real problem? Do you have wrong values at the place when they’re really used?

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

Yes, they are used. It’s a Debug build, so the optimizer isn’t doing this.
It’s a classic Heisenbug, if you debug it, it stops misbehaving. The
assembly that Gary provided is exactly what’s showing up the debugger.
It’s *NOT GETTING EXECUTED* as it was emitted.

From the COD:

; 356 : tAtaIdentifyData ident = { 0};

0003f 66 c7 85 d8 fd

ff ff 00 00 mov WORD PTR _ident$[ebp], 0

00048 b9 7f 00 00 00 mov ecx, 127 ; 0000007fH

0004d 33 c0 xor eax, eax

0004f 8d bd da fd ff

ff lea edi, DWORD PTR _ident$[ebp+2]

00055 f3 ab rep stosd

00057 66 ab stosw

; 357 : tAtaStdParameters ataParms = { 0};

00059 c7 85 a8 fd ff

ff 00 00 00 00 mov DWORD PTR _ataParms$[ebp], 0

00063 33 c0 xor eax, eax

00065 89 85 ac fd ff

ff mov DWORD PTR _ataParms$[ebp+4], eax

0006b 89 85 b0 fd ff

ff mov DWORD PTR _ataParms$[ebp+8], eax

00071 89 85 b4 fd ff

ff mov DWORD PTR _ataParms$[ebp+12], eax

00077 89 85 b8 fd ff

ff mov DWORD PTR _ataParms$[ebp+16], eax

0007d 89 85 bc fd ff

ff mov DWORD PTR _ataParms$[ebp+20], eax

00083 89 85 c0 fd ff

ff mov DWORD PTR _ataParms$[ebp+24], eax

00089 89 85 c4 fd ff

ff mov DWORD PTR _ataParms$[ebp+28], eax

0008f 89 85 c8 fd ff

ff mov DWORD PTR _ataParms$[ebp+32], eax

00095 89 85 cc fd ff

ff mov DWORD PTR _ataParms$[ebp+36], eax

; 358 : ataParms.Data.pData= &ident;

0009b 8d 85 d8 fd ff

ff lea eax, DWORD PTR _ident$[ebp]

000a1 89 85 a8 fd ff

ff mov DWORD PTR _ataParms$[ebp], eax

; 359 : ataParms.Data.AllocLen = ataParms.Data.DataLen = sizeof(ident);

000a7 c7 85 ac fd ff

ff 00 02 00 00 mov DWORD PTR _ataParms$[ebp+4], 512 ;
00000200H

000b1 8b 85 ac fd ff

ff mov eax, DWORD PTR _ataParms$[ebp+4]

000b7 89 85 b0 fd ff

ff mov DWORD PTR _ataParms$[ebp+8], eax

>>>>>>>>>>>>>>>>>

If we breakpoint here on the call to AtaCmdIdentify(ataParms), the memory
at ebp+4 and ebp+8 are filled with garbage. If we breakpoint on the
pointer assignment before we assign the lengths, then run to the
aforementioned bp on the function call, the length values are correct. It
doesn’t help to move the length assignments before the pointer assignment,
or to split them. Is it possible that we’ve found a bug in the CPU
instruction decode that is timing sensitive? Seems unlikely, but how else
do you explain a Heisenbug like this?

>>>>>>>>>>>>>>>>>

; 360 :

; 361 : eAtaError idRes = AtaCmdIdentify(ataParms);

000bd 8b f4 mov esi, esp

000bf 8d 85 a8 fd ff

ff lea eax, DWORD PTR _ataParms$[ebp]

000c5 50 push eax

000c6 8b 4d ec mov ecx, DWORD PTR _this$[ebp]

000c9 8b 11 mov edx, DWORD PTR [ecx]

000cb 8b 4d ec mov ecx, DWORD PTR _this$[ebp]

000ce ff 92 ac 00 00

00 call DWORD PTR [edx+172]

000d4 3b f4 cmp esi, esp

000d6 e8 00 00 00 00 call __RTC_CheckEsp

000db 89 85 9c fd ff

ff mov DWORD PTR _idRes$[ebp], eax

As Gary mentioned, this only occurs on Vista, this has never shown up on
any other OS. Gary will have to give you the in-memory disassembly, I
haven’t gotten around to running Vista here. It’s just what I’ve already
provided, just with addresses fixed up (if there are any. I don’t think
there are any.)

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842


From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Arlie Davis


Sent: Tuesday, October 17, 2006 5:03 PM

To: “Windows System Software Devs Interest List”

Subject: RE: [ntdev] Vista “forgets” instructions …

Can you provide a little more info on what happens next with V1, V2, V3,
and V4? Are the values actually consumed by any future statements? The
optimizing compiler will eliminate quite a lot of code if there are no
perceivable side effects. It will happily eliminate assignment of simple
integers, and even long sequences of increment operators, etc.

Your statement that this works on XP doesn’t jive with my suspicion about
the optimizer, though. Can you provide any more info? A complete function
listing, in C or asm?

-----Original Message-----

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Gary G. Little

Sent: Tuesday, October 17, 2006 3:22 PM

To: Windows System Software Devs Interest List

Subject: [ntdev] Vista “forgets” instructions …

Here’s one that Phil and I both are scratching our heads over …

We have code that looks something like this:

ULONG V1, V2, V3, V4;

V1 = 0;

V2 = V3 = 1;

V4 = 2;

Unchecked, V2 and V3 will never bet set. If you set a breakpoint on V4, V2
and V3 will not be set. If you set a bp on V1, then V2 and V3 will be set.

Now it gets even stranger: the line V2 = V3 = 1 compiles to 3 MOV’s in
assembly.

00BDEF57 mov dword ptr [ebp-254h],200h

00BDEF61 mov eax,dword ptr [ebp-254h]

00BDEF67 mov dword ptr [ebp-250h],eax

[snip]

Gary was being a bit *too* vague, I think. That example was not what
generated the assembly. I’ve posted the relevant assembly in another
message. The annotation in the middle should address all the questions you
asked here.

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842


From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tim Roberts


Sent: Tuesday, October 17, 2006 5:24 PM

To: “Windows System Software Devs Interest List”

Subject: Re: [ntdev] Vista “forgets” instructions …

Gary G. Little wrote:

>Here’s one that Phil and I both are scratching our heads over …

>

>We have code that looks something like this:

>

>ULONG V1, V2, V3, V4;

>

>V1 = 0;

>V2 = V3 = 1;

>V4 = 2;

>

>Unchecked, V2 and V3 will never bet set. If you set a breakpoint on V4, V2

>and V3 will not be set. If you set a bp on V1, then V2 and V3 will be set.

>

>

When you say “will be set” and “will not be set”, what you mean is that

“the debugger shows they are/are not set”, right? Note that there is

HUGE difference between “a variable is not getting set” and “the

debugger is not showing me the proper value”. Which debugger are you

using? VS or WinDbg?

>Now it gets even stranger: the line V2 = V3 = 1 compiles to 3 MOV’s in

>assembly.

>00BDEF57 mov dword ptr [ebp-254h],200h

>00BDEF61 mov eax,dword ptr [ebp-254h]

>00BDEF67 mov dword ptr [ebp-250h],eax

>

>

Based on that exact code, I don’t believe you. The compiler would not

store 200h into a ULONG when asked to store a “1”. This may be

leftovers from some other instruction. Was this with optimization

turned on?

> ----------

From: xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com] on behalf of xxxxx@seagate.com[SMTP:xxxxx@seagate.com]
Reply To: Windows System Software Devs Interest List
Sent: Wednesday, October 18, 2006 1:39 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Vista “forgets” instructions …

If we breakpoint here on the call to AtaCmdIdentify(ataParms), the memory
at ebp+4 and ebp+8 are filled with garbage. If we breakpoint on the
pointer assignment before we assign the lengths, then run to the
aforementioned bp on the function call, the length values are correct. It
doesn’t help to move the length assignments before the pointer assignment,
or to split them. Is it possible that we’ve found a bug in the CPU
instruction decode that is timing sensitive? Seems unlikely, but how else
do you explain a Heisenbug like this?

Bugs like this are usually caused by rewritten stack. Have you checked if ebp points to the correct place at the stack? It can mistakenly (broken by some previous code) point to the unused stack and stop at breakpoint rewrites locals pointed by this (debugger uses unused stack). Next possibility is something else (running at the other CPU for example) uses wrong pointer which points to these variables. I’d use memory breakpoints to check it.

BTW, I see a lot of problems at Vista which weren’t at XP :-/

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

xxxxx@seagate.com wrote:

; 359 : ataParms.Data.AllocLen = ataParms.Data.DataLen = sizeof(ident);

000a7 c7 85 ac fd ff
ff 00 02 00 00 mov DWORD PTR _ataParms$[ebp+4], 512 ;
00000200H

000b1 8b 85 ac fd ff
ff mov eax, DWORD PTR _ataParms$[ebp+4]

000b7 89 85 b0 fd ff
ff mov DWORD PTR _ataParms$[ebp+8], eax

If we breakpoint here on the call to AtaCmdIdentify(ataParms), the memory
at ebp+4 and ebp+8 are filled with garbage.

This code doesn’t touch ebp+4 and ebp+8. It stores into ataParms+4 and
ataParms+8, which is ebp+0xfffffdac and ebp+0xfffffdb0, or ebp-0x254 and
ebp-0x250. Is that what you are meant?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Are there any other threads running in your process (intentionally or
otherwise :-)? Could they be trashing this thread’s stack? Vista
scheduling might *very* well be different from XP’s… And breakpoints
would change all of that in any case…

Heisenbugs suck.

xxxxx@seagate.com wrote:

Yes, they are used. It’s a Debug build, so the optimizer isn’t doing this.
It’s a classic Heisenbug, if you debug it, it stops misbehaving. The
assembly that Gary provided is exactly what’s showing up the debugger.
It’s *NOT GETTING EXECUTED* as it was emitted.

From the COD:

; 356 : tAtaIdentifyData ident = { 0};

0003f 66 c7 85 d8 fd

ff ff 00 00 mov WORD PTR _ident$[ebp], 0

00048 b9 7f 00 00 00 mov ecx, 127 ; 0000007fH

0004d 33 c0 xor eax, eax

0004f 8d bd da fd ff

ff lea edi, DWORD PTR _ident$[ebp+2]

00055 f3 ab rep stosd

00057 66 ab stosw

; 357 : tAtaStdParameters ataParms = { 0};

00059 c7 85 a8 fd ff

ff 00 00 00 00 mov DWORD PTR _ataParms$[ebp], 0

00063 33 c0 xor eax, eax

00065 89 85 ac fd ff

ff mov DWORD PTR _ataParms$[ebp+4], eax

0006b 89 85 b0 fd ff

ff mov DWORD PTR _ataParms$[ebp+8], eax

00071 89 85 b4 fd ff

ff mov DWORD PTR _ataParms$[ebp+12], eax

00077 89 85 b8 fd ff

ff mov DWORD PTR _ataParms$[ebp+16], eax

0007d 89 85 bc fd ff

ff mov DWORD PTR _ataParms$[ebp+20], eax

00083 89 85 c0 fd ff

ff mov DWORD PTR _ataParms$[ebp+24], eax

00089 89 85 c4 fd ff

ff mov DWORD PTR _ataParms$[ebp+28], eax

0008f 89 85 c8 fd ff

ff mov DWORD PTR _ataParms$[ebp+32], eax

00095 89 85 cc fd ff

ff mov DWORD PTR _ataParms$[ebp+36], eax

; 358 : ataParms.Data.pData= &ident;

0009b 8d 85 d8 fd ff

ff lea eax, DWORD PTR _ident$[ebp]

000a1 89 85 a8 fd ff

ff mov DWORD PTR _ataParms$[ebp], eax

; 359 : ataParms.Data.AllocLen = ataParms.Data.DataLen = sizeof(ident);

000a7 c7 85 ac fd ff

ff 00 02 00 00 mov DWORD PTR _ataParms$[ebp+4], 512 ;
00000200H

000b1 8b 85 ac fd ff

ff mov eax, DWORD PTR _ataParms$[ebp+4]

000b7 89 85 b0 fd ff

ff mov DWORD PTR _ataParms$[ebp+8], eax

If we breakpoint here on the call to AtaCmdIdentify(ataParms), the memory
at ebp+4 and ebp+8 are filled with garbage. If we breakpoint on the
pointer assignment before we assign the lengths, then run to the
aforementioned bp on the function call, the length values are correct. It
doesn’t help to move the length assignments before the pointer assignment,
or to split them. Is it possible that we’ve found a bug in the CPU
instruction decode that is timing sensitive? Seems unlikely, but how else
do you explain a Heisenbug like this?

; 360 :

; 361 : eAtaError idRes = AtaCmdIdentify(ataParms);

000bd 8b f4 mov esi, esp

000bf 8d 85 a8 fd ff

ff lea eax, DWORD PTR _ataParms$[ebp]

000c5 50 push eax

000c6 8b 4d ec mov ecx, DWORD PTR _this$[ebp]

000c9 8b 11 mov edx, DWORD PTR [ecx]

000cb 8b 4d ec mov ecx, DWORD PTR _this$[ebp]

000ce ff 92 ac 00 00

00 call DWORD PTR [edx+172]

000d4 3b f4 cmp esi, esp

000d6 e8 00 00 00 00 call __RTC_CheckEsp

000db 89 85 9c fd ff

ff mov DWORD PTR _idRes$[ebp], eax

As Gary mentioned, this only occurs on Vista, this has never shown up on
any other OS. Gary will have to give you the in-memory disassembly, I
haven’t gotten around to running Vista here. It’s just what I’ve already
provided, just with addresses fixed up (if there are any. I don’t think
there are any.)

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842


From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Arlie Davis

>
> Sent: Tuesday, October 17, 2006 5:03 PM
>
> To: “Windows System Software Devs Interest List”
>
> Subject: RE: [ntdev] Vista “forgets” instructions …
>
>
>
>
>
> Can you provide a little more info on what happens next with V1, V2, V3,
> and V4? Are the values actually consumed by any future statements? The
> optimizing compiler will eliminate quite a lot of code if there are no
> perceivable side effects. It will happily eliminate assignment of simple
> integers, and even long sequences of increment operators, etc.
>
>
>
> Your statement that this works on XP doesn’t jive with my suspicion about
> the optimizer, though. Can you provide any more info? A complete function
> listing, in C or asm?
>
>
>
>
>
> -----Original Message-----
>
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Gary G. Little
>
> Sent: Tuesday, October 17, 2006 3:22 PM
>
> To: Windows System Software Devs Interest List
>
> Subject: [ntdev] Vista “forgets” instructions …
>
>
>
> Here’s one that Phil and I both are scratching our heads over …
>
>
>
> We have code that looks something like this:
>
>
>
> ULONG V1, V2, V3, V4;
>
>
>
> V1 = 0;
>
> V2 = V3 = 1;
>
> V4 = 2;
>
>
>
> Unchecked, V2 and V3 will never bet set. If you set a breakpoint on V4, V2
> and V3 will not be set. If you set a bp on V1, then V2 and V3 will be set.
>
>
>
> Now it gets even stranger: the line V2 = V3 = 1 compiles to 3 MOV’s in
> assembly.
>
> 00BDEF57 mov dword ptr [ebp-254h],200h
>
> 00BDEF61 mov eax,dword ptr [ebp-254h]
>
>
>
> 00BDEF67 mov dword ptr [ebp-250h],eax
>
>
>
> [snip]
>
>


Ray
(If you want to reply to me off list, please remove “spamblock.” from my
email address)

Is this code paged or non-paged? Since this bug disappears when being
debugged, it’s possible that the memory manager in Vista is doing something
different with paged code. Using the debugger observes the memory causing
the pages to get faulted in. I know the memory manger in Vista has been
changed but I don’t know the details, so this may not be relevant, but I
thought I’d throw it out there as a possibility.

Beverly

----- Original Message -----
From:
To: “Windows System Software Devs Interest List”
Sent: Tuesday, October 17, 2006 7:39 PM
Subject: RE: [ntdev] Vista “forgets” instructions …

>
> Yes, they are used. It’s a Debug build, so the optimizer isn’t doing
> this.
> It’s a classic Heisenbug, if you debug it, it stops misbehaving. The
> assembly that Gary provided is exactly what’s showing up the debugger.
> It’s NOT GETTING EXECUTED as it was emitted.
>
>
>
> From the COD:
>
>
>
> ; 356 : tAtaIdentifyData ident = { 0};
>
>
>
> 0003f 66 c7 85 d8 fd
>
> ff ff 00 00 mov WORD PTR _ident$[ebp], 0
>
> 00048 b9 7f 00 00 00 mov ecx, 127 ; 0000007fH
>
> 0004d 33 c0 xor eax, eax
>
> 0004f 8d bd da fd ff
>
> ff lea edi, DWORD PTR _ident$[ebp+2]
>
> 00055 f3 ab rep stosd
>
> 00057 66 ab stosw
>
>
>
> ; 357 : tAtaStdParameters ataParms = { 0};
>
>
>
> 00059 c7 85 a8 fd ff
>
> ff 00 00 00 00 mov DWORD PTR _ataParms$[ebp], 0
>
> 00063 33 c0 xor eax, eax
>
> 00065 89 85 ac fd ff
>
> ff mov DWORD PTR _ataParms$[ebp+4], eax
>
> 0006b 89 85 b0 fd ff
>
> ff mov DWORD PTR _ataParms$[ebp+8], eax
>
> 00071 89 85 b4 fd ff
>
> ff mov DWORD PTR _ataParms$[ebp+12], eax
>
> 00077 89 85 b8 fd ff
>
> ff mov DWORD PTR _ataParms$[ebp+16], eax
>
> 0007d 89 85 bc fd ff
>
> ff mov DWORD PTR _ataParms$[ebp+20], eax
>
> 00083 89 85 c0 fd ff
>
> ff mov DWORD PTR _ataParms$[ebp+24], eax
>
> 00089 89 85 c4 fd ff
>
> ff mov DWORD PTR _ataParms$[ebp+28], eax
>
> 0008f 89 85 c8 fd ff
>
> ff mov DWORD PTR _ataParms$[ebp+32], eax
>
> 00095 89 85 cc fd ff
>
> ff mov DWORD PTR _ataParms$[ebp+36], eax
>
>
>
> ; 358 : ataParms.Data.pData= &ident;
>
>
>
> 0009b 8d 85 d8 fd ff
>
> ff lea eax, DWORD PTR _ident$[ebp]
>
> 000a1 89 85 a8 fd ff
>
> ff mov DWORD PTR _ataParms$[ebp], eax
>
>
>
> ; 359 : ataParms.Data.AllocLen = ataParms.Data.DataLen =
> sizeof(ident);
>
>
>
> 000a7 c7 85 ac fd ff
>
> ff 00 02 00 00 mov DWORD PTR _ataParms$[ebp+4], 512 ;
> 00000200H
>
> 000b1 8b 85 ac fd ff
>
> ff mov eax, DWORD PTR _ataParms$[ebp+4]
>
> 000b7 89 85 b0 fd ff
>
> ff mov DWORD PTR _ataParms$[ebp+8], eax
>
>
>
>>>>>>>>>>>>>>>>>>>
>
> If we breakpoint here on the call to AtaCmdIdentify(ataParms), the memory
> at ebp+4 and ebp+8 are filled with garbage. If we breakpoint on the
> pointer assignment before we assign the lengths, then run to the
> aforementioned bp on the function call, the length values are correct. It
> doesn’t help to move the length assignments before the pointer assignment,
> or to split them. Is it possible that we’ve found a bug in the CPU
> instruction decode that is timing sensitive? Seems unlikely, but how else
> do you explain a Heisenbug like this?
>
>>>>>>>>>>>>>>>>>>>
>
>
>
> ; 360 :
>
> ; 361 : eAtaError idRes = AtaCmdIdentify(ataParms);
>
>
>
> 000bd 8b f4 mov esi, esp
>
> 000bf 8d 85 a8 fd ff
>
> ff lea eax, DWORD PTR _ataParms$[ebp]
>
> 000c5 50 push eax
>
> 000c6 8b 4d ec mov ecx, DWORD PTR _this$[ebp]
>
> 000c9 8b 11 mov edx, DWORD PTR [ecx]
>
> 000cb 8b 4d ec mov ecx, DWORD PTR _this$[ebp]
>
> 000ce ff 92 ac 00 00
>
> 00 call DWORD PTR [edx+172]
>
> 000d4 3b f4 cmp esi, esp
>
> 000d6 e8 00 00 00 00 call RTC_CheckEsp
>
> 000db 89 85 9c fd ff
>
> ff mov DWORD PTR _idRes$[ebp], eax
>
>
>
> As Gary mentioned, this only occurs on Vista, this has never shown up on
> any other OS. Gary will have to give you the in-memory disassembly, I
> haven’t gotten around to running Vista here. It’s just what I’ve already
> provided, just with addresses fixed up (if there are any. I don’t think
> there are any.)
>
>
>
> Phil
>
>
>
> Philip D. Barila
>
> Seagate Technology LLC
>
> (720) 684-1842
>
>
>
>
>
>
>
>
______________________________
>
>
>
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Arlie Davis
>
>
> Sent: Tuesday, October 17, 2006 5:03 PM
>
> To: “Windows System Software Devs Interest List”
>
> Subject: RE: [ntdev] Vista “forgets” instructions …
>
>
>
>
>
> Can you provide a little more info on what happens next with V1, V2, V3,
> and V4? Are the values actually consumed by any future statements? The
> optimizing compiler will eliminate quite a lot of code if there are no
> perceivable side effects. It will happily eliminate assignment of simple
> integers, and even long sequences of increment operators, etc.
>
>
>
> Your statement that this works on XP doesn’t jive with my suspicion about
> the optimizer, though. Can you provide any more info? A complete function
> listing, in C or asm?
>
>
>
>
>
> -----Original Message-----
>
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Gary G. Little
>
> Sent: Tuesday, October 17, 2006 3:22 PM
>
> To: Windows System Software Devs Interest List
>
> Subject: [ntdev] Vista “forgets” instructions …
>
>
>
> Here’s one that Phil and I both are scratching our heads over …
>
>
>
> We have code that looks something like this:
>
>
>
> ULONG V1, V2, V3, V4;
>
>
>
> V1 = 0;
>
> V2 = V3 = 1;
>
> V4 = 2;
>
>
>
> Unchecked, V2 and V3 will never bet set. If you set a breakpoint on V4, V2
> and V3 will not be set. If you set a bp on V1, then V2 and V3 will be set.
>
>
>
> Now it gets even stranger: the line V2 = V3 = 1 compiles to 3 MOV’s in
> assembly.
>
> 00BDEF57 mov dword ptr [ebp-254h],200h
>
> 00BDEF61 mov eax,dword ptr [ebp-254h]
>
>
>
> 00BDEF67 mov dword ptr [ebp-250h],eax
>
>
>
> [snip]
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

I encountered this problem while testing our existing disk diagnostic tool
kit on Vista. The entire package has been developed for XP SP2 and Server
2003 and uses ATA/SCSI pass through to bludgeon a drive with most, if not
all, of the commands comming out of the T10 and T13 committess, as well as a
few we defined for our own use. One part of the project has been to write
our own HBA drivers to eleminate the file system during some phases of
testing. We make disk drives, we really do want them to work for you when
you get them, and we want to figure out why they don’t when you send them
back to us.

My apologies for the vagueness of the example and then the sudden shock of
real assembly with actual values instead of the 1, 2 , and 3 literals I used
in the example. There’s nothing propriatary in it, just how we are setting
up the TFR registers in preparation for an IDENTIFY DEVICE. The two
variables that are being set to 512 are lengths for a buffer that will
eventually contain the IDENTIFY DEVICE information. Much like a
UNICODE_STRING, the structure that defines “ident” contains a pointer to a
buffer, the real length of the data and and the actual size of the
allocation.

ataParms.Data.pData= &ident;
ataParms.Data.AllocLen = ataParms.Data.DataLen = sizeof(ident);

The code above should initialize the structure with allocated and actual
lengths being the same. It does under XP, at least I did not run it through
enough iterations to see it NOT set the values correctly. Under Vista
however, I see the anomalous behaviour. Before the actual send to the drive
takes place, the lengths are tested for validity, and though they should
both be equal to 0x200, the validation routine fails because DataLen is
0x5bb2ec22 and AllocLen is 0x00020000. In essence the contents of Datalen
and AllocLen do not change to 0x200, unless I set a breakpoint. I’ve looked
at the values in downstream function calls and they remain the same, causing
errors when validated, unless I use the breakpoint to “force” the
initializztion code to run.

Here is the actual code fragment …

tAtaIdentifyData ident = { 0};
tAtaStdParameters ataParms = { 0};
ataParms.Data.pData= &ident;
ataParms.Data.AllocLen = ataParms.Data.DataLen = sizeof(ident);
eAtaError idRes = AtaCmdIdentify(ataParms);

switch(idRes)

Thanks for all the input …

Gary G. Little

To my knowledge there are no threads running at this time, though Phil can
answer that better than I, and this is not kernel code, so paged, non-paged
issues are not valid. It concievably could be swapped out and in between
settings of variables.

Gary G. Little

“Beverly Brown” wrote in message
news:xxxxx@ntdev…
> Is this code paged or non-paged? Since this bug disappears when being
> debugged, it’s possible that the memory manager in Vista is doing
> something different with paged code. Using the debugger observes the
> memory causing the pages to get faulted in. I know the memory manger in
> Vista has been changed but I don’t know the details, so this may not be
> relevant, but I thought I’d throw it out there as a possibility.
>
> Beverly
>
> ----- Original Message -----
> From:
> To: “Windows System Software Devs Interest List”
> Sent: Tuesday, October 17, 2006 7:39 PM
> Subject: RE: [ntdev] Vista “forgets” instructions …
>
>
>>
>> Yes, they are used. It’s a Debug build, so the optimizer isn’t doing
>> this.
>> It’s a classic Heisenbug, if you debug it, it stops misbehaving. The
>> assembly that Gary provided is exactly what’s showing up the debugger.
>> It’s NOT GETTING EXECUTED as it was emitted.
>>
>>
>>
>> From the COD:
>>
>>
>>
>> ; 356 : tAtaIdentifyData ident = { 0};
>>
>>
>>
>> 0003f 66 c7 85 d8 fd
>>
>> ff ff 00 00 mov WORD PTR _ident$[ebp], 0
>>
>> 00048 b9 7f 00 00 00 mov ecx, 127 ; 0000007fH
>>
>> 0004d 33 c0 xor eax, eax
>>
>> 0004f 8d bd da fd ff
>>
>> ff lea edi, DWORD PTR _ident$[ebp+2]
>>
>> 00055 f3 ab rep stosd
>>
>> 00057 66 ab stosw
>>
>>
>>
>> ; 357 : tAtaStdParameters ataParms = { 0};
>>
>>
>>
>> 00059 c7 85 a8 fd ff
>>
>> ff 00 00 00 00 mov DWORD PTR _ataParms$[ebp], 0
>>
>> 00063 33 c0 xor eax, eax
>>
>> 00065 89 85 ac fd ff
>>
>> ff mov DWORD PTR _ataParms$[ebp+4], eax
>>
>> 0006b 89 85 b0 fd ff
>>
>> ff mov DWORD PTR _ataParms$[ebp+8], eax
>>
>> 00071 89 85 b4 fd ff
>>
>> ff mov DWORD PTR _ataParms$[ebp+12], eax
>>
>> 00077 89 85 b8 fd ff
>>
>> ff mov DWORD PTR _ataParms$[ebp+16], eax
>>
>> 0007d 89 85 bc fd ff
>>
>> ff mov DWORD PTR _ataParms$[ebp+20], eax
>>
>> 00083 89 85 c0 fd ff
>>
>> ff mov DWORD PTR _ataParms$[ebp+24], eax
>>
>> 00089 89 85 c4 fd ff
>>
>> ff mov DWORD PTR _ataParms$[ebp+28], eax
>>
>> 0008f 89 85 c8 fd ff
>>
>> ff mov DWORD PTR _ataParms$[ebp+32], eax
>>
>> 00095 89 85 cc fd ff
>>
>> ff mov DWORD PTR _ataParms$[ebp+36], eax
>>
>>
>>
>> ; 358 : ataParms.Data.pData= &ident;
>>
>>
>>
>> 0009b 8d 85 d8 fd ff
>>
>> ff lea eax, DWORD PTR _ident$[ebp]
>>
>> 000a1 89 85 a8 fd ff
>>
>> ff mov DWORD PTR _ataParms$[ebp], eax
>>
>>
>>
>> ; 359 : ataParms.Data.AllocLen = ataParms.Data.DataLen =
>> sizeof(ident);
>>
>>
>>
>> 000a7 c7 85 ac fd ff
>>
>> ff 00 02 00 00 mov DWORD PTR _ataParms$[ebp+4], 512 ;
>> 00000200H
>>
>> 000b1 8b 85 ac fd ff
>>
>> ff mov eax, DWORD PTR _ataParms$[ebp+4]
>>
>> 000b7 89 85 b0 fd ff
>>
>> ff mov DWORD PTR _ataParms$[ebp+8], eax
>>
>>
>>
>>>>>>>>>>>>>>>>>>>>
>>
>> If we breakpoint here on the call to AtaCmdIdentify(ataParms), the memory
>> at ebp+4 and ebp+8 are filled with garbage. If we breakpoint on the
>> pointer assignment before we assign the lengths, then run to the
>> aforementioned bp on the function call, the length values are correct.
>> It
>> doesn’t help to move the length assignments before the pointer
>> assignment,
>> or to split them. Is it possible that we’ve found a bug in the CPU
>> instruction decode that is timing sensitive? Seems unlikely, but how
>> else
>> do you explain a Heisenbug like this?
>>
>>>>>>>>>>>>>>>>>>>>
>>
>>
>>
>> ; 360 :
>>
>> ; 361 : eAtaError idRes = AtaCmdIdentify(ataParms);
>>
>>
>>
>> 000bd 8b f4 mov esi, esp
>>
>> 000bf 8d 85 a8 fd ff
>>
>> ff lea eax, DWORD PTR _ataParms$[ebp]
>>
>> 000c5 50 push eax
>>
>> 000c6 8b 4d ec mov ecx, DWORD PTR _this$[ebp]
>>
>> 000c9 8b 11 mov edx, DWORD PTR [ecx]
>>
>> 000cb 8b 4d ec mov ecx, DWORD PTR _this$[ebp]
>>
>> 000ce ff 92 ac 00 00
>>
>> 00 call DWORD PTR [edx+172]
>>
>> 000d4 3b f4 cmp esi, esp
>>
>> 000d6 e8 00 00 00 00 call RTC_CheckEsp
>>
>> 000db 89 85 9c fd ff
>>
>> ff mov DWORD PTR _idRes$[ebp], eax
>>
>>
>>
>> As Gary mentioned, this only occurs on Vista, this has never shown up on
>> any other OS. Gary will have to give you the in-memory disassembly, I
>> haven’t gotten around to running Vista here. It’s just what I’ve already
>> provided, just with addresses fixed up (if there are any. I don’t think
>> there are any.)
>>
>>
>>
>> Phil
>>
>>
>>
>> Philip D. Barila
>>
>> Seagate Technology LLC
>>
>> (720) 684-1842
>>
>>
>>
>>
>>
>>
>>
>>
______________________________
>>
>>
>>
>> From: xxxxx@lists.osr.com
>> [mailto:xxxxx@lists.osr.com] On Behalf Of Arlie Davis
>>
>>
>> Sent: Tuesday, October 17, 2006 5:03 PM
>>
>> To: “Windows System Software Devs Interest List”
>>
>> Subject: RE: [ntdev] Vista “forgets” instructions …
>>
>>
>>
>>
>>
>> Can you provide a little more info on what happens next with V1, V2, V3,
>> and V4? Are the values actually consumed by any future statements? The
>> optimizing compiler will eliminate quite a lot of code if there are no
>> perceivable side effects. It will happily eliminate assignment of simple
>> integers, and even long sequences of increment operators, etc.
>>
>>
>>
>> Your statement that this works on XP doesn’t jive with my suspicion about
>> the optimizer, though. Can you provide any more info? A complete function
>> listing, in C or asm?
>>
>>
>>
>>
>>
>> -----Original Message-----
>>
>> From: xxxxx@lists.osr.com
>> [mailto:xxxxx@lists.osr.com] On Behalf Of Gary G. Little
>>
>> Sent: Tuesday, October 17, 2006 3:22 PM
>>
>> To: Windows System Software Devs Interest List
>>
>> Subject: [ntdev] Vista “forgets” instructions …
>>
>>
>>
>> Here’s one that Phil and I both are scratching our heads over …
>>
>>
>>
>> We have code that looks something like this:
>>
>>
>>
>> ULONG V1, V2, V3, V4;
>>
>>
>>
>> V1 = 0;
>>
>> V2 = V3 = 1;
>>
>> V4 = 2;
>>
>>
>>
>> Unchecked, V2 and V3 will never bet set. If you set a breakpoint on V4,
>> V2
>> and V3 will not be set. If you set a bp on V1, then V2 and V3 will be
>> set.
>>
>>
>>
>> Now it gets even stranger: the line V2 = V3 = 1 compiles to 3 MOV’s in
>> assembly.
>>
>> 00BDEF57 mov dword ptr [ebp-254h],200h
>>
>> 00BDEF61 mov eax,dword ptr [ebp-254h]
>>
>>
>>
>> 00BDEF67 mov dword ptr [ebp-250h],eax
>>
>>
>>
>> [snip]
>>
>>
>> —
>> Questions? First check the Kernel Driver FAQ at
>> http://www.osronline.com/article.cfm?id=256
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>
>

It’s a User Mode App. We’re using the Visual Studio debugger. It’s
definitely paged code. But I think the probability of the memory manager
getting this particular code segment wrong all the time in exactly the
right way is extremely small. Not to mention that *executing* a page will
cause it to be made resident.

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842


From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of “Beverly Brown”


Sent: Tuesday, October 17, 2006 9:18 PM

To: “Windows System Software Devs Interest List”

Subject: Re: [ntdev] Vista “forgets” instructions …

Is this code paged or non-paged? Since this bug disappears when being

debugged, it’s possible that the memory manager in Vista is doing something

different with paged code. Using the debugger observes the memory causing

the pages to get faulted in. I know the memory manger in Vista has been

changed but I don’t know the details, so this may not be relevant, but I

thought I’d throw it out there as a possibility.

Beverly

Tim, you are certainly right, as usual. The base problem remains, though,
that if we don’t BP before the assignment, the zero initializer executes,
but the memory at the addresses referenced by EBP-0x250 and EBP-0x254 are
consistently boned, in consistently the same manner.

We did a brief test with a previous version of the app on a different
machine without the issue, so we’re beginning to suspect the hardware.

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842


From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tim Roberts


Sent: Tuesday, October 17, 2006 6:14 PM

To: “Windows System Software Devs Interest List”

Subject: Re: [ntdev] Vista “forgets” instructions …

xxxxx@seagate.com wrote:

>

>; 359 : ataParms.Data.AllocLen = ataParms.Data.DataLen = sizeof(ident);

>

>

>

> 000a7 c7 85 ac fd ff

> ff 00 02 00 00 mov DWORD PTR _ataParms$[ebp+4], 512 ;

>00000200H

>

> 000b1 8b 85 ac fd ff

> ff mov eax, DWORD PTR _ataParms$[ebp+4]

>

> 000b7 89 85 b0 fd ff

> ff mov DWORD PTR _ataParms$[ebp+8], eax

>

>

>If we breakpoint here on the call to AtaCmdIdentify(ataParms), the memory

>at ebp+4 and ebp+8 are filled with garbage.

>

This code doesn’t touch ebp+4 and ebp+8. It stores into ataParms+4 and

ataParms+8, which is ebp+0xfffffdac and ebp+0xfffffdb0, or ebp-0x254 and

ebp-0x250. Is that what you are meant?



Tim Roberts, xxxxx@probo.com

Providenza & Boekelheide, Inc.



Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

There are indeed multiple threads, but we’ve never observed any
inter-thread stack corruption before. Stray pointers are stray pointers,
but you’d think something that repeatable might have shown up a bit
earlier.

As mentioned in another message, another Vista system with a different app
rev doesn’t exhibit the same behavior, so we’re looking into the
possibility of a hardware issue.

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842


From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ray Trent


Sent: Tuesday, October 17, 2006 6:20 PM

To: “Windows System Software Devs Interest List”

Subject: Re:[ntdev] Vista “forgets” instructions …

Are there any other threads running in your process (intentionally or

otherwise :-)? Could they be trashing this thread’s stack? Vista

scheduling might very well be different from XP’s… And breakpoints

would change all of that in any case…

Heisenbugs suck.

Michal,

It’s very a remote possibility of a stray pointer, but it’s extremely
repeatable without the debugger stopping before the assignments, so I’m
guessing we might have seen that before. Using a memory BP is a pretty
good idea, though.

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842


From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of “Michal Vodicka”


Sent: Tuesday, October 17, 2006 6:06 PM

To: “Windows System Software Devs Interest List”

Subject: RE: [ntdev] Vista “forgets” instructions …

> ----------

> From:
xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com]
on behalf of xxxxx@seagate.com[SMTP:xxxxx@seagate.com]

> Reply To: Windows System Software Devs Interest List

> Sent: Wednesday, October 18, 2006 1:39 AM

> To: Windows System Software Devs Interest List

> Subject: RE: [ntdev] Vista “forgets” instructions …

>

>

> If we breakpoint here on the call to AtaCmdIdentify(ataParms), the memory

> at ebp+4 and ebp+8 are filled with garbage. If we breakpoint on the

> pointer assignment before we assign the lengths, then run to the

> aforementioned bp on the function call, the length values are correct. It

> doesn’t help to move the length assignments before the pointer
assignment,

> or to split them. Is it possible that we’ve found a bug in the CPU

> instruction decode that is timing sensitive? Seems unlikely, but how else

> do you explain a Heisenbug like this?

>

Bugs like this are usually caused by rewritten stack. Have you checked if
ebp points to the correct place at the stack? It can mistakenly (broken by
some previous code) point to the unused stack and stop at breakpoint
rewrites locals pointed by this (debugger uses unused stack). Next
possibility is something else (running at the other CPU for example) uses
wrong pointer which points to these variables. I’d use memory breakpoints
to check it.

BTW, I see a lot of problems at Vista which weren’t at XP :-/

Update …

As it turns out, Vista was not “forgetting” anything. The culprit was a
driver issue, that may have only appeared under Vista, hence we blame the
sitting power that is on the throne.

The actual culprit was a 3 year old LSI 21320-r HBA, and specifically the
Vista drivers for that card. When I installed Vista RC2 5744, I noted that
it came with drivers for the HBA so did not install the software on the
support CD or download new drivers from their tech support site. Doing some
preliminary testing such as disc manager, formmating and general accessing
the SCSI drives I was testing,l everything looked good. Of course the
heaviest testing had been done under XP. Even under Vista the discs were
available. However … the first time I ran our diagnostic and testing
software, the OS started to complain about stack corruption.

I began working on it, but no matter how I modified the code, it failed.
From a desk check viewpoint, there was nothing obvioulsy wrong with the
original form of the code. Eventually I called Phil and together we found
that a structure that was defined on the stack, with an address of around
0x2FXXXXX had two fields that never, ever changed, unless we did the
breakpoint trick described previously. Ok … so the next most obvious
answer was bad memory — so swap the DIMMs. No affect. After delving into
it further, with no more insight, I posted the original description here.
Finally, I did the old dance, open the box and start removing things till it
works. I popped the LSI card and bingo, no more stack corruption. I put the
card back in and the stack was corrupted. The next step was to place a call
to LSI tech support who took me through re-flashing the BIOS, but just as we
were about to download new drivers from the web site … Seagate security
decided to call a fire drill. Really and truly they really really did.


Once back in the office with a fresh HBA BIOS it was boot time, but still
the same problem. The next 2 hours is a haze of moving the LSI card from
slot to slot, pulling both the SiI 3132 and 3124 cards, swapping the 3124
and LSI, all of this seeming to go on forever until I finaly had to leave.
Before I left though, I managed to boot to the desktop and download a new
driver from the LSI tech support site. It was the wrong one — BSOD during
the install. At that point I uttered a few well worn explitives and stalked
out of the cube.

This morning, the komputer fairyes did not magically make things work like
they do sometimes, so I booted to safe mode, found a set of drivers I had
pulled for the LSI HBA back in August when this work started, installed
them, and held my breath when I loaded my testing software. It worked. I
turned on the power supply to the two drives, let them be discovered, tried
the test again … and it worked. I booted out of safe mode back to normal
mode and it worked.

So, Vista did not forget a damn thing. Bad Vista drivers for an LSI 21320-R
HBA turned memory into quantuum Swiss cheese.


The personal opinion of
Gary G. Little

Gary,

Thanks for:

  1. taking the time to update us on this issue
  2. making it entertaining (fire drills! Sounds like Hollywood!)
  3. showing your root cause analysis process
  4. making people realize that a bad driver *can* still function for a
    long time in a system with no apparent effect until…

Gald you solve the problem. See, the “komputer fairyes” did show up…
a bit late but she did…

:slight_smile:

Fred

— “Gary G. Little” wrote:

> Update …
>
> As it turns out, Vista was not “forgetting” anything. The culprit was
> a
> driver issue, that may have only appeared under Vista, hence we blame
> the
> sitting power that is on the throne.
>
> The actual culprit was a 3 year old LSI 21320-r HBA, and specifically
> the
> Vista drivers for that card. When I installed Vista RC2 5744, I noted
> that
> it came with drivers for the HBA so did not install the software on
> the
> support CD or download new drivers from their tech support site.
> Doing some
> preliminary testing such as disc manager, formmating and general
> accessing
> the SCSI drives I was testing,l everything looked good. Of course the
>
> heaviest testing had been done under XP. Even under Vista the discs
> were
> available. However … the first time I ran our diagnostic and
> testing
> software, the OS started to complain about stack corruption.
>
> I began working on it, but no matter how I modified the code, it
> failed.
> From a desk check viewpoint, there was nothing obvioulsy wrong with
> the
> original form of the code. Eventually I called Phil and together we
> found
> that a structure that was defined on the stack, with an address of
> around
> 0x2FXXXXX had two fields that never, ever changed, unless we did the
> breakpoint trick described previously. Ok … so the next most
> obvious
> answer was bad memory — so swap the DIMMs. No affect. After
> delving into
> it further, with no more insight, I posted the original description
> here.
> Finally, I did the old dance, open the box and start removing things
> till it
> works. I popped the LSI card and bingo, no more stack corruption. I
> put the
> card back in and the stack was corrupted. The next step was to place
> a call
> to LSI tech support who took me through re-flashing the BIOS, but
> just as we
> were about to download new drivers from the web site … Seagate
> security
> decided to call a fire drill. Really and truly they really really
> did.
>
>
> Once back in the office with a fresh HBA BIOS it was boot time, but
> still
> the same problem. The next 2 hours is a haze of moving the LSI card
> from
> slot to slot, pulling both the SiI 3132 and 3124 cards, swapping the
> 3124
> and LSI, all of this seeming to go on forever until I finaly had to
> leave.
> Before I left though, I managed to boot to the desktop and download a
> new
> driver from the LSI tech support site. It was the wrong one — BSOD
> during
> the install. At that point I uttered a few well worn explitives and
> stalked
> out of the cube.
>
> This morning, the komputer fairyes did not magically make things work
> like
> they do sometimes, so I booted to safe mode, found a set of drivers I
> had
> pulled for the LSI HBA back in August when this work started,
> installed
> them, and held my breath when I loaded my testing software. It
> worked. I
> turned on the power supply to the two drives, let them be discovered,
> tried
> the test again … and it worked. I booted out of safe mode back to
> normal
> mode and it worked.
>
> So, Vista did not forget a damn thing. Bad Vista drivers for an LSI
> 21320-R
> HBA turned memory into quantuum Swiss cheese.
>
> –
> The personal opinion of
> Gary G. Little
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Have you investigated to see if the machine has a rootkit?
I know its vista, but still…

-Jeff

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@seagate.com
Sent: Wednesday, October 18, 2006 12:45 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Vista “forgets” instructions …

There are indeed multiple threads, but we’ve never observed any
inter-thread stack corruption before. Stray pointers are stray
pointers,
but you’d think something that repeatable might have shown up a bit
earlier.

As mentioned in another message, another Vista system with a different
app
rev doesn’t exhibit the same behavior, so we’re looking into the
possibility of a hardware issue.

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842


From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ray Trent


Sent: Tuesday, October 17, 2006 6:20 PM

To: “Windows System Software Devs Interest List”

Subject: Re:[ntdev] Vista “forgets” instructions …

Are there any other threads running in your process (intentionally or

otherwise :-)? Could they be trashing this thread’s stack? Vista

scheduling might very well be different from XP’s… And breakpoints

would change all of that in any case…

Heisenbugs suck.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

I doubt a rootkit would have gone away with the removal of the LSI adapter.
:slight_smile:
Although, stranger things have happened.


The personal opinion of
Gary G. Little

wrote in message news:xxxxx@ntdev…
Have you investigated to see if the machine has a rootkit?
I know its vista, but still…

-Jeff

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@seagate.com
Sent: Wednesday, October 18, 2006 12:45 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Vista “forgets” instructions …

There are indeed multiple threads, but we’ve never observed any
inter-thread stack corruption before. Stray pointers are stray
pointers,
but you’d think something that repeatable might have shown up a bit
earlier.

As mentioned in another message, another Vista system with a different
app
rev doesn’t exhibit the same behavior, so we’re looking into the
possibility of a hardware issue.

Phil

Philip D. Barila

Seagate Technology LLC

(720) 684-1842

________________________________

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ray Trent


Sent: Tuesday, October 17, 2006 6:20 PM

To: “Windows System Software Devs Interest List”

Subject: Re:[ntdev] Vista “forgets” instructions …

Are there any other threads running in your process (intentionally or

otherwise :-)? Could they be trashing this thread’s stack? Vista

scheduling might very well be different from XP’s… And breakpoints

would change all of that in any case…

Heisenbugs suck.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer