need help debugging PAGE_FAULT_IN_FREED_SPECIAL_POOL

Hi everybody,

I have a customer that claims that our driver when run on a system with
Symantect Anti Virus 8.1 causes the Windows 2000 SP4 to crash. We have run
all sorts of tests in our labs (with sav 8.1) but never managed to get it to
crash -- but our client can reproduce it all the times, especially when
running a specific application.

The OS does not crash when either SAV or our driver are disabled. Here below
I have copied portions of a crash dump .. but I am kind of lost. Can anybody
help me in getting started?

thanks,

Marco

PAGE_FAULT_IN_FREED_SPECIAL_POOL (cc)
Memory was referenced after it was freed.
This cannot be protected by try-except.
When possible, the guilty driver's name (Unicode string) is printed on
the bugcheck screen and saved in KiBugCheckDriver.
Arguments:
Arg1: bad0b0f8, memory referenced
Arg2: 00000000, value 0 = read operation, 1 = write operation
Arg3: 804945c3, if non-zero, the address which referenced memory.
Arg4: 00000000, Mm internal code.

Debugging Details:

READ_ADDRESS: bad0b0f8 Special pool

FAULTING_IP:
nt!NtWaitForSingleObject+98
804945c3 8b4048 mov eax,[eax+0x48]

MM_INTERNAL_CODE: 0

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xCC

LAST_CONTROL_TRANSFER: from 80465b91 to 804945c3

TRAP_FRAME: fa36bc88 -- (.trap fffffffffa36bc88)
ErrCode = 00000000
eax=bad0b0b0 ebx=80494531 ecx=812a1be0 edx=00000000 esi=fa36bd1c
edi=00000000
eip=804945c3 esp=fa36bcfc ebp=fa36bd50 iopl=0 nv up ei pl zr na po
nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000
efl=00010246
nt!NtWaitForSingleObject+0x98:
804945c3 8b4048 mov eax,[eax+0x48]
ds:0023:bad0b0f8=????????
Resetting default scope

STACK_TEXT:
fa36bd50 80465b91 00000388 00000000 fa36bd1c nt!NtWaitForSingleObject+0x98
fa36bd50 77f82870 00000388 00000000 fa36bd1c nt!KiSystemService+0xc4
0012fe64 7c573b28 00000388 00000000 0012fe84 ntdll!ZwWaitForSingleObject+0xb
0012fe8c 7c573b50 00000388 00000064 00000000
KERNEL32!WaitForSingleObjectEx+0x5a
0012fe9c 00409895 00000388 00000064 2c000000
KERNEL32!WaitForSingleObject+0xf
WARNING: Stack unwind information not available. Following frames may be
wrong.
0012fecc 0040340f 0012ea58 001337f6 7ffdf000 CVSTRANS+0x9895
0012ff24 0040e5f6 2c000000 00000000 00000001 CVSTRANS+0x340f
0012ffc0 7c581af6 0012ea58 77f843a3 7ffdf000 CVSTRANS+0xe5f6
0012fff0 00000000 0040e4a0 00000000 000000c8 KERNEL32!OpenEventA+0x63d

FOLLOWUP_IP:
nt!NtWaitForSingleObject+98
804945c3 8b4048 mov eax,[eax+0x48]

SYMBOL_STACK_INDEX: 0

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: nt!NtWaitForSingleObject+98

MODULE_NAME: nt

IMAGE_NAME: ntoskrnl.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 4047db83

STACK_COMMAND: .trap fffffffffa36bc88 ; kb

BUCKET_ID: 0xCC_BADMEMREF_nt!NtWaitForSingleObject+98

Followup: MachineOwner

--
Marco [www.neovalens.com]

This may well be VERY obvious, but sometimes it’s not easy to see the
obvious when you’re looking too closely, so I’ll say it anyways:

The fault is caused by someone calling KeWaitForSingleObject, with a
pointer to a freed memory location.

This is caused by many different scenarios. Here’s a couple of examples:

  1. Obviously, it could be some code like this:
    free(p)
    KeWaitForSingleObject( p );
    This would obviously happen every time this code path is executed.
  2. Some code-path is freeing some memory that didn’t belong to it. For
    example, there is some code like this:

somestruct *p;

somefunc(…)
{
if (p)
free(p);
}

Obviously, since p is not set to NULL by free itself, if we call somefunc
twice, the second time will free the memory a second time. If you’re
“lucky” the system will catch it immediately. If you’re not so lucky, it’s
someone elses memory that gets freed, so some file-system driver, or other
kernel structure gets freed.

It can be particularly nasty to try to find these type of bugs…
Especially, if it’s dependant on other drivers.

Does the dump crashing in the same way every time, or is this just one
example of many different scenarios?

Does it happen on other OS’s?

Could it be a race-condition when multiple threads enter your driver from
different apps?


Mats

xxxxx@lists.osr.com wrote on 10/18/2004 04:35:22 PM:

Hi everybody,

I have a customer that claims that our driver when run on a system with
Symantect Anti Virus 8.1 causes the Windows 2000 SP4 to crash. We have
run
all sorts of tests in our labs (with sav 8.1) but never managed to get it
to
crash – but our client can reproduce it all the times, especially when
running a specific application.

The OS does not crash when either SAV or our driver are disabled. Here
below
I have copied portions of a crash dump … but I am kind of lost. Can
anybody
help me in getting started?

thanks,

Marco

PAGE_FAULT_IN_FREED_SPECIAL_POOL (cc)
Memory was referenced after it was freed.
This cannot be protected by try-except.
When possible, the guilty driver’s name (Unicode string) is printed on
the bugcheck screen and saved in KiBugCheckDriver.
Arguments:
Arg1: bad0b0f8, memory referenced
Arg2: 00000000, value 0 = read operation, 1 = write operation
Arg3: 804945c3, if non-zero, the address which referenced memory.
Arg4: 00000000, Mm internal code.

Debugging Details:

READ_ADDRESS: bad0b0f8 Special pool

FAULTING_IP:
nt!NtWaitForSingleObject+98
804945c3 8b4048 mov eax,[eax+0x48]

MM_INTERNAL_CODE: 0

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xCC

LAST_CONTROL_TRANSFER: from 80465b91 to 804945c3

TRAP_FRAME: fa36bc88 – (.trap fffffffffa36bc88)
ErrCode = 00000000
eax=bad0b0b0 ebx=80494531 ecx=812a1be0 edx=00000000 esi=fa36bd1c
edi=00000000
eip=804945c3 esp=fa36bcfc ebp=fa36bd50 iopl=0 nv up ei pl zr na
po
nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000
efl=00010246
nt!NtWaitForSingleObject+0x98:
804945c3 8b4048 mov eax,[eax+0x48]
ds:0023:bad0b0f8=???
Resetting default scope

STACK_TEXT:
fa36bd50 80465b91 00000388 00000000 fa36bd1c
nt!NtWaitForSingleObject+0x98
fa36bd50 77f82870 00000388 00000000 fa36bd1c nt!KiSystemService+0xc4
0012fe64 7c573b28 00000388 00000000 0012fe84
ntdll!ZwWaitForSingleObject+0xb
0012fe8c 7c573b50 00000388 00000064 00000000
KERNEL32!WaitForSingleObjectEx+0x5a
0012fe9c 00409895 00000388 00000064 2c000000
KERNEL32!WaitForSingleObject+0xf
WARNING: Stack unwind information not available. Following frames may be
wrong.
0012fecc 0040340f 0012ea58 001337f6 7ffdf000 CVSTRANS+0x9895
0012ff24 0040e5f6 2c000000 00000000 00000001 CVSTRANS+0x340f
0012ffc0 7c581af6 0012ea58 77f843a3 7ffdf000 CVSTRANS+0xe5f6
0012fff0 00000000 0040e4a0 00000000 000000c8 KERNEL32!OpenEventA+0x63d

FOLLOWUP_IP:
nt!NtWaitForSingleObject+98
804945c3 8b4048 mov eax,[eax+0x48]

SYMBOL_STACK_INDEX: 0

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: nt!NtWaitForSingleObject+98

MODULE_NAME: nt

IMAGE_NAME: ntoskrnl.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 4047db83

STACK_COMMAND: .trap fffffffffa36bc88 ; kb

BUCKET_ID: 0xCC_BADMEMREF_nt!NtWaitForSingleObject+98

Followup: MachineOwner


Marco [www.neovalens.com]


Questions? First check the Kernel Driver FAQ at http://www.
osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@3dlabs.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

ForwardSourceID:NT0000591A

Hi Mats

thanks a lot for the tips.

Does the dump crashing in the same way every time, or is this just one
example of many different scenarios?

AFAIK it happens all the times for a particular application but was told it
has also happened when running IE.

Does it happen on other OS’s?

can’t tell for sure: they have Windows 2000 SP4 and we have been unable to
reprod the crash.

Could it be a race-condition when multiple threads enter your driver from
different apps?

it could: we deal with application execution and must hook NtResumeThread
but we gave them a test version with a return and nothing else and it
crashed as well. FYI, we hook NtCreateSection, NtCreateProcess(Ex) and
NtResumeThread. We use driver verifier and tag memory allocations …

In this dump they run our debug version and had driver verifier on.

Marco


Mats

xxxxx@lists.osr.com wrote on 10/18/2004 04:35:22 PM:

> Hi everybody,
>
> I have a customer that claims that our driver when run on a system with
> Symantect Anti Virus 8.1 causes the Windows 2000 SP4 to crash. We have
run
> all sorts of tests in our labs (with sav 8.1) but never managed to get it
to
> crash – but our client can reproduce it all the times, especially when
> running a specific application.
>
> The OS does not crash when either SAV or our driver are disabled. Here
below
> I have copied portions of a crash dump … but I am kind of lost. Can
anybody
> help me in getting started?
>
> thanks,
>
> Marco
>
>
> PAGE_FAULT_IN_FREED_SPECIAL_POOL (cc)
> Memory was referenced after it was freed.
> This cannot be protected by try-except.
> When possible, the guilty driver’s name (Unicode string) is printed on
> the bugcheck screen and saved in KiBugCheckDriver.
> Arguments:
> Arg1: bad0b0f8, memory referenced
> Arg2: 00000000, value 0 = read operation, 1 = write operation
> Arg3: 804945c3, if non-zero, the address which referenced memory.
> Arg4: 00000000, Mm internal code.
>
> Debugging Details:
> ------------------
>
>
> READ_ADDRESS: bad0b0f8 Special pool
>
> FAULTING_IP:
> nt!NtWaitForSingleObject+98
> 804945c3 8b4048 mov eax,[eax+0x48]
>
> MM_INTERNAL_CODE: 0
>
> DEFAULT_BUCKET_ID: DRIVER_FAULT
>
> BUGCHECK_STR: 0xCC
>
> LAST_CONTROL_TRANSFER: from 80465b91 to 804945c3
>
> TRAP_FRAME: fa36bc88 – (.trap fffffffffa36bc88)
> ErrCode = 00000000
> eax=bad0b0b0 ebx=80494531 ecx=812a1be0 edx=00000000 esi=fa36bd1c
> edi=00000000
> eip=804945c3 esp=fa36bcfc ebp=fa36bd50 iopl=0 nv up ei pl zr na
po
> nc
> cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000
> efl=00010246
> nt!NtWaitForSingleObject+0x98:
> 804945c3 8b4048 mov eax,[eax+0x48]
> ds:0023:bad0b0f8=???
> Resetting default scope
>
> STACK_TEXT:
> fa36bd50 80465b91 00000388 00000000 fa36bd1c
nt!NtWaitForSingleObject+0x98
> fa36bd50 77f82870 00000388 00000000 fa36bd1c nt!KiSystemService+0xc4
> 0012fe64 7c573b28 00000388 00000000 0012fe84
ntdll!ZwWaitForSingleObject+0xb
> 0012fe8c 7c573b50 00000388 00000064 00000000
> KERNEL32!WaitForSingleObjectEx+0x5a
> 0012fe9c 00409895 00000388 00000064 2c000000
> KERNEL32!WaitForSingleObject+0xf
> WARNING: Stack unwind information not available. Following frames may be
> wrong.
> 0012fecc 0040340f 0012ea58 001337f6 7ffdf000 CVSTRANS+0x9895
> 0012ff24 0040e5f6 2c000000 00000000 00000001 CVSTRANS+0x340f
> 0012ffc0 7c581af6 0012ea58 77f843a3 7ffdf000 CVSTRANS+0xe5f6
> 0012fff0 00000000 0040e4a0 00000000 000000c8 KERNEL32!OpenEventA+0x63d
>
>
> FOLLOWUP_IP:
> nt!NtWaitForSingleObject+98
> 804945c3 8b4048 mov eax,[eax+0x48]
>
> SYMBOL_STACK_INDEX: 0
>
> FOLLOWUP_NAME: MachineOwner
>
> SYMBOL_NAME: nt!NtWaitForSingleObject+98
>
> MODULE_NAME: nt
>
> IMAGE_NAME: ntoskrnl.exe
>
> DEBUG_FLR_IMAGE_TIMESTAMP: 4047db83
>
> STACK_COMMAND: .trap fffffffffa36bc88 ; kb
>
> BUCKET_ID: 0xCC_BADMEMREF_nt!NtWaitForSingleObject+98
>
> Followup: MachineOwner
> ---------
>
>
>
> –
> Marco [www.neovalens.com]
> –
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.
> osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com

> ForwardSourceID:NT0000591A

Some more “random thoughts”…

xxxxx@lists.osr.com wrote on 10/18/2004 05:21:24 PM:

Hi Mats

thanks a lot for the tips.

> Does the dump crashing in the same way every time, or is this just one
> example of many different scenarios?

AFAIK it happens all the times for a particular application but was told
it
has also happened when running IE.

Is there anything particular about this app that they have?

>
> Does it happen on other OS’s?
>
can’t tell for sure: they have Windows 2000 SP4 and we have been unable
to
reprod the crash.

Are you trying to repro with the same HW and SW setup (i.e. same version
motherboard, same types and versions of PCI cards, and the same version of
drivers)? Or are you running YOUR machine with your set of HW/SW combo
(besides the obvious of your driver and the Symantec AV software).

You are aware, by the way, that Symantec are naughty and change the stack
during kernel execution. Don’t know if this affects your code or not, but
it’s something to be aware of…

> Could it be a race-condition when multiple threads enter your driver
from
> different apps?
>

it could: we deal with application execution and must hook NtResumeThread

but we gave them a test version with a return and nothing else and it
crashed as well. FYI, we hook NtCreateSection, NtCreateProcess(Ex) and
NtResumeThread. We use driver verifier and tag memory allocations …

In this dump they run our debug version and had driver verifier on.

Ok, so it’s unlikely to be a timing only bug… But it could still be a
race-condition, just that you’ve changed the chances of it happening.

What if your driver is just loaded, without any hooking at all, does it
still happen? :wink:


Mats

Marco

> –
> Mats
>
>
> xxxxx@lists.osr.com wrote on 10/18/2004 04:35:22 PM:
>
>> Hi everybody,
>>
>> I have a customer that claims that our driver when run on a system
with
>> Symantect Anti Virus 8.1 causes the Windows 2000 SP4 to crash. We have
> run
>> all sorts of tests in our labs (with sav 8.1) but never managed to get
it
> to
>> crash – but our client can reproduce it all the times, especially
when
>> running a specific application.
>>
>> The OS does not crash when either SAV or our driver are disabled. Here
> below
>> I have copied portions of a crash dump … but I am kind of lost. Can
> anybody
>> help me in getting started?
>>
>> thanks,
>>
>> Marco
>>
>>
>> PAGE_FAULT_IN_FREED_SPECIAL_POOL (cc)
>> Memory was referenced after it was freed.
>> This cannot be protected by try-except.
>> When possible, the guilty driver’s name (Unicode string) is printed on
>> the bugcheck screen and saved in KiBugCheckDriver.
>> Arguments:
>> Arg1: bad0b0f8, memory referenced
>> Arg2: 00000000, value 0 = read operation, 1 = write operation
>> Arg3: 804945c3, if non-zero, the address which referenced memory.
>> Arg4: 00000000, Mm internal code.
>>
>> Debugging Details:
>> ------------------
>>
>>
>> READ_ADDRESS: bad0b0f8 Special pool
>>
>> FAULTING_IP:
>> nt!NtWaitForSingleObject+98
>> 804945c3 8b4048 mov eax,[eax+0x48]
>>
>> MM_INTERNAL_CODE: 0
>>
>> DEFAULT_BUCKET_ID: DRIVER_FAULT
>>
>> BUGCHECK_STR: 0xCC
>>
>> LAST_CONTROL_TRANSFER: from 80465b91 to 804945c3
>>
>> TRAP_FRAME: fa36bc88 – (.trap fffffffffa36bc88)
>> ErrCode = 00000000
>> eax=bad0b0b0 ebx=80494531 ecx=812a1be0 edx=00000000 esi=fa36bd1c
>> edi=00000000
>> eip=804945c3 esp=fa36bcfc ebp=fa36bd50 iopl=0 nv up ei pl zr
na
> po
>> nc
>> cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000
>> efl=00010246
>> nt!NtWaitForSingleObject+0x98:
>> 804945c3 8b4048 mov eax,[eax+0x48]
>> ds:0023:bad0b0f8=???
>> Resetting default scope
>>
>> STACK_TEXT:
>> fa36bd50 80465b91 00000388 00000000 fa36bd1c
> nt!NtWaitForSingleObject+0x98
>> fa36bd50 77f82870 00000388 00000000 fa36bd1c nt!KiSystemService+0xc4
>> 0012fe64 7c573b28 00000388 00000000 0012fe84
> ntdll!ZwWaitForSingleObject+0xb
>> 0012fe8c 7c573b50 00000388 00000064 00000000
>> KERNEL32!WaitForSingleObjectEx+0x5a
>> 0012fe9c 00409895 00000388 00000064 2c000000
>> KERNEL32!WaitForSingleObject+0xf
>> WARNING: Stack unwind information not available. Following frames may
be
>> wrong.
>> 0012fecc 0040340f 0012ea58 001337f6 7ffdf000 CVSTRANS+0x9895
>> 0012ff24 0040e5f6 2c000000 00000000 00000001 CVSTRANS+0x340f
>> 0012ffc0 7c581af6 0012ea58 77f843a3 7ffdf000 CVSTRANS+0xe5f6
>> 0012fff0 00000000 0040e4a0 00000000 000000c8 KERNEL32!OpenEventA+0x63d
>>
>>
>> FOLLOWUP_IP:
>> nt!NtWaitForSingleObject+98
>> 804945c3 8b4048 mov eax,[eax+0x48]
>>
>> SYMBOL_STACK_INDEX: 0
>>
>> FOLLOWUP_NAME: MachineOwner
>>
>> SYMBOL_NAME: nt!NtWaitForSingleObject+98
>>
>> MODULE_NAME: nt
>>
>> IMAGE_NAME: ntoskrnl.exe
>>
>> DEBUG_FLR_IMAGE_TIMESTAMP: 4047db83
>>
>> STACK_COMMAND: .trap fffffffffa36bc88 ; kb
>>
>> BUCKET_ID: 0xCC_BADMEMREF_nt!NtWaitForSingleObject+98
>>
>> Followup: MachineOwner
>> ---------
>>
>>
>>
>> –
>> Marco [www.neovalens.com]
>> –
>>
>>
>>
>> —
>> Questions? First check the Kernel Driver FAQ at http://www.
>> osronline.com/article.cfm?id=256
>>
>> You are currently subscribed to ntdev as: xxxxx@3dlabs.com
>> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
>> ForwardSourceID:NT0000591A
>
>
>


Questions? First check the Kernel Driver FAQ at http://www.
osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@3dlabs.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

ForwardSourceID:NT0000593E

Mats,

Are you trying to repro with the same HW and SW setup (i.e. same version
motherboard, same types and versions of PCI cards, and the same version of
drivers)? Or are you running YOUR machine with your set of HW/SW combo
(besides the obvious of your driver and the Symantec AV software).

I have asked them for one of their computers but it doubt we’ll get it – we
are indeed using our test machines. All we know is that they have SP4 fully
pacthed but the HW is indeed different.

You are aware, by the way, that Symantec are naughty and change the stack
during kernel execution. Don’t know if this affects your code or not, but
it’s something to be aware of…

yup, that I know … aldready been burned in a previous life. For functions
we hook we basically have a data structure with all the locals and we
allocate the memory on the heap. Now that I think (again) about it I would
also have to check other functions we call to see how much stack space they
allocate/use…

Ok, so it’s unlikely to be a timing only bug… But it could still be a
race-condition, just that you’ve changed the chances of it happening.

What if your driver is just loaded, without any hooking at all, does it
still happen? :wink:

good idea, I will give it a try … if only I could reprod the bug over here

FYI: one of the dumps (not this one) was also analyzed but MS PSS and they
mentioned an over-dereferenced object … but as far as I can tell all our
ref/deref are matched. How can I find such info in the crash dump?

Thank you again,

Marco

xxxxx@lists.osr.com wrote on 10/18/2004 06:34:12 PM:

Mats,

> Are you trying to repro with the same HW and SW setup (i.e. same
version
> motherboard, same types and versions of PCI cards, and the same version
of
> drivers)? Or are you running YOUR machine with your set of HW/SW combo
> (besides the obvious of your driver and the Symantec AV software).

I have asked them for one of their computers but it doubt we’ll get it –
we
are indeed using our test machines. All we know is that they have SP4
fully
pacthed but the HW is indeed different.

It’s quite possible that some other driver on their hardware has some bug
(perhaps triggered by your driver, for some reason), so I would make
attempts to debug the problem on their hardware. If they are not letting
you have a machine, maybe you can go to their location to make efforts to
track it down…

>
> You are aware, by the way, that Symantec are naughty and change the
stack
> during kernel execution. Don’t know if this affects your code or not,
but
> it’s something to be aware of…

yup, that I know … aldready been burned in a previous life. For
functions
we hook we basically have a data structure with all the locals and we
allocate the memory on the heap. Now that I think (again) about it I
would
also have to check other functions we call to see how much stack space
they
allocate/use…

> Ok, so it’s unlikely to be a timing only bug… But it could still be a
> race-condition, just that you’ve changed the chances of it happening.
>
> What if your driver is just loaded, without any hooking at all, does it
> still happen? :wink:

good idea, I will give it a try … if only I could reprod the bug over
here

Yeah, been there… It’s pretty difficult to find bugs when you can’t even
repro them… All you can do is use the “Guess’n’hope” method…

FYI: one of the dumps (not this one) was also analyzed but MS PSS and
they
mentioned an over-dereferenced object … but as far as I can tell all our

ref/deref are matched. How can I find such info in the crash dump?

I don’t know. I’ve never debugged such a problem. [All my advice comes from
either general debugging experiences or debugging display drivers, and
display drivers does very little in the form of referencing and
dereferencing objects.]

Of course, if you dereference and object, it should be freed, and that can
cause all sorts of interesting problems that may also be timing related
(because the free isn’t necessarily immediate, but done by some “clean-up
process” which may run at any time after the dereferencing happens. So
depending on the particular situation, it’s possible that adding extra code
(i.e. different drivers, another filter driver, or something), may cause
the execution to be different.

Here’s an idea. Keep a list of allocations and types, some struct like
this:

#define LARGE_NUMBER 2000

struct allocation
{
void *p;
size_t size;
char *type;
char *where;
int lineNo;
bool freed;
};

struct allocation allocationArray[LARGE_NUMBER];
int allocationCount = 0;

void addAlloc(void *p, size_t size, char *type, char *file, int line)
{
struct allocation *pAlloc;
pAlloc = &allocationArray[allocationCount];
pAlloc->p = p;
pAlloc->size = size;
pAlloc->type = type;
pAlloc->where = file;
pAlloc->line = line;
pAlloc->freed = false;
allocationCount = (allocationCount + 1) % LARGE_NUMBER;
}

void removeAlloc(void *p, size_t size, char *file, int line)
{
int i;
for(i = 0; i < LARGE_NUMBER; i++)
{
pAlloc = &allocationArray[i]
if (pAlloc->p == p && pAlloc->size == size && !pAlloc->freed)
{
pAlloc->freed = true;
return;
}
}
DebugPrint(“Could not find allocation, what’s wrong?”); //
This could of course happen if the LARGE_NUMBER is too small or the
allocation is really old.
}

Now whenever you call anything that allocates memory, call addAlloc, and
when removing, call removeAlloc.

it could be made a bit more (or a lot more) clever by for instance
searching through the list for this addres and “reusing” the address. Using
a hash of the address is also perhaps a good idea, rather than a simple
array. But I think you get the idea.

Once the system crashes, you print out the list of allocations, and check
what it was allocated for and where. Maybe it will turn up something… If
nothins else, it may rule out some scenarios for you. I’ve used this type
of thing for a lot of different memory allocation debugging, and it’s
really helpful to have something like this for keeping track of
allocations.

Also, you could use a “delayed free” method. Instead of freeing up memory,
stick it on a list of freed stuff, and only delete it either when it’s OLD
(by time) or when the size of freed-items is greater than a set size (16MB
for instance). When it’s first put on the list, fill it with a pattern, and
check that the pattern is still there when you actually free it. Again,
this catches code that keeps buffers after they have been freed.


Mats

Thank you again,

Marco


Questions? First check the Kernel Driver FAQ at http://www.
osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@3dlabs.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

ForwardSourceID:NT00005982

Mats,

thank you again. I already track memory allocations by means of tags but
working my way backwards is a lot of work .. keeping track of each
allocation as you propose a better idea.

May I ask you to elaborate on:

You are aware, by the way, that Symantec are naughty and change the
stack during kernel execution.

I have read it too quickly the first time and assumed (somehow) that you
actually meant stack space .. instead of actually changing the stack. Change
the stack of what?

thanks!

Marco [www.neovalens.com]

Its the thread stack!

“Marco Peretti” wrote in message news:xxxxx@ntdev…
> Mats,
>
> thank you again. I already track memory allocations by means of tags but
> working my way backwards is a lot of work … keeping track of each
> allocation as you propose a better idea.
>
> May I ask you to elaborate on:
>
>> You are aware, by the way, that Symantec are naughty and change the
>> stack during kernel execution.
>
> I have read it too quickly the first time and assumed (somehow) that you
> actually meant stack space … instead of actually changing the stack.
> Change the stack of what?
>
> thanks!
> –
> Marco [www.neovalens.com]
> –
>
>
>

xxxxx@lists.osr.com wrote on 10/19/2004 04:59:02 PM:

Mats,

thank you again. I already track memory allocations by means of tags but
working my way backwards is a lot of work … keeping track of each
allocation as you propose a better idea.

Tags are great for knowing which type of allocation takes how much space,
and such, or who allocated some piece of memory. But for tracking for
instance lost data (i.e. freed but shouldn’t have been), it’s pretty much
useless, since it’s most likely already overwritten by some other stuff
when you come to see it as a failure…

May I ask you to elaborate on:

> You are aware, by the way, that Symantec are naughty and change the
> stack during kernel execution.

I have read it too quickly the first time and assumed (somehow) that you
actually meant stack space … instead of actually changing the stack.
Change
the stack of what?

They do something along the lines of:

mov [something], esp
mov esp, newValue

somewhere in their driver.

This means that the STACK SPACE for the process running currently, is
different between after and before this call. Now, to almost all code I can
think of, this shouldn’t make any difference. In fact even the code that
DOES NOT work, it really doesn’t make sense to do anyway, unless the code
is written in assembler and doing REALLY “tricky stuff” (like trying to
access variables in the space of a function back in the stack by using
offsets on ESP. Again, this is fine as long as you only have your own code
in that stack, but say for instance that one call is an external function,
you can’t know how much stack space it uses anyways, so it’s b0rken when
you apply for instance a service pack too. By the way, programming
languages that allow “functions within functions”, for instance Pascal,
will access the stack outside the current function when you access a local
variable in an outer function, which is perfectly legal and “sensible”
thing to do. But then the compiler will know what’s going on on the stack,
and you can’t make a “callback” for instance, of a function that is
anything other than the “outside” of the outermost function).

We do the same as Symantec in our simulator driver (used for new graphics
chips that are not yet available in hardware, and it’s written in C++),
because rather than trying to make a 256KB stack space needed by the
simulator shrink to 12KB, we just swap the stack when we call the
simulator. This is not a nice thing to do, but it’s “OK” for internal
projects (which our simulator is, since no customer ever will want a driver
which runs 10-100 times slower than the cheapest available graphics card on
the market, and that only runs on pretty expensive 3DLabs hardware at the
same time [our simulator works by accessing a 3DLabs graphics card directly
when it’s “drawing”]).


Mats

thanks!

Marco [www.neovalens.com]


Questions? First check the Kernel Driver FAQ at http://www.
osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@3dlabs.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

ForwardSourceID:NT00005AC6

ok, that part i got :wink:

I am was more after the *how* ..

--
Marco [www.neovalens.com]

"Lyndon J Clarke" wrote in message
news:xxxxx@ntdev...
> Its the thread stack!
>
> "Marco Peretti" wrote in message
> news:xxxxx@ntdev...
>> Mats,
>>
>> thank you again. I already track memory allocations by means of tags but
>> working my way backwards is a lot of work .. keeping track of each
>> allocation as you propose a better idea.
>>
>> May I ask you to elaborate on:
>>
>>> You are aware, by the way, that Symantec are naughty and change the
>>> stack during kernel execution.
>>
>> I have read it too quickly the first time and assumed (somehow) that you
>> actually meant stack space .. instead of actually changing the stack.
>> Change the stack of what?
>>
>> thanks!
>> --
>> Marco [www.neovalens.com]
>> --
>>
>>
>>
>
>
>