OK What now

Charlie_Benger-Stevenson · November 7, 2008, 5:01pm

I am trying to develop a basic firewall based on the wfp example in the ddk for windows vista. I have been working some extremely long hours and pouring my heart and soul into this project, and now I am in the testing phases if I see another blue screen of death I think I’ll scream.

I have learned how to analyze crash dumps using windbg, and now I am really up against it. Possibly the least helpful error message I have seen.

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.

The full crash dump is posted below, but I have NO idea where to turn next. Surely there is a way to get more information than this, and track this down. Really heartbroken if i can’t get any further and all that extremely hard work was for nothing.

Heres the trace anyway :

*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: fffe9084, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000001, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: 81a5ca2e, address which referenced memory

Debugging Details:

WRITE_ADDRESS: GetPointerFromAddress: unable to read from 81b55868
Unable to read MiSystemVaType memory at 81b35420
fffe9084

CURRENT_IRQL: 2

FAULTING_IP:
nt!CcMoveVacbToReuseTail+8
81a5ca2e 890a mov dword ptr [edx],ecx

CUSTOMER_CRASH_COUNT: 12

DEFAULT_BUCKET_ID: COMMON_SYSTEM_FAULT

BUGCHECK_STR: 0xA

PROCESS_NAME: WLLoginProxy.ex

TRAP_FRAME: a2900744 – (.trap 0xffffffffa2900744)
ErrCode = 00000002
eax=0001030c ebx=00000000 ecx=fffe9000 edx=fffe9084 esi=000102fc edi=00100000
eip=81a5ca2e esp=a29007b8 ebp=a29007e0 iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
nt!CcMoveVacbToReuseTail+0x8:
81a5ca2e 890a mov dword ptr [edx],ecx ds:0023:fffe9084=???
Resetting default scope

LAST_CONTROL_TRANSFER: from 81a5ca2e to 81a78d24

STACK_TEXT:
a2900744 81a5ca2e badb0d00 fffe9084 81a78b54 nt!KiTrap0E+0x2ac
a29007b4 81a5ca90 000102fc 00000000 81a5c882 nt!CcMoveVacbToReuseTail+0x8
a29007c0 81a5c882 00005eb5 000102fd 81a5f656 nt!CcFreeVirtualAddress+0x41
a29007e0 81c4396f 00000001 00000000 00000007 nt!CcUnpinFileDataEx+0x1e
a2900804 87ab899b 000102fd 2534b3b8 00000000 nt!CcUnpinData+0x4e
a2900860 87ab852b 83e1c5f8 851a70d8 00000794 Ntfs!NtfsAllocateBitmapRun+0x10d
a290095c 87aba0d3 83e1c5f8 851a70d8 ae0ac8d0 Ntfs!NtfsAllocateClusters+0xb67
a2900a08 87a295d1 83e1c5f8 83c89f80 0100000c Ntfs!NtfsAddAllocation+0x34c
a2900a4c 87a221c1 83e1c5f8 83c89f80 0000000c Ntfs!NtfsAddAllocationForNonResidentWrite+0x12a
a2900b80 87a20914 83e1c5f8 83e63850 2534b020 Ntfs!NtfsCommonWrite+0x17ef
a2900bf8 81ad9fd3 851a7020 83e63850 83e63850 Ntfs!NtfsFsdWrite+0x2dc
a2900c10 875d3ba7 84fe88f0 83e63850 00000000 nt!IofCallDriver+0x63
a2900c34 875d3d64 a2900c54 84fe88f0 00000000 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x251
a2900c6c 81ad9fd3 84fe88f0 83e63850 83e63850 fltmgr!FltpDispatch+0xc2
a2900c84 81c6a615 83c89fac 83e63850 83e639e0 nt!IofCallDriver+0x63
a2900ca4 81c4591d 84fe88f0 83c89f80 00000001 nt!IopSynchronousServiceTail+0x1d9
a2900d38 81a75a1a 84fe88f0 00000000 00000000 nt!NtWriteFile+0x6fc
a2900d38 77779a94 84fe88f0 00000000 00000000 nt!KiFastCallEntry+0x12a
WARNING: Frame IP not in any known module. Following frames may be wrong.
0006e304 00000000 00000000 00000000 00000000 0x77779a94

STACK_COMMAND: kb

FOLLOWUP_IP:
nt!CcMoveVacbToReuseTail+8
81a5ca2e 890a mov dword ptr [edx],ecx

SYMBOL_STACK_INDEX: 1

SYMBOL_NAME: nt!CcMoveVacbToReuseTail+8

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nt

IMAGE_NAME: ntkrpamp.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 48d1b7fa

FAILURE_BUCKET_ID: 0xA_W_nt!CcMoveVacbToReuseTail+8

BUCKET_ID: 0xA_W_nt!CcMoveVacbToReuseTail+8

Followup: MachineOwner

Ken_Johnson · November 7, 2008, 5:09pm

My guess is that you are causing pool corruption which is later resulting in secondary failures.

Can you try enabling special pool in driver verifier? (verifier.exe)

S

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.co.uk
Sent: Friday, November 07, 2008 5:01 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] OK What now

I am trying to develop a basic firewall based on the wfp example in the ddk for windows vista. I have been working some extremely long hours and pouring my heart and soul into this project, and now I am in the testing phases if I see another blue screen of death I think I’ll scream.

I have learned how to analyze crash dumps using windbg, and now I am really up against it. Possibly the least helpful error message I have seen.

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.

The full crash dump is posted below, but I have NO idea where to turn next. Surely there is a way to get more information than this, and track this down. Really heartbroken if i can’t get any further and all that extremely hard work was for nothing.

Heres the trace anyway :

*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: fffe9084, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000001, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: 81a5ca2e, address which referenced memory

Debugging Details:

WRITE_ADDRESS: GetPointerFromAddress: unable to read from 81b55868
Unable to read MiSystemVaType memory at 81b35420
fffe9084

CURRENT_IRQL: 2

FAULTING_IP:
nt!CcMoveVacbToReuseTail+8
81a5ca2e 890a mov dword ptr [edx],ecx

CUSTOMER_CRASH_COUNT: 12

DEFAULT_BUCKET_ID: COMMON_SYSTEM_FAULT

BUGCHECK_STR: 0xA

PROCESS_NAME: WLLoginProxy.ex

TRAP_FRAME: a2900744 – (.trap 0xffffffffa2900744)
ErrCode = 00000002
eax=0001030c ebx=00000000 ecx=fffe9000 edx=fffe9084 esi=000102fc edi=00100000
eip=81a5ca2e esp=a29007b8 ebp=a29007e0 iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
nt!CcMoveVacbToReuseTail+0x8:
81a5ca2e 890a mov dword ptr [edx],ecx ds:0023:fffe9084=???
Resetting default scope

LAST_CONTROL_TRANSFER: from 81a5ca2e to 81a78d24

STACK_TEXT:
a2900744 81a5ca2e badb0d00 fffe9084 81a78b54 nt!KiTrap0E+0x2ac
a29007b4 81a5ca90 000102fc 00000000 81a5c882 nt!CcMoveVacbToReuseTail+0x8
a29007c0 81a5c882 00005eb5 000102fd 81a5f656 nt!CcFreeVirtualAddress+0x41
a29007e0 81c4396f 00000001 00000000 00000007 nt!CcUnpinFileDataEx+0x1e
a2900804 87ab899b 000102fd 2534b3b8 00000000 nt!CcUnpinData+0x4e
a2900860 87ab852b 83e1c5f8 851a70d8 00000794 Ntfs!NtfsAllocateBitmapRun+0x10d
a290095c 87aba0d3 83e1c5f8 851a70d8 ae0ac8d0 Ntfs!NtfsAllocateClusters+0xb67
a2900a08 87a295d1 83e1c5f8 83c89f80 0100000c Ntfs!NtfsAddAllocation+0x34c
a2900a4c 87a221c1 83e1c5f8 83c89f80 0000000c Ntfs!NtfsAddAllocationForNonResidentWrite+0x12a
a2900b80 87a20914 83e1c5f8 83e63850 2534b020 Ntfs!NtfsCommonWrite+0x17ef
a2900bf8 81ad9fd3 851a7020 83e63850 83e63850 Ntfs!NtfsFsdWrite+0x2dc
a2900c10 875d3ba7 84fe88f0 83e63850 00000000 nt!IofCallDriver+0x63
a2900c34 875d3d64 a2900c54 84fe88f0 00000000 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x251
a2900c6c 81ad9fd3 84fe88f0 83e63850 83e63850 fltmgr!FltpDispatch+0xc2
a2900c84 81c6a615 83c89fac 83e63850 83e639e0 nt!IofCallDriver+0x63
a2900ca4 81c4591d 84fe88f0 83c89f80 00000001 nt!IopSynchronousServiceTail+0x1d9
a2900d38 81a75a1a 84fe88f0 00000000 00000000 nt!NtWriteFile+0x6fc
a2900d38 77779a94 84fe88f0 00000000 00000000 nt!KiFastCallEntry+0x12a
WARNING: Frame IP not in any known module. Following frames may be wrong.
0006e304 00000000 00000000 00000000 00000000 0x77779a94

STACK_COMMAND: kb

FOLLOWUP_IP:
nt!CcMoveVacbToReuseTail+8
81a5ca2e 890a mov dword ptr [edx],ecx

SYMBOL_STACK_INDEX: 1

SYMBOL_NAME: nt!CcMoveVacbToReuseTail+8

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nt

IMAGE_NAME: ntkrpamp.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 48d1b7fa

FAILURE_BUCKET_ID: 0xA_W_nt!CcMoveVacbToReuseTail+8

BUCKET_ID: 0xA_W_nt!CcMoveVacbToReuseTail+8

Followup: MachineOwner

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Charlie_Benger-Stevenson · November 7, 2008, 5:13pm

>>My guess is that you are causing pool corruption which is later resulting in
secondary failures.

Can you try enabling special pool in driver verifier? (verifier.exe)

S

>

Thanks for the feedback. Sadly I don’t understand any of that, sorry.

Ken_Johnson · November 7, 2008, 5:24pm

Specia pool is a debugging feature of driver verifier that can help catch most pool corruption bugs as they happen instead of when they cause secondary failures. Many “random crashes” in a completely unknown or unrelated section of code are caused by pool corruption.

Try searching the list archives for more details. There are some good resources (see the first hit) in this Google search at the time of this writing: http://www.google.com/m/search?mrestrict=xhtml&eosr=on&home=ig&q=enabling+special+pool+driver+verifier+site%3Aosronline.com

Verifier.exe is the program that ships with Windows and controls driver verifier settings.

S

-----Original Message-----
From: xxxxx@yahoo.co.uk
Sent: Friday, November 07, 2008 16:13
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] OK What now

>>My guess is that you are causing pool corruption which is later resulting in
secondary failures.

Can you try enabling special pool in driver verifier? (verifier.exe)

- S
>>

Thanks for the feedback. Sadly I don’t understand any of that, sorry.

—
NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Charlie_Benger-Stevenson · November 7, 2008, 6:28pm

Hey Ken,

I did this and did get some more meaningful stuff in my crash dump. I am still not any further forward though. I thought I could embelish on the filter driver in the ddk, and I have been doing well, but now this is starting to run away from me which is truly heartbreaking when you consider the hours I have put in. The crashdump makes no sense to me here

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: a8674f90, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: a41044e5, address which referenced memory

Debugging Details:

READ_ADDRESS: GetPointerFromAddress: unable to read from 81b53868
Unable to read MiSystemVaType memory at 81b33420
a8674f90

CURRENT_IRQL: 2

FAULTING_IP:
PKnox!TLInspectWorker+75 [c:\pkbuild\inspect.c @ 1661]
a41044e5 837a1001 cmp dword ptr [edx+10h],1

CUSTOMER_CRASH_COUNT: 17

DEFAULT_BUCKET_ID: VERIFIER_ENABLED_VISTA_MINIDUMP

BUGCHECK_STR: 0xD1

PROCESS_NAME: System

TRAP_FRAME: a31ffcd8 – (.trap 0xffffffffa31ffcd8)
ErrCode = 00000000
eax=a8674f80 ebx=00000000 ecx=a8674f80 edx=a8674f80 esi=870c1a40 edi=00000000
eip=a41044e5 esp=a31ffd4c ebp=a31ffd7c iopl=0 nv up ei ng nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282
PKnox!TLInspectWorker+0x75:
a41044e5 837a1001 cmp dword ptr [edx+10h],1 ds:0023:a8674f90=???
Resetting default scope

LAST_CONTROL_TRANSFER: from a41044e5 to 81a76d24

STACK_TEXT:
a31ffcd8 a41044e5 badb0d00 a8674f80 8362522c nt!KiTrap0E+0x2ac
a31ffd7c 81bf1b18 00000000 07e5ecac 00000000 PKnox!TLInspectWorker+0x75 [c:\pkbuild\inspect.c @ 1661]
a31ffdc0 81a4aa2e a4104470 00000000 00000000 nt!PspSystemThreadStartup+0x9d
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

STACK_COMMAND: kb

FOLLOWUP_IP:
PKnox!TLInspectWorker+75 [c:\pkbuild\inspect.c @ 1661]
a41044e5 837a1001 cmp dword ptr [edx+10h],1

SYMBOL_STACK_INDEX: 1

SYMBOL_NAME: PKnox!TLInspectWorker+75

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: PKnox

IMAGE_NAME: PKnox.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4914cc18

FAILURE_BUCKET_ID: 0xD1_VRF_PKnox!TLInspectWorker+75

BUCKET_ID: 0xD1_VRF_PKnox!TLInspectWorker+75

Followup: MachineOwner

Now the code that it is referencing at this line in TLInspectWorker is

if (packet->direction == FWP_DIRECTION_INBOUND)
{
RemoveEntryList(&packet->listEntry);
}

How on earth could this cause a blue screen like this?

OSR_Community_User · November 7, 2008, 6:59pm

Charlie - take a deep breath… packet is probably a bad value. Verifier found the bad read, where before verifier, RemoveEntryList was making a bad write. Add some code to check the validity of packet, or take a look at the structure of packet when the crash happens.

Ken_Johnson · November 7, 2008, 7:04pm

As sprochniak mentioned, ``packet’’ is probably bogus memory [that has been released to the pool but is still being used].

My guess is that you are freeing the memory pointed to by that variable to the pool, but still keeping ahold of it and thus ending up reusing it after the memory’s released. You should look through your code and see what code paths will result in you using an already-freed memory block in this context.

-S

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.co.uk
Sent: Friday, November 07, 2008 6:28 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] OK What now

Hey Ken,

I did this and did get some more meaningful stuff in my crash dump. I am still not any further forward though. I thought I could embelish on the filter driver in the ddk, and I have been doing well, but now this is starting to run away from me which is truly heartbreaking when you consider the hours I have put in. The crashdump makes no sense to me here

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: a8674f90, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: a41044e5, address which referenced memory

Debugging Details:

READ_ADDRESS: GetPointerFromAddress: unable to read from 81b53868
Unable to read MiSystemVaType memory at 81b33420
a8674f90

CURRENT_IRQL: 2

FAULTING_IP:
PKnox!TLInspectWorker+75 [c:\pkbuild\inspect.c @ 1661]
a41044e5 837a1001 cmp dword ptr [edx+10h],1

CUSTOMER_CRASH_COUNT: 17

DEFAULT_BUCKET_ID: VERIFIER_ENABLED_VISTA_MINIDUMP

BUGCHECK_STR: 0xD1

PROCESS_NAME: System

TRAP_FRAME: a31ffcd8 – (.trap 0xffffffffa31ffcd8)
ErrCode = 00000000
eax=a8674f80 ebx=00000000 ecx=a8674f80 edx=a8674f80 esi=870c1a40 edi=00000000
eip=a41044e5 esp=a31ffd4c ebp=a31ffd7c iopl=0 nv up ei ng nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282
PKnox!TLInspectWorker+0x75:
a41044e5 837a1001 cmp dword ptr [edx+10h],1 ds:0023:a8674f90=???
Resetting default scope

LAST_CONTROL_TRANSFER: from a41044e5 to 81a76d24

STACK_TEXT:
a31ffcd8 a41044e5 badb0d00 a8674f80 8362522c nt!KiTrap0E+0x2ac
a31ffd7c 81bf1b18 00000000 07e5ecac 00000000 PKnox!TLInspectWorker+0x75 [c:\pkbuild\inspect.c @ 1661]
a31ffdc0 81a4aa2e a4104470 00000000 00000000 nt!PspSystemThreadStartup+0x9d
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

STACK_COMMAND: kb

FOLLOWUP_IP:
PKnox!TLInspectWorker+75 [c:\pkbuild\inspect.c @ 1661]
a41044e5 837a1001 cmp dword ptr [edx+10h],1

SYMBOL_STACK_INDEX: 1

SYMBOL_NAME: PKnox!TLInspectWorker+75

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: PKnox

IMAGE_NAME: PKnox.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4914cc18

FAILURE_BUCKET_ID: 0xD1_VRF_PKnox!TLInspectWorker+75

BUCKET_ID: 0xD1_VRF_PKnox!TLInspectWorker+75

Followup: MachineOwner

Now the code that it is referencing at this line in TLInspectWorker is

if (packet->direction == FWP_DIRECTION_INBOUND)
{
RemoveEntryList(&packet->listEntry);
}

How on earth could this cause a blue screen like this?

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Charlie_Benger-Stevenson · November 8, 2008, 2:13am

The packet memory is created in the callout function, and added to the LinkedList for out of band processing. The packet is tested for null before being added to the linked list. The function that is creating the blue screen is the worker function which is fired out of band. Given that null packets cannot be added to the list, and the packet is not freed until after this line I cannot see how this would be bogus memory.

That said, I implemented a null check anyway. So now the code reads

if (packet!=NULL)
{
if (packet->direction!=NULL)
{
if (packet->direction == FWP_DIRECTION_INBOUND)
{
RemoveEntryList(&packet->ListEntry);

The line :

if (packet->direction !=NULL)

is now giving the blue screen.

From this I know that the packet is not null as the first if case passed ok. Testing for null on the packet->direction member before using it would surely prevent this blue screen, no?

Any thoughts here?

Michal_Vodicka-2 · November 8, 2008, 3:30am

Invalid doesn’t always mean NULL. The packet value in your dump is
(probably) 0xa8674f80 and it points to invalid memory. You don’t need to
add checks to find out that the value is wrong. This is what BSOD
already said. Instead, you need to find a reason why it is wrong. Ken
already gave you a good advice how to start.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@yahoo.co.uk
Sent: Saturday, November 08, 2008 8:11 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] OK What now

As sprochniak mentioned, ``packet’’ is probably bogus memory
[that has been
released to the pool but is still being used].

My guess is that you are freeing the memory pointed to by
that variable to the
pool, but still keeping ahold of it and thus ending up
reusing it after the
memory’s released. You should look through your code and see
what code paths
will result in you using an already-freed memory block in
this context.

The packet memory is created in the callout function, and
added to the LinkedList for out of band processing. The
packet is tested for null before being added to the linked
list. The function that is creating the blue screen is the
worker function which is fired out of band. Given that null
packets cannot be added to the list, and the packet is not
freed until after this line I cannot see how this would be
bogus memory.

That said, I implemented a null check anyway. So now the code reads

if (packet!=NULL)
{
if (packet->direction!=NULL)
{
if (packet->direction == FWP_DIRECTION_INBOUND)
{
RemoveEntryList(&packet->ListEntry);

The line :

if (packet->direction !=NULL)

is now giving the blue screen.

From this I know that the packet is not null as the first if
case passed ok. Testing for null on the packet->direction
member before using it would surely prevent this blue screen, no?

Any thoughts here?

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online
at http://www.osronline.com/page.cfm?name=ListServer

David_J_Craig · November 8, 2008, 3:37am

Why don’t you add some kdprints to your code and see if you are attempting
to process one packet in multiple threads on multiple CPUs at the same time.
Display the addresses, processor, and current thread ID. Run with windbg
and see. From what I remember of this thread, it appears you have written
some code (maybe all of it) before any testing and running with windbg,
verifier, and pool tracking. Most of us write and test code incrementally
to avoid such surprises - especially if writing a new driver.

wrote in message news:xxxxx@ntdev…
> [quote]
> As sprochniak mentioned, ``packet’’ is probably bogus memory [that has
> been
> released to the pool but is still being used].
>
> My guess is that you are freeing the memory pointed to by that variable to
> the
> pool, but still keeping ahold of it and thus ending up reusing it after
> the
> memory’s released. You should look through your code and see what code
> paths
> will result in you using an already-freed memory block in this context.
> [/quote]
>
> The packet memory is created in the callout function, and added to the
> LinkedList for out of band processing. The packet is tested for null
> before being added to the linked list. The function that is creating the
> blue screen is the worker function which is fired out of band. Given that
> null packets cannot be added to the list, and the packet is not freed
> until after this line I cannot see how this would be bogus memory.
>
> That said, I implemented a null check anyway. So now the code reads
>
> if (packet!=NULL)
> {
> if (packet->direction!=NULL)
> {
> if (packet->direction == FWP_DIRECTION_INBOUND)
> {
> RemoveEntryList(&packet->ListEntry);
>
> The line :
>
> if (packet->direction !=NULL)
>
> is now giving the blue screen.
>
> From this I know that the packet is not null as the first if case passed
> ok. Testing for null on the packet->direction member before using it would
> surely prevent this blue screen, no?
>
> Any thoughts here?
>
>

Charlie_Benger-Stevenson · November 8, 2008, 3:05pm

OK, 9 hours later after some diligent debugging I have the smoking gun but still no suspect. I will try to convey the facts as best I can in the hope that someone can point me further in the right direction.

This is based on the network packet fwp inspection example in the DDK don’t forget, so if you are familiar with that it would be helpful.

The driver loads fine, sets up the fwp callback routines ok. The execution path that causes the bsod is when the first callback function fires TLInspectALEConnectClassify. It creates a packet “object” - sorry oop programmer by nature, and InsertTailList it to the global LinkedList gConnList, and then fires the worker thread to pull from the gConnList and do some processing on the given packet. The problem is that in the worker thread, the packet object that is retrieved from the gConnList is corrupt, and when the first operation that is performed on it causes the blue screen.

Now arguably the gConnList variable is becoming corrupt, in that the global heap is somehow becoming corrupted somehow, but I do not have any idea how. I did for one fleeting minute suspect that spinlocks were not being released or some such, but the code path is totally clean.

Basically the callback says

Populate packet object
Obtain spinlock
Add packet to global linked list
Release spinlock
Call out of band processing function

The out of band processing function says

Obtain spinlock
Get packet object
Interrogate member of the packet object … whiz bang bsod

This code was working like a charm the other day, and I cannot see any big changes in this code execution path tbh. How can the gConnList get trashed? How can I prove it / investigate it further? What tools should I use? Is there a known gotcha with using global linkedlists I should know about?

Thanks in advance

David_J_Craig · November 8, 2008, 6:56pm

I see several problems with your ‘description’ of the problem.

“fires the worker thread” - How, what, when, where, who, etc. Be
specific.
‘InsertTailList’ does provide some specifics, but where and how the head
and other memory is allocated is not mentioned. Many of the drivers in the
network stack run at dispatch level IRQL.
Normally worker threads are passed a work item that includes the data or
pointers upon which they are to act. Using system worker threads is a bad
idea. Read “NT Insider” from many years ago about creating your own worker
threads.
Even if a list is used why use a double link list for this problem?
Sounds like too much OOP programming design.

wrote in message news:xxxxx@ntdev…
> OK, 9 hours later after some diligent debugging I have the smoking gun but
> still no suspect. I will try to convey the facts as best I can in the hope
> that someone can point me further in the right direction.
>
> This is based on the network packet fwp inspection example in the DDK
> don’t forget, so if you are familiar with that it would be helpful.
>
> The driver loads fine, sets up the fwp callback routines ok. The execution
> path that causes the bsod is when the first callback function fires
> TLInspectALEConnectClassify. It creates a packet “object” - sorry oop
> programmer by nature, and InsertTailList it to the global LinkedList
> gConnList, and then fires the worker thread to pull from the gConnList and
> do some processing on the given packet. The problem is that in the worker
> thread, the packet object that is retrieved from the gConnList is corrupt,
> and when the first operation that is performed on it causes the blue
> screen.
>
> Now arguably the gConnList variable is becoming corrupt, in that the
> global heap is somehow becoming corrupted somehow, but I do not have any
> idea how. I did for one fleeting minute suspect that spinlocks were not
> being released or some such, but the code path is totally clean.
>
> Basically the callback says
>
> Populate packet object
> Obtain spinlock
> Add packet to global linked list
> Release spinlock
> Call out of band processing function
>
> The out of band processing function says
>
> Obtain spinlock
> Get packet object
> Interrogate member of the packet object … whiz bang bsod
>
> This code was working like a charm the other day, and I cannot see any big
> changes in this code execution path tbh. How can the gConnList get
> trashed? How can I prove it / investigate it further? What tools should I
> use? Is there a known gotcha with using global linkedlists I should know
> about?
>
> Thanks in advance
>

OSR_Community_User · November 8, 2008, 7:34pm

Well if Packets are flying by you and you are queueing them onto a linked list to be handled by a worker thread, it’s more than likely that the next driver in the stack is taking that packet, having it’s way with it and then effectively completing it. Any operation you want to perform on these packets, you probably have to do synchronously.

Good for you to have some OO experience. The concept of encapsulation tends to escape people that don’t have any, and still has a lot of relevance in the world of quality WDM C programming.

David_J_Craig · November 8, 2008, 8:27pm

This is very old. Even NDIS miniports have a lot of the techniques without
the overhead (and help) provided by OOP languages. When you are started
with a piece of hardware, you create a memory block to contain your
information about the hardware and NDIS. For each call you receive you get
a ‘context’ so you can find that data for that device (you might have
several to control in one driver). When interrupted by the hardware it goes
first to NDIS, so the context is available to the driver even then.

In languages not considered OOP such as C, COBOL, and assembler, the
techniques of encapsulation can be used. Beginners won’t get it, but with
OOP they will find ways to write bad code even if the language tries to
avoid it. With any language, bad code can be written and will be written.
I have looked back at code from many years ago and see how much I have
learned. There are many things that must be considered to write good code
and OOP is only one. You have to consider the environment, resource
availability, OS interfaces, hardware limitations, and so on. Remember the
old saying that if the only tool you have is a hammer, everything looks like
a nail. Experience in the area involved is the only way to write good code
and even then you will have bad days where you will go down bad paths. No
one write excellent code all the time.

wrote in message news:xxxxx@ntdev…
> Well if Packets are flying by you and you are queueing them onto a linked
> list to be handled by a worker thread, it’s more than likely that the next
> driver in the stack is taking that packet, having it’s way with it and
> then effectively completing it. Any operation you want to perform on
> these packets, you probably have to do synchronously.
>
> Good for you to have some OO experience. The concept of encapsulation
> tends to escape people that don’t have any, and still has a lot of
> relevance in the world of quality WDM C programming.
>

Charlie_Benger-Stevenson · November 9, 2008, 3:00am

As I said before this is largely based on the inspect example in the DDK.
The LinkedLists are initialised in driverload.

The global variables are :

LIST_ENTRY gConnList; //Linked list of tcp connection attempts for processing

KSPIN_LOCK gConnListLock; //Locking mechanism for tcp connection list

LIST_ENTRY gPacketQueue; //Linked list of ip packets for processing

KSPIN_LOCK gPacketQueueLock; //Locking mechanismm for ip packet list

Then in driver main

InitializeListHead(&gConnList);

KeInitializeSpinLock(&gConnListLock);

InitializeListHead(&gPacketQueue);

KeInitializeSpinLock(&gPacketQueueLock);

Then set up the worker thread thus

KeInitializeEvent(

&gWorkerEvent,

NotificationEvent,

FALSE

);

Then set up the fwp callouts

status = TLInspectRegisterCallouts(

gDeviceObject

);

I shant post the code for the above function as it works fine and is irrelevant.

But so far the lists are initialised and the callouts set up. The driver loads and sits there just fine.

Then I fire a connection request at the box, and this callback function fires.

void

TLInspectALEConnectClassify(

IN const FWPS_INCOMING_VALUES0* inFixedValues,

IN const FWPS_INCOMING_METADATA_VALUES0* inMetaValues,

IN OUT void* layerData,

IN const FWPS_FILTER0* filter,

IN UINT64 flowContext,

OUT FWPS_CLASSIFY_OUT0* classifyOut

)

NTSTATUS status;

KLOCK_QUEUE_HANDLE connListLockHandle;

KLOCK_QUEUE_HANDLE packetQueueLockHandle;

TL_INSPECT_PENDED_PACKET* pendedConnect = NULL;

TL_INSPECT_PENDED_PACKET* pendedPacket = NULL;

ADDRESS_FAMILY addressFamily;

FWPS_PACKET_INJECTION_STATE packetState;

BOOLEAN signalWorkerThread;

// We don’t have the necessary right to alter the classify, exit.

if ((classifyOut->rights & FWPS_RIGHT_ACTION_WRITE) == 0)

{
goto Exit;

}

if (layerData != NULL)

{

// We don’t re-inspect packets that we’ve inspected earlier.

packetState = FwpsQueryPacketInjectionState0(

gInjectionHandle,

layerData,

NULL

);

if ((packetState == FWPS_PACKET_INJECTED_BY_SELF) ||

(packetState == FWPS_PACKET_PREVIOUSLY_INJECTED_BY_SELF))

{
classifyOut->actionType = FWP_ACTION_PERMIT;

goto Exit;

}

}
addressFamily = GetAddressFamilyForLayer(inFixedValues->layerId);

if (!IsAleReauthorize(inFixedValues))

{

//

// If the classify is the initial authorization for a connection, we

// queue it to the pended connection list and notify the worker thread

// for out-of-band processing.

//

pendedConnect = AllocateAndInitializePendedPacket(

inFixedValues,

inMetaValues,

addressFamily,

layerData,

TL_INSPECT_CONNECT_PACKET,

FWP_DIRECTION_OUTBOUND

);

Note that pendedConnect is our packet object that we will add to the Linked list in a mo, read on with the code …

if (pendedConnect == NULL)

ASSERT(FWPS_IS_METADATA_FIELD_PRESENT(inMetaValues,

FWPS_METADATA_FIELD_COMPLETION_HANDLE));

//

// Pend the ALE_AUTH_CONNECT classify.

//

status = FwpsPendOperation0(

inMetaValues->completionHandle,

&pendedConnect->completionContext

);

if (!NT_SUCCESS(status))

{

classifyOut->actionType = FWP_ACTION_BLOCK;

classifyOut->rights &= ~FWPS_RIGHT_ACTION_WRITE;
{

classifyOut->actionType = FWP_ACTION_BLOCK;

classifyOut->rights &= ~FWPS_RIGHT_ACTION_WRITE;

goto Exit;

}

goto Exit;

}

Ah now get those cheeky spin locks

KeAcquireInStackQueuedSpinLock(

&gConnListLock,

&connListLockHandle

);

KeAcquireInStackQueuedSpinLock(

&gPacketQueueLock,

&packetQueueLockHandle

);

signalWorkerThread = IsListEmpty(&gConnList) &&

IsListEmpty(&gPacketQueue);

InsertTailList(&gConnList, &pendedConnect->listEntry);

Now test the ->direction member as this later causes a blue screen
if (pendedConnect->direction == FWP_DIRECTION_INBOUND)

{
DbgPrint("Inbound connection added to the linked list ");
}
else
{
DbgPrint("Outbound connection added to the linked list ");
}

pendedConnect = NULL; // ownership transferred

Release those cheeky spinlocks

KeReleaseInStackQueuedSpinLock(&packetQueueLockHandle);

KeReleaseInStackQueuedSpinLock(&connListLockHandle);

classifyOut->actionType = FWP_ACTION_BLOCK;

classifyOut->flags |= FWPS_CLASSIFY_OUT_FLAG_ABSORB;

if (signalWorkerThread)

{
KeSetEvent(

&gWorkerEvent,

0,

FALSE

);

}

Ok so now we just fired the worker thread (in answer to the how,when where type questions. This is how it is done in the DDK example, and as I say this has been working just fine up till now.

So now we are in the worker thread.

void

TLInspectWorker(

IN PVOID StartContext

)

{

NTSTATUS status;

TL_INSPECT_PENDED_PACKET* packet;

LIST_ENTRY* listEntry;

KLOCK_QUEUE_HANDLE packetQueueLockHandle;

KLOCK_QUEUE_HANDLE connListLockHandle;

UNREFERENCED_PARAMETER(StartContext);

while (1)

{

KeWaitForSingleObject(

&gWorkerEvent,

Executive,

KernelMode,

FALSE,

NULL

);

OK been sitting here for a while, as this was set up in DriverLoad, and now that we have been kicked off from the callback lets get going.

if (gDriverUnloading)

{

break;

}

listEntry = NULL;

KeAcquireInStackQueuedSpinLock(

&gConnListLock,

&connListLockHandle

);

Cheeky spin lock now acquired

if (!IsListEmpty(&gConnList))

{

listEntry = gConnList.Flink;

packet = CONTAINING_RECORD(

listEntry,

TL_INSPECT_PENDED_PACKET,

listEntry

);

OK so the list isnt empty and we just got the first record

This next line, 'ere be dragons. Bang fizz wallop blue screen.
if (packet->direction == FWP_DIRECTION_INBOUND)

{

RemoveEntryList(&packet->listEntry);

}

Now there is exactly how it 'appened m’ludd.

I should just add that I am using workitems to log stuff.

I wrote a function called createLogEntry

NTSTATUS createLogEntry(int errorCode, char * errorMessage)

{

NTSTATUS status;

PIO_WORKITEM pWorkItem;

TL_LOG_ITEM_CONTEXT * logItemContext;

status = STATUS_SUCCESS;

pWorkItem = IoAllocateWorkItem(gDeviceObject);

if (pWorkItem == NULL)

{

//DbgPrint the error if Debugging is turned on

#ifdef DEBUG_OUTPUT

DbgPrint(“A problem occured in function createLogEntry. Kernel api call IoAllocateWorkItem returned a null. \n”);

#endif

return STATUS_UNSUCCESSFUL;

}

else

{

logItemContext = ExAllocatePoolWithTag(NonPagedPool,sizeof(TL_LOG_ITEM_CONTEXT),TL_INSPECT_LOG_ITEM_TAG);

if (logItemContext == NULL)

{

IoFreeWorkItem(pWorkItem);

//DbgPrint the error if Debugging is turned on

#ifdef DEBUG_OUTPUT

DbgPrint(“A problem occured in function createLogEntry. Kernel api call ExAllocatePoolWithTag returned a null. \n”);

#endif

return STATUS_UNSUCCESSFUL;

}

else

{

RtlZeroMemory(logItemContext, sizeof(TL_LOG_ITEM_CONTEXT));

}

logItemContext->errorCode=errorCode;

logItemContext->errorMessage=errorMessage;

logItemContext->previousWorkItem=pWorkItem;

//This next line causes a blue screen, why?

IoQueueWorkItem(pWorkItem,logCallback,DelayedWorkQueue,logItemContext);

}

return status;

}

Which calls the work item

VOID logCallback(IN PDEVICE_OBJECT DeviceObject,

IN PVOID Context)

{

//Retrieve the packet information from the context as this is fired from a WorkItem

#define BUFFER_SIZE 255

CHAR buffer[BUFFER_SIZE];

size_t cb;

IO_STATUS_BLOCK ioStatusBlock;

NTSTATUS ntstatus;

TL_LOG_ITEM_CONTEXT * logItemContext = (TL_LOG_ITEM_CONTEXT *)Context;

// Do not try to perform any file operations at higher IRQL levels.

// Instead, you may use a work item or a system worker thread to perform file operations.

if(KeGetCurrentIrql() != PASSIVE_LEVEL)

{

//DbgPrint the error if Debugging is turned on

#ifdef DEBUG_OUTPUT

DbgPrint(“A problem occured in function logPacketCallback. Cannot log at this IRQ level \n”);

#endif

IoFreeWorkItem(logItemContext->previousWorkItem);

ExFreePool(logItemContext);

logItemContext = NULL;

return STATUS_INVALID_DEVICE_STATE;

}

ntstatus = RtlStringCbPrintfA(buffer, sizeof(buffer), “%d,%s \n”, logItemContext->errorCode,logItemContext->errorMessage);

if(NT_SUCCESS(ntstatus)) {

ntstatus = RtlStringCbLengthA(buffer, sizeof(buffer), &cb);

if(NT_SUCCESS(ntstatus)) {

ntstatus = ZwWriteFile(gLogFileHandle, NULL, NULL, NULL, &ioStatusBlock,

buffer, cb, NULL, NULL);

if(!NT_SUCCESS(ntstatus)) {

//DbgPrint the error if Debugging is turned on

#ifdef DEBUG_OUTPUT

DbgPrint(“A problem occured in function logPacketCallback. Kernel api call ZwWriteFile returned an error. \n”);

#endif

}

else

{

//DbgPrint the error if Debugging is turned on

#ifdef DEBUG_OUTPUT

DbgPrint(“A problem occured in function logPacketCallback. Kernel api call RtlStringCbPrintfA returned an error. \n”);

#endif

}

IoFreeWorkItem(logItemContext->previousWorkItem);

ExFreePool(logItemContext);

logItemContext = NULL;

return;

}

And I have been peppering calls to createLogEntry all over the code in an effort to try and create some sort of running log file. I have been putting these calls inside #ifdef directives so I can turn them off quite easily and this was the first thing I did when I started getting the bsods.

As I say it is beyond my understanding what is going wrong here. As you can see, I set up the lists correctly, get the spinlock, add the packet to the list, release the spinlock. Then in the worker thread get the spinlock, retrieve the first list item, and then BANG as soon as I try and poke the retrieved item with a crooked stick.

Sorry for the rather lengthy post, but I just felt that perhaps people needed to see code to make learned comments.

Thanks

Pavel_A1 · November 9, 2008, 6:39pm

“David Craig” wrote in message news:xxxxx@ntdev…

> Remember the old saying that if the only tool you have is a hammer,
> everything looks like a nail.

to extend this analogy - if the only tool you have is windbg… everything
looks like a dump?

–PA

David_J_Craig · November 9, 2008, 8:59pm

I wouldn’t think you make that analogy. Windbg has registers, memory,
source, assembly, stack, locals, globals, etc. display windows so you can
get many views into your code. I do use windbg for dumps, but most of the
time is live debug sessions using 1394a to test changed code or examine the
behavior of suspected failure conditions.

The analogy I used continues with using screws as if they were nails. I you
want to make that analogy with windbg you could add in SoftIce somewhere
along with Vista where there is no real compatibility between the two.

“Pavel A.” wrote in message news:xxxxx@ntdev…
> “David Craig” wrote in message
> news:xxxxx@ntdev…
>
>> Remember the old saying that if the only tool you have is a hammer,
>> everything looks like a nail.
>
> to extend this analogy - if the only tool you have is windbg… everything
> looks like a dump?
>
> --PA
>

Charlie_Benger-Stevenson · November 10, 2008, 5:04am

So I’m guessing no one knows the answer, and now I should stop checking this thread as we are ending up in analogy talks? I was previously asked “1. “fires the worker thread” - How, what, when, where, who, etc. Be specific.”

I have done that. I was asked to put some DbgPrints in my code and trace the path. I have done that. I was asked to check if anywhere along the path I was Feeing memory up and then trying to access it once it has been freed. I have done that, and I am not. The fact remains that a global linked list is getting trashed and I can’t for the life of me see where. I have posted all the code in the execution path and am hanging on for some sort of clue from someone more knowledgable, after all thats what this thread is for, right?

I am aware of the golden hammer anti pattern, and this is quite insulting to me as I have many long years of programming experience under my belt. This is not some rookie coder who thought he’d have a bash a writing a kernel mode driver, I am flat out on this one, its not through want of effort I can tell you. I just need a learned driver developer to cast his eye over the findings I have posted recently and the code I am pouring over and give me some sort of clue.

Thanks

David_R_Cattley · November 10, 2008, 10:10am

Mr. Benger-Stevenson,

I think you are missing the point here to some extent. This is about
finding the right question. The answer will be obvious once you (we) do.

Mr. Craig suggested a very powerful and useful bit of diagnostics to install
into your list (packet queue) handling code: Log everything with
processor/thread IDs included and watch every operation on the packet
objects and packet queue itself. Have you done that?

Mr. Prochniak suggested an important debugging technique of ‘self
verification’ where you embed a unique type signature into your object as an
aid in validating it is not corrupt. Think PoolTag.

As I read and re-read the posts in this thread I had the following thoughts
(as a network driver guy):

Are you sure you have claimed ownership of the packet from the callout such
that you even have the right to queue it?

Have you considered all paths (especially external paths) that might access
the lists and or list elements?

Does your design treat the lists purely as queues? By this I mean that you
cannot ‘touch’ the packet in any way when it is on the queue. The only
operations are to enque or dequeue it. The lock then would only be used to
synchronize access to the list head (queue) and list entry fields in the
pack *and nothing else*. If that is the case, how are you synchronizing
access to the packet itself and especially its deletion? Does your design
explicitly imply that only a single activity can own a packet reference
(either the creator, the queue, or the consumer)?

I find that encapsulating all of the queue access into FORCEINLINE routines
instead of scattering CONTAINING_RECORD(), IsListEmpty(), and other code
throughout the other routines is a handy way of making an OOP design retain
some OO. It is also easier to insert the instrumentation because you have a
single (source code) point of access to key data structures and operations
(like enqueue, dequeue) where validation of these structures can be done at
access time.

Detecting corruption of a queue based on LIST_ENTRY is pretty straight
forward. It requires that you keep a counter of the number of elements you
*think* are in the queue and simply a routine which walks the list
validating what it finds. If it walks more elements than are supposed to be
in the list, it should ASSERT(). If it does not find enough, it should
ASSERT(). If it finds an element that does not make sense (type check
fails) it should ASSERT(). Obviously if it runs across trash and causes a
bugcheck, you have found trash.

Interlocked{Increment|Decrement} counters on key operations are very useful
when trying to prove that your async allocation/queue/process/deallocation
routines are working. Count the number of packets allocated. Count how
many are queued. Count how many are dequeued. Count how many are in the
queue. Count how many are processed. Count how many are freed. When the
crash occurs, look at the counts. Do the add up?

I don’t think anyone here is trying to insult you. IMHO this thread has
been chocked full of useful stuff.

Good Luck,
Dave Cattley
Consulting Engineer
Systems Software Development

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.co.uk
Sent: Monday, November 10, 2008 5:03 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] OK What now

So I’m guessing no one knows the answer, and now I should stop checking this
thread as we are ending up in analogy talks? I was previously asked “1.
“fires the worker thread” - How, what, when, where, who, etc. Be specific.”

I have done that. I was asked to put some DbgPrints in my code and trace the
path. I have done that. I was asked to check if anywhere along the path I
was Feeing memory up and then trying to access it once it has been freed. I
have done that, and I am not. The fact remains that a global linked list is
getting trashed and I can’t for the life of me see where. I have posted all
the code in the execution path and am hanging on for some sort of clue from
someone more knowledgable, after all thats what this thread is for, right?

I am aware of the golden hammer anti pattern, and this is quite insulting to
me as I have many long years of programming experience under my belt. This
is not some rookie coder who thought he’d have a bash a writing a kernel
mode driver, I am flat out on this one, its not through want of effort I can
tell you. I just need a learned driver developer to cast his eye over the
findings I have posted recently and the code I am pouring over and give me
some sort of clue.

Thanks

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Ken_Johnson · November 10, 2008, 2:18pm

Please step back for a moment, and take a deep breath.

This is a volunteer mailing list and not a paid support group - and it’s much difficult to troubleshoot your problem with the limited visibility into the issue that we have versus what you have available to you. Thus, we are relying on you to tell us what we need to know, as we don’t have the source in front of us to make inferences from. In some cases, this means we’re only able to offer general advice.

Now… is the code you posted the *only* place that references those two linked lists?

The thought pattern that you should be using here is that something is freeing the “packet” object before you get to the " if (packet->direction == FWP_DIRECTION_INBOUND)" test. It might, thus, be more productive to start working backwards from all the places where you might free a packet back to the pool and see if there’s any way that a packet could get wrongly freed while it’s still in the linked list.

Based on the limited information that I have available to me here, I’d say the most probable cause is that the packet is being freed to the pool while still being used, given where the crash happened with special pool. At this point, the problem really is in your court, for the most part, as we don’t have the source code, and we can’t thus look through all the logic that might free a packet to make sure it doesn’t have a bug. If I were in your position, that would be my next step.

S

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.co.uk
Sent: Monday, November 10, 2008 5:03 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] OK What now

So I’m guessing no one knows the answer, and now I should stop checking this thread as we are ending up in analogy talks? I was previously asked “1. “fires the worker thread” - How, what, when, where, who, etc. Be specific.”

I have done that. I was asked to put some DbgPrints in my code and trace the path. I have done that. I was asked to check if anywhere along the path I was Feeing memory up and then trying to access it once it has been freed. I have done that, and I am not. The fact remains that a global linked list is getting trashed and I can’t for the life of me see where. I have posted all the code in the execution path and am hanging on for some sort of clue from someone more knowledgable, after all thats what this thread is for, right?

I am aware of the golden hammer anti pattern, and this is quite insulting to me as I have many long years of programming experience under my belt. This is not some rookie coder who thought he’d have a bash a writing a kernel mode driver, I am flat out on this one, its not through want of effort I can tell you. I just need a learned driver developer to cast his eye over the findings I have posted recently and the code I am pouring over and give me some sort of clue.

Thanks

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer