Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting... Please check out the Community Guidelines in the
Announcements and Administration Category, below.

Crash after SRB_FUNCTION_PNP in Storport Driver

Vincent_JinVincent_Jin Posts: 33
During the driver install, I got following stack. This happened after an SRB_FUNCTION_PNP IRP with SrbPnPFlags = 0x00000000.
This storport driver is for a 16T device. If I reduce the size to 8T, driver works fine.
This stack gives me no information related to our code. Is there any way to do further debug?

*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 0000000000000028, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff8019ef708e2, address which referenced memory

Debugging Details:
------------------


DUMP_CLASS: 1

DUMP_QUALIFIER: 0

BUILD_VERSION_STRING: 9600.17041.amd64fre.winblue_gdr.140305-1710

DUMP_TYPE: 0

BUGCHECK_P1: 28

BUGCHECK_P2: 2

BUGCHECK_P3: 0

BUGCHECK_P4: fffff8019ef708e2

READ_ADDRESS: 0000000000000028

CURRENT_IRQL: 2

FAULTING_IP:
nt!MiInsertIoSpaceMap+16a
fffff801`9ef708e2 48395a28 cmp qword ptr [rdx+28h],rbx

CPU_COUNT: 38

CPU_MHZ: 7d0

CPU_VENDOR: GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 3f

CPU_STEPPING: 2

CPU_MICROCODE: 6,3f,2,0 (F,M,S,R) SIG: 38'00000000 (cache) 38'00000000 (init)

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

BUGCHECK_STR: AV

PROCESS_NAME: System

ANALYSIS_SESSION_HOST: VINCE-PC

ANALYSIS_SESSION_TIME: 03-02-2018 17:30:45.0777

ANALYSIS_VERSION: 10.0.15063.400 amd64fre

TRAP_FRAME: ffffd00157429500 -- (.trap 0xffffd00157429500)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8019ef708e2 rsp=ffffd00157429690 rbp=ffffd00157429708
r8=fffff8019f13ab70 r9=00000000000c7840 r10=00000000000c7841
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na po cy
nt!MiInsertIoSpaceMap+0x16a:
fffff801`9ef708e2 48395a28 cmp qword ptr [rdx+28h],rbx ds:00000000`00000028=????????????????
Resetting default scope

LAST_CONTROL_TRANSFER: from fffff8019f060a46 to fffff8019efddb90

STACK_TEXT:
ffffd001`57428c08 fffff801`9f060a46 : 00000000`00000000 00000000`00000000 ffffd001`57428d70 fffff801`9eecd8cc : nt!DbgBreakPointWithStatus
ffffd001`57428c10 fffff801`9f060357 : 00000000`00000003 ffffd001`57428d70 fffff801`9efe4f80 00000000`0000000a : nt!KiBugCheckDebugBreak+0x12
ffffd001`57428c70 fffff801`9efd70a4 : 00000000`00000001 ffffd001`57429898 00000000`00000000 ffffffff`80000300 : nt!KeBugCheck2+0x8ab
ffffd001`57429380 fffff801`9efe2ae9 : 00000000`0000000a 00000000`00000028 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx+0x104
ffffd001`574293c0 fffff801`9efe133a : 00000000`00000000 00000000`00000001 00000000`00000000 ffffd001`57429500 : nt!KiBugCheckDispatch+0x69
ffffd001`57429500 fffff801`9ef708e2 : 00000000`0000014b ffffd001`580af000 00000000`00000020 00000000`000007ff : nt!KiPageFault+0x23a
ffffd001`57429690 fffff801`9ef703da : ffffd001`580af000 00000000`000c7840 00000000`00000023 00000000`00000000 : nt!MiInsertIoSpaceMap+0x16a
ffffd001`57429750 fffff801`9ef7014c : 00000000`00000000 ffffc001`8ca3b7b4 ffffc001`8ca3b7a0 00000000`00001000 : nt!MiMapIoSpace+0x286
ffffd001`57429860 fffff801`57f0d7b5 : fffff801`000008d0 ffffe000`00001003 ffffc001`8ca3b818 ffffc001`8ca3b7a0 : nt!MmMapIoSpace+0xc
ffffd001`57429890 fffff801`57f0a311 : ffffe000`627c09d0 ffffe00c`6dcd2730 00000000`003dec10 ffffd001`57429960 : pci!PciProcessStartResources+0x2765
ffffd001`57429910 fffff801`57ee5996 : ffffe000`627e8928 fffff801`57a249b0 00000000`00000000 00000000`00000000 : pci!PciDevice_Start+0x101
ffffd001`57429a90 fffff801`57a82bf0 : ffffe000`627e8928 ffffe000`6232e028 00000000`00000000 00000000`00000000 : pci!PciDispatchPnpPower+0x96
ffffd001`57429ad0 fffff801`9eed6adb : 00000000`00000000 00000000`00000000 fffff801`57a82b00 ffffd001`57429bd0 : ACPI!ACPIFilterIrpStartDeviceWorker+0xf0
ffffd001`57429b50 fffff801`9ef52794 : 00000000`00000000 ffffe000`63573880 ffffe000`63573880 ffffe000`61ba6900 : nt!ExpWorkerThread+0x293
ffffd001`57429c00 fffff801`9efdd5c6 : ffffd001`541c7180 ffffe000`63573880 ffffd001`541d3fc0 00000000`00000000 : nt!PspSystemThreadStartup+0x58
ffffd001`57429c60 00000000`00000000 : ffffd001`5742a000 ffffd001`57424000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16


STACK_COMMAND: kb

THREAD_SHA1_HASH_MOD_FUNC: 46fe3e28d5234cc637b3ee8135aeb41d83e8881e

THREAD_SHA1_HASH_MOD_FUNC_OFFSET: 2976d692767c4598a9723aab8f2f7a8dde19c594

THREAD_SHA1_HASH_MOD: 6cfd8d23c7422c9d585af32c61b05905a0dd1e59

FOLLOWUP_IP:
pci!PciProcessStartResources+2765
fffff801`57f0d7b5 49894510 mov qword ptr [r13+10h],rax

FAULT_INSTR_CODE: 10458949

SYMBOL_STACK_INDEX: 9

SYMBOL_NAME: pci!PciProcessStartResources+2765

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: pci

IMAGE_NAME: pci.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 53089439

IMAGE_VERSION: 6.3.9600.17031

BUCKET_ID_FUNC_OFFSET: 2765

FAILURE_BUCKET_ID: AV_pci!PciProcessStartResources

BUCKET_ID: AV_pci!PciProcessStartResources

PRIMARY_PROBLEM_CLASS: AV_pci!PciProcessStartResources

TARGET_TIME: 2018-03-02T09:27:54.000Z

OSBUILD: 9600

OSSERVICEPACK: 0

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK: 272

PRODUCT_TYPE: 3

OSPLATFORM_TYPE: x64

OSNAME: Windows 8.1

OSEDITION: Windows 8.1 Server TerminalServer SingleUserTS

OS_LOCALE:

USER_LCID: 0

OSBUILD_TIMESTAMP: 2014-03-06 13:18:55

BUILDDATESTAMP_STR: 140305-1710

BUILDLAB_STR: winblue_gdr

BUILDOSVER_STR: 6.3.9600.17041.amd64fre.winblue_gdr.140305-1710

ANALYSIS_SESSION_ELAPSED_TIME: 647

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:av_pci!pciprocessstartresources

FAILURE_ID_HASH: {233d600d-cab5-c458-a35a-b1b07a268848}

Followup: MachineOwner

Comments

  • Scott_NooneScott_Noone Posts: 2,989
    If I'm looking at the same version as you, here's the assembly leading up to
    your faulting instruction:

    fffff801`ed710645 mov eax,dword ptr [nt!MmIoHeaderData+0x38
    (fffff801`ed955b78)]
    fffff801`ed71064b mov rdx,qword ptr [nt!MmIoHeader (fffff801`ed955af0)]
    fffff801`ed710652 mov dword ptr [rbp-58h],eax

    nt!MiInsertIoSpaceMap+0x159:
    fffff801`ed710655 lea rax,[nt!MmIoHeader (fffff801`ed955af0)]

    nt!MiInsertIoSpaceMap+0x160:
    fffff801`ed71065c cmp rdx,rax
    fffff801`ed71065f je nt!MiInsertIoSpaceMap+0x29f (fffff801`ed71079b)

    nt!MiInsertIoSpaceMap+0x169:
    fffff801`ed710665 cmp qword ptr [rdx+28h],rbx

    Which would mean that nt!MmIoHeader is NULL.

    I've never debugged anything down this path before so I don't know what that
    is, but this looks like the Flink of a global list is corrupt and set to
    NULL (and something around I/O space/device memory, which is suspicious).

    I'd set a write access breakpoint on nt!MmIoHeader during boot and watch
    what happens in the crashing case versus the working case. You can also
    break if the value gets set to NULL:

    ba w8 nt!MmIoHeader ".echo \"Modified...\" ; dq nt!MmIoHeader L1; .if
    (poi(nt!MmIoHeader) != 0) {gc}"

    Not sure that's going to be useful, but it's something to try.

    Also, what do you mean by "16T" versus "8T"? Is that the storage size? Does
    that affect the size of the PCIe device memory presented to the host?

    -scott
    OSR
    @OSRDrivers
  • Jan_BottorffJan_Bottorff Posts: 464
    I don't seem to have the original message describing this, but I know one silly reason for crashes in StorPort SRB_FUNCTION_PNP. If I'm totally out of context, ignore this.

    If the lower driver fails the PnP start IRP, StorPort still calls the miniport StartIO function with SRB_FUNCTION_PNP, except StorPort has never allocated/initialized the device context, so any accesses crash. The docs do not spell out the need to ignore any call to StartIO with a null context, without completing the SRB. The docs say every SRB needs to get completed, which in this case is wrong.

    The underlying cause is the PnP start was failed, like the device vanished from the bus for a bit after PCI enumeration detected it. The WHQL tests sometimes can stimulate this kind of behavior. You can also force it to happen by writing a little filter driver that fails PnP start on demand by changing the IRP result code as the IRP is going up the completion path.

    Jan

    -----Original Message-----
    From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Scott Noone <xxxxx@osr.com>
    Sent: Thursday, March 8, 2018 8:01 AM
    To: Windows System Software Devs Interest List <xxxxx@lists.osr.com>
    Subject: Re:[ntdev] Crash after SRB_FUNCTION_PNP in Storport Driver

    If I'm looking at the same version as you, here's the assembly leading up to your faulting instruction:

    fffff801`ed710645 mov eax,dword ptr [nt!MmIoHeaderData+0x38
    (fffff801`ed955b78)]
    fffff801`ed71064b mov rdx,qword ptr [nt!MmIoHeader (fffff801`ed955af0)]
    fffff801`ed710652 mov dword ptr [rbp-58h],eax

    nt!MiInsertIoSpaceMap+0x159:
    fffff801`ed710655 lea rax,[nt!MmIoHeader (fffff801`ed955af0)]

    nt!MiInsertIoSpaceMap+0x160:
    fffff801`ed71065c cmp rdx,rax
    fffff801`ed71065f je nt!MiInsertIoSpaceMap+0x29f (fffff801`ed71079b)

    nt!MiInsertIoSpaceMap+0x169:
    fffff801`ed710665 cmp qword ptr [rdx+28h],rbx

    Which would mean that nt!MmIoHeader is NULL.

    I've never debugged anything down this path before so I don't know what that is, but this looks like the Flink of a global list is corrupt and set to NULL (and something around I/O space/device memory, which is suspicious).

    I'd set a write access breakpoint on nt!MmIoHeader during boot and watch what happens in the crashing case versus the working case. You can also break if the value gets set to NULL:

    ba w8 nt!MmIoHeader ".echo \"Modified...\" ; dq nt!MmIoHeader L1; .if
    (poi(nt!MmIoHeader) != 0) {gc}"

    Not sure that's going to be useful, but it's something to try.

    Also, what do you mean by "16T" versus "8T"? Is that the storage size? Does that affect the size of the PCIe device memory presented to the host?

    -scott
    OSR
    @OSRDrivers


    ---
    NTDEV is sponsored by OSR

    Visit the list online at: <http://www.osronline.com/showlists.cfm?list=ntdev>;

    MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
    Details at <http://www.osr.com/seminars>;

    To unsubscribe, visit the List Server section of OSR Online at <http://www.osronline.com/page.cfm?name=ListServer>;
  • Vincent_JinVincent_Jin Posts: 33
    The OS version is 2008R2.

    I've done other tests. This crash happens for about 50% of driver installs.
    If driver installed successfully, it can work well like for ever. If it fails, it always after SRB_FUNCTION_PNP with flag set to 0 and action set to StorQueryCapabilities before it starts to query the capacity.
    Now it can be concluded that this only happen in our PCIe Gen3 device. The same driver works fine with Gen2 device. Block device size is not related.

    I also suspect that there's something wrong with PnP fuction in the device.

    -----Original Message-----
    Jan Bottorff
    xxxxx@pmatrix.com
    Join Date: 16 Apr 2013
    Posts To This List: 419
    Crash after SRB_FUNCTION_PNP in Storport Driver
    I don't seem to have the original message describing this, but I know one silly
    reason for crashes in StorPort SRB_FUNCTION_PNP. If I'm totally out of context,
    ignore this.

    If the lower driver fails the PnP start IRP, StorPort still calls the miniport
    StartIO function with SRB_FUNCTION_PNP, except StorPort has never
    allocated/initialized the device context, so any accesses crash. The docs do not
    spell out the need to ignore any call to StartIO with a null context, without
    completing the SRB. The docs say every SRB needs to get completed, which in this
    case is wrong.

    The underlying cause is the PnP start was failed, like the device vanished from
    the bus for a bit after PCI enumeration detected it. The WHQL tests sometimes
    can stimulate this kind of behavior. You can also force it to happen by writing
    a little filter driver that fails PnP start on demand by changing the IRP result
    code as the IRP is going up the completion path.

    Jan

    -----Original Message-----
    From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Scott Noone
    <xxxxx@osr.com>
    Sent: Thursday, March 8, 2018 8:01 AM
    To: Windows System Software Devs Interest List <xxxxx@lists.osr.com>
    Subject: Re:[ntdev] Crash after SRB_FUNCTION_PNP in Storport Driver

    If I'm looking at the same version as you, here's the assembly leading up to
    your faulting instruction:

    fffff801`ed710645 mov eax,dword ptr [nt!MmIoHeaderData+0x38
    (fffff801`ed955b78)]
    fffff801`ed71064b mov rdx,qword ptr [nt!MmIoHeader (fffff801`ed955af0)]
    fffff801`ed710652 mov dword ptr [rbp-58h],eax

    nt!MiInsertIoSpaceMap+0x159:
    fffff801`ed710655 lea rax,[nt!MmIoHeader (fffff801`ed955af0)]

    nt!MiInsertIoSpaceMap+0x160:
    fffff801`ed71065c cmp rdx,rax
    fffff801`ed71065f je nt!MiInsertIoSpaceMap+0x29f (fffff801`ed71079b)

    nt!MiInsertIoSpaceMap+0x169:
    fffff801`ed710665 cmp qword ptr [rdx+28h],rbx

    Which would mean that nt!MmIoHeader is NULL.

    I've never debugged anything down this path before so I don't know what that is,
    but this looks like the Flink of a global list is corrupt and set to NULL (and
    something around I/O space/device memory, which is suspicious).

    I'd set a write access breakpoint on nt!MmIoHeader during boot and watch what
    happens in the crashing case versus the working case. You can also break if the
    value gets set to NULL:

    ba w8 nt!MmIoHeader ".echo \"Modified...\" ; dq nt!MmIoHeader L1; .if
    (poi(nt!MmIoHeader) != 0) {gc}"

    Not sure that's going to be useful, but it's something to try.

    Also, what do you mean by "16T" versus "8T"? Is that the storage size? Does that
    affect the size of the PCIe device memory presented to the host?

    -scott
    OSR
    @OSRDrivers


    ---
  • Jan_BottorffJan_Bottorff Posts: 464
    On the crash I was talking about, HwStartIO was called BEFORE HwStorFindAdapter. I saw it on some hardware during the WHQL tests for PCIe compliance. Since the system would crash due to a null memory access, it was not so apparent exactly where in the WHQL test it failed. Anything that causes a PnP start to fail could stimulate the behavior.

    Jan

    -----Original Message-----
    From: xxxxx@lists.osr.com <xxxxx@lists.osr.com> On Behalf Of xxxxx@gmail.com
    Sent: Tuesday, March 13, 2018 10:47 PM
    To: Windows System Software Devs Interest List <xxxxx@lists.osr.com>
    Subject: RE:[ntdev] Crash after SRB_FUNCTION_PNP in Storport Driver

    The OS version is 2008R2.

    I've done other tests. This crash happens for about 50% of driver installs.
    If driver installed successfully, it can work well like for ever. If it fails, it always after SRB_FUNCTION_PNP with flag set to 0 and action set to StorQueryCapabilities before it starts to query the capacity.
    Now it can be concluded that this only happen in our PCIe Gen3 device. The same driver works fine with Gen2 device. Block device size is not related.

    I also suspect that there's something wrong with PnP fuction in the device.

    -----Original Message-----
    Jan Bottorff
    xxxxx@pmatrix.com
    Join Date: 16 Apr 2013
    Posts To This List: 419
    Crash after SRB_FUNCTION_PNP in Storport Driver I don't seem to have the original message describing this, but I know one silly reason for crashes in StorPort SRB_FUNCTION_PNP. If I'm totally out of context, ignore this.

    If the lower driver fails the PnP start IRP, StorPort still calls the miniport StartIO function with SRB_FUNCTION_PNP, except StorPort has never allocated/initialized the device context, so any accesses crash. The docs do not spell out the need to ignore any call to StartIO with a null context, without completing the SRB. The docs say every SRB needs to get completed, which in this case is wrong.

    The underlying cause is the PnP start was failed, like the device vanished from the bus for a bit after PCI enumeration detected it. The WHQL tests sometimes can stimulate this kind of behavior. You can also force it to happen by writing a little filter driver that fails PnP start on demand by changing the IRP result code as the IRP is going up the completion path.

    Jan

    -----Original Message-----
    From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Scott Noone
    <xxxxx@osr.com>
    Sent: Thursday, March 8, 2018 8:01 AM
    To: Windows System Software Devs Interest List <xxxxx@lists.osr.com>
    Subject: Re:[ntdev] Crash after SRB_FUNCTION_PNP in Storport Driver

    If I'm looking at the same version as you, here's the assembly leading up to
    your faulting instruction:

    fffff801`ed710645 mov eax,dword ptr [nt!MmIoHeaderData+0x38
    (fffff801`ed955b78)]
    fffff801`ed71064b mov rdx,qword ptr [nt!MmIoHeader (fffff801`ed955af0)]
    fffff801`ed710652 mov dword ptr [rbp-58h],eax

    nt!MiInsertIoSpaceMap+0x159:
    fffff801`ed710655 lea rax,[nt!MmIoHeader (fffff801`ed955af0)]

    nt!MiInsertIoSpaceMap+0x160:
    fffff801`ed71065c cmp rdx,rax
    fffff801`ed71065f je nt!MiInsertIoSpaceMap+0x29f (fffff801`ed71079b)

    nt!MiInsertIoSpaceMap+0x169:
    fffff801`ed710665 cmp qword ptr [rdx+28h],rbx

    Which would mean that nt!MmIoHeader is NULL.

    I've never debugged anything down this path before so I don't know what that is,
    but this looks like the Flink of a global list is corrupt and set to NULL (and
    something around I/O space/device memory, which is suspicious).

    I'd set a write access breakpoint on nt!MmIoHeader during boot and watch what
    happens in the crashing case versus the working case. You can also break if the
    value gets set to NULL:

    ba w8 nt!MmIoHeader ".echo \"Modified...\" ; dq nt!MmIoHeader L1; .if
    (poi(nt!MmIoHeader) != 0) {gc}"

    Not sure that's going to be useful, but it's something to try.

    Also, what do you mean by "16T" versus "8T"? Is that the storage size? Does that
    affect the size of the PCIe device memory presented to the host?

    -scott
    OSR
    @OSRDrivers


    ---

    ---
    NTDEV is sponsored by OSR

    Visit the list online at: <http://www.osronline.com/showlists.cfm?list=ntdev>;

    MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
    Details at <http://www.osr.com/seminars>;

    To unsubscribe, visit the List Server section of OSR Online at <http://www.osronline.com/page.cfm?name=ListServer>;
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!