WINDBG & expectations around paged out memory

Hi,

I have an existing WFP callout driver that I am adding new functionality to, it has two similar (looking) pieces of code, in separate areas of the driver:

    // 1. Is in production and has been happily running on 32/64 bit platforms Win7-->Win10 for a year or so. And was tested with verifier etc.
    ULONG const sizeSid = RtlLengthSid(pTokenAccessInfo->SidHash->SidAttr->Sid);
    pContext->userSid = (PSID)ExAllocatePoolWithTag(NonPagedPoolNx, sidSize, HANDLERS_TAG);
    RtlCopySid(sidSize, pContext->userSid, pTokenAccessInfo->SidHash->SidAttr->Sid);
    // 2. New code in development
    BYTE userSid[SECURITY_MAX_SID_SIZE];
    ULONG const sidLength = RtlLengthSid(tokenAccess.SidHash->SidAttr->Sid);
    RtlCopySid(sidLength, userSid, tokenAccess.SidHash->SidAttr->Sid);

I am attempting to flatten the SID structure into a buffer that is big enough for all SID to make it easier to pass to usermode (Inverted Call & DeviceIoControl). My problem is that 2. will intermittently crash (Once in 10mins ish). When I does it will consistently be in the RtlCopySid call. Looking at analyse -v and re-reading the docs makes the root cause apparent : I’m calling RtlCopySid @ DISPATCH_LEVEL when it has a requirement <= APC_LEVEL.

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: fffff805330ef850, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000008, bitfield :
	bit 0 : value 0 = read operation, 1 = write operation
	bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff805330ef850, address which referenced memory

Debugging Details:
------------------

CURRENT_IRQL:  2

FAULTING_IP: 
nt!RtlCopySid+0
fffff805`330ef850 ??              ???

IP_IN_PAGED_CODE: 
nt!RtlCopySid+0
fffff805`330ef850 ??              ???

FAILED_INSTRUCTION_ADDRESS: 
nt!RtlCopySid+0
fffff805`330ef850 ??              ???

STACK_TEXT:  
fffff805`35b3a988 fffff805`32d53dc2 : fffff805`330ef850 00000000`00000003 fffff805`35b3aaf0 fffff805`32bd14d0 : nt!DbgBreakPointWithStatus
fffff805`35b3a990 fffff805`32d534b7 : 00000000`00000003 fffff805`35b3aaf0 fffff805`32c831a0 00000000`0000000a : nt!KiBugCheckDebugBreak+0x12
fffff805`35b3a9f0 fffff805`32c6ec27 : 00000000`00000001 00000000`00000001 ffff8c8c`fef903c0 fffff805`35b3b2a0 : nt!KeBugCheck2+0x947
fffff805`35b3b0f0 fffff805`32c80929 : 00000000`0000000a fffff805`330ef850 00000000`00000002 00000000`00000008 : nt!KeBugCheckEx+0x107
fffff805`35b3b130 fffff805`32c7cc69 : fffff805`35b3b397 00000000`00000000 00000000`35b30340 00000000`00000042 : nt!KiBugCheckDispatch+0x69
fffff805`35b3b270 fffff805`330ef850 : fffff805`31d71a73 fffff805`35b3b820 fffff805`32bd14d0 ffff8c8c`fef903c0 : nt!KiPageFault+0x469
fffff805`35b3b408 fffff805`31d71a73 : fffff805`35b3b820 fffff805`32bd14d0 ffff8c8c`fef903c0 00000000`00000030 : nt!RtlCopySid
fffff805`35b3b410 fffff805`31d700e5 : fffff805`35b3bf88 ffff8c8c`ff3960c0 fffff805`31d70002 fffff805`35b3b530 : XXXDrv!_PopulateBlockEvent+0x93 [WFPHandler.cpp @ 1057] 
fffff805`35b3b4a0 fffff805`34fab17d : fffff805`35b3bf88 ffff8c8c`ff188d50 ffff8c8c`ff0738e0 00000000`00000000 : XXXDrv!WfpHandlerClassifyAuthConnect+0x7a5 [WFPHandler.cpp @ 744] 
fffff805`35b3b720 fffff805`34fad32a : 00000000`00000030 fffff805`35b3bf88 ffff8c8c`ff188d50 ffff8c8c`ff0738e0 : NETIO!ProcessCallout+0x6fd
fffff805`35b3b8a0 fffff805`34fa9ce1 : ffff8c8d`000229b0 fffff805`35b3bf88 00000000`00000000 fffff805`35b3bca0 : NETIO!ArbitrateAndEnforce+0xc3a
fffff805`35b3ba30 fffff805`350cb880 : 00000000`00000000 00000000`00000030 ffff8c8c`ff3889c8 00000000`00000000 : NETIO!KfdClassify+0x5a1
fffff805`35b3be40 fffff805`350c8685 : ffff8c8c`ff36dcc0 fffff805`35b3c480 fffff805`35b3cb10 00000000`00000000 : tcpip!WfpAlepAuthorizeSend+0x9cc
fffff805`35b3c320 fffff805`3509c1ac : fffff805`32fd11e4 fffff805`35b3c8e8 00000000`00000000 00000000`00000000 : tcpip!WfpAleAuthorizeSend+0x509
fffff805`35b3c7d0 fffff805`3509b05b : 00000000`00000000 00000000`00000000 ffff8c8c`ff5d5760 ffff8c8c`ff4cc281 : tcpip!ProcessALEForTransportPacket+0x6dc
fffff805`35b3ca10 fffff805`3509e773 : ffff8c8c`fffaa5ff 00000000`00000002 ffff8c8c`000fe95e 00000000`00000020 : tcpip!WfpProcessOutTransportStackIndication+0x3eb
fffff805`35b3cd20 fffff805`3509d801 : 00000000`00000000 fffff805`35b3d140 ffff8c8c`ff388ca8 fffff805`3527d2e0 : tcpip!IppInspectLocalDatagramsOut+0x763
fffff805`35b3d040 fffff805`351198b2 : fffff805`3527d200 fffff805`35b3d290 fffff805`3527d2e0 ffff8c8c`ff36dcc0 : tcpip!IppSendDatagramsCommon+0x391
fffff805`35b3d1c0 fffff805`3511902c : 00000000`00000000 ffff8c8c`ff1c57c0 ffff8c8c`ff1c57b0 ffff8c8c`ff2c8b70 : tcpip!IppProcessMulticastDiscoveryTimeoutEvents+0x3d6
fffff805`35b3d640 fffff805`350b1c55 : ffff8c8c`ff2c8b70 0000000f`77490044 ffff8c8c`ff2c8b70 00000000`00000000 : tcpip!IppMulticastDiscoveryTimeout+0x18
fffff805`35b3d670 fffff805`350ece0e : ffff8c8c`ff1c5720 ffff8c8c`ff39d370 00000000`00002f00 ffff8c8c`ff39d370 : tcpip!Ipv4pInterfaceSetTimeout+0x285
fffff805`35b3d700 fffff805`32b6e559 : 00000000`00000004 fffff805`35287140 fffff805`3298a180 0000000e`00000002 : tcpip!IppTimeout+0x7ce
fffff805`35b3d8c0 fffff805`32b6d2b9 : 00000000`00000008 00000000`00989680 00000000`0003855f 00000000`00000077 : nt!KiProcessExpiredTimerList+0x169
fffff805`35b3d9b0 fffff805`32c7264e : 00000000`00000000 fffff805`3298a180 fffff805`3303a400 ffff8c8d`047e8080 : nt!KiRetireDpcList+0x4e9
fffff805`35b3dbe0 00000000`00000000 : fffff805`35b3e000 fffff805`35b37000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x7e

My questions are as follows:

  1. Am I correct in my conclusion that this is caused by the userSid memory being paged out during the RtlCopySid call?
  2. Is the reason my first example works due to the use of NonPagedPool?
  3. When I navigate to the frame that calls RtlCopySid in WINDBG I can inspect all relevant bits of memory, and everything looks normal. Is this expected? I would have thought that if the memory was paged out it would display ?? or something equivalent.
  4. Is anyone aware of any other mechanism to copy SID’s @ DISPATCH_LEVEL?

Thanks for your time,
Jason

Well, you’re definitely taking a page fault at elevated IRQL in that crash… But I seriously doubt the kernel-mode stack (the target for your copy) is being paged out.

It looks to ME like the RtlCopySid function itself is paged out. Could that be the case??

Is the source “tokenAccess.SidHash->SidAttr->Sid” in non-paged memory?

RtlCopySid is just a fancy name for memmove… there’s nothing “special” about it. You can confirm this if you step into the function. So, if you know for sure that the source and the destination are both in non-pageable memory, you could just memcpy/memmove the SID. No need to use the clever wrapper function.

Peter

Thanks for getting back to me so quickly.

I’ve spent the day confirming what you suspected, it is RtlCopySid that is being paged. After getting more familiar with this kind of issue I can now see this is apparent in the dump I previously posted (The ret address of KiPageFault is the same as the BugCheck P1 parameter - RtlCopySid).

It is my understanding that the SID structure should be treated as opaque and accessed using the documented functions with PSID indirection (void*). This made me wonder how I would go about getting the length of the SID to copy without calling RtlLengthSid - as this is also documented as being <= APC_LEVEL.

During my testing (when RtlCopySid) was failing I never had an issue with RtlLengthSid, and on further investigation find that there does not appear to be a <= APC_LEVEL requirement in the SAL header definitions for the functions, as follows:

#if (NTDDI_VERSION >= NTDDI_WIN2K)
NTSYSAPI
_Post_satisfies_(return >= 8 && return <= SECURITY_MAX_SID_SIZE)
ULONG
NTAPI
RtlLengthSid (
    _In_ PSID Sid
    );
#endif
#if (NTDDI_VERSION >= NTDDI_WIN2K)
_IRQL_requires_max_(APC_LEVEL)
NTSYSAPI
NTSTATUS
NTAPI
RtlCopySid (
    _In_ ULONG DestinationSidLength,
    _Out_writes_bytes_(DestinationSidLength) PSID DestinationSid,
    _In_ PSID SourceSid
    );
#endif

Combined with the unofficial article here: https://www.geoffchappell.com/studies/windows/km/ntoskrnl/api/rtl/sertl/lengthsid.htm I’m left wondering whether I can get away with using RtlLengthSid given I only need to support Win7+ (and server equivalents). I could also use the mentioned SeLengthSid, but I presume there was a good reason for adding RtlLengthSid in the first place.

This leaves me with a couple of questions:

  1. If the docs say <= APC_LEVEL and the headers don’t - who should I trust and/or how can I confirm?
  2. Has anyone had any positive experiences chasing MS to clarify documentation?
  3. I have SAL decorations on my function that called RtlCopySid, indicating it is called at DISPATCH_LEVEL. Should this not have told me at compile/verification time that I was potentially using a function at the wrong IRQL?

    As always, I appreciate your time
    Jason
  1. Tough call… Believe the headers, I say. But… (see next answer)
  2. YES! File a bug on the doc page… it’s all in GitHub now. The doc writers can be VERY responsive to such issues. Simply tell them that constraints in the doc regarding APC don’t match the header constraints, and could they please clarify. In MY experience, they’ll often get you an answer within a week or two.
  3. Yes… But be sure you have these checks enabled in CA.

Peter

The kernel documentation pages are all in a GitHub repository. You can submit bug reports and even file pull requests to submit your own corrections. I’ve submitted a number of corrections, and all have been incorporated.

The SAL decorations are transparent to the compiler – they compile to nothing. They’re handled by the static code analysis. Are you running the static code analyzer?

They’re handled by the static code analysis.

That’s “EnableCode Analysis on Build” in your VS project that Mr. Roberts is referring to:

Also, you have to be SURE that the ruleset that you’ve selected actually does what you expect (and checks the constraints you expect). The default rulesets leave out a lot of otherwise useful checks.

Peter

You could always use SeLengthSid (from NTIFS.H):

//++
//
//  ULONG
//  SeLengthSid(
//      _In_ PSID Sid
//      );
//
//  Routine Description:
//
//      This routine computes the length of a SID.
//
//  Arguments:
//
//      Sid - Points to the SID whose length is to be returned.
//
//  Return Value:
//
//      The length, in bytes of the SID.
//
//--

#define SeLengthSid( Sid ) \
    (8 + (4 * ((SID *)Sid)->SubAuthorityCount))

Thanks for your help guys, I’ve added the above to the end of my (never ending) todo list!