Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

When will be queued a packet to the I/O completion port ?

nektar80nektar80 Member Posts: 16

assume we have:

  • The file is opened for asynchronous I/O.
  • A I/O completion port is associated with the file.
  • we invoke asynchronous I/O operation on file (ApcContext != 0)

so how detect - will be queued a packet to the IOCP or not ?

this always need exactly know, because we can not free/dereference IO_STATUS_BLOCK or OVERLAPPED (and may be other resources) until I/O not finished.
usually this done in callback called when packet queued to IOCP (we can handle IOCP yourself, or use CreateThreadpoolIo or BindIoCompletionCallback)
if we know that will no packet queued to IOCP - need manually just call this callback with error status
if we use CreateThreadpoolIo callback, need call CancelThreadpoolIo, if will be no IOCP notification

look like documentation for StartThreadpoolIo and CancelThreadpoolIo give exactly answer, when will be no notification to IOCP (and we must call the CancelThreadpoolIo)

  • An overlapped (asynchronous) I/O operation fails (that is, the asynchronous I/O function call returns failure with an error code other than ERROR_IO_PENDING).

  • An asynchronous I/O operation returns immediately with success and the file handle associated with the I/O completion object has the notification mode FILE_SKIP_COMPLETION_PORT_ON_SUCCESS. The file handle will not notify the I/O completion port and the associated I/O callback function will not be called.

because win32 layer usually set error code and return false when NT_ERROR(status) (except special case STATUS_PENDING converted to ERROR_IO_PENDING) returned from Zw* api, for NT layer this mean:

  • if api return NT_ERROR(status) ( status in range [0xC0000000, 0xFFFFFFFF] ) - no notification
  • otherwise, if returned status in range [0, 0xC0000000) - will be

the range [0x80000000, 0xC0000000) bit problematic really
if such status returned from I/O manager, due invalid parameters and before actual call driver, will be no notification.
only one case which i know here - this is STATUS_DATATYPE_MISALIGNMENT 0x80000002 returned for example from ZwNotifyChangeDirectoryFile(ReadDirectoryChangesW ) if lpBuffer not DWORD-aligned.
in this case will be no completion.

interesting that win32 call

    OVERLAPPED ov{};
    ReadDirectoryChangesW(0, (void*)1, 1, TRUE, FILE_NOTIFY_VALID_MASK, 0, &ov, 0);

return TRUE, because !NT_ERROR(STATUS_DATATYPE_MISALIGNMENT) - win32 layer simply lost STATUS_DATATYPE_MISALIGNMENT error

from another case, if say NtQueryDirectoryFile return STATUS_NO_MORE_FILES (0x80000006) will be notification

but this is wrong in general case. really can be notification to IOCP even in case NT_ERROR(status) !
this is because FastIo which can be invoked before IRP create - can return TRUE with error final status.
say FastIoDeviceControl or FastIoLock can return TRUE (I/O finished) and final status is error

for example we can call LockFileEx or ZwLockFile on directory file - I/O manager call Fs FastIoLock implementation (say NtfsFastLock) and it just return TRUE with STATUS_INVALID_PARAMETER

so after such call we got STATUS_INVALID_PARAMETER. by documented rules - must not be notification in this case. but in real word it will be !
of course this is rarely situation, but it break general rule.
also in case SetFileCompletionNotificationModes - if we test it (FILE_SKIP_COMPLETION_PORT_ON_SUCCESS) with LockFileEx - will be no notification in this case.
so FILE_SKIP_COMPLETION_PORT_ON_SUCCESS prevent notification not only success return, but any synchronous return.
so more correct name it FILE_SKIP_COMPLETION_PORT_ON_SYNCHRONOUS and instead
A request returns success immediately without returning ERROR_PENDING
must be
A request returns immediately without returning ERROR_PENDING

POC of code with LockFileEx - https://pastebin.com/qWrMYvy4 , unfortunately too long for post here

so really correct detect will be or not notification in general case 100% reliable ?

i view next solution (but may be i mistake or exist better ?- in this and main question)

set IO_STATUS_BLOCK.Status = STATUS_PENDING; before api call (win32 layer always do this how i know)
check IO_STATUS_BLOCK.Status == STATUS_PENDING after api return code other than STATUS_PENDING (ERROR_IO_PENDING)
sense here in next - notification to IOCP will be when and only when I/O manager write back status to user mode IO_STATUS_BLOCK (if no FILE_SKIP_COMPLETION_PORT_ON_SYNCHRONOUS)
but we can not simply access IO_STATUS_BLOCK(or OVERLAPPED) after api call - because other thread can in concurrent execute callback with this IO_STATUS_BLOCK and can already free it
(this like I/O manager can not more access IRP after call driver, if no IRP_DEFER_IO_COMPLETION in flags - IRP can be already completed and free/reused)
solution here use reference counting on structure which incapsulate IO_STATUS_BLOCK(or OVERLAPPED).
we always create this structure (let name it user mode IRP) with 2 reference.
one reference we release after check result/IO_STATUS_BLOCK of API call (for detect - are we need manually invoke callback, call CancelThreadpoolIo )
another we release in callback

class NT_IRP : public IO_STATUS_BLOCK 
{
    //...
    LONG m_dwRefCount;//=2 on init

    NT_IRP() : m_dwRefCount(2)
    {
          Status = STATUS_PENDING, Information = 0;
    }

    void Release()
    {
        if (!InterlockedDecrement(&m_dwRefCount)) delete this;
    }

    VOID IOCompletionRoutine(NTSTATUS status, ULONG_PTR dwNumberOfBytesTransfered)
    {
            //..
        Release();
    }

    static VOID CALLBACK _IOCompletionRoutine(NTSTATUS status, ULONG_PTR dwNumberOfBytesTransfered, PVOID ApcContext)
    {
        // we must pass NT_IRP pointer in place ApcContext in I/O call
        reinterpret_cast<NT_IRP*>(ApcContext)->IOCompletionRoutine(status, dwNumberOfBytesTransfered);
    }

    void CheckNtStatus(NTSTATUS status, BOOL bSkippedOnSynchronous = FALSE)
    {
          // api completed synchronous (status != STATUS_PENDING)
          // and 
          // bSkippedOnSynchronous or iosb not modified (Status == STATUS_PENDING)
          if (status != STATUS_PENDING && (bSkippedOnSynchronous || Status == STATUS_PENDING))
          {
                IOCompletionRoutine(status, Information);
          }

          Release();
    }
};

or for win32 case

class WIN32_IRP : public OVERLAPPED 
{
    //...
    LONG m_dwRefCount;//=2 on init

    WIN32_IRP() : m_dwRefCount(2)
    {
          Internal = STATUS_PENDING, InternalHigh = 0;
    }

    void Release()
    {
        if (!InterlockedDecrement(&m_dwRefCount)) delete this;
    }

    VOID IOCompletionRoutine(NTSTATUS status, ULONG_PTR dwNumberOfBytesTransfered)
    {
            //...
        Release();
    }

    static VOID CALLBACK _IOCompletionRoutine(NTSTATUS status, ULONG_PTR dwNumberOfBytesTransfered, PVOID ApcContext)
    {
        // we must pass NT_IRP pointer in place ApcContext in I/O call
        reinterpret_cast<NT_IRP*>(ApcContext)->IOCompletionRoutine(status, dwNumberOfBytesTransfered);
    }

    void CheckErrorCode(ULONG dwErrorCode, BOOL bSkippedOnSynchronous = FALSE)
    {
          // api completed synchronous (dwErrorCode != ERROR_IO_PENDING)
          // and 
          // bSkippedOnSynchronous or iosb not modified (Internal == STATUS_PENDING)
          if (dwErrorCode != ERROR_IO_PENDING && (bSkippedOnSynchronous || Internal == STATUS_PENDING))
          {
                 IOCompletionRoutine(status, Information);
          }
          Release();
    }
};

but solution look like too complex.. are microsoft forget Fast Io case ?

also why say FastIoRead and FastIoWrite called only in case synchronous I/O (so not make problems here), but FastIoLock and FastIoDetachDevice called for any I/O ?
more logic from my look - or call FastIoLock and FastIoDetachDevice also only for synchronous file, FastIoRead and FastIoWrite - for any I/O too

Comments

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,204

    I couldn't follow your rather long and rambling post, but I believe you are making this more complicated than it needs to be. It's certainly unclear when you're talking from user mode and when you're talking from kernel.

    If the original call fails immediately, so that the request never makes it into a driver, then there's no completion, and hence no IOCP packet. That's almost always a programming error, and thus should not be much of a concern in production. Once the request makes it into the driver, there will eventually be an IOCP entry, when the highest-level driver completes the request. After the completion is removed from the queue by GetQueuedCompletionStatus, you can free or reuse the OVERLAPPED structure.

    As long as you're using an IOCP, then why do you need to dink with ]STATUS_PENDING at all?

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • nektar80nektar80 Member Posts: 16

    unfortunatelly you not understand me and my concrete exemple at all. very strange

    "If the original call fails immediately, so that the request never makes it into a driver"

    but how we can know - error was from I/O manager or from driver ? and i show concrete example - when FastIoLock routine called from driver and it return error - STATUS_INVALID_PARAMETER - but still in this case will be packet to IOCP. exactly how I/O manager handle FastIoLock and FastIoDeviceControl (when this return TRUE with error status) break general rule.

    "Once the request makes it into the driver, there will eventually be an IOCP entry, when the highest-level driver completes the request."

    but this is of course false !! if driver return error immediately - status in range C0000000-FFFFFFFF - will be no IOCP entry (or apc, or event set) and this is not mandatory programing error (invalid parameters,etc) for example FSCTL_PIPE_LISTEN can return error STATUS_PIPE_CONNECTED (client already connected) or even STATUS_PIPE_CLOSING (client already connected and disconnected) and will be no IOCP entry because this is NT_ERROR status.

    so general rule (implicitly documented)

    • if api call return NT_ERROR - will be no IOCP entry
    • if api call return STATUS_PENDONG will be IOCP entry
    • otherwise based on FILE_SKIP_COMPLETION_PORT_ON_SUCCESS - will be - if this mode not set

    but unfortunatelly this is not always true (how i show in example with LockFileEx - https://pastebin.com/qWrMYvy4 ) if some driver implement FastIoDeviceControl - and call to it return TRUE - will be IOCP entry, even if final operation status was NT_ERROR range.

    at all question - why I/O manager - not call FastIoRead and FastIoWrite for asynchronous request, but call FastIoLock and FastIoDeviceControl (sorry i in original post by mistake wrote "FastIoDetachDevice" insted FastIoDeviceControl )

  • nektar80nektar80 Member Posts: 16

    and yes, we can assume that error not from I/O manager, because this is always result of invalid parameters in call (invalid buffers, handles, or granted access on handles). so let assume that request will be sent to driver. if IRP will be created and sent to driver - rule next, when will be IOCP entry

    if api call return NT_ERROR(status) - NO
    if api call return STATUS_PENDONG - YES
    otherwise if FILE_SKIP_COMPLETION_PORT_ON_SUCCESS mode - NO
    otherwise - YES

    or in code

    bool IsWillBeIOCPNotification(NTSTATUS status /*returned from api call*/, bool bSkipOnSuccessMode)
    {
        if (status == STATUS_PENDING) return true;
    
        if (NT_ERROR(status)) return false;
    
        return !bSkipOnSuccessMode;
    }
    

    or, for win32 case

    bool IsWillBeIOCPNotification(ULONG dwError, bool bSkipOnSuccessMode)
    {
        if (dwError == ERROR_IO_PENDING) return true;
    
        if (dwError != NOERROR) return false;
    
        return !bSkipOnSuccessMode;
    }
    

    again, note, that Once the request makes it into the driver, there will eventually be an IOCP entry - false, if driver return NT_ERROR

    but, before create and sent IRP to driver - I/O manager, in case device io control and lock file request, can try Fast IO, if driver implement it

    and here IOCP entry will be if fast io return true. regardless from returned status. as result can be IOCP entry even with error status

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,583

    Hmmmm... I’m pretty sure you ARE making this harder than it needs to be, but OTOH I think you might be right about the ambiguity involved.

    First, ignore whether the I/O Manager returns the error, or whether it’s from Fast I/O or IRP-based processing. Sure, it’s true that some errors are returned synchronously and others asynchronously... but WHO returns the error isn’t relevant (beyond the fact that, obviously, errors from I/O Manager will always be returned synchronously).

    So... I think what you’re saying is that it’s not clear if warnings (NTSTATUS values in the range of 0x80000000 − 0xBFFFFFFF) will generate completion callbacks. Is that what you’re asking? If that’s the point of your post, then you certainly have the record for the most words to ask the simplest question.

    I wouldn’t be surprised if there were some weird edge-conditions here that result in unexpected behavior in terms of completion being called or not. Is that the issue you’re asking about?

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • nektar80nektar80 Member Posts: 16

    unfortunately apparently I poorly expressed my thought and my not the best English, if you not understand me.

    "First, ignore whether the I/O Manager returns the error, or whether it’s from Fast I/O or IRP-based processing..but WHO returns the error isn’t relevant"

    i try say absolute another( i only describe situation with this), ok let try else one

    • i open for asynchronous I/O,
    • bind IOCP to it (let be via CreateThreadpoolIo or BindIoCompletionCallback)
    • i call some asynchronous API (Zw* or it win32 shell - no matter - anyway Zw* finally called)

    (Tim Roberts ask - unclear when you're talking from user mode and when you're talking from kernel - here this no matter, Zw api called direct or indirect anyway. i say about both)

    so now, i as developer need know - are will be IOCP entry as result of my I/O request or no ?
    i not ask are api completed synchronous or not. this is elementary - based are STATUS_PENDING (ERROR_IO_PENDING) returned.
    i ask - will be IOCP entry queued as result of I/O call or not ?

    (really i even not ask this, i know this, but I’m trying to draw your attention to a very interesting (as it seems to me) question)

    • if will be IOCP notification - my registered callback (with CreateThreadpoolIo or BindIoCompletionCallback) will be called by system (from thread pool which listen on IOCP)
    • otherwise i need call callback manually by self.

    this need for free resources allocated for API call - IO_STATUS_BLOCK (or OVERLAPPED), dereference object which encapsulate file handle. need call CancelThreadpoolIo, if i use TP_IO

    look like CancelThreadpoolIo documentation give answer for this (really we must call CancelThreadpoolIo EXACTLY when will be no IOCP notification)

    To prevent memory leaks, you must call the CancelThreadpoolIo function for either of the following scenarios:

    • An overlapped (asynchronous) I/O operation fails (that is, the asynchronous I/O function call returns failure with an error code other than ERROR_IO_PENDING).
    • An asynchronous I/O operation returns immediately with success and the file handle associated with the I/O completion object has the notification mode FILE_SKIP_COMPLETION_PORT_ON_SUCCESS. The file handle will not notify the I/O completion port and the associated I/O callback function will not be called.

    but how i discover - this is incorrect.
    really I/O request can synchronous return NT_ERROR status, but will be IOCP notification !!
    and even more worse case
    the asynchronous I/O request return STATUS_SUCCESS but will be no IOCP notification (despite we must wait for it in this case)

    the last case I believe direct bug in windows - in NtUnlockFile api. look for win2003 src code (despite this already very old, in this points - nothing changed, how show tests in windows 10)

    look - if diver implement FastIoUnlockSingle and it return TRUE (let be with STATUS_SUCCESS) - I/O manager just complete request, without IOCP

    so what - asynchronous api call return STATUS_SUCCESS but will be no completion. and how detect this ?

    compare with NtLockFile implementation:

    here I/O manager post IOCP entry. even if I/O completed with error status !

    the same situation with NtDeviceIoControlFile/IopXxxControlFile - if FastIoDeviceControl present and return TRUE - will be IoSetIoCompletion called, even if NT_ERROR status returned
    (as opposite FastIoWrite and as FastIoRead called only for FO_SYNCHRONOUS_IO files)

    ok, i agree that use Lock file on directory (which return sysnchronous STATUS_INVALID_PARAMETER) is wrong by design.
    but - we can call LockFileEx with LOCKFILE_FAIL_IMMEDIATELY - in this case - we can just got STATUS_LOCK_NOT_GRANTED (C0000055) - by documentation - must not be IOCP completion in this case. so we can just release resources (IO_STATUS_BLOCK/OVERLAPPED) must call CancelThreadpoolIo etc. but will be and callback called, because entry queued to IOCP. as result will be double free of IO_STATUS_BLOCK and other. but i found way detect and correct handle this case.

    but then, when we call UnlockFileEx (if we acquire lock) - it return synchronous STATUS_SUCCESS and we must except IOCP notification. can not release resources. but will be no notify and callback. and i not view any way detect this case at all (ok, lock/unlock file not very frequently used, but anyway. case with FastIoDeviceControl can be more frequently)

    so i design special example, if somebody interest test it and view result yourself. here i try maximal describe case in code. i create 2 threads, which in concurrent try lock the first byte in same file.

  • nektar80nektar80 Member Posts: 16

    or even better 2 absolute concrete question

    1.)

    let we call

    LockFileEx(hFile, LOCKFILE_EXCLUSIVE_LOCK|LOCKFILE_FAIL_IMMEDIATELY, 0, *, *, lpOverlapped);

    it return FALSE and GetLastError() return ERROR_LOCK_VIOLATION :

    • will be IOCP notification in this case ?
    • need call CancelThreadpoolIo ? (if i use new thread pool callback here)
    • can we free lpOverlapped just ?

    by documentation - must no be IOCP notification and need call CancelThreadpoolIo, but in practic this is wrong

    or if native api closer to someone - let we call

    ZwLockFile(_hFile, 0, 0, # , #, &ByteOffset, &Length, 'key1', TRUE, TRUE);

    and it returned STATUS_LOCK_NOT_GRANTED :

    • will be IOCP notification in this case ?
    • need call CancelThreadpoolIo ? (if i use new thread pool callback here)
    • can we free lpOverlapped/iosb just ?

    2.)

    we call

    UnlockFileEx(_hFile, 0, 1, 0, lpOverlapped) (or ZwUnlockFile)

    and it return TRUE ( or STATUS_SUCCESS).

    • will be IOCP notification in this case ?
    • need call CancelThreadpoolIo ? (if i use new thread pool callback here)
    • can/must we free lpOverlapped just ? or when ?

    by documentation - must me IOCP notification in this case, and we can not free lpOverlapped/iosb until this
    but in real word - will be no more any notifications here

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,583

    i try say absolute another

    ... Hmmmm... well, you say wrong. I am not guessing.

    i not ask are api completed synchronous or not. this is elementary - based are STATUS_PENDING

    No. You are missing the point... and I would recommend you approach asking questions of folks here on the forum with a JUST A BIT more humility. We are working hard to help you, despite it being our holiday and you being a non-native speaker of English.

    As Mr. Roberts said, you are confusing the issue by switching your analysis, viewpoint, and questions back and forth from Win32 to the NT Native API. You wrote "here this no matter, Zw api called direct or indirect anyway" -- indeed it DOES matter. While it is true that the Native API gets called eventually, your entire issue (it seems to me) is with how Win32 handles the edge cases. If you limit your work to the Native NT API (in user mode or kernel mode, "no matter") then I think you'll find things much more clear.

    You're also confusing the case by using what it perhaps one of THE most unusual file system functions in Windows: Directory Change Notification. If you get back STATUS_PENDING from a directory change notification, that effectively grants the request to notify you of the pending change, right? It's only AFTER there's a directory change that the request completes (assuming you got back STATUS_PENDING initially).

    Byte range locks are another odd case, and I suspect how they behave will vary from file system to file system.

    And now I will return to my holiday,

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • nektar80nektar80 Member Posts: 16

    unfortunately i not wait for such result.. may be i too bad explain

    indeed it DOES matter. While it is true that the Native API gets called eventually, your entire issue (it seems to me) is with how Win32 handles the edge >cases. If you limit your work to the Native NT API (in user mode or kernel mode, "no matter") then I think you'll find things much more clear.

    NO. you mistake here. this is not win32 issue. nothing is changed if we call direct native api. and handle direct NTSTATUS but not wrong (not rarely) win32 error. may be my mistake that i post both versions. and i agree that Native NT API much more clear , but here problem not in win32 layer.

    and here not problem in file system implementations at all too. the problem in I/O manager kernel code - i say this all time. problem in how it handle FAST IO case. at some (LockFile) point I/O manager post completion to IOCP, even if error status returned. at another point (Unlock) - I/O manager -complete request without IOCP (even if STATUS_SUCCESS returned). not driver, but I/O itself ! simply WRK-v1.2 (i paste links) for understand src of problems

    Byte range locks are another odd case, and I suspect how they behave will vary from file system to file system.

    again - this not file system problem - how it handle request, but I/O manager problem - how it complete request.

    and Directory Change Notification here unrelated at all. i nothing write about it, and perfect know how it work and not once use it in async mode, but here no problems.

    simply look win

    ok. sorry for all

  • MBond2MBond2 Member Posts: 19

    your posts are certianly too long to follow, but let me say what i think your problem is

    you need to know when an IOCP notification will be queued and when it will not. This is very important

    there is an apparent ambiguity in the documentation re the use of FILE_SKIP_COMPLETION_PORT_ON_SUCCESS. In fact this feature was specifically added to Windows to remove the ambiguity you are worried about

    The name FILE_SKIP_COMPLETION_PORT_ON_SUCCESS includes the work SUCCESS. That word has nothing to do with the real meaning of this option. What it means is that if the operation never goes pending and the final state of the call is known directly, then do not queue a completion on the port. This is true regardless of whether the final result of the call was a failure, warning or success

  • nektar80nektar80 Member Posts: 16

    yes, unfortunately i guilty that I could not clearly explain the essence. but this was not question, faster how say already - but I’m trying to draw your attention to a very interesting (as it seems to me) question

    you need to know when an IOCP notification will be queued and when it will not. This is very important

    absolute agree. and not only i need this. everybody, who do asynchronous I/O programming.

    FILE_SKIP_COMPLETION_PORT_ON_SUCCESS - not direct related to problem, and my post not about it. i know how it work, and that it name not does not display the essence (really it prevent IOCP entry if operation completed without pending)

    but i about absolute another try say.. that I/O manager (not drivers or win32 subsystem) wrong (?) handle FAST IO path what lead to problems

    certianly too long

    look for concrete comment with 2 absolute concrete question..

    if still unclear what i all time try to say, ok, let moderator delete my post at all.

  • MBond2MBond2 Member Posts: 19

    so what about FAST IO bothers you? The IRP can complete immediatly instead of being pended. It can either fail or succeed.

  • MBond2MBond2 Member Posts: 19

    I realize that I have been too sloppy in my terminology. Language barriers increase the need to be technically precise and i have not been

    But the point remains the same - what about FAST IO bothers you?

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,583

    The IRP can complete immediatly instead of being pended. It can either fail or succeed.

    True.... regardless of whether Turbo I/O is used or an IRP is sent.

    I don’t get what Mr. @nektar80 wants....

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • nektar80nektar80 Member Posts: 16

    but i explain !! are still not clear ?! this is break general rule when will be and no IOCP entry as result of I/O (if fast io used IRP not created, but this is unrelated). so again - how based on returned status from I/O detect will be IOCP notification|callback or no ? we must know this exactly for correct manage resources.

    general/semi documented rule is next:

    bool IsWillBeIOCPNotification(NTSTATUS status /*returned from api call*/, bool bSkipOnSuccessMode)
    {
        if (status == STATUS_PENDING) return true;
    
        if (NT_ERROR(status)) return false;
    
        return !bSkipOnSuccessMode;
    }
    

    1.) if STATUS_PENDING returned - will be notification. here no questions.

    2.) if NT_ERROR(status) - must not be IOCP.

    but - i found case, when this rule is beaked. i show concrete example with NtLockFile - when it can return STATUS_INVALID_PARAMETER or STATUS_LOCK_NOT_GRANTED - but will be IOCP entry !!

    and this is not related to win32 layer or file system driver. this is related to I/O manager only - look wrk src code - how it complete request after fast io lock. this part still relevant and for windows 10. the same and for FastIoDeviceControl - how I/O manager handle at this point - possible NT_ERROR(status) - and IOCP notification

    3.) otherwise based on FILE_SKIP_COMPLETION_PORT_ON_SUCCESS - will be completion or no.

    but ** - look for NtUnlockFile src code (again from wrk - still relevant and in latest win10 at this part) - if driver implement FastIoUnlockSingle and it return TRUE - I/O manager complete request **without post IOCP entry (or set event, or APC).

    so I/O can returned STATUS_SUCCESS but without IOCP notification (let assume FILE_SKIP_COMPLETION_PORT_ON_SUCCESS not set - for not confuse).

    look at I/O manager code for understand !

    so i show concrete examples. and "ask" all time about how exactly know - will be or not IOCP notification

    @Peter_Viscarola_(OSR) - sorry for this topic at all. i not wait such and how i say - i not need help at all. i try draw your attention to this point. but if still unclear.. i am sorry. i not want spam you. simply delete my post at all

  • nektar80nektar80 Member Posts: 16

    ок, fortget about win32 layer and FILE_SKIP_COMPLETION_PORT_ON_SUCCESS - it not set.

    maximal simply example

    struct MY_IRP : public IO_STATUS_BLOCK
    {
        //...
    };
    
    VOID NTAPI IoCallback(NTSTATUS status, ULONG_PTR Information, PVOID Context)
    {
        MY_IRP* Irp = reinterpret_cast<MY_IRP*>(Context);
        //...
        delete Irp;
    };
    
    bool WillBeNoCallback(NTSTATUS status); 
    
        HANDLE hFile;
        RtlSetIoCompletionCallback(hFile, IoCallback, 0);
    
        if (MY_IRP* Irp = new MY_IRP(..))
        {
            LARGE_INTEGER ByteOffset{x}, Length{y};
            NTSTATUS status = NtLockFile(hFile, 0, 0, Irp, Irp, &ByteOffset, &Length, 'keyX', TRUE, TRUE);
            if (WillBeNoCallback(status)) // if (NT_ERROR(status)) 
            {
                IoCallback(status, 0, Irp);
            }
        }
    
    • we bind IoCallback to hFile - as result it will be called by system, when IOCP entry
    • every I/O operation require unique IO_STATUS_BLOCK - so i allocate it - MY_IRP* Irp = new MY_IRP(..)
    • now i call asynchronous I/O api (NtLockFile in concrete case) and pass MY_IRP* Irp to it in place ApcContext (for get it back in IoCallback)
    • but MY_IRP* Irp - need be free, when I/O finished. so how, when do this ?
    • native solution (how it seems to me) - do this inside IoCallback.
    • but if IoCallback will be not called (due I/O error) - we need just free MY_IRP* Irp or better - manual call IoCallback(status, 0, Irp);
    • so QUESTION - how detect, based on returned status (and may be IO_STATUS_BLOCK) - will be IoCallback called (as result of IOCP notification) or no
    • so question in implementation of bool WillBeNoCallback(NTSTATUS status);
    • are if (NT_ERROR(status)) - always (forget about FILE_SKIP_COMPLETION_PORT_ON_SUCCESS !! it not set!) correct ?
    • i show that not always !

    note - that will be IOCP notification or no - depend only from I/O manager. this not depend from driver, which handle request ! driver can have any implementation of lock/unlock/ ioctl. but not driver post IOCP entry( or queue APC or set user Event). this done exclusive by I/O manager. based on driver return final status.

  • nektar80nektar80 Member Posts: 16

    sorry, really i mistake with NtUnlockFile - this is synchronous api - so never here will be IOCP notification here

    but still exist case with IoDeviceControl and IoLock when - api returned NT_ERROR(status) but will be packet queued to IOCP

    usually (documented) we need assume that will be no IOCP notification if api return status in range NT_ERROR(status)

    as result we can just free IO_STATUS_BLOCK and another resources allocated for call, if we got NT_ERROR(status) (status in [0xC0000000, 0xFFFFFFFF])

    but because can be IOCP notification even in this case - the callback binded to this IOCP will be called too, end here we again try free IO_STATUS_BLOCK..

    as result double free


    poc in Native api code
    https://github.com/rbmm/LockFile-Poc/blob/master/NT_Api_poc.cpp

    https://github.com/rbmm/LockFile-Poc/blob/master/NT_poc.log

    what i say, about from where was error (I/O manager, Fast IO, IRP io) - this of course really no matter.
    but i simply try explain - why/how/where was situation when NT_ERROR(status) returned but packet queued to IOCP:
    this was in case FastIoDeviceControl or FastIoLock return TRUE with NT_ERROR(status)
    really I/O manager queued packet to IOCP in this case (FastIo return TRUE) and not look for final operation status

    and result not depend from how drives handle lock or ioctl. only from I/O manager, because this it duty - how complete user request, particularly post or not post entry to IOCP

  • nektar80nektar80 Member Posts: 16

    after more research kernel code - i found that this really bug only inside NtLockFile api:
    after FastIoLock return TRUE it not check final status and post entry to IOCP for any status. pseudo code from win10

    NTSTATUS NtLockFile(... PVOID ApcContext ...)
    {
        PFILE_OBJECT FileObject;
    
        // code after FastIoLock return TRUE -
    
        PIO_COMPLETION_CONTEXT CompletionContext = FileObject->CompletionContext;
    
        if (CompletionContext &&
            ApcContext &&
            !(FileObject->Flags & FO_SKIP_COMPLETION_PORT))
        {
            IoSetIoCompletionEx2( CompletionContext->Port,
                CompletionContext->Key,
                ApcContext,
                status, // not checked !!!
                Information,
                ...);
        }
    }
    

    look NtLockFile.png

    for compare, code from IopXxxControlFile (fixed error in win8, still exist in win7)

    NTSTATUS IopXxxControlFile(... PVOID ApcContext ...)
    {
        PFILE_OBJECT FileObject;
    
        NTSTATUS status;
        PVOID Port = 0, Key = 0;
    
        // code after FastIoDeviceControl 
    
        PIO_COMPLETION_CONTEXT CompletionContext = FileObject->CompletionContext;
    
        if (
            CompletionContext && 
            !(FileObject->Flags & FO_SKIP_COMPLETION_PORT) &&
            !NT_ERROR(status) // checked !!!
            )
        {
            Port = CompletionContext->Port, Key = CompletionContext->Key;
        }
    
        if (Port && ApcContext)
        {
            IoSetIoCompletionEx2( Port, Key, ApcContext,status, Information,...);
        }
    }
    

    here added check !NT_ERROR(status)

    so if fix error (i believe that this is bug) in ntoskrnl!NtLockFile

    bool IsWillBeIOCPNotification(NTSTATUS status /*returned from api call*/, bool FoSkipCompletionPort )
    {
        if (status == STATUS_PENDING) return true;
    
        if (NT_ERROR(status)) return false;
    
        return !FoSkipCompletionPort;
    }
    
  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,204

    I am curious to know what led you to tumble down this rabbit hole. You started off asking a very general design question, but was this all triggered because your application encountered a problem with the LockFile API? If not, what let you into this investigation?

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • nektar80nektar80 Member Posts: 16

    @Tim_Roberts i am sorry for too bad English and and for the inability to clearly express one’s thought

    i am developing general class library for asynchronous I/O and always interesting in this topic. but several days ago, when i play with asynchronous file locking, i random catch design bug - NtLockFile returned STATUS_LOCK_NOT_GRANTED but despite this error was notify the I/O completion port and the associated I/O callback function called. what must not be.

    my mistake - that i look initially only to the WRK-v1.2 source code (of course very old already) and i understand why is happens. but when i look to NtDeviceIoControlFile -> IopXxxControlFile - i view that here the same mistake in code. so i decide that the same problem must be and with IOCTL if we use asynchronous file and driver implement FastIoDeviceControl and return TRUE from it with NT_ERROR status. this already more serious. but when i test IOCTL in win10 - no error. so i look for binary code (win7, 8.1, 10 ) and understand that error in IopXxxControlFile was fixed in 8.1 how minimum. but MS forget fix NtLockFile. so bug is only here now. easy for fix (only one check !NT_ERROR(status) need add), but still exist. may be MS fix this bug if report it

    with my say about NtUnlockFile - i of course hurry and mistake - this is synchronous api, not have ApcContext parameter - so here must not be IOCP notification at all. sorry for this mistake.

    bug in NtLockFile really exist, but only here. was early and in ioctl path but fixed long time ago

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,583

    Dude... it was warnings, directory change notification, Fast I/O for read and write... I already said, I’m totally lost.

    I’m not saying the OP hasn’t found some weird edge condition or a bug in some version of Windows handling of some specific IO function code. Heavens knows there are plenty. I’m just saying I have no clue what he’s talking about. The most unfortunate part is that if he can’t explain it to US, he has almost zero chance of explaining it to MSFT in a bug report and thereby getting it fixed.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • nektar80nektar80 Member Posts: 16

    @Peter_Viscarola_(OSR) about ZwNotifyChangeDirectoryFile - it return STATUS_DATATYPE_MISALIGNMENT if we not pass the DWORD-aligned buffer . this bit confusing, because this is not NT_ERROR status but will be no IOCP notification here, because error from I/O manager. but we can always avoid this error - pass not DWORD-aligned buffer to this api - programmer error. so i mention this situation in vain - it only distracted from the main problem and confused. problem how i final research only in NtLockFile - compare modern IopXxxControlFile and NtLockFile implementation in my cooment and view different - !NT_ERROR(status)

    if (FileObject->CompletionContext && ApcContext &&
            !(FileObject->Flags & FO_SKIP_COMPLETION_PORT) &&
            !NT_ERROR(status))
    

    checked in IopXxxControlFile after FastIo return true, but only

    if (FileObject->CompletionContext && ApcContext &&
            !(FileObject->Flags & FO_SKIP_COMPLETION_PORT) )
    

    was checked in NtLockFile

    early, before win 8 - this error was and in IopXxxControlFile too

  • nektar80nektar80 Member Posts: 16

    when i look wrk source code, how FastIo handled in NtLockFile and NtDeviceIoControlFile - hard was say - are this bug or "by design". anyway such "design" break general rule - when will be notification to IOCP, so i and asked general question. but after i view, that code code is changed in NtDeviceIoControlFile - check for NT_ERROR(status) added - i understand that was exactly error in xp/2003 and early here. this error is fixed(in win 8 probably) for NtDeviceIoControlFile but MS forget apply same fix for NtLockFile.

  • MBond2MBond2 Member Posts: 19

    we all understand that you are frustrated by the difficulty in explaining the problem that you are looking at. we are also fusterated by our difficulty in understanding what you are trying to say

    what is not clear is whether you expect this issue to apply to all kinds of API calls that might result in IOCP completions, or whether you think this is confined to a specific call pattern

    I am almost certian that i can explain the correct behaviour to you, since i had exactly this problem before MSFT resolved the problems with Vista / Server 2008 IIRC. But as Peter says it is possible that you have uncovered a specific case that is a bug / flaw

  • nektar80nektar80 Member Posts: 16

    @MBond2 thank, for response

    let me summarize.

    response for my question: When will be queued a packet to the I/O completion port ?

    // status - returned by asynchronous api call 
    // (i.e. NtXxx(HANDLE FileHandle,HANDLE Event,PIO_APC_ROUTINE ApcRoutine,PVOID ApcContext,IO_STATUS_BLOCK IoStatusBlock,..)
    // bSkipOnSuccess = FileObject->Flags & FO_SKIP_COMPLETION_PORT i.e are we set FILE_SKIP_COMPLETION_PORT_ON_SUCCESS on file
    
    bool Will_be_IOCP_Notification(NTSTATUS status, BOOLEAN bSkipOnSuccess = FALSE)
    {
        return (status == STATUS_PENDING) || (!NT_ERROR(status) && !bSkipOnSuccess);
    }
    

    I hope there is no objection ?


    but i was confused by not fixed bug in NtLockFile and WRK-v1.2 source code. early (including win 7) was bug in IopXxxControlFile too, now fixed

    the minimal POC - worked on windows 10

    VOID WINAPI IoCompletionNT(
                                      _In_    NTSTATUS status,
                                      _In_    ULONG_PTR dwNumberOfBytesTransfered,
                                      _Inout_ PVOID ApcContext
                                      )
    {
        WCHAR sz[64];
        swprintf_s(sz, L"(%x %p %p)", status, (void*)dwNumberOfBytesTransfered, ApcContext);
        MessageBoxW(0, sz, L"IoCompletionNT", MB_OK);
    
        delete ApcContext;
    }
    
    void PocLockFile()
    {
        static UNICODE_STRING ObjectName = RTL_CONSTANT_STRING(L"\\SystemRoot");
        static OBJECT_ATTRIBUTES oa = { sizeof(oa), 0, &ObjectName };
    
        HANDLE hFile;
        IO_STATUS_BLOCK iosb;
    
        if (0 <= NtOpenFile(&hFile, FILE_READ_DATA, &oa, &iosb, FILE_SHARE_VALID_FLAGS, FILE_DIRECTORY_FILE))
        {
            if (0 <= RtlSetIoCompletionCallback(hFile, IoCompletionNT, 0))
            {
                if (IO_STATUS_BLOCK* piosb = new IO_STATUS_BLOCK)
                {
                    LARGE_INTEGER ByteOffset{}, Length {1};
                    NTSTATUS status = NtLockFile(hFile, 0, 0, piosb, piosb, &ByteOffset, &Length, 'keyX', TRUE, TRUE);
    
                    if (!Will_be_IOCP_Notification(status))
                    {
                        WCHAR sz[64];
                        swprintf_s(sz, L"[%x, 0, %p]", status, piosb);
                        MessageBoxW(0, sz, L"PocLockFile", MB_OK);
                        //delete piosb;
                    }
                }
            }
            NtClose(hFile);
        }
    }
    

    despite NtLockFile return STATUS_INVALID_PARAMETER here and must not be IOCP notification - it was really

    of course call Lock on folder no sense, but show POC. more real case, when we do this on file - https://github.com/rbmm/LockFile-Poc/blob/master/NT_Api_poc.cpp


    I mentioned also about ZwNotifyChangeDirectoryFile - which can return STATUS_DATATYPE_MISALIGNMENT (this is not NT_ERROR) but we always must avoid this error, by pass correct aligned buffer

  • MBond2MBond2 Member Posts: 19

    You may have found a bug in this seldom used API

    byte range locking is an almost useless feature so it would not be surprising that this is less well tested than other calls

    also note that the code you posted has very poor style. At least this SAL is wrong

    Inout PVOID ApcContext

  • nektar80nektar80 Member Posts: 16

    @MBond2 - yes, i agree that Inout PVOID ApcContext is bad
    but i do copy-paste from minwinbase.h

    typedef
    VOID
    (WINAPI *LPOVERLAPPED_COMPLETION_ROUTINE)(
        _In_    DWORD dwErrorCode,
        _In_    DWORD dwNumberOfBytesTransfered,
        _Inout_ LPOVERLAPPED lpOverlapped
        );
    

    so i take documented callback definition for BindIoCompletionCallback and adjust it for not declared in wdk/sdk headers RtlSetIoCompletionCallback
    i do copy-paste and simply not and did not pay attention to SAL here. in this sense __Inout_ LPOVERLAPPED lpOverlapped_ also wrong - this is in-only pointer to structure. PVOID ApcContext also usual pointer to structure which how minimum containing IO_STATUS_BLOCK. but anyway think SAL here oftopic.

    about NtLockFile - yes - this is seldom, but when i first look for wrk source code i decide that same bug exist also in NtDeviceIoControlFile. in case fast io used. this bug here (NtDeviceIoControlFile) really exist yet in windows 7 (tested) but in window 8.1 already fixed

  • MBond2MBond2 Member Posts: 19

    leave SAL and style aside. I mention this for the archives so that someone else won't copy / paste this code and think it is good

    I suggest you open a support case with Microsoft. If they agree that this is a real bug, and it actually has an impact on you, they will likely fix it. It will not be a fast process. Microsoft are still working on an issue I raised in SQL Native Client about 18 months ago - and that is just an unhandled exception leaking from C++ code out through a C API and so does not involve the possibility of breaking changes. your issue does involve the possibility of breaking changes for existing software.

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
Writing WDF Drivers 21 Oct 2019 OSR Seminar Space & ONLINE
Internals & Software Drivers 18 Nov 2019 Dulles, VA
Kernel Debugging 30 Mar 2020 OSR Seminar Space
Developing Minifilters 27 Apr 2020 OSR Seminar Space & ONLINE