Issue with Inverted Call Model – First 8 IRPs Receive Null Data

Hello all,

I'm currently building an EDR (Endpoint Detection and Response) solution, which includes a Windows kernel driver and a user-mode application that acts as a static analysis engine. I'm using the Inverted Call Model with the Cancel-Safe Queue (CSQ) to handle communication between kernel and user mode.

Setup:

  • The user-mode app is multi-threaded, with 8 threads.
  • Each thread sends a request (IRP) to the driver, then waits using IOCP for completion.
  • The driver queues each IRP using IoCsqInsertIrpEx, then later dequeues and completes it with data using IoCompleteRequest.
  • Once the user-mode thread receives the response, it prints the data and immediately sends another request.

The Problem:

  • The first 8 requests (i.e., the initial IRPs sent by each of the 8 threads) are completed normally by the driver.
  • However, the data sent by the driver never reaches the user β€” it’s received as null or empty.
  • After these first 8 IRPs, everything works perfectly. All subsequent requests and responses behave as expected.

Things I’ve Tried:

  • I verified the same code is executed for all IRPs (first and subsequent), both in the driver and user-mode app.
  • As a test, I sent the first 8 IRPs, shut down the user app, and let the driver complete the IRPs.
    • Later, when I restarted the user-mode app and received the responses, the data appeared normally.

So I'm stuck trying to understand why only the first 8 IRPs misbehave.

Any insights or thoughts on what might cause this would be greatly appreciated!

Thanks

First, try to use only one thread for driver communication to test the response.

What METHOD_XXX are you using for your ioctls? Are you absolutely sure you are setting pIrp->IoStatus->Information before you complete the request?

And, regarding your "things I've tried" section, IRPs submitted by one process can NEVER be received by another process. The 8 IRPs submitted by the first app would have been canceled.

I have tried before and still the same problem, at this case the "first" request only came without any data then after it worked normally.

So it's an answer to your question:

Only first request (of opened device file) is misbehaving.

You have to debug it from driver side too.

Hello,
well I'm using the DeviceIoControl to send the request of course and GetQueuedCompletionStatus to trace the port for any response, I can give you the thread code here: (just ignore that much of printing :))

DWORD WINAPI ThreadStartRoutine(
	LPVOID lpThreadParameter
)
{
	BOOL getQueueReturn, isSuccess;
	DWORD numofBytestTransfered = 0, threadId = 0;
	ULONG_PTR completionKey;
	OVERLAPPED *lpOverLapped = NULL;
	OVERLAPPED overLapped;
	PTHREAD_PARAMETER_CONTEXT lpThreadParameterContext;
	PIO_CONTEXT ioContext;
	DATA_TRANSFERE_FROM_KERNEL bufDataFromKernel = { 0 };
	HANDLE hFile;
	KLEDR_PE_ANALYSIS_RESULT peAanlyzeResult;

	lpThreadParameterContext = (PTHREAD_PARAMETER_CONTEXT)lpThreadParameter;

	threadId = GetCurrentThreadId();

	// wcscpy_s(bufDataFromKernel.binPath, 12, L"TestMessage");

	for(;;) {

		ZeroMemory(&overLapped, sizeof(OVERLAPPED));
		ZeroMemory(&peAanlyzeResult, sizeof(KLEDR_PE_ANALYSIS_RESULT));

		DeviceIoControl(
			lpThreadParameterContext->hDevice,
			KLEDR_CTL,
			nullptr,
			0,
			&bufDataFromKernel,
			sizeof(DATA_TRANSFERE_FROM_KERNEL),
			&numofBytestTransfered,
			&overLapped
		);

		if ((GetLastError()) == ERROR_IO_PENDING)
			std::cout << "[+] the IO control is sent successfully and in pending state, from a thread number " << threadId << std::endl;

		else
			std::cout << "[-] ERROR while sending the IOCTL, not in pending state, \n\t thread ID : " << threadId << "\n\t error code : " << GetLastError() << std::endl;

		getQueueReturn = GetQueuedCompletionStatus(lpThreadParameterContext->hCompletionPort, &numofBytestTransfered, &completionKey, &lpOverLapped, INFINITE);

		if (lpOverLapped == NULL)
		{
			std::cout << "[+++] WE RECIEVED THE REQUEST TO SHUT US DOWN, SHUTTING DOWN (THREAD ID: " << threadId << ")\n.";

			// the main thread should cancel all pending requests. So ignore and return.
			return 0;
		}

		if (!getQueueReturn)
		{
			// IT'S THE CLEAN-UP REQUEST NOWWWW..
			std::cout << "[---] ERROR FROM THE COMPLETION ROUTINE, error code: " << GetLastError() << " (THREAD ID : " << threadId << ")\n.";
			return 0;
		}

		std::cout << "[+] success, we recieved the a complition packet, let's now create another request..\n";
		std::cout << "[**] NOW IN TID: " << threadId << ", dealing with OVERLAPPED STRUCTURE: " << lpOverLapped << std::endl;

	//  CREATE ANOTHER REQUEST..
	//	ioContext = CONTAINING_RECORD(lpOverLapped, IO_CONTEXT, ov);

		// now let's print the data came from the kernel..
		std::wcout << "[*****] THE DATA CAME FROM THE KERNEL LAND IS: " << bufDataFromKernel.binPath << std::endl;
		
	//	printf("[*****] THE DATA CAME FROM THE KERNEL LAND IS: %ls, while its address : %p, and struct address : %p\n", ioContext->DataFromKernel.binPath, bufDataFromKernel.binPath, &bufDataFromKernel);
		
		// let's check if the file is signed or not.
		std::cout << "checking if the file is signed ...\n";
		KlEdrCheckSigned(bufDataFromKernel.binPath, &isSuccess);
		std::cout << "is the file signed ? => " << isSuccess << std::endl;

		// now check for the APIs and the string.
		std::cout << "checking the string and APIs...\n";
		
		peAanlyzeResult = KlEdrAnalyzePeFile(bufDataFromKernel.binPath);

		std::cout << "RESULT..\nString found: " << peAanlyzeResult.StringFound << \
			"\nWriteProcessMemory Found: " << peAanlyzeResult.WriteProcessMemoryFlag << \
			"\nCreateRemoteThread Found: " << peAanlyzeResult.CreateRemoteThreadFlag << \
			"\nVirtualAllocEx Found: " << peAanlyzeResult.VirtualAllocExFlag << \
			"\nOpenProcess Found: " << peAanlyzeResult.OpenProcessFlag << std::endl;
}
}

and this is code in the main thread that opens a handle to the driver and the completion port:

	hDevice = CreateFileA(
		"\\\\.\\symKLEDR",
		GENERIC_ALL, 
		0, 
		NULL, 
		OPEN_EXISTING, 
		FILE_FLAG_OVERLAPPED, NULL
	);

	if (hDevice == INVALID_HANDLE_VALUE)
	{
		std::cout << "[-]ERROR: couldn't open a handle to the device, error code : " << GetLastError() << std::endl;
		return -1;
	}

	// let's make the IOCP
	hCompletionPort = CreateIoCompletionPort(hDevice, NULL, 0, 0);

	if (hCompletionPort == NULL)
	{
		std::cout << "[-]ERROR: while creating the IOCP, error code: " << GetLastError() << std::endl;
		return -1;
	}

	threadParamContext->hCompletionPort = hCompletionPort;
	threadParamContext->hDevice = hDevice;

And from the driver side, yah I'm pretty sure I have set the Irp->IoStatus.inforamtion to the size of the response, and it works fine with other IRPs but only the first IRPs never give any data when it's completed. (knowing that the same code runs for all IRPs whether it's the first or any, but only the first don't hold any data sent by the driver).

and regarding the "thing I've tried" section, I haven't described what I have done very well, so I run the user app under a debugging session and set a break point just after it receives the first request from kernel, then I kept it paused until the driver completed all the other IRPs, then I continued the execution of that user mode app and then it worked very well. But I have just realized now that I have only paused a single thread from the whole process, so again sorry ignore that test :sweat_smile:

yah that's my main issue as you said, the first requests with any case. I have tried to like print the data from the driver side before completing the IRP and it just prints fine; but never reached the user side. Do you have like any ideas of what I should trace in the debugging of the kernel driver ?

Are you calling this function eight times??? This is totally wrong. GetQueuedCompletionStatus returns WHICHEVER request completed. It is not necessarily going to be the one you just submitted. With overlapped I/O, you don't use multiple threads. Instead, you run a single loop that creates and submits your N requests, and then have ONE loop that calls GetQueuedCompletiongStatus, processes it, and resubmits THAT request. You just keep those 8 requests in circulation forever.

If GetLastError does not return ERROR_IO_PENDING, you go right on and call GetQueuedCompletionStatus. That's a mistake. If you don't get ERROR_IO_PENDING, then you should print out what error you DID get and exit. Also, if DeviceIoControl returns 1 (meaning "success"), then you do not call GetQueuedCompletionStatus.

1 Like

there are several problems with this code

the return from GetLastError() is only valid if DeviceIoControl actually fails. If the KM side completed the request synchronously, even OVERLAPPED IO can succeeded directly. Look at SetFileCompletionNotificationModes

GetQueuedCompletionStatus and the Ex version are meant to be called by dedicated threads. Threads that expect to complete work started by other threads. and a NULL OVERLAPPED pointer is not an appropriate shutdown signal. PostQueuedCompletionStatus with a special completion key is

if you want to issue OVERLAPPED IO and then wait or poll for the completion of a specific IRP, then GetOverlappedResult. But the point of IOCP is that the thread issuing the IO continues to do something else while it is in progress, and that some other thread will detect the completion and perform the next action

1 Like

Hello Mr. Tim,
So first of all I'm still learning I couldn't have any mentor so I studied all of that with my own so I may do some weird mistakes, thanks anyway for advising me.

So I wasn't care really about which threads receives which request because they all going to do the same, and the kernel driver was sending too many requests too so I thought of making many threads to accepts as fast as possible those many requests. That's why I thought to make like multi-threaded user app.

But you said that one thread will be enough right ?, My kernel driver is just tracing every newly created process so it gives too many responses, in the future of course I'm going to do some other filtration to reduce that number of responses. And actually most of the threads that was created was really working like it receives and responses.

about the GetLastError, I think of like if the DeviceIoControl fails for any reason other than the way it meant to be, I just wanted to ignore that and not stopping my user app because I want it to be alive as much as it can, but as you said it may cause many problems, so is the best way to keep it alive is like to terminate this thread but make the main thread decide if it's ok to make another one to keep track of the kernel driver responses? or is there's another better way ?.

Hello,
Really thanks for this clarifications.

Great I got that I'm not using the IOCP the way that it should be used for, in the first design I created the main thread to sent the request then create another thread to monitor those requests for it and let the main thread do the rest of the job. But I modified it to make it like all the things done in that newly created thread, I will modify that.

In my main thread, it's just waiting for my response in the terminal to decide whether to shutdown those threads or not (by sending a NULL OVERLAPPED in PostQueuedCompletionStatus), but I just choose the overlapped to be null just for test, I'm going to modify it to send a special completion key for sure.

If your driver is also a mini-filter (which a lot of EDR implement), simplify your life and use the I/O manager messaging API

2 Likes

for sorry it's not, and really yah I have tried to use those before and it's much much easier than this. I think I'm going to remake that kernel driver again to make it minifilter, I just tried to achieve those with my a WDM but it seems I'm facing many problems.