Missing IRP IRP_MN_REMOVE_DEVICE after IRP_MN_QUERY_REMOVE_DEVICE

I has been tested a driver. When I attempt to disable a driver, it receives IRP_MN_QUERY_REMOVE_DEVICE but never receives IRP_MN_REMOVE_DEVICE. When I compile a same driver for Windows XP PnP signalisation works fine. IRP_MN_REMOVE_DEVICE is received. I have no clue why Windows does not send me IRP_MN_REMOVE_DEVICE.

Here is a description on Microsoft pages, but there are no explanation, why Windows 10 hangs in remove-pending state.
https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/understanding-when-remove-irps-are-issued
https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/images/rem-irps.png

The IRP dispatching seems workable:

static NTSTATUS DispatchPnp(PDEVICE_OBJECT DeviceObject, PIRP Irp)
{
NTSTATUS            status;
PIO_STACK_LOCATION  stack = IoGetCurrentIrpStackLocation(Irp);
DEVICEDATA         *deviceData = (DEVICEDATA*)DeviceObject->DeviceExtension;

  switch(stack->MinorFunction)
  {
    case IRP_MN_START_DEVICE:
        _DbgPrintF(DEBUGLVL_VERBOSE, "IRP_MN_START_DEVICE");
	if(deviceData->NumMemResources>0 || deviceData->NumIOResources>0 || deviceData->NumIRQResources>0)
	{
	  DeallocateResources(deviceData);	/* Double start condition!!! */
	}
	status = PnpStart(DeviceObject, Irp);
        deviceData->DevicePnPState = Started;
	break;

    case IRP_MN_QUERY_STOP_DEVICE:
        _DbgPrintF(DEBUGLVL_VERBOSE, "IRP_MN_QUERY_STOP_DEVICE");
	deviceData->DevicePnPState = StopPending;
        SendIrpSynchronously(deviceData->NextLowerDriver, Irp);	
	status = STATUS_SUCCESS;
	break;

    case IRP_MN_QUERY_REMOVE_DEVICE:	/* warning before device removing */
        _DbgPrintF(DEBUGLVL_VERBOSE, "IRP_MN_QUERY_REMOVE_DEVICE");
        deviceData->DevicePnPState = RemovePending;
        //IoSetDeviceInterfaceState(&deviceData->InterfaceName, FALSE);		// Shutdown all outstanding interfaces.
			//https://github.com/teeedubb/ScpServer/blob/master/ScpVBus/bus/buspdo.c
			//https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/handling-an-irp-mn-query-remove-device-request
        //SendIrpSynchronously(deviceData->NextLowerDriver, Irp);
	//_DbgPrintF(DEBUGLVL_VERBOSE, "SendIrpSynchronously() finished");
        //status = STATUS_SUCCESS;
	//break;

        //Irp->IoStatus.Status = STATUS_SUCCESS;
        //IoCompleteRequest(Irp, IO_NO_INCREMENT);
	//return STATUS_SUCCESS;

	IoSkipCurrentIrpStackLocation(Irp);
        status = IoCallDriver(deviceData->NextLowerDriver, Irp);
	_DbgfprintF(DEBUGLVL_VERBOSE, "query remove IoCallDriver status = %Xh\n",status);
        return status;

	//status = SendIrpAsync(deviceData->NextLowerDriver, Irp);
	//_DbgfprintF(DEBUGLVL_VERBOSE, "status = %Xh\n",status);
	//return status;

    case IRP_MN_STOP_DEVICE:
        _DbgPrintF(DEBUGLVL_VERBOSE, "IRP_MN_STOP_DEVICE");
	deviceData->DevicePnPState = Stopped;
	DeallocateResources(deviceData);

		/* Pass the IRP down so that lower drivers can stop. */
	Irp->IoStatus.Status = STATUS_SUCCESS;  // A driver must set Irp->IoStatus.Status to STATUS_SUCCESS.
	SendIrpAsync(deviceData->NextLowerDriver, Irp);
	return STATUS_SUCCESS;

    case IRP_MN_REMOVE_DEVICE:
        _DbgPrintF(DEBUGLVL_VERBOSE, "IRP_MN_REMOVE_DEVICE");
        deviceData->DevicePnPState = Deleted;
		/* Disable the device interface */
	IoSetDeviceInterfaceState(&deviceData->InterfaceName, FALSE);

	DeallocateResources(deviceData);

	Irp->IoStatus.Status = STATUS_SUCCESS;  // A driver must set Irp->IoStatus.Status to STATUS_SUCCESS.
	SendIrpAsync(deviceData->NextLowerDriver, Irp);	

            // delete our device, we have to do this after we send the request down
        IoDetachDevice(deviceData->NextLowerDriver);
	deviceData->NextLowerDriver = NULL;
        IoDeleteDevice(DeviceObject);
        return STATUS_SUCCESS;

	// This could allow to unload and load driver without rebooting. Not tested yet.
   case IRP_MN_QUERY_CAPABILITIES:
        _DbgPrintF(DEBUGLVL_VERBOSE, "IRP_MN_QUERY_CAPABILITIES");
        {
          PDEVICE_CAPABILITIES DeviceCapabilities;
          PIO_STACK_LOCATION IrpSp = IoGetCurrentIrpStackLocation(Irp);

          //TRACE(TL_TRACE, ("t1394VDev_Pnp: IRP_MN_QUERY_CAPABILITIES\n"));
          DeviceCapabilities = IrpSp->Parameters.DeviceCapabilities.Capabilities;
          if(DeviceCapabilities)
              DeviceCapabilities->SurpriseRemovalOK = TRUE;
          SendIrpSynchronously(deviceData->NextLowerDriver, Irp);
	  status = STATUS_SUCCESS;
          break;
        }

    case IRP_MN_QUERY_DEVICE_RELATIONS:
	_DbgPrintF(DEBUGLVL_VERBOSE, "IRP_MN_QUERY_DEVICE_RELATIONS");
	Irp->IoStatus.Status = STATUS_SUCCESS;  // A driver must set Irp->IoStatus.Status to STATUS_SUCCESS.
	IoSkipCurrentIrpStackLocation(Irp);
	return(IoCallDriver(deviceData->NextLowerDriver, Irp));
        //Irp->IoStatus.Status = status;
        //IoCompleteRequest(Irp, IO_NO_INCREMENT);
	//return STATUS_SUCCESS;

    case IRP_MN_CANCEL_REMOVE_DEVICE:
	_DbgPrintF(DEBUGLVL_VERBOSE, "IRP_MN_CANCEL_REMOVE_DEVICE");
	IoSkipCurrentIrpStackLocation(Irp);
	return(IoCallDriver(deviceData->NextLowerDriver, Irp));

    default :			// Asynchronous handling of unknown PnP request.
        _DbgfprintF(DEBUGLVL_VERBOSE, "Unknown IRP received %u\n",stack->MinorFunction);
	IoSkipCurrentIrpStackLocation(Irp);
	return(IoCallDriver(deviceData->NextLowerDriver, Irp));
        //break;
  }

	/* Signal OK and complete the IRP. */
  Irp->IoStatus.Status = status;
  IoCompleteRequest(Irp, IO_NO_INCREMENT);

  return(status);
}

You’re not completing the request, you’re forwarding it to the next driver device in the stack.
Maybe the next one is failing the request ?

Microsoft exactly wants this approach:
https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/handling-an-irp-mn-query-remove-device-request

_Finish the IRP:
In a function or filter driver:

  1. Set Irp->IoStatus.Status to STATUS_SUCCESS.
  2. Set up the next stack location with IoSkipCurrentIrpStackLocation and pass the IRP to the next lower driver with IoCallDriver.
    
  3. Propagate the status from IoCallDriver as the return status from the DispatchPnP routine._
    
  4. ** Do not complete the IRP.**

PCI device is not a bus driver:
_In a bus driver:

  • Set Irp->IoStatus.Status to STATUS_SUCCESS.
  • Complete the IRP (IoCompleteRequest) with IO_NO_INCREMENT.
    
  • Return from the DispatchPnP routine.
    

_

Are you seeing a cancel remove irp? After you don’t see the remove irp try disabling the device. If it hangs that means the query remove irp is stuck somewhere and you need to debug which driver is holding onto it.

This is all my tracelog (13=IRP_MN_FILTER_RESOURCE_REQUIREMENTS):

XDaqLib: AddDevice()
XDaqLib: Unknown IRP received 13
XDaqLib: IRP_MN_START_DEVICE
XDaqLib: DispatchPnpComplete()
XDaqLib: Connecting MSI IRQ
XDaqLib: IRP_MN_QUERY_DEVICE_RELATIONS

I put a request to stop device (I attempt to hook IRP completion request callback):

XDaqLib: IRP_MN_QUERY_REMOVE_DEVICE
XDaqLib: DispatchPnpComplete()
XDaqLib: SendIrpSynchronously() finished

The device is marked as stopped in Windows 10. No error emitted.

And attempt to start device again fails (8 - IRP_MN_QUERY_INTERFACE):

XDaqLib: Unknown IRP received 8
XDaqLib: Unknown IRP received 8
XDaqLib: Unknown IRP received 8
XDaqLib: Unknown IRP received 8
XDaqLib: Unknown IRP received 8
XDaqLib: Unknown IRP received 8
XDaqLib: AddDevice()
XDaqLib: Cannot attach device to the stack

May be that this is something new in Windows 10. As I have written this signalisation still works in Windows XP.

Maybe your logging has a bug. If the stack is being built back up then the remove was sent to the old stack otherwise it is not possible to build the stack back up again. Why are you failing the second AddDevice?

@Doron_Holan said:
Why are you failing the second AddDevice?
It is simply because previous shutdown sequence has never been completed. And Windows protects to add second device to the driver. May be that something hanged and Windows decides not to continue in shutdown sequence.

I can fix AddDevice and internally test whether previous device has been released or not.

But I still prefer to discower why the device cannot be removed and shutdown sequence hangs.

When your driver misses IRP_MN_REMOVE_DEVICE, what does setupapi.dev.log say? does the log indicate that the remove was successful? I think there is a bug in your driver, not a missing or orphanced remove device . Why? Because

a) if it is missing, then the stack would not be rebuilt. The only way a stack is rebuilt is if it is in the surprise removed or completely gracefully removed.

b) a remove device irp is a pnp state changing irp. Only one can be active at a time. While active, it holds the pnp state lock. AddDevice is also a state pnp changing operation and requires the same lock.

Are you in a device class that installs upper filters above your FDO?

I have finally found a source of problem. Thanks for support. PnP signalisation has been implemented correctly.

The problem was missing deviceObject->Flags settings. This is causing problems in Windows 10. It worked for many years without visible problems. Not setting deviceObject->Flags &= ~DO_DEVICE_INITIALIZING; is newly causing loosing IRP_MN_REMOVE_DEVICE.


static NTSTATUS AddDevice(PDRIVER_OBJECT DriverObject, PDEVICE_OBJECT PhysicalDeviceObject)
{
NTSTATUS status;
PDEVICE_OBJECT deviceObject = NULL;
DEVICEDATA *deviceData;
POWER_STATE powerState;

  _DbgPrintF(DEBUGLVL_VERBOSE, "AddDevice()");
  
	/* create a function device object */
  status = IoCreateDevice(DriverObject,
			  sizeof(DEVICEDATA),        // device extension size
			  NULL,                      // no name
			  FILE_DEVICE_UNKNOWN,       // device type
			  FILE_DEVICE_SECURE_OPEN,   // device characteristics
			  FALSE,                     // exclusive device
			  &deviceObject);
  if(!NT_SUCCESS(status))
  {
    _DbgfprintF(DEBUGLVL_VERBOSE, "IoCreateDevice failed %X\n",status);
    return(status);
  }

	// initialize device data */
  deviceData = (DEVICEDATA*) deviceObject->DeviceExtension;

  deviceData->PhysicalDeviceObject = PhysicalDeviceObject;
  deviceData->NumMemResources = deviceData->NumIOResources = deviceData->NumIRQResources = 0;
  deviceData->FlagThread = 0;
  deviceData->DevicePnPState = NotStarted;

	// attach the driver to the device stack
  deviceData->NextLowerDriver = IoAttachDeviceToDeviceStack(deviceObject, PhysicalDeviceObject);
  if(deviceData->NextLowerDriver == NULL)
  {
    _DbgPrintF(DEBUGLVL_VERBOSE, "Cannot attach device to the stack");
    IoDeleteDevice(deviceObject);
    return(STATUS_NO_SUCH_DEVICE);
  }

deviceObject->Flags |= deviceData->NextLowerDriver->Flags & (DO_BUFFERED_IO | DO_DIRECT_IO | DO_POWER_PAGABLE);
deviceObject->DeviceType = deviceData->NextLowerDriver->DeviceType;
deviceObject->Characteristics = deviceData->NextLowerDriver->Characteristics;

  deviceData->DevicePowerState = PowerDeviceD0;
  powerState.DeviceState = PowerDeviceD0;
	/* Notify the power manager of the new device power state. */
  PoSetPowerState(PhysicalDeviceObject, DevicePowerState, powerState);

	// register the interface
  status = IoRegisterDeviceInterface(PhysicalDeviceObject,
				     (LPGUID) &HUDAQ_DEVINTERFACE_GUID,
				     NULL,
				     &deviceData->InterfaceName);
  if(!NT_SUCCESS(status))
  {    
    _DbgfprintF(DEBUGLVL_VERBOSE, "IoRegisterDeviceInterface failed %X\n",status);
    IoDetachDevice(deviceData->NextLowerDriver);
    deviceData->NextLowerDriver = NULL;
    if(deviceData->InterfaceName.Buffer != NULL)
        RtlFreeUnicodeString(&deviceData->InterfaceName);
    IoDeleteDevice(deviceObject);
    return(status);
  }

deviceObject->Flags &= ~DO_DEVICE_INITIALIZING;

  return(STATUS_SUCCESS);
}

I, for one, want to thank you for posting the resolution. So often, we never hear what the real problem was.

I am not sure why this flag has so big impact to further PnP behaiour. Driver works fine for now on.
It is highly unclean why Windows driver worked correctly for many years.

// Define Device Object (DO) flags
#define DO_VERIFY_VOLUME                    0x00000002      
#define DO_BUFFERED_IO                      0x00000004      
#define DO_EXCLUSIVE                        0x00000008      
#define DO_DIRECT_IO                        0x00000010      
#define DO_MAP_IO_BUFFER                    0x00000020      
#define DO_DEVICE_INITIALIZING              0x00000080      
#define DO_SHUTDOWN_REGISTERED              0x00000800      
#define DO_BUS_ENUMERATED_DEVICE            0x00001000      
#define DO_POWER_PAGABLE                    0x00002000      
#define DO_POWER_INRUSH                     0x00004000  

May be that Windows newly sets bit DO_DEVICE_INITIALIZING.

Absolutely not. I went back to my dusty copy of Walter Oney’s “Programming the Windows Driver Model” from 1999, and even it stresses the importance of clearing DO_DEVICE_INITIALIZING. I went back even further to the samples in the NT 4.0 DDK, it also emphasizes this.

It has ALWAYS been an important driver responsibility to clear that bit.

Tim is correct. DO_DEVICE_INITIALIZING has been present from the start of NT (pre pnp) and it was necessary to clear to be able to create a handle to the device. In the move to pnp, that requirement is kept. I am guessing the reason you never saw an issue is that when a handle to a device interface is opened, the handle is against the PDO not your FDO, so the the bit being set on your FDO after AddDevice never affected the create handle path. It was always a latent bug in your driver and it just happened that an update to Windows 10 exposed the bug.