Question double floating kernel to userland ioctl

I've implemented a high-performance TSC acquisition driver using METHOD_NEITHER I/O transfer semantics with direct user address manipulation:

// DeviceIoControl IOCTL handler excerpt
PVOID outputBuffer = Irp->UserBuffer;
ProbeForWrite(outputBuffer, sizeof(double), TYPE_ALIGNMENT(double));
*(double *)outputBuffer = mytimer();

This architecture provides several performance characteristics:

  • Zero intermediate buffer allocation/copying
  • Single memory validation touch operation
  • Direct user VA manipulation without MDL overhead
  • Minimal instruction path from TSC register to user memory

Is this approach the optimal implementation for absolute minimum-latency TSC acquisition within the Windows driver model constraints? Are there architectural optimizations I'm overlooking that could further reduce the critical path execution time?

My primary concern is the mandatory ProbeForWrite() validation (~35-150 cycles overhead). Has anyone implemented a more efficient boundary-crossing architecture while maintaining Windows security model compliance?

I think you can skip the call to ProbeForWrite because what you do on the next line is approximately the same! Here is a thread about ProbeForRead/Write: https://community.osr.com/t/probeforwrite-vs-probeforread/57879

Why are you going to all this trouble instead of just encoding the rdtsc instruction? It's not privileged. There's even a compiler intrinsic for it:

   auto x = __rdtsc();

Also, you are radically overestimating the impact of METHOD_BUFFERED. If the input buffer length is 0 and the output buffer length of 8, you are talking about no copy on the input side, and a copy of 8 bytes on the output side, and the I/O system code for doing that has been highly optimized over the decades.

1 Like

Assuming TSC is the time stamp counter (rdtsc instruction), why would you bother?

There are other sources of time, and maybe better ones. But you certainly don't need any special code to get a CPU's counter - or any of the in built time stamps in either UM or KM

I think you can skip the call to ProbeForWrite because what you do on the next line is approximately the same

No, you absolutely can NOT skip the probe step in METHOD_NEITHER. With METHOD_NEITHER the OS does zero validation and simply provides whatever address the user specified in the call. The probe step validates that the address provided by the user is not a kernel address.

Though note that, amusingly enough, a ProbeForRead is sufficient here because all you really want to do is make sure it's a user mode address. After that you need to wrap all your accesses to the buffer in a __try/__except block because the user address may not be valid (and, given that it's a user address, the user application can arbitrarily change the validity or protections on the address). The ProbeForWrite documentation actually recommends always doing ProbeForRead because the write version touches the start of every page which isn't necessarily useful or helpful.