Question double floating kernel to userland ioctl

eden · April 6, 2025, 10:09pm

I've implemented a high-performance TSC acquisition driver using METHOD_NEITHER I/O transfer semantics with direct user address manipulation:

// DeviceIoControl IOCTL handler excerpt
PVOID outputBuffer = Irp->UserBuffer;
ProbeForWrite(outputBuffer, sizeof(double), TYPE_ALIGNMENT(double));
*(double *)outputBuffer = mytimer();

This architecture provides several performance characteristics:

Zero intermediate buffer allocation/copying
Single memory validation touch operation
Direct user VA manipulation without MDL overhead
Minimal instruction path from TSC register to user memory

Is this approach the optimal implementation for absolute minimum-latency TSC acquisition within the Windows driver model constraints? Are there architectural optimizations I'm overlooking that could further reduce the critical path execution time?

My primary concern is the mandatory ProbeForWrite() validation (~35-150 cycles overhead). Has anyone implemented a more efficient boundary-crossing architecture while maintaining Windows security model compliance?

Bo_Branten · April 6, 2025, 10:26pm

I think you can skip the call to ProbeForWrite because what you do on the next line is approximately the same! Here is a thread about ProbeForRead/Write: https://community.osr.com/t/probeforwrite-vs-probeforread/57879

Tim_Roberts · April 7, 2025, 12:22am

Why are you going to all this trouble instead of just encoding the rdtsc instruction? It's not privileged. There's even a compiler intrinsic for it:

   auto x = __rdtsc();

Also, you are radically overestimating the impact of METHOD_BUFFERED. If the input buffer length is 0 and the output buffer length of 8, you are talking about no copy on the input side, and a copy of 8 bytes on the output side, and the I/O system code for doing that has been highly optimized over the decades.

MBond2 · April 7, 2025, 1:12am

Assuming TSC is the time stamp counter (rdtsc instruction), why would you bother?

There are other sources of time, and maybe better ones. But you certainly don't need any special code to get a CPU's counter - or any of the in built time stamps in either UM or KM