I've implemented a high-performance TSC acquisition driver using METHOD_NEITHER I/O transfer semantics with direct user address manipulation:
// DeviceIoControl IOCTL handler excerpt
PVOID outputBuffer = Irp->UserBuffer;
ProbeForWrite(outputBuffer, sizeof(double), TYPE_ALIGNMENT(double));
*(double *)outputBuffer = mytimer();
This architecture provides several performance characteristics:
- Zero intermediate buffer allocation/copying
- Single memory validation touch operation
- Direct user VA manipulation without MDL overhead
- Minimal instruction path from TSC register to user memory
Is this approach the optimal implementation for absolute minimum-latency TSC acquisition within the Windows driver model constraints? Are there architectural optimizations I'm overlooking that could further reduce the critical path execution time?
My primary concern is the mandatory ProbeForWrite() validation (~35-150 cycles overhead). Has anyone implemented a more efficient boundary-crossing architecture while maintaining Windows security model compliance?