Benchmarking WinUSB, libusbk and Cypress cyusb driver

I was investigating why cyusb3.sys is faster than libusbk and WinUSB for bulk transfer (IN or OUT), and one of the theory from Travis Robinson (the developer of libusbk) is that libusbk/WinUSB are based on KMDF but Cypress cyusb3.sys is most likely based on WDM. To check that theory, I use libusb-win32 device driver (libusb0.sys) which is also based on WDM, indeed I got the same result as cyusb3.sys.

Is the overhead significant in this case?

Benchmark data: superspeed bulk IN/OUT transfer with the bulksrcsink example from Cypress (1024B packet size, burst length 16, theoretical max speed up to 454,300 KB/sec as per https://www.cypress.com/file/125281/download ).

Real world speed using the same Windows 10 computer and same Cypress BulkSrcSink FW.
437,800 KB/sec for cyusb3.sys using Cypress Cystream application
437,700 KB/sec for libusb0.sys using libusbk kBench application
386,600 KB/sec for libusbk.sys using libusbk kBench application
385,300 KB/sec for WinUSB using libusbk kBench application

Is the KMDF overhead over WDM so significant that this is the root cause of the slower speed?