Windows 10: Is the driver returning to the caller before the data transfer is complete, for a user-defined ioctl?

I have deleted my previous post about this topic and have posted a shorter, clearer
version of that topic.

I'm an application software engineer who writes test software for testing SATA HDD
firmware. We use Windows 10 Enterprise systems with a SATA storage port driver that was developed in-house. The person who wrote the driver still works with me, but we haven't been able to figure out this problem.

In short, the tests I write send commands to a SATA HDD, wait for the response and verify if the response is valid or not. The test I'm having a problem with sends 1000s of commands over the course of 3 hours. One command in particular is sent to the HDD 100s of times during this test. Every time that command is sent to the HDD 4-7MB of data is transferred to the host. All of those commands complete successfully except for the last command, which causes a test failure because some of the data transferred is wrong. Oddly enough, we can only duplicate this issue readily on 1 Windows system in a lab which has close to 100 systems. We cannot determine if there is anything unique to this system. It's just another PC from what we can see.

When the last command fails, I can clearly see all of the correct data being transferred to the host, using a SATA protocol analyzer. There is nothing special about this command as it is issued 100s of times prior to the failure, all of which complete successfully.

This appears to be a race condition caused by one of two events.
1: Windows sent the interrupt to our driver before the data transfer actually completed, even though the ioctl returns a value indicating the expected amount of data was transferred.
2: Copying the data from the kernel buffer to the user space buffer was interrupted. However, given that the correct amount of data was actually copied to the user space buffer, somehow the wrong data was copied to user space.

This driver, and the test framework have been in use long before Windows 10 was even on the drawing board, I feel it is stable.

@Tim_Roberts
I saw your reply about the IOCTL_CODE here

We are using
//45000 is defined in-house for the SATA ioctl
#define IOCTL_SATA_COMMAND CTL_CODE( 45000, 0x930, METHOD_BUFFERED, FILE_ANY_ACCESS )

I would appreciate some help trying to understand why the data isn't being copied to user space correctly. I'm happy to post parts of the driver code as well as the
middle layer which calls the driver.

Thank you

to help, I think we need some more information. Probably the first question is if you are using OVERLAPPED IO from the UM side?

1 Like

@MBond2
I"m happy to provide you with more information.

UM side?
The handle to the HDD for this test does is NOT opened for sending overlapped commands.

i suppose I would look at memory coherency next. The obvious 'tell' for this kind of problem is that is goes away when debugged or works on one machine but not on another

memory coherency next

Would you elaborate what you mean by that?

I would like to emphasize that many of the exact same SATA commands have been issued prior to the one which fails. All of those sata commands were issued using the same function call in the test (user space).

I also left out that there are no multiple threads, nor sub-processes occurring from the test nor the middle layer. The SATA commands are issued synchronously with a timeout value. One command needs to complete before the next one is issued.

I understand that the test is successful many times in a row and then unexpectedly fails. And that you are not using OVERLAPPED IO or multiple threads.

the reason I suggest looking at memory coherency - of the lack thereof is that it fits with your problem. And it is suspicious that this seems to happen only on the last call - implying that issuing a new call may do somthing significant

obviously I am only guessing

1 Like