right way to read data from driver ??

Hi,
My application has a read thread ( prio is above NORMAL ), which polls for messages from Driver as per below code snippet.

This take more than 25% of CPU time and guess has impact on my whole of application.

How to read data on kernel event and sleep for rest of time? I want to keep less load of CPU? I can’t put a Sleep() inside my thread loop.

CreateFIle is invoked with OVERLAPPED enabled and I use OVERLAPPED for WriteFile and DeviceIOControl APIs.

void MyReadClass::ReadThread()
{

while(!bSTOP)
{

success = ReadFile(hDEV, pdataBuf, ReadLen, (PULONG)&nBytesRead, &ovRead);
if (success == FALSE && GetLastError() != ERROR_IO_PENDING) {
break;
}
if (GetOverlappedResult(hDEV, &ovRead, (PULONG)&nBytesRead, TRUE) != 0 )
{
ProcessData(pdataBuf,nBytesRead);
}
}

}//end of thread func

There are many ways to read data in UM, but this is a terrible one. You have a tight loop polling for IO completion. If you wanted a sync read, you should populate the hEvent member of your OVERLAPPED and ReadFile will block until your IO is complete. This will avoid your problem of flat lining a core by having a tight loop. Of course this is no longer overlapped IO

This design will be effective for a single thread. If you need more, or better performance, then you should look at IO completion ports or the thread pool APIs

Sent from Mailhttps: for Windows 10

From: srinivaskumar.r@in.bosch.commailto:srinivaskumar.r
Sent: December 14, 2016 7:19 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: [ntdev] right way to read data from driver ??

Hi,
My application has a read thread ( prio is above NORMAL ), which polls for messages from Driver as per below code snippet.

This take more than 25% of CPU time and guess has impact on my whole of application.

How to read data on kernel event and sleep for rest of time? I want to keep less load of CPU? I can’t put a Sleep() inside my thread loop.

CreateFIle is invoked with OVERLAPPED enabled and I use OVERLAPPED for WriteFile and DeviceIOControl APIs.

void MyReadClass::ReadThread()
{

while(!bSTOP)
{

success = ReadFile(hDEV, pdataBuf, ReadLen, (PULONG)&nBytesRead, &ovRead);
if (success == FALSE && GetLastError() != ERROR_IO_PENDING) {
break;
}
if (GetOverlappedResult(hDEV, &ovRead, (PULONG)&nBytesRead, TRUE) != 0 )
{
ProcessData(pdataBuf,nBytesRead);
}
}

}//end of thread func


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:srinivaskumar.r></https:>

I am using OVERLAPPED with events. But I could not get the the IO completion ports or other better options. Please share some code snippets or links.

My driver sends data intermittently and it has to reach the upper layers so that next data can be sent to device again. below is my design of read thread; I know it is in bad shape. need your inputs to improve.

void MyReadClass::ReadThread()
{
BYTE by_bIn[READ_BYTES];
USHORT wIndex = 0;
PACKET packet;
int nBytesRead;
BOOL success;
int ReadLen = READ_BYTES;
OVERLAPPED ovRead;
memset(&ovRead, 0, sizeof(ovRead));
ovRead.Offset = 0;
ovRead.hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);

while (!m_stop)
{
//Sleep(1);
success = ReadFile(mHandle, by_bIn, ReadLen, (PULONG)&nBytesRead, &ovRead);
if (success == 0 && GetLastError() == ERROR_IO_PENDING) // 0 as it is anync
{
if (GetOverlappedResult(mHandle, &ovRead, (PULONG)&nBytesRead, TRUE) != 0)
{
if (nBytesRead > 0){
ProcessPackets(&packet);;
}
}
}
}

}

Sorry ? I should have been more clear. ReadFile won?t block, but you will wait using WaitForSingleObject and then get the results using GetOverlappedResult

The effect is blocking IO

Sent from Mailhttps: for Windows 10

From: Marion Bondmailto:xxxxx
Sent: December 14, 2016 7:41 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: RE: [ntdev] right way to read data from driver ??

There are many ways to read data in UM, but this is a terrible one. You have a tight loop polling for IO completion. If you wanted a sync read, you should populate the hEvent member of your OVERLAPPED and ReadFile will block until your IO is complete. This will avoid your problem of flat lining a core by having a tight loop. Of course this is no longer overlapped IO

This design will be effective for a single thread. If you need more, or better performance, then you should look at IO completion ports or the thread pool APIs

Sent from Mailhttps: for Windows 10

From: srinivaskumar.r@in.bosch.commailto:srinivaskumar.r
Sent: December 14, 2016 7:19 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: [ntdev] right way to read data from driver ??

Hi,
My application has a read thread ( prio is above NORMAL ), which polls for messages from Driver as per below code snippet.

This take more than 25% of CPU time and guess has impact on my whole of application.

How to read data on kernel event and sleep for rest of time? I want to keep less load of CPU? I can’t put a Sleep() inside my thread loop.

CreateFIle is invoked with OVERLAPPED enabled and I use OVERLAPPED for WriteFile and DeviceIOControl APIs.

void MyReadClass::ReadThread()
{

while(!bSTOP)
{

success = ReadFile(hDEV, pdataBuf, ReadLen, (PULONG)&nBytesRead, &ovRead);
if (success == FALSE && GetLastError() != ERROR_IO_PENDING) {
break;
}
if (GetOverlappedResult(hDEV, &ovRead, (PULONG)&nBytesRead, TRUE) != 0 )
{
ProcessData(pdataBuf,nBytesRead);
}
}

}//end of thread func


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></http:></http:></http:></mailto:xxxxx></mailto:srinivaskumar.r></https:></mailto:xxxxx></mailto:xxxxx></https:>

Can there only ever be one request outstanding due to a hardware limit or is this something that exists because of your design? If there will only ever be one and the hardware cannot proceed without a response from UM, then you are doomed to low data rates, but this might not be a problem for your application. If that is the case, then the code you have below needs only minor corrections

Something like this

If(ReadFile(…) == 0)

{

dwErr = GetLastError();

If(dwErr == ERROR_IO_PENDING)

{

WaitForSingleObject(…)

}

Else

{

//Throw new exception(?handle this error somehow? ); // dwErr contains the error code

}

}

GetOverlappedResult(…)

// the read has completed and we have the results ? do something with them

Sent from Mailhttps: for Windows 10

From: srinivaskumar.r@in.bosch.commailto:srinivaskumar.r
Sent: December 14, 2016 8:18 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: RE:[ntdev] right way to read data from driver ??

I am using OVERLAPPED with events. But I could not get the the IO completion ports or other better options. Please share some code snippets or links.

My driver sends data intermittently and it has to reach the upper layers so that next data can be sent to device again. below is my design of read thread; I know it is in bad shape. need your inputs to improve.

void MyReadClass::ReadThread()
{
BYTE by_bIn[READ_BYTES];
USHORT wIndex = 0;
PACKET packet;
int nBytesRead;
BOOL success;
int ReadLen = READ_BYTES;
OVERLAPPED ovRead;
memset(&ovRead, 0, sizeof(ovRead));
ovRead.Offset = 0;
ovRead.hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);

while (!m_stop)
{
//Sleep(1);
success = ReadFile(mHandle, by_bIn, ReadLen, (PULONG)&nBytesRead, &ovRead);
if (success == 0 && GetLastError() == ERROR_IO_PENDING) // 0 as it is anync
{
if (GetOverlappedResult(mHandle, &ovRead, (PULONG)&nBytesRead, TRUE) != 0)
{
if (nBytesRead > 0){
ProcessPackets(&packet);;
}
}
}
}

}


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:srinivaskumar.r></https:>

Have you considered using the extented version of ReadFile (ReadFileEx) ?

You would be notified of the completion ‘automagically’ by the I/O manager by means of an APC. This APC would run when the I/O operation has completed and the calling thread enters an alertable wait state.

You couldn’t use this feature for an IOCTL operation for instance.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa365468(v=vs.85).aspx

Still the issue is not resolved. I modified the read in my thread with WaitForSingleObject(); but still no improvement.

I guess the issue could be inside my driver as well. I have pasted the skeleton of my EvtIoRead().

The read is simple; I have configured the USBCOntinuousReader which is polling the USB BUS. In the read completion routine, data is copied to a ringbuffer with spinLock.
THE PROBLEM IS THAT most of times read completion gets empty data from device and ring buffer will be empty. The device sends around 40-100 bytes of data for 60-80usec.

I see that my readthread is calling ReadFile() with no delay and this is in turn invoking UsbDeviceEvtIoRead() continuously. Most of times ReadFile get 0 as bytes read as ringbuffer is empty.

VOID UsbDeviceEvtIoRead(IN WDFQUEUE Queue, IN WDFREQUEST Request, IN size_t Length) {

PUSBDEVICE_DEVICE_EXTENSION pDevExt = NULL;
WDFMEMORY requestBuffer;

status = WdfRequestRetrieveOutputMemory(Request, &requestBuffer);

//Acquire a spinlock on a ringBUffer…
//this Ringbuffer is filled by UsbdeviceEvtBulkInPipeReadComplete which is configured in WdfUsbTargetPipeConfigContinuousReader

WdfSpinLockAcquire(pDevExt->spinLockRxMsgList);

//copy data from ring buffer… ???
// if the ringBuff

ringBuffLength = RingBufferGetLength(pDevExt->pRingBuffer[iProcessCount]);//

if (ringBuffLength <= 0)
{
completionSz = 0;
}
else
{
status = WdfRequestRetrieveOutputMemory(Request, &requestBuffer);
if (Length < ringBuffLength)
{
ringBuffLength = Length;
}
status = RingBufferRemove(pDevExt->pRingBuffer[iProcessCount], requestBuffer, 0, ringBuffLength, FALSE);
completionSz = (ULONG)ringBuffLength;
}

WdfSpinLockRelease(pDevExt->spinLockRxMsgList); //release the lock

WdfRequestCompleteWithInformation(Request, status, completionSz);

}

How to make my UsbDeviceEvtIoRead() to block until there is some data in ringbuffer? Can I loop inside this function for few iterations or until data is available in ringbuffer?
Is there any event to fire to userspace? I am completely lost in this scenario?

Here is my latest readthread in user space application…

while(…)
{
if (ReadFile(mHandle, by_bIn, ReadLen, NULL, &m_ovRead) == 0)
{
derror = GetLastError();
if (derror != ERROR_IO_PENDING)
{
continue;////error case.
}
else
{
waitState = WaitForSingleObject(m_ovRead.hEvent, 100);
switch (waitState)
{
case WAIT_FAILED:
case WAIT_TIMEOUT:
case WAIT_ABANDONED:
derror = GetLastError();
break;
case WAIT_OBJECT_0:
if (GetOverlappedResult(mHandle, &m_ovRead, (PULONG)&nBytesRead, TRUE) != 0)
{
if (nBytesRead > 0) {
packet.bIsPacket = FALSE;
for (wIndex = 0; wIndex < nBytesRead; wIndex++)
{
fHandlerRx.ExtractFrame(by_bIn[wIndex], &packet);
if (packet.bIsPacket == TRUE)
{
packet.bIsPacket = FALSE;
ProcessPacket(&packet);;
}
}
}
}

break;
}
}
}

srinivaskumar.r@in.bosch.com wrote:

Still the issue is not resolved. I modified the read in my thread with WaitForSingleObject(); but still no improvement.

I guess the issue could be inside my driver as well. I have pasted the skeleton of my EvtIoRead().

The read is simple; I have configured the USBCOntinuousReader which is polling the USB BUS. In the read completion routine, data is copied to a ringbuffer with spinLock.
THE PROBLEM IS THAT most of times read completion gets empty data from device and ring buffer will be empty. The device sends around 40-100 bytes of data for 60-80usec.

I see that my readthread is calling ReadFile() with no delay and this is in turn invoking UsbDeviceEvtIoRead() continuously. Most of times ReadFile get 0 as bytes read as ringbuffer is empty.

You have a totally synchronous mindset here.

Your user-mode code waits for 100ms, and if there is a timeout, it loops
around and submits another request with the same OVERLAPPED structure.
That’s totally wrong. If your wait times out, that means your current
request is still running. You can’t resubmit until that one completes.
You either need to continue to wait, or cancel the existing request.

In your kernel-mode code, if there is nothing in the ring buffer when
you get EvtIoRead, then you need to place the request on a manual queue
and return. Then, in your continuous reader callback, before you copy
the data to your ring buffer, you check to see whether there are
requests in the queue. If so, you pop the top request, copy the data,
and complete the request.

A clever designer will note that there is an opportunity for code reuse
here:

UsbEvtIoRead( … )
{
unconditionally move request to manual queue;
CheckForCompletions;
}

ContinuousReaderCallback()
{
copy data to ring buffer;
CheckForCompletions;
}

CheckForCompletions()
{
while( there is more data in ring buffer )
{
if waiting queue is empty, return;
pop top request
copy data into it, updating ring buffer pointers
if request is full
complete it
else
push back on queue
}
}

Why do you need a kernel driver at all? Virtually every custom USB
device can be better handled by a user-mode app calling WinUSB. Instead
of automatically thinking of a kernel driver, you should think about a
wrapper DLL with its own friendly interface, calling WinUSB to do the
USB work.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Well, your UM code looks better.

In KM, you need to pend the read request until there is data to satisfy it. If you complete the request with zero bytes, it will cause unnecessary KM / UM transitions.

Usually in KMDF a queue is used for this. Usually, you would pend several ReadFile overlapped calls from UM and in KM, when data arrives, either complete the next pending request or queue the data (ring buffer or otherwise). When a new request comes in from UM, either complete it directly with queued data or queue it until new data arrives from your HW

Sent from Mailhttps: for Windows 10

From: srinivaskumar.r@in.bosch.commailto:srinivaskumar.r
Sent: December 20, 2016 7:46 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: RE:[ntdev] right way to read data from driver ??

Still the issue is not resolved. I modified the read in my thread with WaitForSingleObject(); but still no improvement.

I guess the issue could be inside my driver as well. I have pasted the skeleton of my EvtIoRead().

The read is simple; I have configured the USBCOntinuousReader which is polling the USB BUS. In the read completion routine, data is copied to a ringbuffer with spinLock.
THE PROBLEM IS THAT most of times read completion gets empty data from device and ring buffer will be empty. The device sends around 40-100 bytes of data for 60-80usec.

I see that my readthread is calling ReadFile() with no delay and this is in turn invoking UsbDeviceEvtIoRead() continuously. Most of times ReadFile get 0 as bytes read as ringbuffer is empty.

VOID UsbDeviceEvtIoRead(IN WDFQUEUE Queue, IN WDFREQUEST Request, IN size_t Length) {

PUSBDEVICE_DEVICE_EXTENSION pDevExt = NULL;
WDFMEMORY requestBuffer;

status = WdfRequestRetrieveOutputMemory(Request, &requestBuffer);

//Acquire a spinlock on a ringBUffer…
//this Ringbuffer is filled by UsbdeviceEvtBulkInPipeReadComplete which is configured in WdfUsbTargetPipeConfigContinuousReader

WdfSpinLockAcquire(pDevExt->spinLockRxMsgList);

//copy data from ring buffer… ???
// if the ringBuff

ringBuffLength = RingBufferGetLength(pDevExt->pRingBuffer[iProcessCount]);//

if (ringBuffLength <= 0)
{
completionSz = 0;
}
else
{
status = WdfRequestRetrieveOutputMemory(Request, &requestBuffer);
if (Length < ringBuffLength)
{
ringBuffLength = Length;
}
status = RingBufferRemove(pDevExt->pRingBuffer[iProcessCount], requestBuffer, 0, ringBuffLength, FALSE);
completionSz = (ULONG)ringBuffLength;
}

WdfSpinLockRelease(pDevExt->spinLockRxMsgList); //release the lock

WdfRequestCompleteWithInformation(Request, status, completionSz);

}

How to make my UsbDeviceEvtIoRead() to block until there is some data in ringbuffer? Can I loop inside this function for few iterations or until data is available in ringbuffer?
Is there any event to fire to userspace? I am completely lost in this scenario?

Here is my latest readthread in user space application…

while(…)
{
if (ReadFile(mHandle, by_bIn, ReadLen, NULL, &m_ovRead) == 0)
{
derror = GetLastError();
if (derror != ERROR_IO_PENDING)
{
continue;////error case.
}
else
{
waitState = WaitForSingleObject(m_ovRead.hEvent, 100);
switch (waitState)
{
case WAIT_FAILED:
case WAIT_TIMEOUT:
case WAIT_ABANDONED:
derror = GetLastError();
break;
case WAIT_OBJECT_0:
if (GetOverlappedResult(mHandle, &m_ovRead, (PULONG)&nBytesRead, TRUE) != 0)
{
if (nBytesRead > 0) {
packet.bIsPacket = FALSE;
for (wIndex = 0; wIndex < nBytesRead; wIndex++)
{
fHandlerRx.ExtractFrame(by_bIn[wIndex], &packet);
if (packet.bIsPacket == TRUE)
{
packet.bIsPacket = FALSE;
ProcessPacket(&packet);;
}
}
}
}

break;
}
}
}


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:srinivaskumar.r></https:>

Hi,
Thanks for the inputs. Now I have to figure out as how to implement manual queues for my scenario.
Is there any example on how to create the manual queues ?

But I would like to give one more information. My requirement is to support 4 device max and 3 client processes can access one/4 device in parallel ( rx, tx same time ). due to few performance requirements , i am evaluating the custom driver.

Right now I create 3 queues for ioctrl, read,write. below is snipped for read,
WDF_IO_QUEUE_CONFIG_INIT(&ioQueueConfig, WdfIoQueueDispatchSequential);

WDF_OBJECT_ATTRIBUTES_INIT(&queueAttributes);

ioQueueConfig.EvtIoRead = UsbEvtIoRead;
ioQueueConfig.EvtIoStop = UsbEvtIoStop;

queueAttributes.SynchronizationScope = WdfSynchronizationScopeQueue;

status = WdfIoQueueCreate(device,
&ioQueueConfig,
&queueAttributes,
&pDevExt->reqQueueRead);

status = WdfDeviceConfigureRequestDispatching(device,
pDevExt->reqQueueRead,
WdfRequestTypeRead);

srinivaskumar.r@in.bosch.com wrote:

Thanks for the inputs. Now I have to figure out as how to implement manual queues for my scenario.
Is there any example on how to create the manual queues ?

Did you even look? Almost every KMDF sample creates manual queues.
Queues are absolutely fundamental to KMDF drivers.

But I would like to give one more information. My requirement is to support 4 device max and 3 client processes can access one/4 device in parallel ( rx, tx same time ). due to few performance requirements , i am evaluating the custom driver.

What do you mean by “4 device max”? Do you actually have 4 different
devices that get 4 different drivers, or is this one device that offers
4 difference services?

Right now I create 3 queues for ioctrl, read,write. below is snipped for read,
WDF_IO_QUEUE_CONFIG_INIT(&ioQueueConfig, WdfIoQueueDispatchSequential);

The only difference between this and a manual queue is that you specify
WdfIoQueueDispatchManual. Also, request are not automatically
dispatched; you have to fetch the next request manually.

Why do you use sequential here? With a sequential queue, only one read
request can run at a time.

Why have three separate queues? You can have one queue that has
callbacks for EvtIoRead, EvtIoWrite, and EvtIoDeviceControl.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.