Improve flash storage device performance

Hi all,
I am developing a flash storage filter driver to improve our storage device transfering performance.
At first, I found a software tool named Txxx that can improve flash disk performance. So I install the software to see how it works.
Our product is for USB 3.x and I just try with USB 2.0 device instead.
After my tseting, I got following results:
In original mode: read speed is 22 MB/s
In turbo mode: read speed is 26 MB/s
There is about 18% improvement, it seems to meet my requirement.
Then, I use USB trace tool to get more information. They are as follows:
Txxx tool install a filter driver, and it adds the filter to 2 devices:
USB Disk device : adds a upper filter
USB Mass sorage device : adds a lower filter
In original mode : most TransferBufferLength of URBs are 0x1000 (pUrb->UrbBulkOrInterruptTransfer.TransferBufferLength)
In turbo mode : most TransferBufferLength of URBs are 0x10000
I guess this is the reason why it can improve storage performance. It increases the TransferBufferLength in a time, so it reduces the transferring time eventually.

By the way, there is something tricky:
In original mode : most Srb->DataTransferLength in SCSI request blocks are 0x10000
In turbo mode : most Srb->DataTransferLength in SCSI request blocks are 0x200
In other words, in turbo mode, the SRB DataTransferLength are less than the length in original mode. It makes me confusing.

My questions are as follows:

  1. Should I increase the TransferBufferLength of URB to improve the transfering performance?
  2. How to modify TransferBufferLength of URB? Should I collect some USB packets and then transfer them in a time?
  3. Should I make only one filter for USB storage device to achieve the purpose or should I make 2 filters for both USB storage device and disk device?
  4. Should I use upper or lower filter for mass storage device/disk device?
  5. Does Srb->DataTransferLength affect the transfering performance?

Does anyone know the answer of above questions? Any suggust is highly appreciated!

Thanks!
Best Regards,
Gordon

Srb->DataTransferLength isn’t the only key to impact performance. Did
you track the command depth?

Thanks
Wayne

Hi Wayne,
Many thanks for your response!

>Srb->DataTransferLength isn’t the only key to impact performance. Did you track the command depth?
I am not familiar with SRB, could you indicate which parameter or field I have to track?

By the way, I have some other questions as follows:

I had written a disk lower filter for my USB storage device.
I would like to compare the relationship between MaximumTransferLength in STORAGE_ADAPTER_DESCRIPTOR and disk transfering performance, so I need to overwrite MaximumTransferLength in my filter driver.
I add a ‘DeviceIOControl’ routine in filter to handle ‘IOCTL_STORAGE_QUERY_PROPERTY’ request. When QueryType in PSTORAGE_PROPERTY_QUERY equals PropertyStandardQuery(0x00), I set a completion routine and pass through the irp.
However, when it goes into my completion routine, the QueryType is changed form PropertyStandardQuery(0x00) to 0x20. Therefore, I can’t modify MaximumTransferLength in the completion routine.
My questions are as followings:

  1. How to overwrite MaximumTransferLength in filter driver?
  2. Why does the QueryType in Irp be changed when it enters completion routine?
    Any suggestion is highly appreciated.

The routines are as below:

NTSTATUS DDKDeviceIOControl(IN PDEVICE_OBJECT pDevObj, IN PIRP pIrp)
{
PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) pDevObj->DeviceExtension;
NTSTATUS status;
ULONG ioControlCode;
ULONG inlen;
ULONG outBufflen;
PVOID buffer;
PIO_STACK_LOCATION irpStack = IoGetCurrentIrpStackLocation(pIrp);

buffer = pIrp->AssociatedIrp.SystemBuffer;
inlen = irpStack->Parameters.DeviceIoControl.InputBufferLength;
outBufflen = irpStack->Parameters.DeviceIoControl.OutputBufferLength;
status = STATUS_NOT_SUPPORTED;//STATUS_INVALID_DEVICE_REQUEST;//
ioControlCode = irpStack->Parameters.DeviceIoControl.IoControlCode;

switch(ioControlCode)
{
case IOCTL_STORAGE_QUERY_PROPERTY: // 0x002d1400
{
PSTORAGE_PROPERTY_QUERY pQuery = (PSTORAGE_PROPERTY_QUERY)buffer;
PSTORAGE_DESCRIPTOR_HEADER pOutput = (PSTORAGE_DESCRIPTOR_HEADER)buffer;

status = STATUS_INVALID_PARAMETER;

if (pQuery->QueryType == PropertyStandardQuery)
{
switch (pQuery->PropertyId)
{
case StorageAdapterProperty:
{
if (outBufflen >= sizeof(STORAGE_ADAPTER_DESCRIPTOR))
{
PSTORAGE_ADAPTER_DESCRIPTOR pAdapDesc = (PSTORAGE_ADAPTER_DESCRIPTOR)buffer;

IoCopyCurrentIrpStackLocationToNext(pIrp);
IoSetCompletionRoutine(pIrp, (PIO_COMPLETION_ROUTINE) IOCTLCompletionRoutine,
(PVOID) pdx, TRUE, TRUE, TRUE);
return IoCallDriver(pdx->LowerDeviceObject, pIrp);
}
}
break;
} // ~ switch (pQuery->PropertyId)
} // ~ if (pQuery->QueryType == PropertyStandardQuery)
}
break;
}
IoSkipCurrentIrpStackLocation(pIrp);
status = IoCallDriver(pdx->LowerDeviceObject, pIrp);
return status;
}
#pragma LOCKEDCODE
NTSTATUS IOCTLCompletionRoutine(PDEVICE_OBJECT fido, PIRP Irp, PDEVICE_EXTENSION pdx)
{ // StartDeviceCompletionRoutine
IoAcquireRemoveLock(&pdx->RemoveLock,Irp);
NTSTATUS status;
ULONG ioControlCode;
ULONG inlen;
ULONG outBufflen;
PVOID buffer;
PIO_STACK_LOCATION irpStack = IoGetCurrentIrpStackLocation(Irp);
buffer = Irp->AssociatedIrp.SystemBuffer;
inlen = irpStack->Parameters.DeviceIoControl.InputBufferLength;
outBufflen = irpStack->Parameters.DeviceIoControl.OutputBufferLength;
status = STATUS_NOT_SUPPORTED;//STATUS_INVALID_DEVICE_REQUEST;//
ioControlCode = irpStack->Parameters.DeviceIoControl.IoControlCode;

switch(ioControlCode)
{
case IOCTL_STORAGE_QUERY_PROPERTY: // 0x002d1400
{
PSTORAGE_PROPERTY_QUERY pQuery = (PSTORAGE_PROPERTY_QUERY)buffer;
PSTORAGE_DESCRIPTOR_HEADER pOutput = (PSTORAGE_DESCRIPTOR_HEADER)buffer;

status = STATUS_INVALID_PARAMETER;
if (pQuery->QueryType == PropertyStandardQuery)
{
switch (pQuery->PropertyId)
{
case StorageAdapterProperty:
{
if (outBufflen >= sizeof(STORAGE_ADAPTER_DESCRIPTOR))
{

PSTORAGE_ADAPTER_DESCRIPTOR pAdapDesc = (PSTORAGE_ADAPTER_DESCRIPTOR)buffer;
pAdapDesc->MaximumTransferLength = 0x00100000;
pAdapDesc->MaximumPhysicalPages = 0x101;
pAdapDesc->AcceleratedTransfer = TRUE;
}
} //case
} //switch id
} // if
} //case
break;
} //switch ioctl
IoReleaseRemoveLock(&pdx->RemoveLock, Irp);
return STATUS_SUCCESS;
}

Thanks!
Best Regards,
Gordon

You can use performance monitor to trace I/O command depth of your adapter.

Thanks
Wayne