StorPort and the 254 queue depth limit...

Does anyone out there know if StorPortSetDeviceQueueDepth really DOES
limit the queue depth to 254? I’ve yet to test this but that is a
very small number theses days. Can someone from Microsoft comment on
the limit? When will it change to something much larger like 2048 or
even larger? With MSI-X and very very fast storage devices the 254
limit might generate a fair amount of back pressure which will prevent
devices with a single LUN from performing at their peak.

TIA,
Robert.


Robert Randall | xxxxx@gmail.com

If your miniport sets it to larger number, it may break existing targets. Some old FC targets are known to silently drop excessive commands, which is not a good thing.

Also, storport.sys currently limits number of pending SRBs to 1000 per HBA instance anyway.

The miniport driver will ONLY be talking to MY target and I know the
capabilities of the target and logical unit with great precision ;-).
It can easily handle a queue depth several times the documented limit
of 254. I’m just seeking some confirmation that setting a depth
larger than 254 will actually make a difference.

Based on the lack of response from the list I can assume one of two things:

  1. my question is so banal and the answer so obvious that my question
    does not warrant a reply B-)…

  2. everyone assumes, as I do, that I should just test it and find out.
    Which I will of course do but, as usual, management would like and
    answer BEFORE I have enough code written to test and verify.
    Management is so demanding :wink:

After I test it I’ll reply with an answer to my own question to add
clarity to the discussion.

Cheers,
Robert.

On Tue, Mar 1, 2011 at 3:49 PM, wrote:
> If your miniport sets it to larger number, it may break existing targets. Some old FC targets are known to silently drop excessive commands, which is not a good thing.
>
> Also, storport.sys currently limits number of pending SRBs to 1000 per HBA instance anyway.
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>


Robert Randall | xxxxx@gmail.com

Why do you think that sending more than 254 commands to a LUN will improve a device throughput?

It might, if a LUN is composed from more than 8-16 real disk drives, and you issue random IO. But you have to test that scenario anyway to say definitely that it’s worth trouble.

Yes, the limitation of 254 queue depth per LUN and 1000 per adapter is there.

Quote:
I’m just seeking some confirmation that setting a depth
larger than 254 will actually make a difference.

Well, I already know it is worth the trouble. I can’t describe device
specifics because they are proprietary. However, I do KNOW that the
device has NO PROBLEM servicing a queue depth that is several times
the 254 limit.

So, my only option is using multiple LUNs to talk to the device to get
more IOs flowing to the device?

Any advice would be appreciated.

How about the Hyper-V architect that chimes in from time to time??
Any advice from you :wink:

TIA and Best Regards,
Robert.

On Wed, Mar 2, 2011 at 9:41 AM, wrote:
> Why do you think that sending more than 254 commands to a LUN will improve a device throughput?
>
> It might, if a LUN is composed from more than 8-16 real disk drives, and you issue random IO. But you have to test that scenario anyway to say definitely that it’s worth trouble.
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>


Robert Randall | xxxxx@gmail.com

“Well, I already know it is worth the trouble.”

OK, so, for example, what gain you get while running page-sized random I/O with 160 requests vs 240 requests? (assuming it’s a block storage device). If that’s not a mechanical device, I don’t see how you could gain much at all.

[quote]
OK, so, for example, what gain you get while running page-sized random I/O with
160 requests vs 240 requests? (assuming it’s a block storage device). If that’s
not a mechanical device, I don’t see how you could gain much at all.

[quote]

Well, given that the OP has already demonstrated he’s not dumb, I seriously doubt that he’ll describe the details of the performance of his proprietary device here in this forum.

It’s all about the workload, of course. But, one could imagine that latency – in terms of ISR/DPC time – could significantly affect throughput. So, if the transfer time of 160 requests is EFFECTIVELY the same as that of 240 requests which is EFFECTIVELY the same as 500 requests and the device is kept very busy, the major impact to throughput would be the latencies of (a) completing the requests, and (b) keeping the device fed. If you can do a good job of coalescing interrupts, you might be able to lower that latency significant.

I’m not saying this is what the OP’s hardware does, or what the OP is thinking. I’m just saying that there are conceivable scenarios and workloads where having a mere 250 I/Os in progress simultaneously can be a limiting factor.

Peter
OSR

>

“Well, I already know it is worth the trouble.”

OK, so, for example, what gain you get while running page-sized random
I/O
with 160 requests vs 240 requests? (assuming it’s a block storage
device). If
that’s not a mechanical device, I don’t see how you could gain much at
all.

If the link to the storage devices was high latency then a deeper queue
would be more beneficial in terms of keeping the device busy (and
allowing it to intelligently reorder and prioritise requests etc).

James

What do you want me to say? I don’t control storport. I do send them
feedback pretty regularly.

Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


“Robert Randall” wrote in message news:xxxxx@ntdev…

Well, I already know it is worth the trouble. I can’t describe device
specifics because they are proprietary. However, I do KNOW that the
device has NO PROBLEM servicing a queue depth that is several times
the 254 limit.

So, my only option is using multiple LUNs to talk to the device to get
more IOs flowing to the device?

Any advice would be appreciated.

How about the Hyper-V architect that chimes in from time to time??
Any advice from you :wink:

TIA and Best Regards,
Robert.

On Wed, Mar 2, 2011 at 9:41 AM, wrote:
> Why do you think that sending more than 254 commands to a LUN will improve
> a device throughput?
>
> It might, if a LUN is composed from more than 8-16 real disk drives, and
> you issue random IO. But you have to test that scenario anyway to say
> definitely that it’s worth trouble.
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


Robert Randall | xxxxx@gmail.com

True. But you could get a confirmation from the maintainers as to
whether or not the 254 limit is actually enforced by the class
driver…

Regards,
Robert.

On Thu, Mar 10, 2011 at 12:15 PM, Jake Oshins
wrote:
> What do you want me to say? ?I don’t control storport. ?I do send them
> feedback pretty regularly.
>
>
>
> Jake Oshins
> Hyper-V I/O Architect
> Windows Kernel Group
>
> This post implies no warranties and confers no rights.
>
> --------------------------------------------------------------
> “Robert Randall” ?wrote in message news:xxxxx@ntdev…
>
> Well, I already know it is worth the trouble. ?I can’t describe device
> specifics because they are proprietary. ?However, I do KNOW that the
> device has NO PROBLEM servicing a queue depth that is several times
> the 254 limit.
>
> So, my only option is using multiple LUNs to talk to the device to get
> more IOs flowing to the device?
>
> Any advice would be appreciated.
>
> How about the Hyper-V architect that chimes in from time to time??
> Any advice from you :wink:
>
> TIA and Best Regards,
> Robert.
>
> On Wed, Mar 2, 2011 at 9:41 AM, ? wrote:
>>
>> Why do you think that sending more than 254 commands to a LUN will improve
>> a device throughput?
>>
>> It might, if a LUN is composed from more than 8-16 real disk drives, and
>> you issue random IO. But you have to test that scenario anyway to say
>> definitely that it’s worth trouble.
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>
>
>
> –
> Robert Randall | xxxxx@gmail.com
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


Robert Randall | xxxxx@gmail.com

I could research every question that passes this list. I could read the
code. I know where it is. I’ve read it before.

But the honest truth is that to keep my time spent here reasonable, I only
answer questions that I (believe I) know the answer to already. If you want
research, you’re more than welcome to open a support case with PSS developer
support. That’s their job.

Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


“Robert Randall” wrote in message news:xxxxx@ntdev…

True. But you could get a confirmation from the maintainers as to
whether or not the 254 limit is actually enforced by the class
driver…

Regards,
Robert.

On Thu, Mar 10, 2011 at 12:15 PM, Jake Oshins
wrote:
> What do you want me to say? I don’t control storport. I do send them
> feedback pretty regularly.
>
>
>
> Jake Oshins
> Hyper-V I/O Architect
> Windows Kernel Group
>
> This post implies no warranties and confers no rights.
>
> --------------------------------------------------------------
> “Robert Randall” wrote in message news:xxxxx@ntdev…
>
> Well, I already know it is worth the trouble. I can’t describe device
> specifics because they are proprietary. However, I do KNOW that the
> device has NO PROBLEM servicing a queue depth that is several times
> the 254 limit.
>
> So, my only option is using multiple LUNs to talk to the device to get
> more IOs flowing to the device?
>
> Any advice would be appreciated.
>
> How about the Hyper-V architect that chimes in from time to time??
> Any advice from you :wink:
>
> TIA and Best Regards,
> Robert.
>
> On Wed, Mar 2, 2011 at 9:41 AM, wrote:
>>
>> Why do you think that sending more than 254 commands to a LUN will
>> improve
>> a device throughput?
>>
>> It might, if a LUN is composed from more than 8-16 real disk drives, and
>> you issue random IO. But you have to test that scenario anyway to say
>> definitely that it’s worth trouble.
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>
>
>
> –
> Robert Randall | xxxxx@gmail.com
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


Robert Randall | xxxxx@gmail.com

classpnp and disk.sys sources are included with WDK. You can see that they don’t care about queue depth. The queue is completely inside storport.sys.

You can write your own port. This is not trivial and MS may not sign your driver. But then all the queue depth will be under your control.

Understood. Will do…

On Fri, Mar 11, 2011 at 10:41 AM, Jake Oshins
wrote:
> I could research every question that passes this list. ?I could read the
> code. ?I know where it is. ?I’ve read it before.
>
> But the honest truth is that to keep my time spent here reasonable, I only
> answer questions that I (believe I) know the answer to already. ?If you want
> research, you’re more than welcome to open a support case with PSS developer
> support. ?That’s their job.
>
> Jake Oshins
> Hyper-V I/O Architect
> Windows Kernel Group
>
> This post implies no warranties and confers no rights.
>
> --------------------------------------------------------------
> “Robert Randall” ?wrote in message news:xxxxx@ntdev…
>
> True. ?But you could get a confirmation from the maintainers as to
> whether or not the 254 limit is actually enforced by the class
> driver…
>
> Regards,
> Robert.
>
> On Thu, Mar 10, 2011 at 12:15 PM, Jake Oshins
> wrote:
>>
>> What do you want me to say? ?I don’t control storport. ?I do send them
>> feedback pretty regularly.
>>
>>
>>
>> Jake Oshins
>> Hyper-V I/O Architect
>> Windows Kernel Group
>>
>> This post implies no warranties and confers no rights.
>>
>> --------------------------------------------------------------
>> “Robert Randall” ?wrote in message news:xxxxx@ntdev…
>>
>> Well, I already know it is worth the trouble. ?I can’t describe device
>> specifics because they are proprietary. ?However, I do KNOW that the
>> device has NO PROBLEM servicing a queue depth that is several times
>> the 254 limit.
>>
>> So, my only option is using multiple LUNs to talk to the device to get
>> more IOs flowing to the device?
>>
>> Any advice would be appreciated.
>>
>> How about the Hyper-V architect that chimes in from time to time??
>> Any advice from you :wink:
>>
>> TIA and Best Regards,
>> Robert.
>>
>> On Wed, Mar 2, 2011 at 9:41 AM, ? wrote:
>>>
>>> Why do you think that sending more than 254 commands to a LUN will
>>> improve
>>> a device throughput?
>>>
>>> It might, if a LUN is composed from more than 8-16 real disk drives, and
>>> you issue random IO. But you have to test that scenario anyway to say
>>> definitely that it’s worth trouble.
>>>
>>> —
>>> NTDEV is sponsored by OSR
>>>
>>> For our schedule of WDF, WDM, debugging and other seminars visit:
>>> http://www.osr.com/seminars
>>>
>>> To unsubscribe, visit the List Server section of OSR Online at
>>> http://www.osronline.com/page.cfm?name=ListServer
>>>
>>
>>
>>
>> –
>> Robert Randall | xxxxx@gmail.com
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>
>
>
> –
> Robert Randall | xxxxx@gmail.com
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


Robert Randall | xxxxx@gmail.com

Yes, I’m trying to avoid that messy path…

On Fri, Mar 11, 2011 at 1:29 PM, wrote:
> classpnp and disk.sys sources are included with WDK. You can see that they don’t care about queue depth. The queue is completely inside storport.sys.
>
> You can write your own port. This is ?not trivial and MS may not sign your driver. But then all the queue depth will be under your control.
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>


Robert Randall | xxxxx@gmail.com