storport timeout

I’m getting a freeze under high i/o load in my storport driver.

It freezes for 30 seconds then issues SRB_FUNCTION_RESET_LOGICAL_UNIT, which I assume is a result of a timeout, but by my count there are no requests outstanding.

After the reset (where no further action is taken other than to complete the reset srb, because there are no outstanding requests that I can find), the i/o continues for a bit until it happens again

Driver is a virtual storport.

How can I best debug this?

Am I right in assuming that the most likely explanation is that I’m ‘dropping’ an srb somewhere? I figure if there was another reason for an LU reset (eg an SRB was returned in error status), then there wouldn’t be the 30 second delay.

Thanks

James

>

It freezes for 30 seconds then issues SRB_FUNCTION_RESET_LOGICAL_UNIT,
which I assume is a result of a timeout, but by my count there are no
requests outstanding.

Turns out I was wrong about this. My count took place after queue processing where the stray request got processed, so there was indeed a ‘stuck’ SRB.

The cause was a missing KeMemoryBarrier() causing a race in shared ring processing. Hard to find, but easy to fix.

James

>Driver is a virtual storport.

What about the layer below? can it introduce freeze?

Am I right in assuming that the most likely explanation is that I’m ‘dropping’ an srb somewhere?

Yes. If, in storport, there is a way to complete the SRB as “please retry later”, and you abuse this path - then yes.

I figure if there was another reason for an LU reset (eg an SRB was returned in error status), then
there wouldn’t be the 30 second delay.

Yes, LU reset sequence can also introduce such a thing.

Do you see any SRBs related to LU reset in such a case?


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Seriously?

To be quite honest, I don’t even know where I’d drop a KeMemoryBarrier(). Sure, I know what it does, but I’ve never known a place where I should code one in a real implementation (other than on something like ARM, of course).

I bet I’m not the only one.

Mr. Harper, maybe you’d be willing to write something up for The NT Insider on the practical use of KeMemoryBarrier as a tutorial and warning to help the clueless such as myself regarding this topic??

Peter
OSR
@OSRDrivers

> To be quite honest, I don’t even know where I’d drop a KeMemoryBarrier(). Sure, I know what it

does, but I’ve never known a place where I should code one in a real implementation

Some lock-free data structure.

If you don’t use them, and only use spinlocks - you’re fine without this call.

More so, InterlockedXxx imply KeMemoryBarrier


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

>

Seriously?

To be quite honest, I don’t even know where I’d drop a KeMemoryBarrier().
Sure, I know what it does, but I’ve never known a place where I should code
one in a real implementation (other than on something like ARM, of course).

I bet I’m not the only one.

Mr. Harper, maybe you’d be willing to write something up for The NT Insider
on the practical use of KeMemoryBarrier as a tutorial and warning to help the
clueless such as myself regarding this topic??

KeMemoryBarrier() stops both the compiler and the CPU re-ordering your code (you probably knew that). My driver uses a shared memory ring between the fronend (windows) and the backend (xen dom0). It’s lock free, so to prevent a race, at the end of the ring processing loop we do:

  1. Check if backend has subsequently written any more entries to the ring, and if so, go around the loop again
  2. If they haven’t, set the notification variable so the backend will notify us when more data is written
  3. Check again if the backend has written any more while we were doing the above

So when I exit from the ISR/DPC, I know that the backend will send another event through when there is more data.

I was missing KeMemoryBarrier() between #2 and #3, so either the compiler or the cpu was optimising things for me and not actually checking the backend again.

AFAICS, the only practical use of KeMemoryBarrier() is for processing lock free data structures. I guess if I used InterlockedXxx() calls then they wouldn’t be needed, but I’m communicating with another VM which doesn’t use InterlockedXxx() calls, and I don’t think you can mix them. Also the ring processing code is mostly macros from a xen header file which include mb() and wmb() calls (which I macro’d to KeMemoryBarrier()) so I need to be consistent. I’m doing things a bit manually in this case to better optimise things - it’s a storport driver so I know exactly how many outstanding requests I have.

As for article writing - not something I’m really good at.

James

Thank you, Mr. Harper. Good, if narrow, use-case. It reassured me that there wasn’t something awful that I was missing :wink:

Said he… AFTER writing five clear, concise, paragraphs explaining the situation.

C’mon… an article in The NT Insider will make you rich and famous! An instant celebrity, not to mention indescribably appealing to members of the opposite sex. You’ll be greeted with flowers and sweets wherever you travel. Towns will be named in your honor. We’ve exceeded 60K subscribers world-wide, you know. :slight_smile:

Peter
OSR
@OSRDrivers

>not to mention indescribably appealing to members of the opposite sex

It worked out for me …

//Daniel