Storport and concurrent SRB's for same LBA

SUMMARY

In a Storport virtual miniport is it possible to have multiple concurrent SCSI SRB’s for the same LBA, perhaps via overlapping ranges? In particular is it possible to have multiple concurrent SCSI WRITE’s for the same LBA?

DETAILS

I have a Storport virtual miniport and I am trying to fully understand the Storport concurrency model so that I can correctly implement synchronization. My understanding is that in a virtual Storport no locks are taken for StartIo so any required synchronization has to be done in the miniport.

In my miniport I handle synchronization as it relates to my driver’s data structures. However a more subtle question is whether I need to handle synchronization for LBA ranges. Suppose that my miniport can handle 2 or more concurrent SRB’s for unrelated ranges without any synchronization (e.g. an SRB for LBA 42 and an SRB for LBA 1042). Suppose further that my miniport cannot handle 2 or more concurrent SRB’s for overlapping ranges without synchronization (e.g. an SRB for range 0-1 and an SRB for range 1-2).

My questions:

  • Does the system provide any guarantee (either explicit or implicit) that the scenario of multiple concurrent SRB’s for overlapping ranges cannot happen?
  • If it does not what is the expected behavior? (Whole range in SRB gets written atomically, single blocks in SRB get written atomically, anything goes?)

My expectation is that it would be counter-productive for file systems (or other system components) to issue such requests, but I do not know.

The system does not provide synchronization of the SRB. There is not any implication that your driver should synchronize ranges, if that is needed the upper levels should deal with it.

Don Burn
Windows Driver Consulting
Website: http://www.windrvr.com

Don, thanks for your answer.

That would be my expectation as well, thanks for confirming it.

What Don said. Multiple concurrent operations to overlapping LBAs are a programming error, because they yield undefined results. Shit, we don’t really know the order that operations are completed in once we give them to the controller… right?

As I was typing that, I recalled a dim memory of some discussion with members of the file system team about never reordering across flush operations.

But I think the point is not to worry about it, cuz It so not your problem in your StorPort driver. Requests come in, you serve them up to your adapter, where they get competed… and the world turns.

Peter

Peter, thanks. What you said makes perfect sense.

Regarding the reordering part of your message: My miniport completes SRB’s by posting them into a queue where they get retrieved by a user mode process. They are then processed in user mode and completed in one of the process threads.

So it is actually conceivable that if a WRITE arrives and then a SYNCHRONIZE CACHE arrives before the WRITE completes, they might get reordered. (E.g. the WRITE gets retrieved by Thread A and then the SYNCHRONIZE CACHE gets retrieved by Thread B, and for whatever reason Thread B gets to do its processing before Thread A.)

I would expect that this should not happen because the file system should wait for all WRITE’s to complete before issuing any FLUSH, but again I do not know if it does. Any idea on this?

EDIT: Apologies if I am being too paranoid with all this.

Requests need to be completed in the order they arrive, or as allowed by SRB.QueueAction. Otherwise data corruption and other problems will occur if you reorder writes or even reads. For instance, some people write code that sends a string of read commands and only wait on the last one assuming the previous ones are also done and they won’t be if you reordered them. The dangers of reordering writes (unless allowed by QueueAction) should be obvious.

Requests need to be completed in the order they arrive

That is architecturally incorrect in terms of Windows.

some people write code that sends a string of read commands and only wait on the last one assuming the previous ones are also done

This is a programming logic error. There is absolutely, positively no guarantee of the order of such operations on Windows.

Peter

As Peter notes this is architecturally incorrect for Windows, and for that matter most operating systems. There has been hardware out there for a long time that ignores that concept (I first programmed one 25 years ago). This is never something you should assume.

Don Burn
Windows Driver Consulting
Website: http://www.windrvr.com

I would expect that this should not happen because the file system should wait for all WRITE’s to complete before issuing any FLUSH, but again I do not know if it does. Any idea on this?

FastFat appears to (indirectly) use the FUA bit on SCSI WRITE’s and the issue of ordering with SYNCHRONIZE CACHE does not exit.

For example, for IRP_MJ_FLUSH_BUFFERS and flushing a file:

  • FatCommonFlushBuffers calls FatFlushFile for the file (and possibly its ancestors).
  • FatFlushFile calls CcFlushCache, which is a blocking call.
  • CcFlushCache issues recursive IRP_MJ_WRITE’s with (IRP_PAGING_IO | IRP_NOCACHE).
  • FatCommonWrite detects that this is a flush operation and sets IRP_CONTEXT_FLAG_WRITE_THROUGH, which instructs the lower layers to use write-through I/O.
  • The lower layers set SL_WRITE_THROUGH on all WRITE IRP’s.
  • Classpnp checks whether SL_WRITE_THROUGH is set and sets the FUA bit accordingly in SetupReadWriteTransferPacket.

There are a couple of cases where FastFat explicitly flushes the underlying device (see FatHijackIrpAndFlushDevice), but these do not appear to have ordering issues with WRITE’s either.

My conclusion from this investigation and prior comments from Don and Peter is that I do not need to care about such issues in my miniport.

Over the years a lot of smart people have had their drivers break because they didn’t follow clearly documented rules. Instead they knew so much about the kernel that they just thought it through and made their own set of fragile assumptions with no promises backing them. These smart folks rubbed elbows with Microsoft developers fueling their confidence meanwhile dangerously ignoring all specifications. They then tried their theories in code and alas, it worked on their system and their theory was now “validated”. But then something changed and their code blew up meanwhile other peoples drivers did not.

This is hacking, not good engineering. If you are shipping code with methodologies like “i know the kernel well enough to say this will work” or “i tried it and it worked” or do things like toss SRB_ORDERED_QUEUE_TAG_REQUEST in the garbage because you “know better” then kindly let us know the list of driver names so we never install those on our systems. Always follow the complete set of specifications rather than cut corners or try to outsmart them. Otherwise your code will break. It’s not if, but when.

@Rourke

My understanding is that the SCSI interface is used in 2 different ways in the Windows OS:

  • As a communications mechanism with a SCSI device.
  • As an internal communications mechanism between OS components.

What is being discussed here is the second case.

For this case I believe there is a lot of evidence that order of operations does not matter. As the best such evidence I present Microsoft’s own WDKStorPortVirtualMiniport sample. This sample is Copyright (c) Microsoft Corporation and presumably how Microsoft intends for us to write such miniports. The sample can be downloaded from https://code.msdn.microsoft.com/windowshardware/WDKStorPortVirtualMiniport-973650f6

Looking at the source code file scsi.c, line 641, we find function ScsiReadWriteSetup. This function handles READ and WRITE SRB’s by queueing work items using function IoQueueWorkItem. The actual I/O is performed in function MpWkRtn, which runs in a system thread.

It appears then that virtual miniports are allowed to reorder operations. I also note that the sample never touches the QueueTag or QueueAction SRB fields.

Mr. Rourke… I agree wholeheartedly with your sentiments. In fact, you’re repeating almost exactly what I’ve written here (and in The NT Insider, and in numerous official Microsoft white papers) many times. I couldn’t agree more.

You seem to think that Mr. Burn and I are making statements about implementation, when in fact, we are not. I didn’t say “this is how the implementation works” or “from my reading of the Windows source code, you should do this” – I said “this is how the Windows architecture is defined.” There’s a pretty big difference.

With all due respect, you seem to misundersatnd the use of SRB_ORDERD_QUEUE_TAG_REQUEST. My statement about Windows architecture is an invariant dating from well before there WAS an SRB_ORDERED_QUEUE_TAG_REQUEST. It has always been true. And, as Mr. Zissimopoulos reminded us… how about all those drivers that don’t even LOOK at QueueTagAction? Or what are the constraints when the SRB_FLAGS_QUEUE_ACTION_ENABLE bit is not set?

There is absolutely, positively, nothing – in the documentation or anywhere else – that allows an application to issue a series of reads and wait for the one it sent “last” – Given the way file systems and multi-threading work in Windows, there’s no way this can be guaranteed to work. Think for a minute, about this: Where is the order defined? In other words, if I have a process that sends a series of reads from a single thread… at what point is the order established? If the process is multi-threaded, where is the order established? Who is responsible for establishing the “sequence number” of the write… as it were? And at what layer? At the file system? The cache manager? The volume manager? The disk class driver? At the Adapter driver, perhaps?

And how are I/Os from different threads and different processes ordered? If process A sends a read for LBN X, and process B sends a write for that same LBN X… what dictates which of these is “first”? Wall clock time on one processor or another?

Suppose a user does an asynchronous read from logical offset 1,000,000 in a file, and follows this with an async read from offset 0 in the same thread. Do you think there’s an implication that one of these will be completed before the other? Because the file system imposes no such ordering itself.

Regardless of what we do at the adapter, as Mr. Burn I believe noted, there’s no guarantee of write ordering WITHIN THE DEVICE itself.

You’re approach isn’t harmful… it’s just needlessly constraining.

Peter

hi
anyone can reupload the sample from https://code.msdn.microsoft.com/windowshardware/WDKStorPortVirtualMiniport-973650f6 ? nothing found…

You are posting to a thread that is more than two years old.

As you undoubtedly know from the forum guidelines this is not allowed.

If you have a question, start a new thread.

This thread is locked.

Peter