Short reads on FSDs and disks

On Linux, the read() call from the on-disk file, as also from the DASD disk, can sometimes return with the amount of data lesser then requested.

Is it ever possible on Windows (on FSDs, EOF condition aside, and DASD disks)?

I have not ever seen such a thing in Windows for years, neither from user nor from kernel mode.

Have anybody seen this on FSD-driven disk files or the DASD disks?

Surely (Nt)ReadFile can return shorter data on pipe/FIFO-style things, TCP sockets, custom drivers of any kind… but what about FSDs and DASD disks?

Also, if this is not possible, is there any official documentation? The ReadFile MSDN page does not state this explicitly, it lists the cases when the short read can occur instead.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

It is possible but unlikely. The disk firmware could complete the read short without any error, but an error is much, much more likely.

Sent from my Windows Phone


From: Maxim S. Shatskihmailto:xxxxx
Sent: ?1/?11/?2015 9:57 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: [ntdev] Short reads on FSDs and disks

On Linux, the read() call from the on-disk file, as also from the DASD disk, can sometimes return with the amount of data lesser then requested.

Is it ever possible on Windows (on FSDs, EOF condition aside, and DASD disks)?

I have not ever seen such a thing in Windows for years, neither from user nor from kernel mode.

Have anybody seen this on FSD-driven disk files or the DASD disks?

Surely (Nt)ReadFile can return shorter data on pipe/FIFO-style things, TCP sockets, custom drivers of any kind… but what about FSDs and DASD disks?

Also, if this is not possible, is there any official documentation? The ReadFile MSDN page does not state this explicitly, it lists the cases when the short read can occur instead.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx>

So, the Windows kernel software will not do this on its own, unless the disk FW will do so?
“Peter Wieland” wrote in message news:xxxxx@ntdev…
It is possible but unlikely. The disk firmware could complete the read short without any error, but an error is much, much more likely.

Sent from my Windows Phone

------------------------------------------------------------------------------
From: Maxim S. Shatskih
Sent: ?1/?11/?2015 9:57 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Short reads on FSDs and disks

On Linux, the read() call from the on-disk file, as also from the DASD disk, can sometimes return with the amount of data lesser then requested.

Is it ever possible on Windows (on FSDs, EOF condition aside, and DASD disks)?

I have not ever seen such a thing in Windows for years, neither from user nor from kernel mode.

Have anybody seen this on FSD-driven disk files or the DASD disks?

Surely (Nt)ReadFile can return shorter data on pipe/FIFO-style things, TCP sockets, custom drivers of any kind… but what about FSDs and DASD disks?

Also, if this is not possible, is there any official documentation? The ReadFile MSDN page does not state this explicitly, it lists the cases when the short read can occur instead.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

If you include all the layers I?m less positive.

I suspect the FSD layer will. If you try to read 1MB out of a 1KB file, you?ll only get back 1KB. I don?t recall if you get an error in that case, but I don?t think so. The RAW file system might similarly protect you for reads past the end of a partition, or the end of the disk.

I don?t know about the partition or disk layer ? I think that it will fail a read past the end of a partition, but I don?t recall for sure.

Once it gets below those drivers it?s up to the disk to report an error if the LBA being requested is bad.

-p

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Maxim S. Shatskih
Sent: Monday, January 12, 2015 9:12 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Short reads on FSDs and disks

So, the Windows kernel software will not do this on its own, unless the disk FW will do so?
“Peter Wieland” > wrote in message news:xxxxx@ntdev…
It is possible but unlikely. The disk firmware could complete the read short without any error, but an error is much, much more likely.

Sent from my Windows Phone
________________________________
From: Maxim S. Shatskihmailto:xxxxx
Sent: ?1/?11/?2015 9:57 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: [ntdev] Short reads on FSDs and disks
On Linux, the read() call from the on-disk file, as also from the DASD disk, can sometimes return with the amount of data lesser then requested.

Is it ever possible on Windows (on FSDs, EOF condition aside, and DASD disks)?

I have not ever seen such a thing in Windows for years, neither from user nor from kernel mode.

Have anybody seen this on FSD-driven disk files or the DASD disks?

Surely (Nt)ReadFile can return shorter data on pipe/FIFO-style things, TCP sockets, custom drivers of any kind… but what about FSDs and DASD disks?

Also, if this is not possible, is there any official documentation? The ReadFile MSDN page does not state this explicitly, it lists the cases when the short read can occur instead.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.commailto:xxxxx
http://www.storagecraft.com


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer</mailto:xxxxx></mailto:xxxxx></mailto:xxxxx>

I don’t think disk+classpnp is able to handle partial completion (underrun) of DASD I/O.

For large operations split to multiple XRBs, the code in DDK doesn’t check if each of those partial transfers completed fully. If an XRB in the middle had an underrun, it will create a hole in transferred data, and the total transfer length will then not include some of modified buffer…

(thanks to Max for the most interesting question in weeks)

You’d think all of us big experts would know the answer to this simple questions right off the top of our heads, wouldn’t you. LOL…

Suppose you try to read beyond the disk’s capacity? I mean, who checks that? Does the request get to the controller.

I just don’t remember. Back in the day, I *seem* to remember that disk or partition checked to see if you attempted to read past the end of the current partition. But that code has definitely changed since the last time I paid any attention to it.

Peter
OSR
@OSRDrivers

I don’t know the exact answer, but do know something “not quite right”
happens if the partition table does not match the reality of the disk,
like the size of a partition is larger than the actual disk. The user
perceived behavior is the system gets “stuck” for 20-60 seconds
occasionally. Disk I/O errors are also written to the system even log. I
assume this is the file system trying to read/write blocks that don’t
exist. I believe I saw this on Win 7 and Server 2012 R2.

You would think some layer in the storage stack checks the requests
against the actual disk sizes, but my experience seemed to say this was
not the case. Perhaps like when a disk comes online and the partition
table is first read, it would be nice if it logged some system event
saying “Umm, you know, your partition tables seem to not be correct, did
you incorrectly clone a disk?”. Or perhaps when a file system is mounted
would be the appropriate time to validate things.

I saw this happening at one company, on MANY systems, and believe they
were installing OS images by making block level images of a physical
source disk, and applying those images to a target disk that was smaller
in capacity. I wrote a little powershell script that queried the physical
disk size, and then compared it to the partition offsets and sizes, with
WMI calls. The machines that reliably got the strange stall and system
event log messages were exactly the systems that had partitions larger
than the actual disk.

I personally view this as an OS bug (and a bug in whatever process was
being used to image systems), although perhaps Microsoft views it’s
important to maintain backward bug compatibility, and there is some reason
having partition tables that run off the end of a disk is appropriate (so
is a feature not a bug). I initially thought the event log errors meant a
failing disk, but then after deeper investigation, the reality was rather
uglier.

Jan

On 1/13/15, 2:11 AM, “xxxxx@osr.com” wrote:

>(thanks to Max for the most interesting question in weeks)
>
>You’d think all of us big experts would know the answer to this simple
>questions right off the top of our heads, wouldn’t you. LOL…
>
>Suppose you try to read beyond the disk’s capacity? I mean, who checks
>that? Does the request get to the controller.
>
>I just don’t remember. Back in the day, I seem to remember that disk
>or partition checked to see if you attempted to read past the end of the
>current partition. But that code has definitely changed since the last
>time I paid any attention to it.
>
>Peter
>OSR
>@OSRDrivers

>I suspect the FSD layer will. If you try to read 1MB out of a 1KB file, you?ll only get back 1KB.

Surely so, but I was speaking not about EOFs. With EOFs, everything is obvious.

I don?t recall if you get an error in that case, but I don?t think so.

You will not. Just a short read.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

> You’d think all of us big experts

No, I think MS’s people who know the real truth are here and can respond.

would know the answer to this simple questions

Oh yes. At work, I have a case where my code ported to Linux had a bug, definitely due to a short read (my logic was considering this to be EOF or such).

And, the Windows version of this code, which is like 10 years old now, never ever had such a bug.

Probably it still can have such a bug, just me (and our QA/Support, and the customers) was lucky enough to not experience it?

Suppose you try to read beyond the disk’s capacity?

No, I don’t mean any EOF conditions which are more or less obvious.

I mean - reading in the middle of a large file, by far below EOF, can - on Linux - suddenly do a short read.

More so, Linux web resources say that yes, Linux does short reads, and you must be prepared.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

> perceived behavior is the system gets “stuck” for 20-60 seconds

occasionally

Yes, I also saw this.

For me, such a condition is just plain a ruined disk, which must be fixed.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

WHAT? That strikes me as strange.

So, if you’re sequentially reading through a file and you are returned less data than you asked for, but you don’t get an “end of file” error, you just keep reading until you get zero bytes and an end of file error??

That seems… ah… unusual. But if those are the rules, I guess it’s fine…

Peter
OSR
@OSRDrivers

> So, if you’re sequentially reading through a file and you are returned less data than you asked for,

but you don’t get an “end of file” error, you just keep reading until you get zero bytes and an end of
file error??

Yes. On POSIX, yes.

At there are web resources where the developers are warned on this.

My Linux bug was: I have some “chunk headers” inside the file.

If EOF hits in the middle of the Nth chunk header, thus making the header truncated - then the file is corrupt.

And my code was just reading ChunkHeaderSize and failing on a short read after, reporting the corrupt file.

This seems (I’m now not sure even about this!) to be correct on Windows.

But, on Linux, the OS can return a short read on my chunk header read, and then the valid file is considered to be broken. More so, this occurs only sometimes :slight_smile:

All of this is related to Linux signals in some way. A signal can cause a short read.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

IIRC I tried introducing that check at one point and it broke some set of systems which were already in market. I can’t remember the details though.

-p

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Jan Bottorff
Sent: Monday, January 12, 2015 8:33 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Short reads on FSDs and disks

I don’t know the exact answer, but do know something “not quite right”
happens if the partition table does not match the reality of the disk, like the size of a partition is larger than the actual disk. The user perceived behavior is the system gets “stuck” for 20-60 seconds occasionally. Disk I/O errors are also written to the system even log. I assume this is the file system trying to read/write blocks that don’t exist. I believe I saw this on Win 7 and Server 2012 R2.

You would think some layer in the storage stack checks the requests against the actual disk sizes, but my experience seemed to say this was not the case. Perhaps like when a disk comes online and the partition table is first read, it would be nice if it logged some system event saying “Umm, you know, your partition tables seem to not be correct, did you incorrectly clone a disk?”. Or perhaps when a file system is mounted would be the appropriate time to validate things.

I saw this happening at one company, on MANY systems, and believe they were installing OS images by making block level images of a physical source disk, and applying those images to a target disk that was smaller in capacity. I wrote a little powershell script that queried the physical disk size, and then compared it to the partition offsets and sizes, with WMI calls. The machines that reliably got the strange stall and system event log messages were exactly the systems that had partitions larger than the actual disk.

I personally view this as an OS bug (and a bug in whatever process was being used to image systems), although perhaps Microsoft views it’s important to maintain backward bug compatibility, and there is some reason having partition tables that run off the end of a disk is appropriate (so is a feature not a bug). I initially thought the event log errors meant a failing disk, but then after deeper investigation, the reality was rather uglier.

Jan

On 1/13/15, 2:11 AM, “xxxxx@osr.com” wrote:

>(thanks to Max for the most interesting question in weeks)
>
>You’d think all of us big experts would know the answer to this simple
>questions right off the top of our heads, wouldn’t you. LOL…
>
>Suppose you try to read beyond the disk’s capacity? I mean, who checks
>that? Does the request get to the controller.
>
>I just don’t remember. Back in the day, I seem to remember that disk
>or partition checked to see if you attempted to read past the end of the
>current partition. But that code has definitely changed since the last
>time I paid any attention to it.
>
>Peter
>OSR
>@OSRDrivers


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

On 13-Jan-2015 21:19, Peter Wieland wrote:

IIRC I tried introducing that check at one point and it broke some set of systems which were already in market. I can’t remember the details though.

Was this related to “magic sectors” beyond the end of physical disk?
IIRC disk drivers deliberately allow addressing past the disk size
because some devices used to have proprietary commands via such “magic
sectors”.

– pa

Linux block device layers are sophisticated. Windows not so much. Windows
does no real optimization in the drivers. It assumes device will handle any
optimizations (like CSCAN). Linux block layers implement merging, sorting,
and coalescing of IO in the device queue. These sorts of optimizations can
result in short reads.

It is also a good idea to be able to handle short reads in Windows code
even if you have never seen it. It makes your code more portable and the
APIs do account for short reads.

On Tue, Jan 13, 2015 at 3:39 PM, Pavel A. wrote:

> On 13-Jan-2015 21:19, Peter Wieland wrote:
>
>> IIRC I tried introducing that check at one point and it broke some set of
>> systems which were already in market. I can’t remember the details though.
>>
>
> Was this related to “magic sectors” beyond the end of physical disk? IIRC
> disk drivers deliberately allow addressing past the disk size because some
> devices used to have proprietary commands via such “magic sectors”.
>
> – pa
>
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


Jamey Kirby
Disrupting the establishment since 1964

This is a personal email account and as such, emails are not subject to
archiving. Nothing else really matters.

I do understand that backward compatibility is really important, but so is
the CORRECT operation of systems. Would writing a warning into the system
event log if the partition table was inconsistent with the physical disk
be viewed as breaking backward compatibility?

Jan

On 1/13/15, 7:19 PM, “Peter Wieland” wrote:

>IIRC I tried introducing that check at one point and it broke some set of
>systems which were already in market. I can’t remember the details
>though.
>
>-p
>
>
>-----Original Message-----
>From: xxxxx@lists.osr.com
>[mailto:xxxxx@lists.osr.com] On Behalf Of Jan Bottorff
>Sent: Monday, January 12, 2015 8:33 PM
>To: Windows System Software Devs Interest List
>Subject: Re: [ntdev] Short reads on FSDs and disks
>
>I don’t know the exact answer, but do know something “not quite right”
>happens if the partition table does not match the reality of the disk,
>like the size of a partition is larger than the actual disk. The user
>perceived behavior is the system gets “stuck” for 20-60 seconds
>occasionally. Disk I/O errors are also written to the system even log. I
>assume this is the file system trying to read/write blocks that don’t
>exist. I believe I saw this on Win 7 and Server 2012 R2.
>
>You would think some layer in the storage stack checks the requests
>against the actual disk sizes, but my experience seemed to say this was
>not the case. Perhaps like when a disk comes online and the partition
>table is first read, it would be nice if it logged some system event
>saying “Umm, you know, your partition tables seem to not be correct, did
>you incorrectly clone a disk?”. Or perhaps when a file system is mounted
>would be the appropriate time to validate things.
>
>I saw this happening at one company, on MANY systems, and believe they
>were installing OS images by making block level images of a physical
>source disk, and applying those images to a target disk that was smaller
>in capacity. I wrote a little powershell script that queried the physical
>disk size, and then compared it to the partition offsets and sizes, with
>WMI calls. The machines that reliably got the strange stall and system
>event log messages were exactly the systems that had partitions larger
>than the actual disk.
>
>I personally view this as an OS bug (and a bug in whatever process was
>being used to image systems), although perhaps Microsoft views it’s
>important to maintain backward bug compatibility, and there is some
>reason having partition tables that run off the end of a disk is
>appropriate (so is a feature not a bug). I initially thought the event
>log errors meant a failing disk, but then after deeper investigation, the
>reality was rather uglier.
>
>Jan
>
>
>
>On 1/13/15, 2:11 AM, “xxxxx@osr.com” wrote:
>
>>(thanks to Max for the most interesting question in weeks)
>>
>>You’d think all of us big experts would know the answer to this simple
>>questions right off the top of our heads, wouldn’t you. LOL…
>>
>>Suppose you try to read beyond the disk’s capacity? I mean, who checks
>>that? Does the request get to the controller.
>>
>>I just don’t remember. Back in the day, I seem to remember that disk
>>or partition checked to see if you attempted to read past the end of the
>>current partition. But that code has definitely changed since the last
>>time I paid any attention to it.
>>
>>Peter
>>OSR
>>@OSRDrivers
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer

[Oh, I feel it coming]

That should have read “sophisticated”… with the quotes. I would have said “outdated and meddlesome” myself.

And I’m not trying to be cute, or take a random dump on Linux here.

But we’ve worked very closely with a *lot* of storage vendors and they all, unanimously, want the OS to do as little pre-write “optimization” as possible.

The type of I/O Scheduling that I understand Linux does is based upon some really ancient assumptions. *I* did that sort of coalescing, next sector first, elevator service, nearest sector first with a a fairness count… heck, back in the days of the PDP-11. I believe that, and ST-506 disks on PCs, was the last time this type of optimization made real sense.

These days, there’s darn little that you can count on in terms of disk layout. It’s better to just jam as many requests down to the disks control logic as possible (hundreds of simultaneous operations is great) and let the disk figure out what’s best for it based on what it knows about the media.

If you haven’t read it (it’s several years old) and didn’t see it in our pre-Christmas Tweet, anyone interested in this topic should check out the paper entitled “Why Disks Are Like Snowflakes”(http://www.pdl.cmu.edu/PDL-FTP/Storage/CMU-PDL-11-102.pdf).

I’d be curious if anybody knows why this type of optimization remains in Linux. I know they’re not reticent to change stuff that’s outdated, and that probably means they think this type of optimization is “worth it”… But I’d like to hear what the current argument is.

Peter
OSR
@OSRDrivers

Yes Peter, sophisticated was probably the wrong word to use; more like
complex and burdensome. I was being nice :slight_smile:

On Tue, Jan 13, 2015 at 5:48 PM, wrote:

> [Oh, I feel it coming]
>
>


>
> That should have read “sophisticated”… with the quotes. I would have
> said “outdated and meddlesome” myself.
>
> And I’m not trying to be cute, or take a random dump on Linux here.
>
> But we’ve worked very closely with a lot of storage vendors and they
> all, unanimously, want the OS to do as little pre-write “optimization” as
> possible.
>
> The type of I/O Scheduling that I understand Linux does is based upon some
> really ancient assumptions. I did that sort of coalescing, next sector
> first, elevator service, nearest sector first with a a fairness count…
> heck, back in the days of the PDP-11. I believe that, and ST-506 disks on
> PCs, was the last time this type of optimization made real sense.
>
> These days, there’s darn little that you can count on in terms of disk
> layout. It’s better to just jam as many requests down to the disks control
> logic as possible (hundreds of simultaneous operations is great) and let
> the disk figure out what’s best for it based on what it knows about the
> media.
>
> If you haven’t read it (it’s several years old) and didn’t see it in our
> pre-Christmas Tweet, anyone interested in this topic should check out the
> paper entitled “Why Disks Are Like Snowflakes”(
> http://www.pdl.cmu.edu/PDL-FTP/Storage/CMU-PDL-11-102.pdf).
>
> I’d be curious if anybody knows why this type of optimization remains in
> Linux. I know they’re not reticent to change stuff that’s outdated, and
> that probably means they think this type of optimization is “worth it”…
> But I’d like to hear what the current argument is.
>
> Peter
> OSR
> @OSRDrivers
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


Jamey Kirby
Disrupting the establishment since 1964

This is a personal email account and as such, emails are not subject to
archiving. Nothing else really matters.

Sure. But my part of this was a wayback when discussion, so unless you have a wayback machine I can use to send 1999 me a note*, it’s all academic. I’m sorry that I didn’t figure out a way to give you a warning.

I would be interested in knowing what error your disks are giving you back when you read past the end of them. It should be something that doesn’t get retried or require any sort of on-disk delay, but they could be doing something really silly and reporting a recoverable error. That would trigger retries.

-p

(* to be clear, 1999 is a guess. Don’t target your wayback machine there without chatting with me first. Also we should include some lottery numbers in the note. 2015 me just bought a new house)

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Jan Bottorff
Sent: Tuesday, January 13, 2015 2:13 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Short reads on FSDs and disks

I do understand that backward compatibility is really important, but so is the CORRECT operation of systems. Would writing a warning into the system event log if the partition table was inconsistent with the physical disk be viewed as breaking backward compatibility?

Jan

On 1/13/15, 7:19 PM, “Peter Wieland” wrote:

>IIRC I tried introducing that check at one point and it broke some set
>of systems which were already in market. I can’t remember the details
>though.
>
>-p
>
>
>-----Original Message-----
>From: xxxxx@lists.osr.com
>[mailto:xxxxx@lists.osr.com] On Behalf Of Jan Bottorff
>Sent: Monday, January 12, 2015 8:33 PM
>To: Windows System Software Devs Interest List
>Subject: Re: [ntdev] Short reads on FSDs and disks
>
>I don’t know the exact answer, but do know something “not quite right”
>happens if the partition table does not match the reality of the disk,
>like the size of a partition is larger than the actual disk. The user
>perceived behavior is the system gets “stuck” for 20-60 seconds
>occasionally. Disk I/O errors are also written to the system even log.
>I assume this is the file system trying to read/write blocks that don’t
>exist. I believe I saw this on Win 7 and Server 2012 R2.
>
>You would think some layer in the storage stack checks the requests
>against the actual disk sizes, but my experience seemed to say this was
>not the case. Perhaps like when a disk comes online and the partition
>table is first read, it would be nice if it logged some system event
>saying “Umm, you know, your partition tables seem to not be correct,
>did you incorrectly clone a disk?”. Or perhaps when a file system is
>mounted would be the appropriate time to validate things.
>
>I saw this happening at one company, on MANY systems, and believe they
>were installing OS images by making block level images of a physical
>source disk, and applying those images to a target disk that was
>smaller in capacity. I wrote a little powershell script that queried
>the physical disk size, and then compared it to the partition offsets
>and sizes, with WMI calls. The machines that reliably got the strange
>stall and system event log messages were exactly the systems that had
>partitions larger than the actual disk.
>
>I personally view this as an OS bug (and a bug in whatever process was
>being used to image systems), although perhaps Microsoft views it’s
>important to maintain backward bug compatibility, and there is some
>reason having partition tables that run off the end of a disk is
>appropriate (so is a feature not a bug). I initially thought the event
>log errors meant a failing disk, but then after deeper investigation,
>the reality was rather uglier.
>
>Jan
>
>
>
>On 1/13/15, 2:11 AM, “xxxxx@osr.com” wrote:
>
>>(thanks to Max for the most interesting question in weeks)
>>
>>You’d think all of us big experts would know the answer to this simple
>>questions right off the top of our heads, wouldn’t you. LOL…
>>
>>Suppose you try to read beyond the disk’s capacity? I mean, who checks
>>that? Does the request get to the controller.
>>
>>I just don’t remember. Back in the day, I seem to remember that disk
>>or partition checked to see if you attempted to read past the end of the
>>current partition. But that code has definitely changed since the last
>>time I paid any attention to it.
>>
>>Peter
>>OSR
>>@OSRDrivers
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer