Sure. But my part of this was a wayback when discussion, so unless you have a wayback machine I can use to send 1999 me a note*, it’s all academic. I’m sorry that I didn’t figure out a way to give you a warning.
I would be interested in knowing what error your disks are giving you back when you read past the end of them. It should be something that doesn’t get retried or require any sort of on-disk delay, but they could be doing something really silly and reporting a recoverable error. That would trigger retries.
-p
(* to be clear, 1999 is a guess. Don’t target your wayback machine there without chatting with me first. Also we should include some lottery numbers in the note. 2015 me just bought a new house)
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Jan Bottorff
Sent: Tuesday, January 13, 2015 2:13 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Short reads on FSDs and disks
I do understand that backward compatibility is really important, but so is the CORRECT operation of systems. Would writing a warning into the system event log if the partition table was inconsistent with the physical disk be viewed as breaking backward compatibility?
Jan
On 1/13/15, 7:19 PM, “Peter Wieland” wrote:
>IIRC I tried introducing that check at one point and it broke some set
>of systems which were already in market. I can’t remember the details
>though.
>
>-p
>
>
>-----Original Message-----
>From: xxxxx@lists.osr.com
>[mailto:xxxxx@lists.osr.com] On Behalf Of Jan Bottorff
>Sent: Monday, January 12, 2015 8:33 PM
>To: Windows System Software Devs Interest List
>Subject: Re: [ntdev] Short reads on FSDs and disks
>
>I don’t know the exact answer, but do know something “not quite right”
>happens if the partition table does not match the reality of the disk,
>like the size of a partition is larger than the actual disk. The user
>perceived behavior is the system gets “stuck” for 20-60 seconds
>occasionally. Disk I/O errors are also written to the system even log.
>I assume this is the file system trying to read/write blocks that don’t
>exist. I believe I saw this on Win 7 and Server 2012 R2.
>
>You would think some layer in the storage stack checks the requests
>against the actual disk sizes, but my experience seemed to say this was
>not the case. Perhaps like when a disk comes online and the partition
>table is first read, it would be nice if it logged some system event
>saying “Umm, you know, your partition tables seem to not be correct,
>did you incorrectly clone a disk?”. Or perhaps when a file system is
>mounted would be the appropriate time to validate things.
>
>I saw this happening at one company, on MANY systems, and believe they
>were installing OS images by making block level images of a physical
>source disk, and applying those images to a target disk that was
>smaller in capacity. I wrote a little powershell script that queried
>the physical disk size, and then compared it to the partition offsets
>and sizes, with WMI calls. The machines that reliably got the strange
>stall and system event log messages were exactly the systems that had
>partitions larger than the actual disk.
>
>I personally view this as an OS bug (and a bug in whatever process was
>being used to image systems), although perhaps Microsoft views it’s
>important to maintain backward bug compatibility, and there is some
>reason having partition tables that run off the end of a disk is
>appropriate (so is a feature not a bug). I initially thought the event
>log errors meant a failing disk, but then after deeper investigation,
>the reality was rather uglier.
>
>Jan
>
>
>
>On 1/13/15, 2:11 AM, “xxxxx@osr.com” wrote:
>
>>(thanks to Max for the most interesting question in weeks)
>>
>>You’d think all of us big experts would know the answer to this simple
>>questions right off the top of our heads, wouldn’t you. LOL…
>>
>>Suppose you try to read beyond the disk’s capacity? I mean, who checks
>>that? Does the request get to the controller.
>>
>>I just don’t remember. Back in the day, I seem to remember that disk
>>or partition checked to see if you attempted to read past the end of the
>>current partition. But that code has definitely changed since the last
>>time I paid any attention to it.
>>
>>Peter
>>OSR
>>@OSRDrivers
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer
—
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer