can the ValidDataLength and FileSize be different for compressed files in NTFS?

OSR_Community_User · January 20, 2009, 3:12am

Hi,

I am hitting problem for some compressed files in a particular machine. The compressed file size is shown as 140 kb but the ValidDataLength fiels is only 64K. The file attribute record header shows that additional 29 clusters are allocated to the file other than the 19 sparse clusters. When I read the corresponding clusters from the disk, the clusters have junk data in it. But when NTFS reads the file, it shows them as 0’s as it ignores all the clusters outside the ValidDataLength as zeroed clusters. As far as I understand, the ValidDataLength field should be same as the FileSize for sparse/compressed files. Is it true? Can the ValidDataLength be less than the actual FileSize for compressed files? If so could you please help me in which senarios can this happen?

Thanks,
Manorama

Malcolm_Smith · February 5, 2009, 1:49am

ValidData and FileSize can be different for any reason at the
filesystem’s discretion.

Compressed files are not immune from this statement.

To demonstrate this, I wrote a simple app:

HANDLE hFile;

hFile = CreateFile( argv[1], GENERIC_ALL, 0, NULL, OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL, NULL );
SetFilePointer( hFile, 0x80000, NULL, FILE_BEGIN );
SetEndOfFile( hFile );

…which results in…

+0x010 AllocationSize : _LARGE_INTEGER 0x80000
+0x018 FileSize : _LARGE_INTEGER 0x80000
+0x020 ValidDataLength : _LARGE_INTEGER 0x2 (this is how many
bytes were in the file when I ran my test app.)

(and yes, this file is compressed.)

Generally though, I would strongly discourage anyone from trying to
interpret ValidDataLength. This is a filesystem optimization. It is
deliberately not exposed via any query interface, and was left in the
common fcb header effectively by mistake. There is no guarantee that
data <= valid data length has been written to disk. The in-memory value
here is still only part of the story (consider sparse/compressed ranges)
which determine the data that is valid.

M

xxxxx@gmail.com wrote:

Hi,

I am hitting problem for some compressed files in a particular machine. The compressed file size is shown as 140 kb but the ValidDataLength fiels is only 64K. The file attribute record header shows that additional 29 clusters are allocated to the file other than the 19 sparse clusters. When I read the corresponding clusters from the disk, the clusters have junk data in it. But when NTFS reads the file, it shows them as 0’s as it ignores all the clusters outside the ValidDataLength as zeroed clusters. As far as I understand, the ValidDataLength field should be same as the FileSize for sparse/compressed files. Is it true? Can the ValidDataLength be less than the actual FileSize for compressed files? If so could you please help me in which senarios can this happen?

Thanks,
Manorama

–
This posting is provided “AS IS” with no warranties, and confers no rights