Object Id uniqueness

gtordai · February 23, 2019, 8:39pm

Hi,

Accordig to the specifications, file object ids generated by NTFS are only guaranteed to be unique on the volume. Is there any public information about how these uids are generated ? What is the chance of collision on different computers? ie is there a real chance that two files on two different computers will have the same object id ?

Thank you,
Gabor

Peter_Viscarola_OSR · February 23, 2019, 9:17pm

is there a real chance that two files on two different computers will have the same object id

How much of a chance does it take for you to consider it a “real chance”?

Why not just take something unique on the machine and/or volume and use that along WITH the File ID as the guaranteed unique value?

“How much of a chanceL isn’t engineering… it gambling.

Peter

gtordai · February 23, 2019, 9:47pm

And another question relating to the first one, is it possible to look at an object id, and know if it was generated by the ntfs (through CREATE_OR_GET_OBJECT_ID) or it was set by an app explicitly to a GUID produced by os new GUID call?

Peter_Viscarola_OSR · February 23, 2019, 11:55pm

gtordai · February 24, 2019, 10:25am

@“Peter_Viscarola_(OSR)” said:

is there a real chance that two files on two different computers will have the same object id

How much of a chance does it take for you to consider it a “real chance”?

Why not just take something unique on the machine and/or volume and use that along WITH the File ID as the guaranteed unique value?

“How much of a chanceL isn’t engineering… it gambling.

Peter

Thank you Peter for your answer. What you said is true, and the optimal solution would be to use object id together with a volume unique id. However the external system uses guids to identify files and adding another key would mean to change a lot of code.
I was thinking of all guid generation algorithms (not based on mac address) , as well as hashing algorithms, have a chance of collision and people still use them as unique keys because that chance is marginal.
We are also in complete control of the environments so if for example the underlying ntfs algorithm used volumenames as part of the process we could ensure uniqueness by using unique volume names etc.
However, based on your answer and the lack of documentation I think we will let go of the idea of using object ids.

gtordai · February 24, 2019, 5:52pm

After some experimenting, it looks like ntfs generates type 1 sequential guids with the mac address on the lower 6 bytes. Looking at ExUuidCreate, it have the same behaviour, so possibly ntfs uses ExUuidCreate to create its object ids.
In this case I dont see how it could not be unique even on the same computer with multiple volumes.

Peter_Viscarola_OSR · February 25, 2019, 5:58pm

Yes… OK… well…

Your post here got me thinking… so I decided to look into this a bit.

IIRC, the reason Object ID was introduced was to have a persistent and globally unique value that identifies the file – Not to be confused with the FILE ID, which is definitely not unique per system.

There’s a good post by Neal Christiansen (one of the MSFT file system architects and an all around smart guy) here.

There a rather alarming post here.

There’s another interesting post here by Sarosh Havewala (the MSFT file system filter lead) about why they mention file IDs and volumes.

So, in the end, as long as you’re using the Object ID and not the File ID you should be OK assuming it’s unique.

Peter

Craig_Barkhouse · June 24, 2019, 11:05pm

Just want to add that Object ID (like File ID) is only guaranteed to be unique on a given volume, not on a given machine. Correct that NTFS calls ExUuidCreate to generate an Object ID, but that’s an implementation detail that could change any time. Given that you can explicitly set a file’s Object ID (unlike File ID), it’s trivial for two files on two different volumes to have the same Object ID. It would be very common to have files on two different volumes have the same File ID considering the metadata starts off at a known state when formatting, and File IDs are handed out in a predictable way.

I will clarify even further, when I say unique on a given volume, I also mean at a point in time. Both Object IDs and File IDs can be reused over time for different files.

Peter_Viscarola_OSR · June 26, 2019, 8:33pm

Hi Craig,

Object ID (like File ID) is only guaranteed to be unique on a given volume, not on a given machine.
…
Both Object IDs and File IDs can be reused over time for different files.

Not to argue or split hairs, but I’m not sure how that squares with what Neal wrote in the thread to which I referred above (emphasis below mine):

The reason ObjectID’s were added to Windows 2000 was to support the notion of a unique ID that will never change for a given file no matter what volume/system it is moved to in the world. The existing fileID could not be used because it is related to the internal structure of the file system and changes as a file is copied.

What you write also means that ObjectIDs are no “better” at uniquely identifying a file than File IDs. And therefore… why have both?

Peter