[Req] Generate UUID V4

prashant · April 21, 2025, 3:57pm

Hellos,
Request to generate UUID 4 or GUID ?

Currently I believe javascript can do this , native always preferred?

Lord Chatgpt gave me this below

return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, c => {
  const r = Math.random() * 16 | 0;
  const v = c === 'x' ? r : (r & 0x3 | 0x8);
  return v.toString(16);
});

Admin · April 21, 2025, 5:08pm

What do you need UUIDs for?

There seem to be more than one type of UUID/GUID: NCS/DCE/Microsoft. Then various sub-types and versions.

Admin · April 21, 2025, 5:18pm

BTW I think the Javascript above is going to give you a different UUID every time you run it.

prashant · April 21, 2025, 5:33pm

Need to transfer files to object storage with obfuscation , what better than UUID ?

Yes!!!

Admin · April 21, 2025, 6:31pm

Do you want a row to have a different UUID every run?

Monotone94 · April 21, 2025, 11:32pm

If it is to be used as a unique reference, e.g. to anonymise data, is that not just a sufficiently large hash function of [some of] the data?

Or is it supposed to be reversible, weak encryption or just non-obvious to the casual observer, e.g. base64?

prashant · April 22, 2025, 1:22am

Yes, technically once in production file is generated once and passed on to other scripts. If you do consider this , it could be part of SHUFFLE feature to generate UUID also

UUID 4 atleast is complete random , not requiring any input from user side. Hence no reversibility

Monotone94 · April 22, 2025, 1:48am

Do it in a single transform by hashing a suitable column (or concatenation), or generate a random number and hash or base64 that. What is the length you desire?

prashant · April 22, 2025, 4:58am

As I understand UUID (4) is a standard , it’s always 32 digits with 8–4–4–4–12 . Hashing here is not required

Monotone94 · April 22, 2025, 5:15am

You asked for a unique (32 character) alphanumeric. EDT can do that directly. What is the additional structure you need?

GLS · April 22, 2025, 5:51am

Hello,
Could you please explain how can to do it directly with EDT (without JavaScript)?
I also need to anonymise some columns with random unique values.
Thank you in advance

Monotone94 · April 22, 2025, 6:19am

Hash and Chop

Unique ID.transform (2.4 KB)

SHA1 produces 40 characters so I have chopped it to 32 in case that is a criterion.

If you want to retain a separate lookup table of the original codes against the hashes then copy the original data, join, and delete not-required columns

Admin · April 22, 2025, 7:52am

A hash can be used to generate a long alphanumeric. It is useful for anonymising things. But the same input to the hash will always give the same output. Also, you can get collisions where 2 different values could conceivable return the same hash value. Although this is unlikely with a well designed hash.

UUIDs produce a large random number[1], based on no input. They are incredibly unlikely to produce the same UUID twice.

[1]Mostly. There are various different types of UUID.

Admin · April 22, 2025, 7:55am

I don’t think the third party library we use (Qt) has any way to change the UUID seed. If it did that would make it more likely to generate the same UUID twice, which would defeat the object of using UUIDs.

Admin · April 22, 2025, 8:01am

I also need to anonymise some columns with random unique values.

You might consider using Random to add a column of random values, Concat Cols with, say, their name to get values like:

98430238JohnSmith
32980712JaneSmith

Then hash those.

Note that it is possible that 2 different strings could result in the same hash (a ‘hash collision’). But this is very unlikely and you can check for it with a Verify transform.

Monotone94 · April 22, 2025, 8:18am

SHA1 collisions have a likelihood of about 1 in 2^80 (10^24). For 100,000 records that comes down to about a 3.4x10^-39 probability of collision.

If that is not sufficiently rare then use SHA256 or greater on padded data. The randomness of any arbitrary number of alphanumeric characters, whether or not case-sensitive, is not varied by calling it a UUID rather than a hash. Other than encoding bits, the example I gave represents UUID V5.

I noted earlier that hashing a key is for anonymisation, where you keep a separate secure file for lookups. Some Census bureaux do this for current anonymity yet for the benefit of historians after 99 or so years.

If non-repeatable keys are desired then hash a random number, as Andy and I both mentioned.

The 8-4-4-4-12 structure to which Prashant referred is the addition of separators (given Prashant did not express the need for codification bits), probably achievable with a bit of Regex. I have not looked more closely at that.

Admin · April 22, 2025, 8:25am

Just bear in mind that Random could generate the same value for 2 different rows. Also it is only pseudo random (based on a seed + an algorithm).

Monotone94 · April 22, 2025, 9:08am

I guess part of the problem here is ill definition. Is a random key to be generated from scratch or from existing unique data? How much data? What collision probability is acceptable? UUID itself has slightly less than the theoretically calculable options for its length because strictly it uses a nibble to encode its type.

Math.random() in JS has a cycle length of 2^128 apparently (2^30 for older versions) so the question is the seed, which comes back to my original request for easy variation of it or even to have it as metadata.

Speculation on a change:
The default seed EDT generates for a new file could be retained in Random while a separate seed is generated from a hash of the input data (extracting digits) and made available through metadata. The suggestion of a “Generate now” button in Random would also be retained.

Anonymous · April 22, 2025, 9:23am

Here it is with 8-4-4-4-12, I used HMAC-MD5 hash, as you can set whatever secret you like.

Transform file.
GenerateUUID4.transform (2.2 KB)

prashant · April 23, 2025, 12:13am

Thank you all for the feedback and showcasing how to do the same in EDT currently. My entire request was based on the idea of using NO seed .

We recently uploaded 12 year of ERP images & financial documents from our database to s3
Storage for Company 1

Images - 245,839
documents - 134,304

both of above have many many duplicates file names but kept in different locations in ERP , for object storage hence cannot use filename as seed

YES!!

Thank You as always