Currently, you can use the Hash function to pseudonymize data (replace values with a pseudonym). But:
- Hashes are long.
- There is a (very small) chance of a hash collision (where 2 different values have the same hash).
- It is possible to reverse the hash with a lookup table (only practical if the value hash is short).
You can also use Row Num or UUID transforms. But these have their own issues.
So we are working on a new Pseudonymize transform:
This allows you to create a lookup table from the pseudonym to the original value.
- The values are generally to be much shorter and more human friendly than hashes.
- There is no chance of a collision.
- You cannot reverse it without the lookup table.
You can control what the psuedonyms look like using Prefix and Start at.
The order in which indexes are assigned to values is pseudo-random, controlled by the Seed value.
We would be interested in feedback on this new transform from anyone who needs to pseudonymize data.