Anyone doing a lot of de-duplication of data?

Admin · February 18, 2022, 9:06am

We are looking into making some significant improvements to Dedupe and Unique transforms. Including the option of fuzzy matching. If you are interested in trying an early version and sending feedback, please email us (subject: “dedupe testing”) at the usual address:

patrick · March 2, 2022, 3:29am

We load a lot of employee data where the employer will often times have duplicate “unique” IDs. its in quotes, because often people are listed 2x times in the data for various reasons. we have to do a lot of sorting and the de-dooping based on other data points within the data. not sure if thats what you are looking for, but happy to help if we can and have the time…

Admin · March 2, 2022, 8:39am

@patrick I will let you know when we have something to try.

Admin · March 4, 2022, 10:25am

We are making some progress with a fuzzy option for Dedupe.

DanFeliciano · March 6, 2022, 2:03am

I can’t wait to test and use this new Fuzzy option.

Admin · March 7, 2022, 5:09pm

Fuzzy matching of duplicates and the ability to view duplicates are now available.

fuzzy-dedupe

See sections 2 and 3 here:

(you may need to refresh the page)

Because of the way Easy Data Transform works you can’t manually select duplicates, but you can set the ‘Dedupe’ mode to ‘Add duplicate information’, export the dataset and do it manually in an editor.

Please try it and let us know what you think.