Hey EDT team!
I just recently downloaded the platform have been playing around with it. I was wondering if you guys could give me a head start.
I’m Web scraping data for a project. On multiple different sites then aggregating them into a CSV.
On every site “generally at least” the product has been called something similar but not the same.
So it would look something like this
Site A Asus ROG Crosshair VIII Hero Site URLcolumns form factor ATX Socket AM4
Site B ASUS ROG X570 Crosshair VIII Hero Site URL form factor ATX Socket AM4
Site C Asus Crosshair Hero Site URL form factor ATX Socket AM4 Chipset
all the same Motherboard but different names on different sites. Is there a way to identify all the duplicating rows then merge these duplicates into one row? So it would look like this
Asus ROG Crosshair VIII Hero Site A URL Site B URL Site C URL form factor ATX Socket AM4
I have about 5000 thousand listings and have been doing them manually. Its obviously extremely time-consuming. Several people have recommended using panda, I’m not great with python. Many have also recommended EDT!
I know the general logic if Name Matches 65% of other Name & Socket & ATX are the same merge row. I just don’t know how to do it or where to start.