Lookup with 'contains'/REGEX

I have a column which contains tonnes of Raw Material items such as

7 Ply 4X4 Carton
36" Polybag
39" Polybag
Cow leather

I had like to categorise these items using a lookup table (excel/csv) , where I have can write down which has a structure. PS : This table is more than 250+ items

Contains | Category
Carton | Packaging
Polybag | Packaging
Cow | Raw Material
Sheet | Raw Material

Expected Output

7 Ply 4X4 Carton | Packaging
36" Polybag | Packaging
39" Polybag | Packaging
Cow leather | Raw Material

Just an idea … but the IF transform can be a pain depending on the number of categories. I’m sure there are better Javascript solutions for this step (but I’m not a programmer).

lookup contains.transform (4.3 KB)

image

1 Like

Hello @olaf,

Thanks for the reply. I apologise I wasn’t clear in my post, the lookup table is more than 250+ items

found another solution without programming or complex IF. But it use CROSS which might generate big data.

lookup contains 2.transform (3.9 KB)

1 Like

This is a good solution for small dataset.

My RM list and categorisation list combo generated a 27.1 million row. I need to play with this a little to figure solutions clearly

1 Like

Olaf’s Cross solution should work for smaller datasets.

I tried using fuzzy Lookup, but that isn’t very reliable. (see attached)

Another possible approach is Cluster followed by Lookup. (see attached)

cluster and fuzzy lookup.transform (4.7 KB)

But sometimes you just need a honky great If! You could use Easy Data Transform to generate the data to paste into the If transform.

I have made a note to look into the possibility of a contains option for the Lookup transform.

Cluster is good but it still requires me to define tonnes of options manually.

YES that would be helpful !!

?? What does this mean ??

Sometime you just need an If transform with 250 terms.

I thought there was an option to paste a table of data into into an If transform (as there is with Filter), but it looks like I got that wrong.

The extention of lookup would be the best.

In case of higher volume in source data which might be too large for Cross I would build currently a solution to split the source into multiple files, maybe using the row numbers and output into multiple files. Then process the smaller portions using the batch mode and in a last step combine the results again into one file. I think this a simple approach instead of maintaining a if with too many arguments.

But if the solution with Cluster works it is the best with current functionality given.

1 Like

Hi @prashant

Can you share your data? whole 11.7K rows and 2.3k rows?

Reason I asked the full set is to know any another cases like Cow leather

Here is my solution.

Transform file.
LookupwithContains.transform (5.2 KB)

The above solution with Join change the order of the records, but with lookup the records stays in the same place.

Transform file.
LookupwithContains2.transform (6.2 KB)

2 Likes

@Anonymous you are the Master. I did thought about Extract, too, but without knowledge of Regex, I had no clue.
But I can imagine it can get more complex if the terms in column 1 get longer and the key is just somewhere in the term.

1 Like

It is not an issue, the idea is that you extract the text out of the column that is needed for the lookup and then simply use it for the lookup, that is why I was asking for the whole data set, to see different patterns.

Hello @Anonymous ,

please pm me email address

Just sent you a private message.

1 Like

@Olaf

Here a more generic solution, since EDT only goes row by row and the only transform that takes a value and goes through a list of data and return value if it matches the condition is Lookup and since at the moment Lookup does not support contains search, we can make one our own.

Transform file.
LookupwithContains3.transform (5.7 KB)

3 Likes

This is a masterclass in using EDT in a very different way , I ran this with the 11,000 row set as well and EDT using JS didn’t crawl on my M1 which I anticipated

This is good enough to get started .

@Admin , maybe this is something you can add as native option in 1x or 2x . The use case we discussed beside matching category to items was also to match blacklisted_domains containing email / domain from massive list of emails .

I think this is something that can and should be handled by standard transforms.

This can be done without using Javascript?

Not very easily in v1. But hopefully you will be able to do it more simply with a Lookup ‘contains’ function in v2.

1 Like