How can I add a condition to my Split Col?

PhilipBallerscheff · October 21, 2024, 3:15pm

Hi everyone!

I am trying to reformat some data my colleague sent me. The chief problems are that in the Vorname column, there are sometimes names in this format: “Last Name, First Name”.

I have tried to fix this with a Split Col. For the most part, this has worked, which is good. However, whenever my colleague has correctly entered the data and has put first name and last name into the right columns, the split col messes it up.

See example:
Here, Nataliya Dasklova is entered correctly.

Shared with Zight

However, it is messed up when you look at the First Name and Last Name columns. Why is this?

Shared with Zight

To be clear, Vorname and Nachname are the original data columns and First Name and Last Name are the reformatted data columns.

Whenever names are already written correctly, what is in the Vorname field appears in the Last Name field and what is in the Nachname field does not appear at all. Explain the functions Easy Data Transform has which can help me solve this. It would be best if there were a way to make the Split Col conditional on something, but the If function does not seem to do that.

I do not see any option in the Split Col function itself to introduce this condition either. What can I do to introduce a condition?

I have looked at the topics of other users revolving around conditionals, but none of them have helped me so far.

Admin · October 21, 2024, 3:54pm

There are multiple ways you could do this. Here is one approach:

pb.transform (3.1 KB)

Another approach would be to Concat cols the two columns together with a comma delimiter, then Split col them.

pb2.transform (3.0 KB)

PhilipBallerscheff · October 21, 2024, 4:28pm

Hi Admin,
and thank you for your quick reply and your help. When I try to replicate your steps, it seems my IF function is messed up.

This is what I get: Easy Data Transform v1.47.1...

Second, I want to understand your method so I can better replicate it. I will explain it as follows to check my understanding:

First, we have the Clipboard, or in my case, the data sheet.

Second, we have the Split Col. This function splits up cells in the Vorname column into TWO columns if there is a comma in the cell. I believe I understand how this works and I believe I have replicated this in my workflow.

Here is a picture of that: Easy Data Transform v1.47.1...

Third, we have an IF function. I am not sure what it does.

Fourth, we have a second IF function. As I understand from your image, your IF function checks whether Nachname is empty. If Nachname is empty, then we put in the value of Nachname.

If Nachname is not empty, then we put in the value of Vorname2.

To repeat myself for the sake of thoroughness, when I try to replicate your steps, it seems my IF function only prints the last name of the person in question.

Fifth, we have Remove Cols. This serves to remove the superfluous rows.

I am glad to hear there is a way of solving my problem and I would like to understand it better. Could you help me understand what you did?

Admin · October 21, 2024, 4:50pm

So we split Vorname column into 2 new columns.

You need 2 ifs, one to populate the clean first name column and the other to populate the clean last name column.
The first If take the first name from the first split column (if it is not empty) or from the original first name column (if it is).

The second If take the last name from the second split column (if it is not empty) or from the original last name column (if it is).

The second approach above, without if, is a bit simpler.

Olaf · October 21, 2024, 8:30pm

Hi Philip,

I think you struggle with some basics. I took the first file from Admin and used the field Comment in all transactions to add some explanations. I hope they help to understand the logic.

In addition I think the first IF is not required and I replaced in the bottom part by a Rename.

pb_comments.transform (5.8 KB)

In case it is still unclear you can send me a personal message in German if it makes it easier to discuss for you.

By the way you screenshot of the IF statement was messed up, as you selected in the then part the Vorname and in the else part the Nachname. Where as the logic from Admin had one If for the Vorname and another If for the Nachname and selected in both IF which of the Vorname fields or Nachname field should be used. In EDT it often that you generate some more fields/columns to support activities which you remove later on. With the performance of EDT is doesn’t matter in most cases.

Anonymous · October 22, 2024, 5:55am

Here is my solution to it.

Transform File.
Fix_FirstLastname_Column.transform (2.4 KB)

PhilipBallerscheff · October 22, 2024, 8:28am

Hi Admin ,
and thank you for your response. I think I am beginning to understand what you mean. I have tried to replicate your approach. Unfortunately, I cannot upload an attachment. I will use pictures instead.

This is the first IF statement

Shared with Zight

This is the second IF statement

Shared with Zight

I now see the already correctly written names as they should be.

For example, I have “Markus” under Clean Vorname and I have “Fritsch” under Clean Nachname.

However, the names that were previously written in this format: Lastname, Firstname, like “Weißig, Christoph”, are now rendered as “Weißig” in Clean Vorname and “Christoph” in Clean Nachname.

The following are the problems with my data that I am trying to fix with EDT.

First I need to solve the problem of the already correctly ordered name, so that the correctly ordered name is recognized as being correctly ordered and also shows up in the right places. I think this has been accomplished, as shown by Markus Fritsch.

Second, I need to solve the problem of names being reversed. This has not been solved. This is why I have “Gajic” under Clean Vorname and have “Milka” under Clean Nachname. It should be “Milka” under Clean Vorname and “Gajic” under Clean Nachname.

In my previous attempt at fixing this problem, I was able to change this. When I copied your steps, this problem reappeared. I am not sure why.

Third, sometimes my colleagues put in the names of leads in Vorname and put in the names of the leads’ friends and family under Nachname.

For example, I now have “Jachmann, Birgit” under Clean Vorname and I have “Jachmann, Vanessa” under Clean Nachname.

A different example: The full name should be Horst-Dieter Ignaz Heeg and his friend Dieter Bodenheim.

I now have “Horst-Dieter Ignaz” in Vorname2. I have Heeg in Clean Vorname. I have “Dieter, Bodenheim” in Clean Nachname, I am not sure why.

Lastly, I would like to thank everyone who has commented so far, including @Olaf and @Anonymous.

I will continue to try your solutions and see if I can fix this. Thank you again for your help so far.

Admin · October 22, 2024, 8:45am

So you are saying names can be stored as any of the following ways in columns ‘First name’ and ‘Last name’ :

First name=first
Last name=last

First name=first,last
Last name=

First name=last,first
Last name=

First name=first1,last1
Last name=first2,last2

?

Easy Data Transform doesn’t know what is a first name and what is a last name. So this is a much harder problem to solve. Possibly you could add an extra column and manually add a “reversed” value for each row where the names are reversed, then do a different thing for those rows. Or maybe add a lookup with all common first names. But this is getting more complicated.

I think really you need to tell your colleagues not to give you such terrible data!

Olaf · October 22, 2024, 8:50am

had the same thought as Admin, when the das is mixed and not clearly structured, you have a big issue.

Just one hint, looking to your screen shots. You ca use the Whitespace transaction to remove leading or trailing blanks, etc. in case you got data like

first, last instead of
first,last

Admin · October 22, 2024, 9:02am

Yes, we think Easy Data Transform is excellent at converting data from one structure to another. But converting unstructured to structured is a much harder problem that requires a lot of domain knowledge.

PhilipBallerscheff · October 22, 2024, 9:23am

Hi Admin,
and again I thank you for your quick response. I really do appreciate your help.

Firstly, your description of the ways first names and last names are stored in my data is correct.

Sometimes names of the primary lead are written correctly, with the first name in Vorname and the last name in Nachname.

Sometimes they are written as “Last Name, First Name”.

The solution that I tried to make for this yesterday is as follows: If there is a comma in the Vorname field, split the text of the Vorname field into two separate columns.

If there is not a comma in the Vorname field (which would usually mean the name was entered into the right fields), just copy the text of the Vorname field into First Name and copy the text of the Nachname field into Last Name.

However, it appears that this is not really how IF statements work in EDT. I am explaining my attempt because there might be something in EDT that I can use to fix this at least partly.

Then there is the problem that my colleagues sometimes entered the name of a lead’s friend or family in the Nachname section.

The solution that I tried to make for this yesterday is as follows: If there is a comma in the Nachname field, split the text of the Nachname field into two separate columns.

If there is not a comma in the Nachname field (which would usually mean the name was entered into the right fields of Weitere Personen), just copy the text of the Vorname field into First Name and copy the text of the Nachname field into Last Name.

Is there some function in EDT that would allow me to make the Col Split conditional?

I have looked through the EDT documentation on the website, but I did not find a function for that.

Lastly, you are right about the data being inconsistent and the data being in your words terrible. If this is unstructured data, what domain knowledge would I need to fix it? For context, I previously tried to set up conditional formulas in Excel and Google Sheets to fix this, but this did not work. What tools would I need to do this?

I have replicated Olaf’s solution too. I will be replying to him in this forum in a separate post so this post is not too long.

Thank you for your help and your quick responses. It is very much appreciated. I will continue trying.

Admin · October 22, 2024, 9:56am

No. But you could use 2 Filter transforms to split the dataset into 2 datasets depending on whether there is a comma in the column. Then process those 2 datasets separately. Then Stack them afterwards. That seems more complicated than the examples given though.

Possibly you could input a table of the most common first and last names and use this with a Lookup to try to sort out the name order. It isn’t going to be 100% reliable, though.

PhilipBallerscheff · October 29, 2024, 8:43am

Ok, as an update and a resolution to this thread, I solved this problem by using OpenRefine to continue working on it.