Splitting names into honorific, first name, last name etc

Splitting names using the current Split Col transform is tricky, due to the complexities of people’s names.

Each part of the name can have multiple tokens. Some parts may be missing. The tokens can be in different cases, with or without dots. The parts can be separated by 1 or more space or comma characters.

So we are experimenting with a new Split Name transform that tries to handle all this. It splits name parts into a separate column for each, using various heuristics (rules of thumb):

It is clever enough to know about common honorifics (Mr, Mrs, Frau, Captain, Sir etc), suffixes (Jr, PhD etc) and particules (Da, de, Von etc).

It expects each value to be a single name in the order: honorific(s) → first name(s) → middle name(s) → last name(s) → suffix(es), separated by whitespace or commas. Some of the name parts can be missing, but they must be in this order.

It cannot guarantee to be 100% accurate. For example it is impossible to know for sure if the ‘Lee’ In ‘John Lee Hooker’ is a first name, middle name or last name. In such situations it will make a guess.

If values can contain multiple names (e.g. ‘Mr John Smith & Mrs Jane Smith’) you will need to split the column using Split Col, before you use Split Name.

It would be useful to get some feedback on this new transform. You can download a snapshot release with the new transform here:

Windows installer: https://www.easydatatransform.com/downloads/EasyDataTransform_2_6_1_snapshot.exe
Windows zip: https://www.easydatatransform.com/downloads/EasyDataTransform_Windows_2_6_1_snapshot.zip
Mac DMG: https://www.easydatatransform.com/downloads/EasyDataTransform_2_6_1_snapshot.dmg

How well does it perform on your data?

Did we miss any common honorifics or suffixes?

so fa I have no use for this as my data typically don’t include Names and if so one of the first steps would be to anonymise it.

I see some challenges, e.g. I personally have 3 first names one is the Olaf I’m called and there are 2 what you would name Middle Names. I think it is difficult to get them clearly identified.
In Germany most last names with double names are separated with a “-” as you Jones-Lee example. If I look to Spain, everybody has a double last name (first part from father, second part from mother). As far as I know they are just separated by a blank.

I think there a lot of different possibilities and rules. I don’t know what can be covered and what would outcome incorrect and leads into discussion.

Trying to work out what is a first, middle and last name is indeed tricky, especially across many different languages and cultures. But, even if we guess wrong from in some cases, I think it will be an improvement on Split Col or doing it all manually.

Here are extra honorifics and suffixes

Class Honorific
General Mr, Mrs, Miss, Ms, Mx, Master, Madam, Madame
Academic Dr, Prof, Professor, Dean, Chancellor, Rector, Fellow
Professional Rev (Reverend), Pastor, Rabbi, Father, Sister, Monsignor, Canon, Elder, Chaplain
Nobility/Royalty Sir, Dame, Lord, Lady, Baron, Baroness, Count, Countess, Duke, Duchess, Marquess, Marchioness, Earl
Judicial Judge, Justice, The Honourable, The Right Honourable, His/Her Honour, The Learned Judge
Military Admiral, General, Colonel, Major, Captain, Commander, Lieutenant, Brigadier, Air Marshal, Rear Admiral
Diplomatic Ambassador, Ambassador-at-Large, Consul, Consul General, Envoy Extraordinary, His/Her Excellency
Other Titles Alderman, Cllr (Councillor), Senator, President, Governor, Headmaster, Headmistress, Warden, Provost
Class Suffix
Academic Degrees PhD, MD, EdD, JD, DDS, DMD, MBA, MSc, MA, BA, BSc, LLB, LLM
Generational Jr., Sr., II, III, IV
Professional Certifications CPA, PMP, PE, RN, CISA, CISSP, CFA, Esq. (Esquire), CMA, PA-C, L.Ac., DABFM, APRN
Honorary/Chivalric Bt (Baronet), KBE (Knight Commander of the Order of the British Empire), OBE, CBE, MBE
Medical/Nursing RN, APRN, DNP, PA-C, NP

There could be others, but I think above is quite enough.

Here I tried on the following random names with Honorifics and Suffixes

Names Honorific First Middle Last Suffix Missed
Mr. John Paul Alexander Smith Jr. Mr. John Paul Alexander Smith Jr.
Mrs. Ana María Isabel García López Mrs. Ana María Isabel García López
Dr. Wei Ming Jian Chen PhD Dr. Wei Ming Jian Chen PhD
Ms. Fatima Zahra El Hassan Ms. Fatima Zahra El Hassan
Prof. David Lee Andrew Kim EdD Prof. David Lee Andrew Kim EdD Academic Degrees Suffix
Miss Chloe Grace Marie Duval Miss Chloe Grace Marie Duval
Sir James Arthur Henry St. John Bt Sir James Arthur Henry St. John Bt
Madam Nguyen Thi Thu Trang Madam Nguyen Thi Thu Trang
Rev. Samuel Kwame Boateng Mensah Rev. Samuel Kwame Boateng Mensah
Rabbi Eliyahu Moshe Ben-David Rabbi Eliyahu Moshe Ben-David
Father Michael Joseph O’Connor Father Michael Joseph O’Connor Professional - Honorific
Lady Helena Elizabeth Windsor Lady Helena Elizabeth Windsor
Lord Charles Edward Mountbatten Lord Charles Edward Mountbatten
Baroness Ingrid Astrid Johanna Svensson Baroness Ingrid Astrid Johanna Svensson
Baron Jean-Pierre Luc Moreau Baron Jean-Pierre Luc Moreau
Countess Sofia Elisabetta Lucia Rossi Countess Sofia Elisabetta Lucia Rossi
Judge Priya Lakshmi Menon Nair Judge Priya Lakshmi Menon Nair
Justice Ahmed Hassan Abdel Rahman El-Sayed Justice Ahmed Hassan Abdel Rahman El-Sayed
The Honourable Maryam Fatemeh Rezaei Azadi The Honourable Maryam Fatemeh Rezaei Azadi
General Rajiv Kumar Prasad Sharma General Rajiv Kumar Prasad Sharma
Admiral Lars Erik Johansen Berg Admiral Lars Erik Johansen Berg
Colonel Pedro Henrique da Silva Souza Colonel Pedro Henrique da Silva Souza
Major Tariq Jamil Ahmed Chaudhry Major Tariq Jamil Ahmed Chaudhry
Captain Giovanni Luca Matteo De Santis Captain Giovanni Luca Matteo De Santis
Commander Rania Mohamed Ali El-Gendy Commander Rania Mohamed Ali El-Gendy
Lt. Colonel Dmitry Sergeyevich Petrov Volkov Lt. Colonel Dmitry Sergeyevich Petrov Volkov Military - Honorific
Ambassador Maria Teresa De La Cruz Ambassador Maria Teresa De La Cruz
Consul General Carlos Eduardo José Santos Silva Consul General Carlos Eduardo José Santos Silva
His Excellency Yusuf Ibrahim Abdullahi Abubakar His Excellency Yusuf Ibrahim Abdullahi Abubakar
Her Excellency Aisha Bint Ali Al-Maktoum Her Excellency Aisha Bint Ali Al-Maktoum
Dean Elena Nikolaevna Sergeyevna Volkova Dean Elena Nikolaevna Sergeyevna Volkova
Chancellor Hans Jürgen Peter Schmidt Chancellor Hans Jürgen Peter Schmidt
President Anna-Maria Elisabeth Schneider President Anna-Maria Elisabeth Schneider
Senator Jean-Louis Pierre Dubois Senator Jean-Louis Pierre Dubois
Governor Sofia Maria Antonia Costa Governor Sofia Maria Antonia Costa Other Titles - Honorific
Headmaster Samuel Oluwafemi Adeyemi Headmaster Samuel Oluwafemi Adeyemi Other Titles - Honorific
Headmistress Clara Beatriz Silva Pereira Headmistress Clara Beatriz Silva Pereira Other Titles - Honorific
Warden Adeleke Oluwaseun Ayodele Balogun Warden Adeleke Oluwaseun Ayodele Balogun
Provost Leandro Rafael dos Santos Oliveira Provost Leandro Rafael dos Santos Oliveira
Dr. Chinonso Chinedu Obi Okafor MD Dr. Chinonso Chinedu Obi Okafor MD
Dr. Miguel Ángel José López García DDS Dr. Miguel Ángel José López García DDS Academic Degrees - Suffix
Dr. Haruto Satoshi Kenji Nakamura DMD Dr. Haruto Satoshi Kenji Nakamura DMD Academic Degrees - Suffix
Dr. Priyanka Rani Sharma Patel PharmD Dr. Priyanka Rani Sharma Patel PharmD
Dr. Sanjay Kumar Rajesh Gupta JD Dr. Sanjay Kumar Rajesh Gupta JD Academic Degrees - Suffix
Ms. Lina María Delgado Gómez MBA Ms. Lina María Delgado Gómez MBA
Mr. Paolo Andrea Giovanni Bianchi MSc Mr. Paolo Andrea Giovanni Bianchi MSc
Ms. Mei Ling Hua Zhang MA Ms. Mei Ling Hua Zhang MA
Mrs. Nandini Priya Lakshmi Reddy BA Mrs. Nandini Priya Lakshmi Reddy BA
Mr. Willem Hendrik Jan van Dijk BSc Mr. Willem Hendrik Jan van Dijk BSc
Mrs. Gabriela Lucia Elena Morales LLB Mrs. Gabriela Lucia Elena Morales LLB Academic Degrees - Suffix
Mr. Bashir Ahmed Suleiman Bello LLM Mr. Bashir Ahmed Suleiman Bello LLM Academic Degrees - Suffix
Dr. Lucas Gabriel Fernandez Torres DNP Dr. Lucas Gabriel Fernandez Torres DNP Medical/Nursing - Suffix
Dr. Ahmed Mostafa Abdel Aziz El-Masry DABFM Dr. Ahmed Mostafa Abdel Aziz El-Masry DABFM Professional Certifications - Suffix
Ms. Siti Nur Aisyah Binti Ahmad RN Ms. Siti Nur Aisyah Binti Ahmad RN
Mrs. Lillian Akosua Mensah Addo APRN Mrs. Lillian Akosua Mensah Addo APRN Professional Certifications - Suffix
Dr. Omar Farouk Abdulrahman Al-Khalifa PA-C Dr. Omar Farouk Abdulrahman Al-Khalifa PA-C Professional Certifications - Suffix
Mr. Roberto Carlos Da Silva PE Mr. Roberto Carlos Da Silva PE Professional Certifications - Suffix
Ms. Farida Zahra El-Badawi CPA Ms. Farida Zahra El-Badawi CPA Professional Certifications - Suffix
Mr. Ryoichi Takashi Masaru Sakamoto CFA Mr. Ryoichi Takashi Masaru Sakamoto CFA Professional Certifications - Suffix
Ms. Kim Eun Ji Soo PMP Ms. Kim Eun Ji Soo PMP Professional Certifications - Suffix
Mr. Kwame Nkrumah Mensah Boateng CISA Mr. Kwame Nkrumah Mensah Boateng CISA Professional Certifications - Suffix
Ms. Zainab Fatima Yusuf Bello CISSP Ms. Zainab Fatima Yusuf Bello CISSP Professional Certifications - Suffix
Mrs. Eva Katarzyna Anna Nowak CMA Mrs. Eva Katarzyna Anna Nowak CMA Professional Certifications - Suffix
Mr. Jean-Claude François Morel Esq. Mr. Jean-Claude François Morel Esq.
Dr. Chinedu Emeka Nwachukwu Okeke L.Ac. Dr. Chinedu Emeka Nwachukwu Okeke L.Ac. Professional Certifications - Suffix
Dr. Yasmin Leila Nasrallah Saad DABFM Dr. Yasmin Leila Nasrallah Saad DABFM Professional Certifications - Suffix
Miss Grace Abigail Hope Thompson Miss Grace Abigail Hope Thompson First

As you pointed out that it is hard to decide which compound name goes with what, does it go with First or Middle? similarly does the compound name goes with Middle or Last and what if you have compound name in all three First, Middle and Last.

Ahem, I’d want my BEng (Hons) :man_student:

Is that normally written with brackets around the ‘Hons’?

@Anonymous
Thanks for the list. I had about 90% of them. I have added some of the missing ones.

Names like ‘Grace’ and ‘Dean’ can be either a first name or an honorific, which is tricky.

If we can split 90%+ of the names correctly, we are probably doing quite well.

Yes: Honours Degree:
When a candidate is awarded a degree with honours, “(Hons)” may be suffixed to their designatory letters – e.g. BA (Hons), BSc (Hons), BMus (Hons), MA (Hons).[15] An MA (Hons) would generally indicate a degree award from certain Scottish universities (cf. Scottish MA) and is at the same level as a bachelor’s degree."

Maybe there should be an option add your own terms somehow.
Because there will come the need to localise these terms.
For example Mr. is Hr. in danish.

1 Like

The above two I would assume Spanish and in that case the last names are:
Santos Silva and
López García

That is what I wrote in my paragraph at the end, it is hard to split compound names, does it go with first, middle or last. You can see in the experiment, that it mostly took middle names for compound and in some cases took 3 words as middle name and took Suffix as Last name.

Yeah, you may be fighting an unwinnable war when you extend to international, unless there’s a big degree of configurability.

Go no further than Spanish and Portuguese names: in Portugal, the main surname is the last (or are the last, like in my own case - whatever comes from your father) but in Spain, your main surname (you father’s) is in the middle. Would you semantically call that “middle”?

BTW, other than “General” (which would be “Geral”) most everything about Carlos Eduardo José Santos Silva sounds more Portuguese than Spanish. :slight_smile:

We did think about that. It would be easy enough to allow a comma separated list of custom honorifics and suffixes.

1 Like

Commodore
Air Commodore

What of Asian languages where the name is usually last-first except in English-speaking countries where they often adapt to our first-last? Could be Xing Ming or Ming Xing. Sometimes it is clarified by capitalising the surname and not the first, so XING Ming.

Perhaps some of these can be left to prior or subsequent transforms based on user knowledge of the input.

Edit to add: Academic may include Dip. and Grad. Dip.

Something like the following.

As the person would know what kind of Honorific(s) or Suffix(es) is present in ones data and can simply add that and no need to maintain long list behind the scenes in the program.

Now only thing to figure out how to counter the compound First name, compound Middle name and compound Last name. As different culture have different naming styles.

2 Likes

Is the screenshot from the snapshot release? Have not had the time to download it.
Because this seems just like what I could use in regards to the localised versions of the honorifics.

It is a screenshot from the Split Name transform, with added place holder, based on your statement “…Maybe there should be an option add your own terms somehow…”

It’s my interpretation how it would like.

1 Like

Already got those.

We are concentrating on Western names. Not out of xenophobia, but simply to keep things manageable. Also we don’t have many customers in Asian countries.

Ok, thanks,

Yes, that is what we were considering. Currently, transforms don’t store any state. So these fields would start off blank each time (they wouldn’t remember the last values set).

Even people can’t do that reliably. So I don’t think we have any chance of doing it algorithmically in 100% of case. Possibly we could include an option as to whether to expect 1 or 2 last names (to cover Spanish).

Well it is no different then any other transformation, when initially used, they are blank, but once use in the solution, they retain whatever is put in. So similarly, this could initially be empty and person who is using it, would put what is needed and if it is left blank, then it can fallback to EDT default or common or whatever it will be called.