Reduce path lengths by rules?

Hi, new user here…

I have a .txt file with paths (NTFS on Windows). They are all at least 250 characters long, most of them have over 260 characters.

I want to shorten them so the all fit into a 259 char limit.

Lines (about 16.000) look like e.g.:

\\home.local\shares\01 Leadership\01 Geschäftsleitung\03 AF-Projekte\GA001 Organisationsüberprüfung 2025\10 Arbeitsdokumente\4 Organisationshandbuch\02 Stellenbeschreibungen\2012 Arbeitsdokumente Leiter Bereich Immobilien Landwirtschafts- und Forstliegenschaften.docx

I can import this file with the custom delimiter “\” and get ~24 columns out of it.

My question is: Can EDT be used to set up rules that would help to shorten paths and add a column that counts the number of characters over each row ?

Rule 1: The last column which contains data in each row should be checked if its content contains a word that was used in a higher hierarchy level and if found the word is removed in the last column with data.

E.g. the filename here contains “Arbeitsdokumente” and 3 levels higher the cell contains this word as well which should lead to a changed cell entry of this name:

2012 Leiter Bereich Immobilien Landwirtschafts- und Forstliegenschaften.docx

If this rule doesn’t lead to a row with a maximum number of chars <= 259, the filename should be truncated by removing a middle part like:

2012 Leiter Berei ... liegenschaften.docx

Just as an example…

Is such “advanced” stuff possible with EDT and if yes, can someone give me a hint how something like this can be set up?

Thanks!

Counting the number of characters is easy. Just use a Calculate transform.

You could then use that with Filter to remove rows <= 259 chars. Or use Verify to trigger a warning or error.

Implemented your shortening rules is a lot more challenging. I’m not sure if any of the existing transforms will help you. You might have to use the Javascript transform and write some Javascript.

Thanks for the hints!

A “Calculate” won’t help in this case, due to the custom delimiter “\” I get a lot of columns, not only 1 or 2.

It works by using Javascript though.

E.g.:

return String($(1)).length + String($(2)).length + ... + String($(22)).length;

I don’t know why I can’t do this within a loop but anyway…

let totalLength = 0;

for (let i = 1; i <= 22; i++) {
totalLength += String($(i)).length;
}
return totalLength;

Will lead to a:

ReferenceError: $ is not defined

error…

$(i) is substituted once at the start with a simple string substitution (with the value of the column named ‘i’). So it isn’t going to do what you want.

There are other simpler ways you can count the total non delimiters characters though. Read it in as plain text. Replace the \ characters with nothing using Replace and then use Calculate to calculate the length.

length.transform (3.0 KB)

One way you could do it in Javascript is:

-read the text into a single column as plain text.

-break it into tokens using String.split() and regex

-read the tokens into a dictionary of text tokens + count

-remove a token that occurs more than once

-repeat as necessary

I had offline communication with @highend on the challenge and a form of a prototype to solve it was created.

We were able to extract elements for potential “shortening” in own columns and than a replace with $(i) is possible:


here the text portions are just removed.

First time I used replace in this manner using entries in other columns, so learned something new.

1 Like

My understanding was that @highend needs to remove parts of strings based on other parts of strings, which is something the native transforms don’t really handle. Hence the suggestion of Javascript. It might be useful to see a simple an example with several rows of input and the corresponding output desired.

Hi,

sorry for the late reply and thanks @Olaf for showing me what can be done with EDT in the Teams call!

Regarding JavaScript and removals via regular expressions:

\\home.local\shares\01 Führung\03 Geschäfte FFKo\01 Vorträge\F+F 2017\F+F 2018\2041 - Vernehmlassungen Masterplan Veloinfrastruktur_Parkierungskonzept\Beilage 2_Vernehmlassung Parkierungskonzept.docx

\\home.local\shares\01 Führung\03 Geschäfte FFKo\01 Vorträge\F+F 2017\F+F 2021\2161 Turi, Kauf Korbgasse 99-102\Grundlagen für Vortrag\90 Kaufangebot\mdl. Nr. 90 Kaufangebot Korbgasse 99-102 Turi.pdf

\\home.local\shares\01 Führung\03 Geschäfte FFKo\01 Vorträge\F+F 2017\F+F 2024\3515 Kostenbeteiligung und Erlass Umsatzmiete Vierzetteli\3515_Kostenbeteiligung und Erlass Umsatzmiete Vierzetteli.docx
01.
Current filename: Beilage 2_Vernehmlassung Parkierungskonzept.docx
First level above: 2041 - Vernehmlassungen Masterplan Veloinfrastruktur_Parkierungskonzept
Expected output: Beilage 2.docx

This could only be done if a regex is built with singular / plural options, in this case for the german language.
Because "Vernehmlassung" != "Vernehmlassungen"
02.
Current filename: mdl. Nr. 90 Kaufangebot Korbgasse 99-102 Turi.pdf
First level above: 90 Kaufangebot
Second level above: Grundlagen für Vortrag
Third level above: 2161 Turi, Kauf Korbgasse 99-102
Expected output: mdl. Nr..pdf

"90 Kaufangebot" is present in the first level above, "Korbgasse 99-102" and "Turi" in the third level above

The final file name isn't pretty, I know, but I'm posting "easy" things here and we'll have to die a few deaths to trim down the total length on more complicated paths
03.
Current filename: 3515_Kostenbeteiligung und Erlass Umsatzmiete Vierzetteli.docx
First level above: 3515 Kostenbeteiligung und Erlass Umsatzmiete 
Expected output: Doc.docx

Everything from the file name is present in the first level above. Partially with different separators ("_" instead of only spaces).
This would lead to a completely empty file name!

The common sense for such cases is to set it to e.g. just "Doc" (for "Document")

Best regards!

This is text processing with complex rules and requiring quite a lot of domain knowledge. Which is not what Easy Data Transform is really designed for. You can probably do a lot of what you want in a Javascript transform, but it isn’t something we can really help with. Sorry.

No problem! More simple cases with no singular / plural differences can be solved (without JavaScript at all) with what Olaf showed me.

1 Like