Detecting ragged TSV files

Admin · July 9, 2021, 12:36pm

We were asked about whether you can detect tab separated files that have a different number of values per rows (e.g. 10 values for row 1, 9 values for row 2). This is not an error as such, but it does indicate there may be missing data values. After a bit of thought, it is possible. The dataset is loaded as plain text (preserving tabs) and then a Javascript transform is used to count the number of tabs on each line. The Scale transform is used to show how the number of tabs varies from the maximum. Then Filter is used to list those rows with less than the maximum number.

See attached.

check-ragged.transform (3.7 KB)

For example:

Outputs:

Admin · August 23, 2021, 9:54am

Note that checking for ragged TSV and CSV files is supported from v1.20.1:

https://www.easydatatransform.com/easydatatransform_v1k1.html

Amontillado · August 24, 2021, 1:06pm

The earlier Javascript solution is still worth review, even if it’s not needed in modern times. Looking at a different solution is always good food for thought.