Detecting ragged TSV files

We were asked about whether you can detect tab separated files that have a different number of values per rows (e.g. 10 values for row 1, 9 values for row 2). This is not an error as such, but it does indicate there may be missing data values. After a bit of thought, it is possible. The dataset is loaded as plain text (preserving tabs) and then a Javascript transform is used to count the number of tabs on each line. The Scale transform is used to show how the number of tabs varies from the maximum. Then Filter is used to list those rows with less than the maximum number.

See attached.

check-ragged.transform (3.7 KB)

For example:

image

Outputs:

2 Likes