Data verification

We’ve been working hard on the new data verification feature. it is very much a work in progress, but you can see the basics here.

Here are the dataset rules:

Here are the rules for text columns:

Here are the rules for selection (categorical) columns:

Here are the rules for numerical (real) columns:

Here are the rules for numerical (integer) columns:

Here are the rules for date columns:

You check each rule you want to verify. Some rules are mutually exclusive and will disappear when you check another rule.

Each verification failure can:

  • generate an information message; or
  • generate a warning message; or
  • generate a warning message and stop further processing

You will also be able to see fails color-coded in the data table.

2 Likes

This show verification fails highlighted in the data table. The fails are color-coded by the severity of each each fail. You can hover over a data value to find out what verification rule it failed.

2 Likes

EAN13, UPC-A and non-printable characters have verification rules.

2 Likes

IBAN and ISBN are pretty easily verified afaik. I am not that sure about the new UDI-DI (medical products in CE area). But EAN/gtin would be cool to have verified, too

Best,
Jochen

Can you attach a link to a spec for these? Or, better still, some code or an algorithm of how to verify them.

IBAN:
The IBAN check digit consists of two digits in positions 3 and 4 of the IBAN.
It is calculated using the MOD97 algorithm and provides the primary integrity check for the IBAN standard.
Supported for all 116 countries.
IBAN Checker: Validate & Check IBAN Number for Errors.

ISBN:
IBAN Checker: Validate & Check IBAN Number for Errors.

EAN/GTIN:
several lengths. Perhaps you have to cut of the last digit, calculate the new full number and compare for differences…

UDI:
seem a bit more complex… Unique Device Identification (UDI) - Healthcare | GS1
I was not able to find a verification procedure

1 Like

Thanks. You have linked to the IBAN checker twice. Do you have a link for an ISBN checker?

sorry. Here is the link to an ISBN checker: https://www.instructables.com/How-to-verify-a-ISBN/
or a PDF: https://www.csuohio.edu/sites/default/files/88-2015.pdf
O’Reiley has a REGex: 4.13. Validate ISBNs - Regular Expressions Cookbook, 2nd Edition [Book].

1 Like

Thanks. Currently concentrating on releasing the beta, but will look into these.

How often do you need to verify ISBN, IBAN and GTIN codes?

GTIN pretty often, the others from time to time. Usually I have to go without validation, although data contains many errors. Those validations would be very useful for me

Which lengths of GTIN do you use?

mostly gtin-13, sometimes gtin-8. gtin-12 is also possible. Gtin-128 now gets more important but I had not yet a case where I would have wanted to validate it

1 Like

If you don’t mind, can you provide some test data? and how you would like it to be checked?, I will give it a shot to come up with the transform to check it.

You can now download the v2 beta to try the data verification:

1 Like

Here is an example of an article list containing gtin. I don’t know if they are correct or not.

AngebotEnergizer.csv (1.3 KB)

Best
Jochen

Hi,

works fine. Takes time, but that’s fine for me…

Working with the Transform is a bit tedious because the update throws me up to the beginning of the pane and reopens all the columns I closed. I can understand why this is setup hierarchical, but it is a bit difficult to work with.
I would not agree that gtin is an Integer, first because I am unsure if leading zeros are possible, second because this would be so large numbers that Excel always displays them in exponential writing.
It iritates me a bit that when I switch a column to Integer, that this starts an update because one rule is already active…

Best,
Jochen

It is potentially doing a lot of checks. I don’t think there is much room for big speed-ups.

We will look into that.

It is just digits 0-9, so that makes it an integer in my book, regardless of how Excel treats them.

This is because the rules checked change. You can avoid this by unchecking Run>Auto run.

It is just digits 0-9, so that makes it an integer in my book, regardless of how Excel treats them.

Correct, but any leading zeros will be lost by converting this to an int.
On a practical side, I always work with any article number or gtin as String. In this case, for me an integer is something I want to calculate with. I never ever do this with gtin so I keep/convert them as/to text