Everything is stored as a string (text) in Easy Data Transform. Some transforms will try to convert strings to integers (or reals or dates) temporarily for the purposes of that transform. But then it stores it as text again. This isn’t as fast as storing a column as integers, but it is much more flexible and less hassle (as you don’t have to go through and define the type of every column).
Setting the column type as ‘Integer’ just changes the verification rules available for that column. It doesn’t change how that column is stored. So you shouldn’t lose any leading zeros, unless perhaps you output to Excel and explicitly set the column format to integer.
If you set Column type to integer, wouldn’t you usually want to check that they are all integers? So we set that rule on by default for Numeric (integer) columns.
The only 3 rules that are checked by default are:
Integer except listed special values for Numeric (integer) columns
Numeric except listed special values for Numeric (real) columns
I think it would be useful to have a default set of Verify checks based on column name. So, for example:
If a column is called ‘Purchase Date’ then set Verification column type to Date and check rules Valid date in format and no empty values.
If a column is called ‘barcode’ then set Verification column type to Numeric (integer) and check is valid EAN13.
They would just be the default rules for each column when you add a new Verify. You could change them. Sort of a simple data dictionary, but it won’t be in v2.0!
You would be able to define your own column names. And it would be optional.
If you get your files mostly from a single source, I think this feature would be quite useful. But if every dataset you get has different column names, it won’t help.
mhmm… when I try to test to validate some EAN13, I instantly get warnings from the auto test “Integer except listed special…”. EAN validation than works like a charm…
see… it’s a (formated) text. You don’t want to do math with it, either
That’s already true… and it’s “only” the first beta… it will only become worse over time. And this amount for every column in a data set… massive configuration. The only help I can think of would be a text searchfilter like you came up with for column headers…
As it isn’t clear if EAN13 or UPC-A are integers or barcodes, I am going to list them under both (as we do for telephone numbers). That seems the best compromise for various reasons.
Also, @joker persuaded me, there will be no rules checked by default.