How validate / verify the CSV template did not change (column names)

I am wondering if Easy Data Transform could help me to verify if the CSV file columns headers are what they are expected to be. I download a CSV file regularly. The company producing the CSV is notorious for making unannounced changes to the CSV file without giving any sort of warning in advance or after the matter.

I am less concerned about the columns moving around and more concerned about the names of columns (fields) change. If columns name change, the values will likely change as well. So without being warned about it my import would import the CSV without importing the changed columns properly. I use Airtable to import the CSVs. The import in Airtable works well for CSVs if they remain same. If column moves around, it’s not an issue, but if the column header name changes, the import would not stop but would continue and would simply skip the columns that are missing and will not import any changed columns names.

I don’t think I can expect Airtable to add any CSV checking process because that is really not Airtable problem and they are a database company, so having an import tool for CSV, that is what they do have is all they would provide. Making sure the CSV is correct, would need to be done before that.

So I am thinking Easy Data Transform could be a great tool to do that since I can see it already does so many things.

I looked at the Javascript option and I don’t know if this is something that could do that, or is there anything else I should take a look at.

From what I read I may be able to take my original CSV template (empty) file and then either compare that against the newly downloaded CSV (after removing data) and then use e.g. Araxis Merge app to compare the files. Or I could just use EDT to compare the column names, but it requires quite a long javascript statement.

All I would be happy if I could create somehow a CRC value or something of all column names in the CSV file (or even count the total number of characters would suffice) and then compare it with a control number (which is what I would know my original template file has).

I am not a programmer so I am pretty much improvising here. Maybe a solution to what I am looking is pretty simple, which would be great. I am all ear.

I see here this page has description about Hash check but it seems only capable to handle values but now the column names?


You can use Stack to check for the columns that have name change or spelling issues or new columns

For Example:

These are your orignal columns

Now in the new csv
col 1,cola,colb,col2,col3,col4,col5,col6

Transform file
MyTransform.transform (2.1 KB)

1 Like

Forgot to mention that, the output of stack will show the original column blank, if it is not in the new csv, otherwise it will show the values under the column. Also any new (or old with new spelling or with extra spaces or whatever) will show after the last original column, this way you will know which are the new columns and then can make adjustment as necessary.

As a suggestion, you can have only the column headers for the original columns and then for the new csv, you can use Sample Tranform to get only 1 or few records and then use Stack Transform and you will know which ever original columns have value, that is unchanged and whichever are blank will mean, either they are not present or have different name or spelling, which you will find after the original columns.

1 Like

As @Anonymous says, you can do a Stack transform with Align columns by set to Header name. If the column names change, you will then get additional columns.

We have some ideas to handle this better in v2 without using Stack by storing the expected schema (column names and order) in an input and rearranging columns or throwing an error if the schema doesn’t match the expected one. Perhaps we will add an option to send some form of email or notification if things go wrong. Watch this space.

See also:

Thanks that is good and will check the Stacks option that could work find for me right now.

1 Like

I solved this a bit further, I have separated 2 flows, one verification control flow and the actual the CSV main flow filtered out the resulting stacked table and transposed it, then filtered out again and I only have now the rows showing pairs of headers not matching. I added a condition column to fill in with a larger block of text if there is a mismatch.

Now the output files are two, one is checking results and the other is normal processed file. The control checking results file is either 7 bits or so large if it is empty, if there are mismatches, the file size will have like 500 bits or more…

It’s a small visual trick, so I don’t need to check the flow actually can see on the file size of the control file.

It’s a crazy work-around but pretty nifty. Saves time.

1 Like

Thanks for sharing that. We will be looking into making this sort of ‘schema validation’ much easier in v2.

We will be looking into making this sort of ‘schema validation’ much easier in v2

My company would benefit greatly from this feature. Do you know the roadmap for this? Also, please make sure to add this validation feature for outputs too. For my use case, we need to produce a specified output from varying inputs.

It is high on the wishlist for v2 and we have started v2 development. We might have something you can try in a few months. But we don’t make promises on future releases, as a matter of policy. It only leads to tears!