How validate / verify the CSV template did not change (column names)

I am wondering if Easy Data Transform could help me to verify if the CSV file columns headers are what they are expected to be. I download a CSV file regularly. The company producing the CSV is notorious for making unannounced changes to the CSV file without giving any sort of warning in advance or after the matter.

I am less concerned about the columns moving around and more concerned about the names of columns (fields) change. If columns name change, the values will likely change as well. So without being warned about it my import would import the CSV without importing the changed columns properly. I use Airtable to import the CSVs. The import in Airtable works well for CSVs if they remain same. If column moves around, it’s not an issue, but if the column header name changes, the import would not stop but would continue and would simply skip the columns that are missing and will not import any changed columns names.

I don’t think I can expect Airtable to add any CSV checking process because that is really not Airtable problem and they are a database company, so having an import tool for CSV, that is what they do have is all they would provide. Making sure the CSV is correct, would need to be done before that.

So I am thinking Easy Data Transform could be a great tool to do that since I can see it already does so many things.

I looked at the Javascript option and I don’t know if this is something that could do that, or is there anything else I should take a look at.

From what I read I may be able to take my original CSV template (empty) file and then either compare that against the newly downloaded CSV (after removing data) and then use e.g. Araxis Merge app to compare the files. Or I could just use EDT to compare the column names, but it requires quite a long javascript statement.

All I would be happy if I could create somehow a CRC value or something of all column names in the CSV file (or even count the total number of characters would suffice) and then compare it with a control number (which is what I would know my original template file has).

I am not a programmer so I am pretty much improvising here. Maybe a solution to what I am looking is pretty simple, which would be great. I am all ear.

I see here this page has description about Hash check but it seems only capable to handle values but now the column names?

Hi,

You can use Stack to check for the columns that have name change or spelling issues or new columns

For Example:

These are your orignal columns
col1,col2,col3,col4,col5,col6
v1,v2,v3,v4,v5,v6

Now in the new csv
col 1,cola,colb,col2,col3,col4,col5,col6
v1,va,vb,v2,v3,v4,v5,v6

Transform file
MyTransform.transform (2.1 KB)

1 Like

Forgot to mention that, the output of stack will show the original column blank, if it is not in the new csv, otherwise it will show the values under the column. Also any new (or old with new spelling or with extra spaces or whatever) will show after the last original column, this way you will know which are the new columns and then can make adjustment as necessary.

As a suggestion, you can have only the column headers for the original columns and then for the new csv, you can use Sample Tranform to get only 1 or few records and then use Stack Transform and you will know which ever original columns have value, that is unchanged and whichever are blank will mean, either they are not present or have different name or spelling, which you will find after the original columns.

1 Like

As @Anonymous says, you can do a Stack transform with Align columns by set to Header name. If the column names change, you will then get additional columns.

We have some ideas to handle this better in v2 without using Stack by storing the expected schema (column names and order) in an input and rearranging columns or throwing an error if the schema doesn’t match the expected one. Perhaps we will add an option to send some form of email or notification if things go wrong. Watch this space.

See also: