Dealing with schema drift

‘Schema drift’ is where the columns in an input change over time. E.g.:

  • new columns are added
  • existing columns are deleted
  • the column order changes

It can be a real nuisance for transformations that you run regularly.

In Easy Data Transform you can handle this using Stack. But this can be quite tedious to do.

So we are adding a new Schema feature to inputs in v2. This gives you the option to store an ordered list of column names with each input and say what you want to do if the input does not match the schema.

  • add missing columns (in the schema, but not in the input) with empty values
  • rearrange input columns into the same order as the schema
  • add or ignore extra columns (in the input, but not in the schema)
  • stop with an error

We hope to have a beta version that customers can try in not too long.

5 Likes

now you just teasing us

My take on Rearrange

  1. will be very helpful , but you will have to show the user the existing column order also to help
  2. Give the option to enter ‘dummy’ column with ‘dummy’ value so in case a column is deleted , user can insert that too , to maintain integrity in calculation downstream.

Yes, still thinking about that.

That has already been added.

image

oooh la la ! I am now looking forward to the big bad beta , which I hope is paid !

This looks fantastic. Looking forward to v2.

1 Like