Video: How to compare Excel sheets

We have created a new 4 minute video to show you how to use the powerful new Compare Data and Compare Headers features in Easy Data Transform v2:

Youtube ‘likes’ help us to be more prominent and are much appreciated.

Subscribe to our Youtube channel to be notified of future videos.

4 Likes

Holy crap,

I had no idea this feature exists !

Comparing header in itself was good enough but data is icing on cake!

2 Likes

It would be helpful if we could export this result directly.
(only the rows with differences and only the columns with their different values).

Yes, we can do all the comparison in a flow with “subtract” left from right and vice versa, but an easy-to-use compare-data-component that does this for us or an export function within this compare-data-screen would be great :blush:

We have thought about doing a Diff transform. I’m not sure how you would best represent the results as a single table though. Especially when the inputs might have different numbers of rows and different columns.

Note that you can Join 2 dataset by key or row number and then use Filter to remove rows where 2 values match in particular columns. This doesn’t really scale when you want to compare lots of columns though. Although I guess you could Concat Cols the columns you care about.

Perhaps you could give some information about a real world example where you need to do this?

That is exactly my challenge :wink:

We have a big Material-Masterdata-list with lots of attributes (50+).
When I compare it today with the previous month’s list (based on the material no. as key), I merge all columns and export them if there are differences in any of the columns.
( the number and names of the columns stays the same ! )

In my wishful thinking, the perfect differences-list will only contain the attribute-name and the old and new values per material-no.

Example:

In “Compare data” you are only highlighting the differences, so it would be nice if that info can be exported in the structure of my “diff” example :blush:

Thanks for your helpful tips.

Your ‘diff’ example might be a good way of putting the differences into 1 table and not replying on fonts/colors etc.

Possibly we could add:

  • A new transform called Diff that takes 2 inputs. The top input is ‘old’ and the bottom input is ‘new’ (or possibly we could refer to them as ‘top’ and ‘bottom’).
  • Options to compare by row number or key-based (as in Compare Data).
  • Key comparison can optionally be case sensitive (as in Compare Data).
  • Only outputs differences, something like:
Material Change Column Old Value New Value
1001 Value changed Attribut1 A B
1001 Value changed Attribut3 1,5 1,18
1002 Row deleted
1003 Value changed Attribut2 X
1004 Value changed Attribut3 4,5 6,5
1005 Row inserted
  • Would have to handle duplicate row keys for key-based.
  • Would have to handle differences in column name/order. The columns are the same in your example, but we have to handle cases when they aren’t. How do you output the differences for this?:

Perhaps something like this?

V1 Change Column Old Value New Value
A Value removed V3 4
B Value removed V3 5
C Value removed V3 6

as long as the “DIFF” transformation is not available :wink: I have a suggestion for an interim solution, which can/should be adapted to get more close to the expectation. But I think it is a step to come to an overview in a table form as wished.

N8AKTIV compare.transform (6.7 KB)

Specially if this is a monthly activity it can be given some effort.

1 Like

Wow, that would be the perfect solution.

Since I currently also concatenate the contents of all fields, there is a difference for all rows as soon as a column is removed or added.
I have to edit this manually if I don’t want to see this.
If these differences could also be seen in the difference list, that would be the icing on the cake.

In the final difference list, the users could then even use the “Change” column to filter which differences they want to keep: “Value changed”, “Value removed”, “Row deleted” or “Row inserted” :+1:

1 Like

I just played around a little more and now it creates exactly the wished out put.

for larger input tables the selection in the Gather transformations need to be adapted, but if the structure of the file is very month the same it should work with 50+ attributes, too.

N8AKTIV compare 2.transform (13.1 KB)

1 Like

Impressive! Would be nice if you could do it in one transform though. ;0)

Sure, but as long it is not available and you have to do it monthly it worse to run the 25 transformation job. For a one-time activity it is a little much. But it was a nice finger exercise.

1 Like

Danke Olaf, thanks Andy,

the interim solution was almost perfect, I only made small adjustments at the end to improve the display a little. The big solution was not necessary at all.
Now my 2 files with 100K rows and 59 columns each could be compared and only the differing columns are displayed.
It requires a whopping 12GB of memory, but it’s worth it :blush:

1 Like

That sounds a lot. Have you got Preferences>General>Optimize processing for set to Minimum memory use? If not, you probably should. In many cases this is actually faster than Maximum speed, due to the reduced memory use.

… I tried both, but it didn’t make much difference.
At the “Gather” step we are already at 8GB and as soon as the two input streams with 5.4M lines each are combined via full join, another 3GB are required.
With “Maximum Speed” even 1GB less RAM is required :wink:

That is surprising because Maximum Speed doesn’t use memory compression.

We are working on a new Difference transform. It can output a dataset of differences between 2 input datasets. The two input datasets can be compared by row number or using key columns. Columns are compared by column number (not name).

It gives the same results to the Compare Data window, but output in a tabular form.

It should be pretty fast.

You can download a snapshot release with the new transform here:

Windows: https://www.easydatatransform.com/downloads/EasyDataTransform_2_8_1_snapshot_1.exe
Mac: https://www.easydatatransform.com/downloads/EasyDataTransform_2_8_1_snapshot_1.dmg

Let us know what you think.

… tested with these small datasets:


… it work perfectly when column-structure is equal
and also when columns (e.g. Attribut3) are missing at the end :+1:

When comparing by column-no. it would be helpful if the column-names were also visible next to the column-number, so that we can see which attribute we are talking about in this line.

Is it possible to add an option to compare by column-names ?

If a column is missing in the middle (e.g. Attribut2), this is declared as a missing column and also as renaming the column and all the rows are shown as a difference …


Difference_Test1.transform (3.4 KB)

Thanks a lot in advance !

I did think about that. But:

  • there is no limit on how long a column name can be
  • the column name might be different in the 2 datasets

So that makes it tricky.

Perhaps. Until then you need to make sure the column ordering is the same in the 2 datasets. Note that you can do this by using Sample with Rows=0 to remove all the rows from one dataset (so it is only the header), then Stack the other dataset under it by column name.

I noticed the compare function is only available for files, so you can’t compare datasets if the datasets are transforms.

Then I noticed if you attach an output file to a transform, that output is a valid source for the compare feature.

There are always other ways to compare data, of course. EDT is truly cool.