Issue with having to re-assemble transformation after changing filter

Johnnycash · March 18, 2021, 10:48am

Team, I have an issue that is perhaps the most irritating part of using EDT (when for example compared with the SPSS Modeler), not from what I can do with the tools and the fact that they aim to different audiences, but from a data workflow stream integrity management. Allow me to explain:

With SPSS Modeler whenever I build a stream / flow, unless I make source or column related changes, the tool always keeps my nodes / transformations the way they are even if there are merge / joins, etc.
In EDT I have a lot of issues in this area: very often I find out that I need to make some small changes in ifs and filters transformations and they end up re-setting many of the downstream transformations and I have to literally be re-building halft the flow - even though I did not change ANYTHING that should be affecting the stream integrity.

You probably will need an example and this is one. I am not quite sure why changing an IF statement, affects some of of my joins and other transformations after that have completely unrelated keys.

Admin · March 18, 2021, 11:36am

We are keen to fix annoyances like this.

In the early days of EDT there were quite a few issues with transitory changes upstream resetting column settings downstream. We have fixed them where we know about them.

Changing the options for an If transform shouldn’t (edited) change the column structure, so shouldn’t reset any column settings downstream.

Can you reproduce this issue in a simple example?

Johnnycash · March 18, 2021, 11:55am

Thanks Andy, I will try to come with that still today or tomorrow.

Johnnycash · March 19, 2021, 8:51am

Hi @Admin, sorry for taking my time to come back but I really want to dig a litter more on this before coming back with a detailed explanation of what is happening in the above stream.

Re-iterating what I said: its actually not the IF that is causing issues, its the Remove Columns transformation. Let me detail:

The BP CEID field is the key to join with other sources along the entire stream but that isn’t the column I’m adding or removing anywhere
What is happening is that if I come to the Remove cols transformation and add 2 more columns (because I just decided I needed them), it kinda breaks up the (for some odd reason) the joins and then subsequently all the transformations after them too.

In Screenshots from 1 to 5: I am adding back 2 columns in the Remove Column transformation. When I do that, the first join right after that transformation doesnt break but the 2 after those do break and completely get everything else after disjointed. I can try to provide a sample file for you to analyze. Unfortunately this one is production work for my company and I can’t share the data.

1- I go to the remove cols transformation

2- Adding 2 columns here back Step 2

3- Some of the joins after the Remove columns get broken even if I didnt do anything with the Key Column

4- This is how it looks after I change that Remove Columns info - Joins get undone

5- And as a result of the joins getting undone, everything else after pops out and I have to re-do most of the transformations

My expectation would be that if I don’t do anything with columns that help do the joins, nothing should get broken. It seems, however, that if I add or remove anything in that Remove Columns transformation, even if its not a column that is being used for a join, there’s some kind of refresh that breaks things downstream.

Johnnycash · March 19, 2021, 8:57am

I can also provide a short video showing the issue. Maybe that will help but @Admin let me know.

Admin · March 19, 2021, 9:05am

EDT tracks columns by their indexes. When you add or remove columns it potentially changes all the column indexes downstream. So sometimes we reset any parameters related to columns downstream to ensure they aren’t set to the wrong column. It may be possible to be more sophisticated about this to minimize this issue. However these cascading column changes are one of the most complex parts of EDT. I will look into it when I get chance.

Admin · March 24, 2021, 3:23pm

@Johnnycash I have been trying to reproduce the problem by removing and re-adding columns to Remove Cols transforms upstream of a Join. But without success so far. Can you create a simple example that illustrates the problem?