Team, I have an issue that is perhaps the most irritating part of using EDT (when for example compared with the SPSS Modeler), not from what I can do with the tools and the fact that they aim to different audiences, but from a data workflow stream integrity management. Allow me to explain:
With SPSS Modeler whenever I build a stream / flow, unless I make source or column related changes, the tool always keeps my nodes / transformations the way they are even if there are merge / joins, etc.
In EDT I have a lot of issues in this area: very often I find out that I need to make some small changes in ifs and filters transformations and they end up re-setting many of the downstream transformations and I have to literally be re-building halft the flow - even though I did not change ANYTHING that should be affecting the stream integrity.
You probably will need an example and this is one. I am not quite sure why changing an IF statement, affects some of of my joins and other transformations after that have completely unrelated keys.
In the early days of EDT there were quite a few issues with transitory changes upstream resetting column settings downstream. We have fixed them where we know about them.
Changing the options for an If transform shouldn’t (edited) change the column structure, so shouldn’t reset any column settings downstream.
Hi @Admin, sorry for taking my time to come back but I really want to dig a litter more on this before coming back with a detailed explanation of what is happening in the above stream.
Re-iterating what I said: its actually not the IF that is causing issues, its the Remove Columns transformation. Let me detail:
The BP CEID field is the key to join with other sources along the entire stream but that isn’t the column I’m adding or removing anywhere
What is happening is that if I come to the Remove cols transformation and add 2 more columns (because I just decided I needed them), it kinda breaks up the (for some odd reason) the joins and then subsequently all the transformations after them too.
In Screenshots from 1 to 5: I am adding back 2 columns in the Remove Column transformation. When I do that, the first join right after that transformation doesnt break but the 2 after those do break and completely get everything else after disjointed. I can try to provide a sample file for you to analyze. Unfortunately this one is production work for my company and I can’t share the data.
My expectation would be that if I don’t do anything with columns that help do the joins, nothing should get broken. It seems, however, that if I add or remove anything in that Remove Columns transformation, even if its not a column that is being used for a join, there’s some kind of refresh that breaks things downstream.
EDT tracks columns by their indexes. When you add or remove columns it potentially changes all the column indexes downstream. So sometimes we reset any parameters related to columns downstream to ensure they aren’t set to the wrong column. It may be possible to be more sophisticated about this to minimize this issue. However these cascading column changes are one of the most complex parts of EDT. I will look into it when I get chance.
@Johnnycash I have been trying to reproduce the problem by removing and re-adding columns to Remove Cols transforms upstream of a Join. But without success so far. Can you create a simple example that illustrates the problem?