Super Transformation

Johnnycash · August 31, 2022, 10:52pm

I want to put into consideration the idea of being able to encapsulate several (by selection) sections of a transformation within one “Super Transformation”. This could be specially useful in big transformation streams where one wants to consolidate transformations sections and have a more “cleaner” visual.

Admin · September 1, 2022, 3:18am

I have given this some thought already and I can see how it would be useful. It is on the ‘wishlist’ for v2.

Johnnycash · September 1, 2022, 10:23am

What are the plans for V2 @Admin? Is this something you can share? Ideas / timeline?

Admin · September 1, 2022, 10:34am

Nothing is set in stone. But the big ones are things like:

input and out from more data sources, such as:
- web APIs
- databases
- ftp/sftp
- more file formats
more transforms
more options on existing transforms
improvements to the Javascript transform
user interface improvements (such as grouping/folding transforms as discussed above)
visualisation via charts and graphs
others we are not ready to discuss at present

Amontillado · September 2, 2022, 12:47am

A subroutine transform would be neat.

Imagine a dozen input files, each representing a year’s worth of similar data. You want to apply the same chain of transforms to each file, then maybe do something unique per file. Finally, you stack them for your analysis.

If that chain of transforms in the middle could be defined once and chained into multiple data flows, that would be cool.

Just a thought.

GLS · September 2, 2022, 7:06am

It would be useful to save a sequence of transforms as a custom named transform to apply in new processes instead of copying and pasting from another process.

Admin · September 2, 2022, 7:45am

You can do that already, assuming I understand you correctly.

If you have an input and a chain of transforms, you can create a duplicate by:

selecting the input
right clicking and selecting ‘duplicate branch’
you can then change the file the input points to and any transform options in the new branch

Then you can add your Stack transform.

Admin · September 2, 2022, 7:51am

Yes, it would be useful to be able to build up a library of groups of named transforms (also Javascript and regex snippets). However I’m not sure there is a way to save the column related options (unless the column names match, perhaps). So you might have to go through and set all the column related options. Something to consider for v2.

Johnnycash · September 2, 2022, 7:56am

That point about saving the column related options is really a big bottleneck. I am not trying to argue that there is an obvious solution but I have to say (and its obviously an unfair comparison) that for example you can do that in SPSS Modeler (I have no technical details about it). Meaning that if you set column related parameters in a given node and even if you detach it from the upstream nodes, the options you set do not change. That’s incredibly useful when you want to copy and/or fix anything. I would say this is the ONE setback for EDT in my particular workflow even though there are ways to mitigate it.

Admin · September 2, 2022, 8:08am

I’m not sure how SPSS does it, but the only way I can think that you can preserve the column related options when a transform is attached is by column index (position) or column name. For example you could store the name of columns set when a transform is disconnected and then try to match them up again when it is reconnected. It is definitely something to consider for v2.

GLS · September 2, 2022, 9:50am

Applying such a custom transform would either use right away the transforms in case of matching columns name OR show a popup requesting which columns to match.
It may be useful at any rate to have a popup just confirming that you really want to apply to the matching columns the transforms.

Admin · September 2, 2022, 9:57am

Yes, getting the user to confirm the mapping between old and new columns when they reconnect (defaulting to name match, if possible, and index match, if not) is probably the right way to do it. But not trivial to get working! Is that how SPSS does it @Johnnycash ?

Amontillado · September 2, 2022, 11:31am

Right, but if you adjust the common transform chain later you have to make the same change in multiple places.

Admin · September 2, 2022, 11:50am

You could also use the batch processing feature to apply the same set of transforms to multiple input files and append them all to one output file (rather than doing a Stack).

Another interesting possibility would be to specify multiple files using a wildcard in the input (e.g. “*.csv”) and have it effectively do the stack transform on all matching files before the first transform. That way you would only need 1 set of transforms. It is a possibility for v2.

Amontillado · September 2, 2022, 12:45pm

True, and the way I’ve been handling this is either duplicate copies of a branch, if it is simple and unlikely to change, or batch processing to output files. That applies a single transform chain in parallel.

Another thing that would be nice is if you could restructure a fixed record file.

In my case, I’ve got one input file per year. Every record is 4096 characters. Position 2000-and-something is a flag indicating which of six types the record is, each of the six types having a different fixed-length layout.

I read the input, break it into three fields, make six filters, and write each filter out in text format. That way I get the 4096 character records sorted into the six types they belong in.

Those six files are where I get my input for other transform chains.

But please take none of that as complaint. I can slice, dice, and skewer our local tax collector with what I’m learning with EDT.

That’s not just a good thing, it’s kind of fun!

Admin · September 2, 2022, 12:57pm

I’m not sure how we could improve on that.

I suppose, in theory, you could tell Easy Data Transform to create N filters from the N values in a column. But it would be a 1-shot deal as it would be completely impractical for Easy Data Transform to create and destoy filters transforms dynamically as the column values changes. So I’m not sure how useful it would be.

Amontillado · September 2, 2022, 1:05pm

It’s not a big deal and probably very special purpose. In my case, I’m writing fixed length records as text to drop the layout information, then I’m reading them as fixed length records with new column boundaries.

Probably very rare. It’s needed in this case because the rows don’t all have the same field layout. Which is crazy.

Admin · September 2, 2022, 1:24pm

Having multiple different fixed length record formats in the same file is a bit bizarre.

Admin · August 23, 2024, 9:32pm

@Johnnycash
You can now disconnect and reconnect a transform and the columns settings are preserved in most cases. See:
https://forum.easydatatransform.com/t/snapshot-release-22-aug-2024/

Johnnycash · August 26, 2024, 9:52am

Thank you! Very excited for the changes I saw and what is coming in V2!