Executing a transform

pvonk · June 25, 2021, 8:18pm

Today was my first use of EDT. I think I’m missing something.

I have an input file (let’s call it input.csv). I’ve created a sequence of transforms that massages the input data and writes it to (let’s call it…) output.csv. I’ve saved the EDT file as process.transform.

It was a learning process - my testing method is to make some changes to the input file in order to test how my algorithm works with modified data. The first time I did this and returned to the EDT window I looked around the screen for a “Run” button. Couldn’t find one. I went though some attempts, e.g. entering the input filename again. But my attempts created new nodes on the graph or other changes.

One way I was able to re-run my algorithm after making changes to input.csv was to close EDT and double click process.transform. There I got the “Disable Outputs” window. By choosing “Write to output(s)” the algorithm was executed and a new output.csv file was created. Can I not leave the EDT window open, externally edit input.csv and run the program again? That’s what I’m missing. Any help will be appreciated.

Admin · June 25, 2021, 8:54pm

The processing is done automatically. No need to press run. If you can see a green tick, then it has already been done. It is just so fast that you probably won’t notice if you have <100,00 rows. ;0)

If you check ‘watch’ for an input, then changes to the input file will trigger it to be re-read and all the downstream processing done automatically.

As soon as you stop changing the options in the right pane for 2 seconds* it does the transformation for you.

*You can change the 2 seconds in Preferences.

pvonk · June 25, 2021, 8:58pm

Wow! How easy is that! It’s the “watch” parameter I didn’t catch.

Thanks.

Admin · June 25, 2021, 9:10pm

Just note that ‘watch’ may not work if you delete the file and then replace it with a new file with the same name.

Also you can select the input and click the refresh button in the right pane to force the file to be re-read.

pvonk · June 25, 2021, 9:15pm

Yes, this is what I’ll probably have to do. I download the “input.csv” file from the web each day, so it overwrites the previous one (same name).

Admin · June 25, 2021, 9:22pm

I believe overwriting the file will still trigger a re-read if ‘watch’ is checked.

Amontillado · June 27, 2021, 3:58pm

Howdy! Newcomer here, girding up with Easy Data Transform to fight injustice, defend the defenseless, and generally kick ant hills in city hall.

It appears to me that analysis with EDT can involve iterations against a data set that don’t have to be done in a single EDT file.

For instance, my first project amounts to picking hors d’oeuvres out of a large mulch pile of confusing data.

I want to add column names to my fixed length input data files, so I’m playing with the stack transform.

Then I want to do joins and things using the new column names. One process I’m looking at is an EDT file that just adds the names and writes the data out. Other EDT files then take the transformed files as input.

I guess I can look at an EDT file as a document written to show the changes I want. Or I can look at an EDT file as a script that takes input, makes changes, and writes output for further use. The act of opening and closing the EDT file does the processing in that case.

Really interesting things happen if you create an output file and then add it as a new input in the same EDT document. Give it a try, if you dare. You can make the Kessel run in less than 12 parsecs, but hazards abound.

Admin · June 27, 2021, 7:33pm

I don’t recommend outputting to the same file you input from (and you should get a warning if you try to). For one thing there is no guarantee in which order things happen, beyond the fact that a parent is always processed before its children. If you do try it, back up your data files first!

If you really want to push your luck you can try reading and writing .transform files from within Easy Data Transform (they are XML).

In future we might add the possibility to trigger one .transform file from another, which might be useful for complex jobs.

Amontillado · June 27, 2021, 9:20pm

The warning message for trying to use an output file for input is brilliant. “Inputting from xyz.xlsx (which Easy Data Transform is also outputting to) could cause a rip in the space-time continuum. Are you sure you want to do that?”