What new features would you like to see?

Yes. A little bit what the SPSS Modeler does in a way. You can define the entire transformation, the input and the output, but you would just run the transformation by clicking on a button.

The issue with that approach is that:

  • many transform options depend on the columns available
  • a transform can change the columns available downstream (for some transforms you have to process the entire dataset to know what columns will be in the output)

So, if you want to add A->B->C, how do you know what columns are available to C until you have processed A and B? How does SPSS Modeler get around that?

I am not sure of what happens in the background but the SPSS Modeler doesnt need you to process the entire dataset to show up what is available in each node / transformation. I think, If I am not mistaken, that when you load a source the SPSS Modeler automatically reads all the columns and the first x rows of data (not sure if 50 or something like that) and takes assumptions based on it (this sometimes can even become a problem on itself).

@GLS This is now fixed with an optional Open Recent window shown at start-up in v1.12.0.

@Johnnycash This is now fixed with an optional Open Recent window shown at start-up in v1.12.0.

1 Like

I just tested it and it looks very nice!

1 Like

Working perfectly - Thank you very much for your fast implementation!

1 Like

Potential feature request if this doesn’t already exist:

Motivation: I am processing excel files with multiple excel sheets of different data sets with except I am manually iterating through a set of filter changes for each sheet (I believe this is beyond the batch system’s capabilities) and putting the output into a new workbook with multiple sheets to represent the differently processed versions of each input sheet. Kind of like a batch process but with a manual filter change also thrown in there. I am looking to speed up the processing for this semi-manual batching style process.

Request/Question for accessing multiple Excel sheets easily: If loading in an excel workbook for input data, option is given to select the sheet to use, Once this is selected, one would need to re-load the file to put in a different sheet as input data. If there could be an option to reload given a list of sheets populated in a pulldown, this would greatly speed up processing a transform across a set of sheets in a workbook with the same format and different datasets. For example once loaded, the sheet names would be visible in a pulldown. Selecting a different sheet would load that data.

Similarly if possible, in the output, if there were a way to select and/or name an existing sheet or a “new sheet” option and provide a name for it via an entry box if the file has sheets in it (i.e. not a new file)? The semi work around is the sheet specified name with "[]'s in the file name. This would make it much easier to keep track of existing sheet names to overwrite data in by just selecting rather than entering for each.


Longer term feature consideration: An extended batching capability could really take the capabilities of EDT to the next level for complex data sets. Potentially, If there was a way to enter wildcards/variable names in filter names and file names so that some batch script file can auto execute with these changes to auto-execute a work-list. For example: Make your design, add in variables/wildcards, click something to have the program generate a boilerplate script file with the variables/wildcards listed, add/edit in the values to process, adjust the script for what you want to have entered in the wildcards, and run the script. The program would just follow the outlined script actions on your laid out workflow or just fails with an error on the step that failed.

Thank you for your consideration!

Potential experimental feature request:
The ability to push outputs live to MySQL type databases via a simple CURSOR style entry with an INSERT just row by row into a selected table. This would be for an already (assumed) formatted table to append new data on it OR empty table contents and then perform the INSERT process. It would be the user’s responsibility to already set up the table and optimize it in preparation for a data insert. The idea is you could enter your server name, user, pwd, port, DB, table. The implementation might be semi-auto where you have it set default as disabled, then the user would flip from disable to enable to begin the transfer, allowing the user to view the preview first. The transfer would complete or fail on a SQL error or an internal/connection error and switch back to disabled showing the timestamp and last operation completed w/o error. An option of blank or explicit NULL for open cells might be considered along with blank row skipping (if relevant). It would be up to the user to inspect the DB on their own to see if everything looks OK - no need to poll the DB to display current contents within EDT.

In addition to providing flexibility with the output, this would be a great work-around for file size limitations with many browser-based admin tools for uploading CSVs for data entry to MySQL type DBs or other 3rd party tools.

Thank you for your consideration.

Potential request for new feature:

Some button to temporarily pause the auto-calculate feature.

Yes the period between calculations can be adjusted in the settings (this helps!), but here is the issue: If you are filtering data and the calculation happens while the filter is being adjusted (i.e. you click away with it to quickly scroll through the data to check entry values after a tweak in the filter name), and a mis-formed filter phrase is processed, the Joins downstream are all broken and need to be reassigned as no records pass if a partial or unmatched filter is applied. If the auto-calculation could be paused (or a temporary error permitted) while these changes happen this could prevent this issue.

The semi work-around right now is using the new Undo or reassigning the joins if this ever happens.

Thank you for your consideration!

In more recent versions of Easy Data Transform filtering should not affect columns. So any Join transforms downstream should not get nuked by changing a Filter upstream. It is possible there is a bug though. Can you send me an example with instructions how to reproduce?

@mklopfer
File>Batch Process does already handle wildcards in file and sheet names. So, for example, you can specify in batch mode than one of the inputs is:

C:\Users\andy\Documents*.xlsx[data*]

It will then generate a set of outputs for each matching input.

You can also set the output file and sheet name based on the input file name.

See also:
https://www.easydatatransform.com/help/1c/windows/html/batch_processing.html

Does that help. IIf not, you might have to give me a concrete example.

1 Like

@mklopfer
Quite a few people have asked for the ability to input from and output to SQL databases. It is very much on the wishlist. Hopefully we can use ODBC so we don’t have to have a different connector for each database.

Is MySQL your preferred database. Are you more interested in output to a database than input from?

1 Like

@mklopfer

Please note that a DayOfWeek operation is now available in the new Calculate transform in v1.17.0.

1 Like

Greetings,

Awesome software. Love using it.
I have two possible suggestions for new features:

  1. Ability to iterate within the transform.
  2. Improved ability to reference values in other fields in the same row.

As follows:

Suggestion 1: In Transform Iterators

The single biggest feature that I currently see missing in EDT is the ability to place iterator operations inside the EDT transformations.

At the moment, EDT can only batch a set of files on a whole transform…
Bit like a wrapper that has to be manually set up and started.
Which means you have to break big workflows up into
separate steps whenever you need to iterate…
and then use the output manually in the next step.

Not such a big deal but if there was a way to insert those batch iterations operation as proper step blocks inside an EDT transform then it would become possible to build “One-Single-Big-Automatic-Workflow”
with no extra fiddling about or breaking into separate steps required.


Suggestion 2: Improved ability to reference values in other fields in the same row.

Not all transform operations are able to reference the value of other fields in a row easily…For example, on a simple level, to check if two fields are equivalent in each row you actually have to write your own little javascript ( $(Field1) == $(Field2) ) etc.

This is because the current EDT “IF” Transform block will not not allow you to reference $(Field2)…it only allows to check equivalence to static fixed values and strings.

Easy enough to overcome with Javascript for the above example but in other harder examples it would be nicer to be able to use a reference to field value for each row directly similar to “$(Field2)” within the transform.

Cheers!

One more suggestion:

The ability to copy selected parts of a transform and paste into another transform.

1 Like

@Free_World_Maps

Can you give me a real world example when this would be useful?

Currently column variables ( such as $(Field1) ) can be used in the following transforms:

If
Insert
Javascript
Replace
Substitute

But currently column variable are only supported for THEN and ELSE parts of IF, not the conditional (check). I would also like to support column variables in Filter and other transforms. It is on our wishlist to add these.

It would be easier to use a Compare transform.

Hi,

Here is a general example of where an “In-transform iterator” would be helpful:

For example, a user has a inbox directory continuously filling up with thousands of small json log files. Currently user has to first manually batch all those json files into one big csv file by first running them through an initial transform for example by using the absolutely wonderful Gather and Spread transforms.

After that the user can continue and manually move onto the next step transform using that new csv file.

Suggesting it would be great if EDT could instead make everything run under one transform automatically by having an “in-transform iterator” capability.

P.S. Something with equivalent functionality to this important operator in the ArcGIS visual modeler would work well: A quick tour of using iterators—ArcMap | Documentation

@Free_World_Maps

Thanks for the clarification.

It may already be possible to do this using File>Batch process. See example 2 here:
https://www.easydatatransform.com/help/1i/windows/html/how_do_i_perform_the_same_transforms.html

So basically you:

  1. Construct a .transform to process one JSON file into an output.
  2. Set the output Write Mode to Append.
  3. Select File>Batch process.
  4. Set the input to /YourFolder/*.json.
  5. Click Process.

All your JSON files in that folder should then be processed into a single output.

Hi,

Nope, at the moment I think not quite possible to fully automate a batch step iterating over many files into an EDT workflow (unless some javascript trick works?)

As already stated :

My request is for the capability to do the same-same kind of existing batch file iteration BUT AS AN OPERATOR FROM WITHIN an EDT transform so that then the whole workflow process can become fully automated.