CSV Splitting and Recombining

Opening up CSVs for inspection in text editors or EXCEL is increasingly difficult or down right impossible when approaching or exceeding ~1M rows. If you are using web/HTML based tools that have large file restrictions or time-outs to upload CSVs into DBs, braking CSVs for uploading in parts is essential for reliable uploads.

There is a nice workflow in EDT allowing one to add row numbers to split a CSV file based on total rows for each split file to be created, this row number designator can be removed before each final file generated. Here is the logical flow of the process.

Showing the processing for one of the split CSV files (the first cut) that is set to send the first 2 rows to a CSV file:


Parallel operations to remove the added “row number” column and write to sequential files would need to follow for each of the other “cuts”.

Remember that if you name your output files sequentially you can use the BATCH capability of EDT to recombine your CSV files by sequential restacking to re-create the original input again: https://www.easydatatransform.com/help/1b/windows/html/batch_processing.html
https://www.easydatatransform.com/help/1b/windows/html/how_do_i_perform_the_same_transforms.html

2 Likes

I’ve got some ideas about how file splitting could be made more efficient. Also how it can cope with dynamically assigning the output file name. Watch this space…

2 Likes