Difficulty opening XML File

DanFeliciano · August 12, 2022, 4:16pm

Hi folks, the attached file opens fine in Excel but, I can’t get it to open in EDT.
The file is too large to attach it’s 8MB but can be loaded here.

https://electionresults.vermont.gov/rss/5188/ResultsData.xml

I’m running;
MacBook Pro
Processor 2.4 GHz 8-Core Intel
Core i9
Graphics Intel UHD Graphics
630 1536 MB
Memory 32 GB 2400 MHz
DDR4
macOS Ventura 13.0

Anonymous · August 12, 2022, 6:54pm

Hi,

I was not able to load the XML, so I converted it to JSON and then it took more than 15 minutes and displayed in right pannel that two many columns I think it was more than 259K columns, so I changed from Wide (more columns) to long (more rows) and it displayed the data and I saved that as CSV.

I zipped both the ResultsData.json and ResultsData.csv so that I can attach it here.

I think it is taking more time because by default when you Input XML or JSON file, the format is default set to Wide (more columns) and I think that might be the problem.

You can use the ResultsData.csv for your purpose it loads instantly. You can try yourself to load the ResultsData.json and see how long it takes.

I was not able to upload as zip file, so I added csv extention to it, so please remove the csv extention and unzip it to get the json and csv files.

ResultsData.zip.csv (1.2 MB)

Admin · August 12, 2022, 7:05pm

@Anonymous is correct, the problem is that the XML tree structure creates a vast number of columns if you load it as ‘wide’ and takes forever to load. But it loads in less than a second if you load it as ‘long’.

Load the attached and click … to change the .xml file location to the correct one to input it as long:
resultsdata.transform (1.0 KB)

I am aware that it is a problem that you can’t change between wide and long before the initial load and hope to have a solution to this soon.

Anonymous · August 12, 2022, 7:06pm

Hi,

Found a simple solution to the problem, create a xml file for example

<root>
<record>
first record
</record>
</root>

save it and load it in EDT and then change Format from Wide (more columns) to Long (more rows) and then use the … to change the file to ResultsData.xml and it loads instantly.

I did not notice the admin already replied.

DanFeliciano · August 12, 2022, 7:11pm

Thanks, everyone. I appreciate you looking into this.

yorkeman · August 13, 2022, 12:42pm

I’ve adopted a completely amateurish and kluge for two types of file I process {one for instance being single record files of essentially unstructured data which changes with every instance, a superset of which has common elements).

This is to read the XML file as plain text and then extract the data I need using the Javascript function.

Naff, but very effective and fast.

Admin · August 13, 2022, 1:58pm

If it works, that’s fine! But I suspect quick kludges aren’t going to work for big and complex XML files.

XML and JSON are a bit tricky, because you are flattening a tree into a table (or vice versa). And there is no one true way to do that.

yorkeman · August 13, 2022, 2:52pm

Absolutely, but the issues I had were with small but complex files.

XML (of which I have been a fan for a long time) becomes more problematic to handle when it is used for non record oriented data, and especially so when {a} there is no published schema and {b} elements or attributes are not present when they are deemed irrelevant for some instance viz no consistency in the input file.

Take this snippet from an example file (one of the simpler ones)

If you input the file as ‘wide’ it generates 1row and 988 cols, but still requires javascript to unpack that neo-riemannian element above and well as the composite UUIDS.
Input it long, you get 67 not very useful cols, and 700 rows containing all sorts of data.

The genius in EDT (or one of them…) is it flexibility in handling a variety of input character files.

After struggling with this stuff for a while I read the file as text and used javascript routines to reorganise the data and then standard EDT functions to manipulate the end result - now enhanced by multiple outputs.

If you look at the starting file at https://btcloud.bt.com/web/app/share/invite/hDufSwzZHD

you see that that are are sorts of little wrinkles in extracting all those UUIDS and mapping to an associated data value (with of course those mapping tables being loaded into EDT.).

So once having decided on the text way , it was surprising quick to re-structure this and extract the required data.

DanFeliciano · August 15, 2022, 2:47pm

Thanks, everyone, I was thinking that perhaps in the preferences setting have a default option for XML import (wide or long). When Import large files the system hangs and I can’t change the option to try the other setting.

Admin · August 15, 2022, 5:03pm

@DanFeliciano That would work, but its a bit of a kludge. Options in the preferences aren’t very easy to find. Also lots of options in the Preferences can be rather daunting. Hopefully we can come up with a better solution.

Anonymous · August 16, 2022, 5:31am

Hi,

If it is possible, why not set the default to Long (more rows) as it always load the data fast and if user try to go with Wide (more columns) way, then if possible show the statistics that how many columns there would be and how long it might take and let user decide to continue or stay with the current Long (more rows) or simply say not possible due to too many columns.

Because when I changed the XML to JSON and try to load with default setting of Wide (more columns) it took time and at the end it still showed error that too many columns and I should split it to make it under 100,000. So in the end the time was wasted and still I have to go throuhg the Long (more rows) way to load the data.

The above could be a quick fix for the mean time, while you come up with a better solution.

Admin · August 16, 2022, 8:00am

Long format causes problems with some datasets (eg. nested Python arrays).

Working out how many rows and columns there will be before actually creating the dataset would slow things down a lot.

The best solution would be to allow the user to turn off the automatic updating and then press ;run’ when they have all the options set correctly. We’ve partly implemnted that already and hope to have something usable soon.

Anonymous · August 16, 2022, 9:36am

Hi,

Waiting for this option to turn off auto updating eagerly and hope it come as soon as possible.

Still this will not solve the problem of not knowing whether the Wide or Long is better. Until or unless when the file is brought in the EDT and it tries the default option and it takes time, there should be some way to stop, instead of force closing the program like I did, when the XML was not loading in Wide format.

Admin · August 16, 2022, 9:40am

The plan is that you will be able to stop input/transform/ouput part wy through if it is taking too long. Lots of details to work through though…

Anonymous · August 16, 2022, 10:15am

Hi,

That will be wonderful, if one can stop and start during any of the phases and wish you all the best to sort through all those lots of details.

Admin · September 8, 2022, 8:06pm

Here it is!