Transform JSON column data

Hi there,
i have large CSV datasets which contain JSON data in one column. This data column looks like this (simplified):

"{""json"":""{\\""foo\\"":\\""test1\\"",\\""bar\\"":\\""test1\\""}""}"
"{""json"":""{\\""foo\\"":\\""test2\\"",\\""bar\\"":\\""test2\\""}""}"
"{""json"":""{\\""foo\\"":\\""test3\\"",\\""bar\\"":\\""test3\\""}""}"
"{""json"":""{\\""test\\"":\\""test\\"",\\""betatest\\"":\\""test2\\""}""}"

Is there a way to expand this data into new columns (foo, bar, test, betatest)?

Thanks & kind regards
Mike

I’m not sure. One format packed inside another is not something we usually have to deal with!

I’m quite confused by the escaping. Is the above 1 column with 4 rows? Is:

"{""json"":""{\\""foo\\"":\\""test1\\"",\\""bar\\"":\\""test1\\""}""}"

Supposed to resolve to a valid piece of JSON? What does that JSOn look like?

Yes, the above is actually 4 separate rows in the CSV source file. The original data is created by a webhook which dumps the JSON data into a database. Each JSON object contains logging data from different steps of a voice bot workflow. I assume that they store this as JSON objects in a single column because the JSON structure differs from step to step in the workflow.

The CSV I have to deal with is created by an export process from that database. It is quite possible that the escaping is produced by the export process.

Why are the quotes around foo escaped with \, but the quotes arounf json not escaped?

I can use the Replace transform to get this:

{"json":"{"foo":"test1","bar":"test1"}"}

But it isn’t valid JSON, according to https://jsonlint.com/.

But this is:

{"json":{"foo":"test1","bar":"test1"}}

Yes, the escaping/quoting looks odd. Thanks for looking into this. I’ll check with the vendor who creates the original JSON log data before it is pushed through the webhook.

1 Like

If you can find a way to clean this up into valid JSON. I think the way to do this is to:

-Use Remove cols to make a dataset with just this column.
-Clean it up using Replace
-write it out as plain text to get a valid ‘JSON lines’ format file
-read it back in as JSON to form columns ‘foo’, ‘bar’ etc.
-use Row Num to add row numbers to this data and the original data
-Join the new dataset to the old dataset using the row number as a key

See also:
https://www.easydatatransform.com/help/latest/windows/html/index.html?json_format.html

Here it is,

Transform file.
TransformJasonColumnData.transform (3.3 KB)

2 Likes

Thanks, Anonymous!

That’s a great approach to re-import the cleaned up JSON data. I did not see that the Import block actually has the transformation I was looking for in the wrong places.

2 Likes