Removing trailing hex nuls from character stream files

Circumstances can often arise where a character stream file terminates after a final x0D x0A with a hex nul byte which (a) may cause software which reads the file to fail or generate an error and (b) is difficult to remove efficiently without the use of a hex editor.
For example a trailing nul in an XML file will generally result in its rejection by any software accessing it.
Easy Data Transform provides a very simple way to trim this, including with a whole directory of such files. Take the following example of the last bytes of an XML file
image

Set up the input file so

image

Note that the input type is set to ‘plaintext’ and not ‘XML’

Then add a filter, which uses regex to exclude any row containing \x0

image

Then finally, output the file as ‘Delimited text’

image

The whole process looks like this
image

Note that the 206 input rows have now been reduced to 205 in the output, removing the offending hex nul. By using the batch facility in Easy Data Transform, this can be applied to a series of files in some folder.

2 Likes

Useful tip, thanks. If the null is on the same line as the closing ‘>’ you could probably use Replace with a regex to remove just the null, rather than Filter to remove the whole line.

Yes, that would be the approach in that case. My solution was limited to the simple case case where the null followed a CR/LF and hence was isolated on the last line, where deletion made sense.

What’s quite interesting is that if you Google the problem, most suggested solutions are quite involved under windows, but with EDT I dealt with it 5 minutes.

1 Like