XML Convert Duplicate Data Items

Hello all,

I have just started using this tool and its great! Just what Im looking for.

Now, our data structure includes (on occasion) duplicate data types. I am struggling to find a way to include these to repeat.

I have this data set:

As you can see I have: QualificationSubject.QUALSUBJECT and QualificationSubject.QUALPROPORTION doubling. Sometimes its a 100 and sometimes it’s 50 of one + 50 of another. Now when I extract the blanks are skipped which is good.

The issue is when I extract from the above sheet I get:

  <Qualification>
    <QUALCAT>H0003</QUALCAT>
    <QUALTITLE>BA (Hons) in Interiors</QUALTITLE>
    <AWARDINGBODYID>5035</AWARDINGBODYID>
    <QualificationSubject>
      <QUALSUBJECT>101316</QUALSUBJECT>
      <QUALPROPORTION>100</QUALPROPORTION>
    </QualificationSubject>
  </Qualification>
  <Qualification>
    <QUALCAT>H0003</QUALCAT>
    <QUALTITLE>BA (Hons) in Fashion Styling and Visual Merchandising</QUALTITLE>
    <AWARDINGBODYID>5035</AWARDINGBODYID>
    <QualificationSubject>
      <QUALSUBJECT>100443</QUALSUBJECT>
      <QUALPROPORTION>50</QUALPROPORTION>
      <QUALSUBJECT2>100444</QUALSUBJECT>
      <QUALPROPORTION2>50</QUALPROPORTION>
    </QualificationSubject>
  </Qualification>

IF I don’t include the number 2 in the sheet, the second QUALSUBJECT and QUALPROPORTION is ignored. Like this:

  <Qualification>
    <QUALCAT>H0003</QUALCAT>
    <QUALTITLE>BA (Hons) in Fashion Styling and Visual Merchandising</QUALTITLE>
    <AWARDINGBODYID>5035</AWARDINGBODYID>
    <QualificationSubject>
      <QUALSUBJECT>100444</QUALSUBJECT>
      <QUALPROPORTION>50</QUALPROPORTION>
    </QualificationSubject>
  </Qualification>

Now this is easy to solve, I go into an XML editor like Sublime or Notepad++ and do a Find and Replace without the 2. My question is, is there a smarter and more efficient way to do this. Also considering that I have other Data Sets that have the same Data repeating 14 times. Which is not ideal.

Hope this is clear! Let me know if there are any questions!

So what should the output XML look like for the example supplied?

The final result should look like this:

  <Qualification>
    <QUALCAT>H0003</QUALCAT>
    <QUALTITLE>BA (Hons) in Fashion Design and Marketing</QUALTITLE>
    <AWARDINGBODYID>5035</AWARDINGBODYID>
    <QualificationSubject>
      <QUALSUBJECT>100055</QUALSUBJECT>
      <QUALPROPORTION>100</QUALPROPORTION>
    </QualificationSubject>
  </Qualification>
  <Qualification>
    <QUALCAT>H0003</QUALCAT>
    <QUALTITLE>BA (Hons) in Fashion Styling and Creative Direction</QUALTITLE>
    <AWARDINGBODYID>5035</AWARDINGBODYID>
    <QualificationSubject>
      <QUALSUBJECT>100443</QUALSUBJECT>
      <QUALPROPORTION>50</QUALPROPORTION>
      <QUALSUBJECT>100444</QUALSUBJECT>
      <QUALPROPORTION>50</QUALPROPORTION>
    </QualificationSubject>
  </Qualification>

I will try to look into it today.

Hi,

Is it possible for you to provide test data in it’s original input format?

I have a fix for this and will put it out in the next release:

Before:

xml-output-duplicate-cols-before

After:

xml-output-duplicate-cols-after

The fix is here:

You may need to refresh the page to get the latest version.

Please let us know if it works ok.

Hello, that is amazing! Thank you so much for your support. I will test it and feedback any issues.

Have a great weekend.

1 Like

Hello there,

I have tested this now and it works as expected for the issue described. I get this result:

    <QualificationSubject>
      <QUALSUBJECT>100443</QUALSUBJECT>
      <QUALPROPORTION>50</QUALPROPORTION>
      <QUALSUBJECT>100444</QUALSUBJECT>
      <QUALPROPORTION>50</QUALPROPORTION>
    </QualificationSubject>
  </Qualification>

Which is great.

However I am getting this issue with another section when it comes to parent TAGS.

        <EntryQualificationAward>
          <QUALRESULT>H4</QUALRESULT>
          <QUALTYPEID>IB</QUALTYPEID>
          <QUALRESULT>H4</QUALRESULT>
          <QUALTYPEID>IB</QUALTYPEID>
          <QUALRESULT>H3</QUALRESULT>
          <QUALTYPEID>IB</QUALTYPEID>
          <QUALRESULT>H3</QUALRESULT>
          <QUALTYPEID>IB</QUALTYPEID>
          <QUALRESULT>H6</QUALRESULT>
          <QUALTYPEID>IB</QUALTYPEID>
          <EntryQualificationSubject>
            <SUBJECTID>E12</SUBJECTID>
            <SUBJECTID>H91</SUBJECTID>
            <SUBJECTID>70H</SUBJECTID>
            <SUBJECTID>P11</SUBJECTID>
            <SUBJECTID>M11</SUBJECTID>
          </EntryQualificationSubject>
        </EntryQualificationAward>

While I am expecting this output:

        <EntryQualificationAward>
          <QUALRESULT>H4</QUALRESULT>
          <QUALTYPEID>IB</QUALTYPEID>
          <EntryQualificationSubject>
            <SUBJECTID>E12</SUBJECTID>
          </EntryQualificationSubject>
        </EntryQualificationAward>
        <EntryQualificationAward>
          <QUALRESULT>H4</QUALRESULT>
          <QUALTYPEID>IB</QUALTYPEID>
          <EntryQualificationSubject>
            <SUBJECTID>H91</SUBJECTID>
          </EntryQualificationSubject>
        </EntryQualificationAward>
        <EntryQualificationAward>
          <QUALRESULT>H3</QUALRESULT>
          <QUALTYPEID>IB</QUALTYPEID>
          <EntryQualificationSubject>
            <SUBJECTID>70H</SUBJECTID>
          </EntryQualificationSubject>
        </EntryQualificationAward>
        <EntryQualificationAward>
          <QUALRESULT>H3</QUALRESULT>
          <QUALTYPEID>IB</QUALTYPEID>
          <EntryQualificationSubject>
            <SUBJECTID>P11</SUBJECTID>
          </EntryQualificationSubject>
        </EntryQualificationAward>
        <EntryQualificationAward>
          <QUALRESULT>H6</QUALRESULT>
          <QUALTYPEID>IB</QUALTYPEID>
          <EntryQualificationSubject>
            <SUBJECTID>M11</SUBJECTID>
          </EntryQualificationSubject>
        </EntryQualificationAward>

My XLS input headers are labeled the following way:

Engagement.EntryProfile.EntryQualificationAward.QUALRESULT
Engagement.EntryProfile.EntryQualificationAward.QUALTYPEID Engagement.EntryProfile.EntryQualificationAward.EntryQualificationSubject.SUBJECTID Engagement.EntryProfile.EntryQualificationAward.QUALRESULT
Engagement.EntryProfile.EntryQualificationAward.QUALTYPEID Engagement.EntryProfile.EntryQualificationAward.EntryQualificationSubject.SUBJECTID Engagement.EntryProfile.EntryQualificationAward.QUALRESULT
Engagement.EntryProfile.EntryQualificationAward.QUALTYPEID Engagement.EntryProfile.EntryQualificationAward.EntryQualificationSubject.SUBJECTID Engagement.EntryProfile.EntryQualificationAward.QUALRESULT
Engagement.EntryProfile.EntryQualificationAward.QUALTYPEID Engagement.EntryProfile.EntryQualificationAward.EntryQualificationSubject.SUBJECTID Engagement.EntryProfile.EntryQualificationAward.QUALRESULT
Engagement.EntryProfile.EntryQualificationAward.QUALTYPEID Engagement.EntryProfile.EntryQualificationAward.EntryQualificationSubject.SUBJECTID

I will investigate and get back to you.

As a workaround you can rename the headers like this:

Engagement.EntryProfile.EntryQualificationAward.QUALRESULT
Engagement.EntryProfile.EntryQualificationAward.QUALTYPEID Engagement.EntryProfile.EntryQualificationAward.EntryQualificationSubject.SUBJECTID *Engagement.EntryProfile.EntryQualificationAward.QUALRESULT
*Engagement.EntryProfile.EntryQualificationAward.QUALTYPEID *Engagement.EntryProfile.EntryQualificationAward.EntryQualificationSubject.SUBJECTID **Engagement.EntryProfile.EntryQualificationAward.QUALRESULT
**Engagement.EntryProfile.EntryQualificationAward.QUALTYPEID **Engagement.EntryProfile.EntryQualificationAward.EntryQualificationSubject.SUBJECTID

Then remove all the ‘*’ characters on the final file (e.g. by inputting it as plain text and use a Remove transform, then outputting it as plain text to a .xml file).

I will have a think about a better way to do it.

Hi - i was on holidays. Thanks for the replies!

this what I have been doing and naming repeating fields as:

Engagement.EntryProfile.EntryQualificationAward.QUALRESULT1
Engagement.EntryProfile.EntryQualificationAward.QUALTYPEID1
Engagement.EntryProfile.EntryQualificationAward.EntryQualificationSubject.SUBJECTID1

Then removing them in the xml file. It’s not a huge problem, i had a lot bigger problems before i found EDT! :smiley:

1 Like

@AlexSzabo
There is a fix for the problem in the latest snapshot:

It isn’t especially elegant, but it works!