BENderIsGr8te's avatar

Converting XML to JSON/Array

I am working on a project that gets criminal and credit reports for renters from a Credit Bureau. The Credit Bureau posts XML data to us when the process is complete.

I've noticed that I can't seem to access any data past a certain nesting point. I can clearly see in the XML the data is there, and if I just do a SimpleXML Load String the SimpleXmlElement has all the data. But if I try to convert it to an array or JSON (which is required for storage) then the I am losing some data after a certain nesting point.

json_encode is supposed to return false if too many nest levels are hit (512 is my default). But it's not. I also saw that it may be a browser limitation of the pre tag or dd() function of Laravel. But if I try to dump something deep in the next it's still empty.

Anyway I can get access to the Most Wanted Rapsheet?

Here is the Pastebin containing the XML string and my attempt to get the data.

http://laravel.io/bin/Lk4xV

0 likes
11 replies
bashy's avatar

I used simplexml_load_string() then you can read items and namespaces of XML via this

$namespaces = $job->getNameSpaces(true);
$taleo = $job->children($namespaces['taleo']);

I had to use it once for Oracle's Taleo system. It was horrid D:

BENderIsGr8te's avatar

@bashy 2 problems. When I do that I still only get an empty element for crim. Second, I have to be able to convert this to JSON for storage in a database. So I need to reliably be able to convert to json where all the children will be available. Any thoughts?

bashy's avatar

XML is probably the worst thing I've had to deal with in programming. How does anyone cope when using it... Can't tell what's a string or integer :P

Converting to JSON does remove all namespaces so you can't just auto convert it. I'm not sure why it's not returning elements in X amount of nested levels as I've done quite a few when I converted XML myself.

Is that the only format you can get it in?

BENderIsGr8te's avatar

Ugh. Yes. They have a bug with their JSON format, so they are returning only in XML for the time being. I despise XML.

jlrdw's avatar

Hey I never saw that before thanks.

BENderIsGr8te's avatar
BENderIsGr8te
OP
Best Answer
Level 6

So that XML to JSON function I found does work great, but it wouldn't work for my needs because it incorporates the name space into the key name, and I need the name space removed from the keys.

The problem is when you incorporate Namespaces in XML elements the conversion to an Array or JSON the child nodes that are namespaced are removed. I realized if I remove the namespace from the XML string before I convert the string to an XML Element, then the json_encode function works as expected and no data is removed.

So, for anyone else having this issue, here's is how I solved the problem for my needs.

  • I know that the XML being sent to me has no naming collisions. The only reason Namespaces are used are to identify the source of that portion of the XML data.
  • I know all the Namespaces being used

Having that information here is what I did

function removeNamespaceFromXML( $xml )
{
    // Because I know all of the the namespaces that will possibly appear in 
    // in the XML string I can just hard code them and check for 
    // them to remove them
    $toRemove = ['rap', 'turss', 'crim', 'cred', 'j', 'rap-code', 'evic'];
    // This is part of a regex I will use to remove the namespace declaration from string
    $nameSpaceDefRegEx = '(\S+)=["\']?((?:.(?!["\']?\s+(?:\S+)=|[>"\']))+.)["\']?';

    // Cycle through each namespace and remove it from the XML string
   foreach( $toRemove as $remove ) {
        // First remove the namespace from the opening of the tag
        $xml = str_replace('<' . $remove . ':', '<', $xml);
        // Now remove the namespace from the closing of the tag
        $xml = str_replace('</' . $remove . ':', '</', $xml);
        // This XML uses the name space with CommentText, so remove that too
        $xml = str_replace($remove . ':commentText', 'commentText', $xml);
        // Complete the pattern for RegEx to remove this namespace declaration
        $pattern = "/xmlns:{$remove}{$nameSpaceDefRegEx}/";
        // Remove the actual namespace declaration using the Pattern
        $xml = preg_replace($pattern, '', $xml, 1);
    }

    // Return sanitized and cleaned up XML with no namespaces
    return $xml;
}

function namespacedXMLToArray($xml)
{
    // One function to both clean the XML string and return an array
    return json_decode(json_encode(simplexml_load_string(removeNamespaceFromXML($xml))), true);
}

By calling the namespacedXMLToArray() function I can simply get an array that is 100% good to go in my case.

Hopefully this approach helps others. I am sure if you don't know what possible namespaces exist you can use a RegEx to find the various defined namespaces and then remove them once you know their names.

3 likes
mhopkins321's avatar

Colorado Department of Transportation's (cdot) api is (unfortunately) all namespaced xml. Your function solved ALL of the problems I was running into. I made this change to the namespacedXMLToArray() function for Laravel convenience.

private function namespacedXMLToCollection($xml){
    // One function to both clean the XML string and return an array
    $collection = new \Illuminate\Support\Collection(json_decode(json_encode(simplexml_load_string($this->removeNamespaceFromXML($xml))), true));
    return $collection;
}

Thank you so much @BENderIsGr8te !

1 like
SwissNight's avatar

A year later, @BENderIsGr8te and @mhopkins321 's code keeps doing wonders! Had to parse namespaced XML from some good ol' - err, ancient - Swiss accounting software, and had been trying all possible variants of simplexml_load_string, xml parsers and all the like. It was so painful. So glad I found this post! Thumbs up!

auroralabs's avatar

Hi,

Can you suggest how I can implement this into my workflow? In my controller I have:

public function getSupercontrolDetails()
{
    $client = new Client();
    $response = $client->request('GET', 'https://api.supercontrol.co.uk/xml/property_xml.asp?id=94298&siteID=14978', [
        'headers' => [
            'Accept'     => 'text/xml'
        ]
    ]);
    if ($response->getStatusCode() == 200) {

        $response  = json_decode(json_encode(simplexml_load_string($response->getBody()->getContents())), true);
    }
    dd($response);
}

When I put the result through XML to JSON on the following website https://codebeautify.org/xmlviewer It does a great job of converting the data to JSON yet when I try to achieve this in Laravel I cannot get it to convert. The result I get is

array:5 [▼
  "error" => []
  "property" => array:95 [▼
    "propertyname" => []
    "propertycode" => "94298"
    "flipkeyID" => []
    "clientcode" => []
    "groupID" => "11430"
    "propertyminbookingdays" => "2"
    "propertyaddress" => []
    "propertypostcode" => "KY16 8RQ"
    "longitude" => "-2.831340"
    "latitude" => "56.330508"
    "mapzoom" => "15"
    "country" => "United Kingdom"
    "countryiso" => "GB"
    "nobooking" => "0"
    "regionname" => array:1 [▶]
    "regionname1" => array:1 [▶]
    "regionname2" => array:1 [▶]
    "regionname3" => array:1 [▶]
    "regionname4" => array:1 [▶]
    "typename" => array:1 [▶]

This function looks like the missing part of my workflow but not sure how I would implement this into my function to get the XML to convert to JSON correctly.

Please or to participate in this conversation.