zahedkamal87's avatar

How to parse 2gb size XML file using low memory

I've a 2gb sized XML file. I tried to use https://github.com/prewk/xml-string-streamer but i get "Out of Memory" error. Is there any other way to parse the file fast and with using low memory?

Thank you

0 likes
4 replies
Snapey's avatar

the problem with it being xml is the fact that the xml might only be parseable once it is all in memory.

It really depends how nested the data is.

For instance if there were <records> and within that, multiple <record> then you might be able to process the file sequentially as text, dealing with one record at a time.

Unfortunately it really does depend on the structure of the xml

Robstar's avatar

@zahedkamal87 You need to stream the file as opposed to loading it all into memory at once.

Fortunately, I don't have to work with XML too often. However, the last time (~ 6 months ago) a large (~400 MB) XML file was forced upon me I used https://github.com/prewk/xml-string-streamer

EDIT: a lol, can see you used the same library as myself. I really didn't have any issues myself. Have you tried the different stream providers from the package. i.e.

$CHUNK_SIZE = 1024;
$provider = new Prewk\XmlStringStreamer\Stream\File("large-xml-file.xml", $CHUNK_SIZE);
zahedkamal87's avatar
zahedkamal87
OP
Best Answer
Level 2

I've used this XMLReader and DOMDocument . Code works great!

        $url = "jobs.xml";

        $z = new \XMLReader;
        $z->open($url);

        $doc = new \DOMDocument;

        while ($z->read() && $z->name !== 'job');

        $counter = 0;
        echo '<table border="1">';
        while ($z->name === 'job')
        {
            $node = simplexml_import_dom($doc->importNode($z->expand(), true));
            echo '<tr>';
            echo '<td>' . $node->url.PHP_EOL . '</td>';
            echo '<td>' . $node->company.PHP_EOL . '</td>';
            echo '<td>' . $node->description.PHP_EOL . '</td>';
            echo '<td>' . $node->title.PHP_EOL . '</td>';
            echo '<td>' . $node->date . '</td>';
            echo '<td>' . $node->city . ' ' . $node->state . ' ' . $node->zip . '</td>';
            echo '</tr>';
            $z->next('job');
        }
        echo '</table>';

Please or to participate in this conversation.