Get html code of page and parse out content of <div class="img-wrapper">

I want to load HTML code of https://www.example.com/ and get the content of the [div class="img-wrapper"]. I want raw data => I want all the HTML tags and texts inside as is in the code.

I've tried DOMXPath, DOMDocument etc. etc. It's either too stupid and can't handle HTML5 or too clever and is stripping all HTML tags inside that DIV.

sr57

5 years ago

Level 39

Use https://www.php.net/manual/en/function.file-get-contents.php

See first example

andyandy

5 years ago

Level 4

That gets content of entire page. How do I get content of a single DIV?

sr57

5 years ago

Level 39

Your filter what you want with preg_match

andyandy

5 years ago

Level 4

Really? What if given DIV contains another DIVS?

sr57

5 years ago

Level 39

Of course, ... give your example ...

andyandy

5 years ago

Best Answer

Level 4

I have downloaded this file (a single file simple_html_dom.php is enough to make this work):

https://sourceforge.net/projects/simplehtmldom/files/

Simple example is here:

https://code.tutsplus.com/tutorials/html-parsing-and-screen-scraping-with-the-simple-html-dom-library--net-11856

And this code will get all DIVs with class "wrap", and then you can for example loop them and add into a variable:

        require('public/simple_html_dom.php');
        $html = new \simple_html_dom();
        $html->load_file('https://www.example.com/');
        $items = $html->find('div[class=wrap]');



        $result = '';
        $i = 0;
        foreach ($items as $item) {
            if ($i == 3) {
                break;
            } else {
                $i++;
                $result .= $item->innertext;
            }
        }

        return $result;

Please or to participate in this conversation.