Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

Gabotronix's avatar

Issue scraping html table with Goutte

Hi everybody, I'm trying to scrape an html table of cities by population, with Goutte in laravel, I want to return the html table as php array and then turn it into json and save it to disk.

For some reason when I crawl the table I get an array full of null values, this is my code:

public function crawlAustraliaHtmlTable(Request $request)
    {
        $html='';
        $client = new Client();
        $url = 'http://www.geoba.se/population.php?cc=AU&st=city_rank_country&asde=&page=1';
        $crawler = $client->request('GET', $url);
        //$crawler->addHTMLContent($html);
        
        $table = $crawler->filter('table')->filter('tr')->each(function ($tr, $i) {
            return $tr->filter('td')->each(function ($td, $i) {
                $td->filter('a')->each(function ($a, $i) {
                    return $a->attr('href');
                });
            });
        });
        
        //print_r($table);

        $json = json_encode($table);

        $filename = 'cities_in_australia.json';

        File::put(public_path('/uploads/'.$filename),$json);

        return response()->json([
            'json' => $json,
        ]);
    }

The result (notice all the nulls for some reason).

[[null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],...]]

The html table structure is like this:

<table border=0 cellpadding=3 cellspacing=3 class="table table-condensed table-noline">

<tr style="font-size: 16px;">

<th class="bottom" valign=top width=50 align=left NOWRAP><b><a class=redglow style="color:#0000FF;" href="population.php?cc=AU&st=crcountry&asde=d&page=1" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Sort By Rank'); return false;">Rank</a></b></td>
<th class="bottom" valign=top width=200 align=left NOWRAP><b><a class=redglow style="color:#0000FF;" href="population.php?cc=AU&st=city&asde=d&page=1" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Sort By City'); return false;">City</a></b></td><th class="bottom" valign=top width=125 align=left><b><a class=redglow style="color:#0000FF;" href="population.php?cc=AU&st=state&asde=d&page=1" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Sort By State'); return false;">State</a></b></td><th class="bottom" valign=top width=100 align=left><b>Country</b></td><th class="bottom" valign=top width=75 align=right NOWRAP><b><a class=redglow style="color:#0000FF;" href="population.php?cc=AU&st=pop&asde=d&page=1" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Sort By Population'); return false;">Population</a></b></td>
<td></td>
</tr>

	<tr style="font-size:13px;" class="bb">
	<td valign=top><a name="1"></a>1.</td>
	<td valign=top><a class=redglow style="color:#0000FF;" href="/location.php?query=2158177&geoid=Y" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Melbourne'); return false;">Melbourne</a></td>
	<td valign=top width=150><a class=redglow style="color:#0000FF;" href="population.php?sc=Victoria&state=Victoria" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Victoria'); return false;">Victoria</a></td><td valign=top><a class=redglow style="color:#0000FF;" href="country.php?cc=AU&year=2020" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Australia'); return false;">Australia</a></td>
	<td valign=top align=right>3,730,206</td>
	
	<tr style="font-size:13px;" class="bb">
0 likes
2 replies
automica's avatar

@gabotronix you are missing a return

should be:

    $table = $crawler->filter('table')->filter('tr')->each(function ($tr, $i) {
        return $tr->filter('td')->each(function ($td, $i) {
            return $td->filter('a')->each(function ($a, $i) {
                return $a->attr('href');
            });
        });
    });

rather than:

        $table = $crawler->filter('table')->filter('tr')->each(function ($tr, $i) {
            return $tr->filter('td')->each(function ($td, $i) {
                $td->filter('a')->each(function ($a, $i) {
                    return $a->attr('href');
                });
            });
        });

Please or to participate in this conversation.