You are running your scrape too often. there is probably a limit on how many you can do per minut. So you need to put some sleep time between the calls.
Jan 12, 2021
2
Level 19
CURL always get 429 Too Many Requests
Hi All,
Anyone can help me. i was scrap any site (https://www.realestate.com.au/property/28-athol-ct-rye-vic-3941). what i do is use guzzle with this code.
$jar = new \GuzzleHttp\Cookie\CookieJar();
$res = $client->request('GET', $propertyPageLink, [
'verify' => false,
'cookies' => $jar,
'debug' => false,
'headers' => [
'Host' => 'www.realestate.com.au',
'Pragma' => 'no-cache',
'Upgrade-Insecure-Requests' => '1',
'User-Agent' => 'your bot 0.1',
'Connection' => 'keep-alive',
'Cache-Control' => 'no-cache',
'Accept-Language' => 'en-US,en;q=0.5',
'Accept-Encoding' => 'gzip, deflate, br',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
],
]);
$pageRes = (string)$res->getBody();
But always get response 429 Too Many Request.
Then i try basic call CURL in my terminal. The response still same 429 Status Code. The result like this.
λ curl -v https://www.realestate.com.au/property/28-athol-ct-rye-vic-3941
* Trying 23.15.154.230...
* TCP_NODELAY set
* Connected to www.realestate.com.au (23.15.154.230) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: C:\laragon\bin\laragon\utils\curl-ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: C=AU; ST=Victoria; L=Richmond; O=REALESTATE.COM.AU PTY LIMITED; CN=www.realestate.com.au
* start date: Dec 13 00:00:00 2020 GMT
* expire date: Dec 16 23:59:59 2021 GMT
* subjectAltName: host "www.realestate.com.au" matched cert's "www.realestate.com.au"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x1c0f070)
> GET /property/28-athol-ct-rye-vic-3941 HTTP/2
> Host: www.realestate.com.au
> User-Agent: curl/7.63.0
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 429
< cache-control: no-cache, no-store, must-revalidate
< expires: 0
< pragma: no-cache
< content-length: 0
< date: Tue, 12 Jan 2021 08:46:02 GMT
< set-cookie: ak_bmscz=20W5Ha7wV03bFXKJ6bkSFw%3D%3D%3A%3AFZ0hcopjXa6IlnJoX3vC42w39dpn9YB0eo6%2BiMKpIeI7L7Ohxsk06Uws1AGd755BPaIY1n0UlZ5U5a8vEBPYYZTB8rIRcSezzr3ONpT1Jy4gsJLx69HoxaLgNl7D%2B5Z3vulvxROEnf1gftSLg27H72x2HA4th7mCS6TMNHQe%2B5dcD4A7Xp%2FqdFzHqL2CXEDUxJlS9timUbgGNefKNxlf5cXWIc8dCYexxDBR2hp6h%2FxyLNZXHfmCxxh4gfS3Plk8znMLNbYkde7eCjPbKlbB%2F7XerGNUIev5e5MZWwabGDx5uBQ7Thfw%2FW%2FIGzIRfZtqssBIrq8WivdT9JJrPK41vcvhPD%2Fk%2BCvCKYiHWDtWi2U%2BOBcAkd4xLePSslUZaanjJt8O%2BeltoluBDP2rs2OgI6%2BMJEcfXDCGze7c%2Fma4kOU%3D; Path=/; Expires=Wed, 13 Jan 2021 08:45:02 GMT; HttpOnly
< set-cookie: reauid=92e83217bb4d0000ca61fd5f2701000022930100; expires=Mon, 31-Dec-2038 23:59:59 GMT; path=/; domain=.realestate.com.au
< set-cookie: Country=SG; path=/; domain=.realestate.com.au
< content-security-policy-report-only: default-src https: data: ; script-src https: data: 'unsafe-inline' 'unsafe-eval' ; style-src https: data: 'unsafe-inline' 'unsafe-eval' ; object-src 'none' ; plugin-types 'none' ; report-uri /csp-report/http-v1/a0de66688d0f66fc8ddfe327c2248a30
< content-security-policy: upgrade-insecure-requests;
<
* Connection #0 to host www.realestate.com.au left intact
Try to find solution but still not sure how. Anyone can give me clue how to get html so i can scrap any data.
Thanks :)
Please or to participate in this conversation.