puppeteer.js inside loop performance question, reusing browser instances?
Hi everybody, I'm making a scraping app, I have an array of urls from a single domain with headless puppeteer.js.
For each url I have I launch a browser, check if I'm logged, if not, log and write cookies for next iteration of the loop (this is soI don't have to login each time).
After that I do some classic scraping, close the page and close the browser, back to the next iteration of the loop and the same again.
I was wondering if it would be possible to increase the performance and overall quality of my operations, I think opening and closing chrome instance in loop is pretty CPU intensive, I don't want memory leaks either, can I alter my code to reuse browser or page intances for my script.
async scrapeForResults()
{
let urls = [ ... ]:
for(var i = 0; i < urls.length; i++)
{
let response = await this.scrapWebForData(urls[i]);
console.log('Contact info scraped - - - - - - - -');
}
}
puppeteer.js action, I do this for each url of my array, all urls are from the same domain:
async scrapWebForData(url)
{
let browser = await puppeteer.launch({ headless: true});
const context = browser.defaultBrowserContext();
context.overridePermissions("https://www.facebook.com", []);
let page = await browser.newPage();
await page.setDefaultNavigationTimeout(100000);
await page.setViewport({ width: 1365, height: 623 });
if (!Object.keys(cookies).length)
{
this.handleLoginCookies(page);
let currentCookies = await page.cookies();
fs.writeFileSync('./cookies.json', JSON.stringify(currentCookies));
}
else
{
await page.setCookie(...cookies);
await page.goto(facebook, { waitUntil: "networkidle2" });
//DO ACTION HERE
}
await page.close();
await browser.close();
}
Please or to participate in this conversation.