4 weeks ago

puppeteer.js inside loop performance question, reusing browser instances?

Posted 4 weeks ago by Gabotronix

Hi everybody, I'm making a scraping app, I have an array of urls from a single domain with headless puppeteer.js.

For each url I have I launch a browser, check if I'm logged, if not, log and write cookies for next iteration of the loop (this is soI don't have to login each time).

After that I do some classic scraping, close the page and close the browser, back to the next iteration of the loop and the same again.

I was wondering if it would be possible to increase the performance and overall quality of my operations, I think opening and closing chrome instance in loop is pretty CPU intensive, I don't want memory leaks either, can I alter my code to reuse browser or page intances for my script.

async scrapeForResults()
        let urls = [ ... ]:

        for(var i = 0; i < urls.length; i++)
            let response = await this.scrapWebForData(urls[i]);
            console.log('Contact info scraped - - - - - - - -');

puppeteer.js action, I do this for each url of my array, all urls are from the same domain:

async scrapWebForData(url)
        let browser = await puppeteer.launch({ headless: true});

        const context = browser.defaultBrowserContext();
        context.overridePermissions("", []);
        let page = await browser.newPage();
        await page.setDefaultNavigationTimeout(100000);
        await page.setViewport({ width: 1365, height: 623 });
        if (!Object.keys(cookies).length)
            let currentCookies = await page.cookies();
            fs.writeFileSync('./cookies.json', JSON.stringify(currentCookies));
            await page.setCookie(...cookies);
            await page.goto(facebook, { waitUntil: "networkidle2" });
            //DO ACTION HERE
        await page.close();
        await browser.close(); 

Please sign in or create an account to participate in this conversation.