Turn webpages into LLM-ready data at scale with a simple API call

Python Pyppeteer ScraperAPI Integration

In this guide, I will show you how you can easily use ScraperAPI with Python’s Pyppeteer library for headless browser automation. I will walk you through exactly how to integrate ScraperAPI and avoid common issues so you can get scraping as quickly as possible.

To correctly use Pyppeteer with ScraperAPI, you should configure the browser to use our proxy servers directly, like you would any other proxy.

First, install Pyppeteer:

pip install pyppeteer

Here’s how to configure your Pyppeteer browser to route all requests through ScraperAPI:

import asyncio
from pyppeteer import launch

API_KEY = 'YOUR_API_KEY'

async def main():
   browser = await launch({
       'args': [
           f'--proxy-server=http://proxy-server.scraperapi.com:8001'
       ]
   })
  
   page = await browser.newPage()
  
   await page.authenticate({
       'username': 'scraperapi',
       'password': API_KEY
   })
  
   await page.goto('http://quotes.toscrape.com/')
  
   quotes = await page.evaluate('''() => {
       return Array.from(document.querySelectorAll('.quote')).map(quote => ({
           text: quote.querySelector('.text').innerText,
           author: quote.querySelector('.author').innerText
       }));
   }''')
  
   print(quotes)
   await browser.close()

asyncio.run(main())

And that’s it. Whenever Pyppeteer makes a request, it will send the request through our proxy servers, and you can use Pyppeteer as you would normally do.

Here is an example result:

[
  {
    "text": "“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”",
    "author": "Albert Einstein"
  },
  {
    "text": "“It is our choices, Harry, that show what we truly are, far more than our abilities.”",
    "author": "J.K. Rowling"
  },
  {
    "text": "“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”",
    "author": "Albert Einstein"
  },
  {
    "text": "“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”",
    "author": "Jane Austen"
  },
  {
    "text": "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”",
    "author": "Marilyn Monroe"
  },
  {
    "text": "“Try not to become a man of success. Rather become a man of value.”",
    "author": "Albert Einstein"
  },
  {
    "text": "“It is better to be hated for what you are than to be loved for what you are not.”",
    "author": "André Gide"
  },
  {
    "text": "“I have not failed. I've just found 10,000 ways that won't work.”",
    "author": "Thomas A. Edison"
  },
  {
    "text": "“A woman is like a tea bag; you never know how strong it is until it's in hot water.”",
    "author": "Eleanor Roosevelt"
  },
  {
    "text": "“A day without sunshine is like, you know, night.”",
    "author": "Steve Martin"
  }
]

Alternative Method: Using The SDK

You can also use our Python SDK for simpler integration:

First, install ScraperAPI SDK

pip install scraperapi-sdk

Example

import asyncio
from pyppeteer import launch
from scraperapi_sdk import ScraperAPIClient

API_KEY = 'YOUR_API_KEY'
client = ScraperAPIClient(API_KEY)

async def main():
   browser = await launch(headless=True)
   page = await browser.newPage()
  
   html = client.get('http://quotes.toscrape.com/')
   await page.setContent(html)
  
   quotes = await page.evaluate('''() => {
       return Array.from(document.querySelectorAll('.quote')).map(quote => ({
           text: quote.querySelector('.text').innerText,
           author: quote.querySelector('.author').innerText
       }));
   }''')
  
   print(quotes)
   await browser.close()

asyncio.run(main())

Using Additional ScraperAPI Functionality

ScraperAPI lets you customize the API’s functionality by adding additional parameters to your requests. For example, you can use premium proxies by setting custom headers:

await page.setExtraHTTPHeaders({
   'X-ScraperAPI-Premium': 'true',
   'X-ScraperAPI-Country': 'us',
   'X-ScraperAPI-Session': '123'
})

The API will accept the following parameters via headers:

ParameterHeaderDescription
premiumX-ScraperAPI-PremiumActivate premium residential and mobile IPs by setting it to `true`
country_codeX-ScraperAPI-CountryActivate country geotargeting by setting it to `us`
session_numberX-ScraperAPI-SessionReuse the same proxy by setting it to `123`

Configuring Concurrency and Retries

To get the most out of your ScraperAPI plan, configure your concurrent browser limit based on your plan’s thread allowance:

import asyncio
from pyppeteer import launch

API_KEY = 'YOUR_API_KEY'
CONCURRENT_BROWSERS = 5  # Free Plan has 5 concurrent threads

async def scrape_page(url):
   browser = await launch({
       'args': [f'--proxy-server=http://proxy-server.scraperapi.com:8001']
   })
  
   page = await browser.newPage()
   await page.authenticate({
       'username': 'scraperapi',
       'password': API_KEY
   })
  
   await page.goto(url)
  
   quotes = await page.evaluate('''() => {
       return Array.from(document.querySelectorAll('.quote')).map(quote => ({
           text: quote.querySelector('.text').innerText,
           author: quote.querySelector('.author').innerText
       }));
   }''')
  
   await browser.close()
   return {'url': url, 'quotes': quotes}

async def scrape_multiple_pages(urls):
   semaphore = asyncio.Semaphore(CONCURRENT_BROWSERS)
  
   async def scrape_with_semaphore(url):
       async with semaphore:
           return await scrape_page(url)
  
   tasks = [scrape_with_semaphore(url) for url in urls]
   results = await asyncio.gather(*tasks)
   print(results)

# Usage
urls = [
   'http://quotes.toscrape.com/page/1/',
   'http://quotes.toscrape.com/page/2/',
   'http://quotes.toscrape.com/page/3/',
]

asyncio.get_event_loop().run_until_complete(scrape_multiple_pages(urls))

Retry Failed Requests

For most sites, over 97% of your requests will be successful on the first try. However, some requests may fail. Implement retry logic to handle these cases:

import asyncio
from pyppeteer import launch

API_KEY = 'YOUR_API_KEY'
MAX_RETRIES = 3

async def scrape_with_retry(url, retries=MAX_RETRIES):
   for i in range(retries):
       try:
           browser = await launch({
               'args': [f'--proxy-server=http://proxy-server.scraperapi.com:8001']
           })
          
           page = await browser.newPage()
           await page.authenticate({
               'username': 'scraperapi',
               'password': API_KEY
           })
          
           await page.goto(url, {'timeout': 60000})  # 60 second timeout
          
           quotes = await page.evaluate('''() => {
               return Array.from(document.querySelectorAll('.quote')).map(quote => ({
                   text: quote.querySelector('.text').innerText,
                   author: quote.querySelector('.author').innerText
               }));
           }''')
          
           await browser.close()
           return quotes
          
       except Exception as e:
           print(f"Attempt {i + 1} failed: {e}")
           if i == retries - 1:
               raise e

# Usage
async def main():
   quotes = await scrape_with_retry('http://quotes.toscrape.com/')
   print(quotes)

asyncio.run(main())

Final Notes

  • Pyppeteer + ScraperAPI = scalable, JS-capable scraping with proxy rotation
  • Method 1 (proxy mode) is best for interactive/page-script scraping
  • Method 2 (SDK) is great for HTML snapshots + fast DOM parsing
  • Configure concurrency and retry based on your plan
  • Works with JavaScript-heavy sites thanks to Pyppeteer

More Resources

ScraperAPI Docs  

Pyppeteer Docs

How to Use Pyppeteer in Python for Web Scraping

Ready to start scraping?

Get started with 5,000 free API credits or contact sales

No credit card required