Python Pyppeteer ScraperAPI Integration

June, 2025

In this guide, I will show you how you can easily use ScraperAPI with Python’s Pyppeteer library for headless browser automation. I will walk you through exactly how to integrate ScraperAPI and avoid common issues so you can get scraping as quickly as possible.

Recommended Method: Use Proxy Configuration

To correctly use Pyppeteer with ScraperAPI, you should configure the browser to use our proxy servers directly, like you would any other proxy.

First, install Pyppeteer:

pip install pyppeteer

Here’s how to configure your Pyppeteer browser to route all requests through ScraperAPI:

import asyncio
from pyppeteer import launch

API_KEY = 'YOUR_API_KEY'

async def main():
   browser = await launch({
       'args': [
           f'--proxy-server=http://proxy-server.scraperapi.com:8001'
       ]
   })
  
   page = await browser.newPage()
  
   await page.authenticate({
       'username': 'scraperapi',
       'password': API_KEY
   })
  
   await page.goto('http://quotes.toscrape.com/')
  
   quotes = await page.evaluate('''() => {
       return Array.from(document.querySelectorAll('.quote')).map(quote => ({
           text: quote.querySelector('.text').innerText,
           author: quote.querySelector('.author').innerText
       }));
   }''')
  
   print(quotes)
   await browser.close()

asyncio.run(main())

And that’s it. Whenever Pyppeteer makes a request, it will send the request through our proxy servers, and you can use Pyppeteer as you would normally do.

Here is an example result:

[
  {
    "text": "“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”",
    "author": "Albert Einstein"
  },
  {
    "text": "“It is our choices, Harry, that show what we truly are, far more than our abilities.”",
    "author": "J.K. Rowling"
  },
  {
    "text": "“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”",
    "author": "Albert Einstein"
  },
  {
    "text": "“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”",
    "author": "Jane Austen"
  },
  {
    "text": "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”",
    "author": "Marilyn Monroe"
  },
  {
    "text": "“Try not to become a man of success. Rather become a man of value.”",
    "author": "Albert Einstein"
  },
  {
    "text": "“It is better to be hated for what you are than to be loved for what you are not.”",
    "author": "André Gide"
  },
  {
    "text": "“I have not failed. I've just found 10,000 ways that won't work.”",
    "author": "Thomas A. Edison"
  },
  {
    "text": "“A woman is like a tea bag; you never know how strong it is until it's in hot water.”",
    "author": "Eleanor Roosevelt"
  },
  {
    "text": "“A day without sunshine is like, you know, night.”",
    "author": "Steve Martin"
  }
]

Alternative Method: Using The SDK

You can also use our Python SDK for simpler integration:

First, install ScraperAPI SDK

pip install scraperapi-sdk

Example

import asyncio
from pyppeteer import launch
from scraperapi_sdk import ScraperAPIClient

API_KEY = 'YOUR_API_KEY'
client = ScraperAPIClient(API_KEY)

async def main():
   browser = await launch(headless=True)
   page = await browser.newPage()
  
   html = client.get('http://quotes.toscrape.com/')
   await page.setContent(html)
  
   quotes = await page.evaluate('''() => {
       return Array.from(document.querySelectorAll('.quote')).map(quote => ({
           text: quote.querySelector('.text').innerText,
           author: quote.querySelector('.author').innerText
       }));
   }''')
  
   print(quotes)
   await browser.close()

asyncio.run(main())

Using Additional ScraperAPI Functionality

ScraperAPI lets you customize the API’s functionality by adding additional parameters to your requests. For example, you can use premium proxies by setting custom headers:

await page.setExtraHTTPHeaders({
   'X-ScraperAPI-Premium': 'true',
   'X-ScraperAPI-Country': 'us',
   'X-ScraperAPI-Session': '123'
})

The API will accept the following parameters via headers:

Parameter	Header	Description
premium	X-ScraperAPI-Premium	Activate premium residential and mobile IPs by setting it to `true`
country_code	X-ScraperAPI-Country	Activate country geotargeting by setting it to `us`
session_number	X-ScraperAPI-Session	Reuse the same proxy by setting it to `123`

Configuring Concurrency and Retries

To get the most out of your ScraperAPI plan, configure your concurrent browser limit based on your plan’s thread allowance:

import asyncio
from pyppeteer import launch

API_KEY = 'YOUR_API_KEY'
CONCURRENT_BROWSERS = 5  # Free Plan has 5 concurrent threads

async def scrape_page(url):
   browser = await launch({
       'args': [f'--proxy-server=http://proxy-server.scraperapi.com:8001']
   })
  
   page = await browser.newPage()
   await page.authenticate({
       'username': 'scraperapi',
       'password': API_KEY
   })
  
   await page.goto(url)
  
   quotes = await page.evaluate('''() => {
       return Array.from(document.querySelectorAll('.quote')).map(quote => ({
           text: quote.querySelector('.text').innerText,
           author: quote.querySelector('.author').innerText
       }));
   }''')
  
   await browser.close()
   return {'url': url, 'quotes': quotes}

async def scrape_multiple_pages(urls):
   semaphore = asyncio.Semaphore(CONCURRENT_BROWSERS)
  
   async def scrape_with_semaphore(url):
       async with semaphore:
           return await scrape_page(url)
  
   tasks = [scrape_with_semaphore(url) for url in urls]
   results = await asyncio.gather(*tasks)
   print(results)

# Usage
urls = [
   'http://quotes.toscrape.com/page/1/',
   'http://quotes.toscrape.com/page/2/',
   'http://quotes.toscrape.com/page/3/',
]

asyncio.get_event_loop().run_until_complete(scrape_multiple_pages(urls))

Retry Failed Requests

For most sites, over 97% of your requests will be successful on the first try. However, some requests may fail. Implement retry logic to handle these cases:

import asyncio
from pyppeteer import launch

API_KEY = 'YOUR_API_KEY'
MAX_RETRIES = 3

async def scrape_with_retry(url, retries=MAX_RETRIES):
   for i in range(retries):
       try:
           browser = await launch({
               'args': [f'--proxy-server=http://proxy-server.scraperapi.com:8001']
           })
          
           page = await browser.newPage()
           await page.authenticate({
               'username': 'scraperapi',
               'password': API_KEY
           })
          
           await page.goto(url, {'timeout': 60000})  # 60 second timeout
          
           quotes = await page.evaluate('''() => {
               return Array.from(document.querySelectorAll('.quote')).map(quote => ({
                   text: quote.querySelector('.text').innerText,
                   author: quote.querySelector('.author').innerText
               }));
           }''')
          
           await browser.close()
           return quotes
          
       except Exception as e:
           print(f"Attempt {i + 1} failed: {e}")
           if i == retries - 1:
               raise e

# Usage
async def main():
   quotes = await scrape_with_retry('http://quotes.toscrape.com/')
   print(quotes)

asyncio.run(main())