Integrating ScraperAPI With NodeJS Puppeteer

May, 2021

In this guide, we’ll see how you can easily use ScraperAPI with a NodeJS Puppeteer scraper. We will walk you through exactly how to integrate ScraperAPI and the common mistakes users make so you can get scraping as quickly as possible.

Full code examples can be found on GitHub here.

Not Recommended Method: Using The API Endpoint

A common issue we see users run into when trying to integrate ScraperAPI into a Puppeteer scraper is that they try to send their requests to our API endpoint. This isn’t the best integration method as you will run into two issues:

Issue #1 – Because you are sending a GET request to the ScraperAPI endpoint, if there are any other assets/endpoints that need to be requested by Puppeteer on the page, Puppeteer will look for these on the ScraperAPI endpoint, not the website’s server.

For example, when making a request to quotes.toscrape.com, it needs to make a request to the relative URL /static/main.css. Which in normal circumstances, would send the request to http://quotes.toscrape.com/static/main.css but when using the API endpoint it would try to make a request to http://api.scraperapi.com/static/main.css which will return an error. As a result, Puppeteer will only get access to the first HTML response (which kinda defeats the purpose of using a headless browser).

Issue #2 – Similarly to the first issue, if you want to navigate or emulate real user behaviour using Puppeteer you will often have to click on relative links which won’t work when sending requests via the API endpoint for the same reasons as above.

As a result, we highly recommend that you don’t use the API endpoint if you want to integrate a NodeJS Puppeteer scraper with the API.

Recommended Method: Use Proxy Port

To correctly use Puppeteer with ScraperAPI you should use our proxy mode, like you would any other proxy. The ScraperAPI proxy port details are as follows:


// ScraperAPI proxy configuration
PROXY_USERNAME = 'scraperapi';
PROXY_PASSWORD = 'API_KEY'; // <-- enter your API_Key here
PROXY_SERVER = 'proxy-server.scraperapi.com';
PROXY_SERVER_PORT = '8001';

First, you need to set the –proxy-server to ScraperAPIs proxy port in the browser options when launching the browser. You should also set the browser to ignore HTTPS errors.

Then when a new page has been created, you need to authenticate the proxy with your username and password.

Once this is complete then you can use your Puppeteer scraper as you would normally. Whenever Puppeteer makes a request it will now send the request via our proxy port so your IP address will be hidden from the website you are scraping.

Note: Sometimes some conflicts do occur when using Puppeteer with our proxy port. If you experience any issues like certain resources/assets not being loaded then contact our support team and we can take a look into it for you.

Here is a full code example of a NodeJS Puppeteer scraper that scrapes quotes.toscrape.com using ScraperAPI as the proxy.

Ready to start scraping?

Get started with 5,000 free API credits or contact sales

Get Started For Free

Ready to start scraping?

Get started with 5,000 free API credits or contact sales

No credit card required

Async Scraper Service

Structured Data

DataPipeline

Scraping API

Large-Scale Data Acquisition

Ecommerce

Market Research Firms

SEO Agencies

Travel Agencies and Hotels

VCs and Hedge Funds

AI and ML

SERP Data Collection

Ecommerce Data Collection

Market Research Scraper

Real Estate Data Collection

cURL

Python

NodeJS

PHP

Ruby

Java

DataPipeline

Developer Guides

Free Downloads

Product FAQs

Case Studies

Webinars

Comparisons

Learning Hub

Glossary

Blog

Async Scraper Service

Structured Data

DataPipeline

Scraping API

Large-Scale Data Acquisition

Ecommerce

Market Research Firms

SEO Agencies

Travel Agencies and Hotels

VCs and Hedge Funds

AI and ML

SERP Data Collection

Ecommerce Data Collection

Market Research Scraper

Real Estate Data Collection

cURL

Python

NodeJS

PHP

Ruby

Java

DataPipeline

Developer Guides

Free Downloads

Product FAQs

Case Stuides

Webinars

Comparisons

Learning Hub

Glossary

Blog

Integrating ScraperAPI With NodeJS Puppeteer

Not Recommended Method: Using The API Endpoint

Recommended Method: Use Proxy Port

Table of Contents

Ready to start scraping?

Ready to start scraping?