Use ScraperAPI with Cypress to scrape JavaScript-heavy sites and run end-to-end tests. It’s perfect for dynamic pages that regular scraping tools can’t handle.
Getting started
This basic Cypress test works fine for static sites, but it breaks on pages that load content with JavaScript:
// Basic Cypress Test Without ScraperAPI
describe('Plain Cypress scraping', () => {
it('visits a page', () => {
cy.visit('https://example.com')
cy.get('h1').should('contain.text', 'Example Domain')
})
})
To scrape JavaScript-heavy pages, use ScraperAPI with cy.request()
and DOM parsing instead.
Recommended Method: Custom Command + DOM Injection + ScraperAPI
ScraperAPI handles rendering, proxies, CAPTCHAs, and retries for you. Cypress fetches the HTML, injects it into a DOM node, and lets you query it easily.
Requirements
- Cypress for running scraping tests
npm
, the package manager to install Cypress and dependenciesnodejs/node
to run Cypress and npmcypress-dotenv
to keep your credentials secure- ScraperAPI and the given API key for scraping
Step 1: Set Up Your Node.js Project
Begin by moving to your project folder and installing Node.js and npm
.
# For Ubuntu
sudo apt update
sudo apt install nodejs npm
# For macOS (includes npm)
brew install node
# For Windows
# Download and install Node.js (which includes npm) from the official website (https://nodejs.org/en/download/) and follow the installer steps.
Initialize your Node.js project and download Cypress by running:
npm init -y
npm install cypress --save-dev
Step 2: Add a Custom Command
First off, generate a Cypress folder structure by running this in your terminal from the root of your project:
npx cypress open
If it’s the first time you do this, Cypress will create its default folder structure.
Now you can navigate to cypress/support/commands.js
and create a reusable Cypress command that integrates with ScraperAPI to fetch and parse HTML from JavaScript-heavy websites.
// cypress/support/commands.js
Cypress.Commands.add('scrapeViaScraperAPI', (targetUrl) => {
const scraperUrl = `http://api.scraperapi.com?api_key=${Cypress.env('SCRAPER_API_KEY')}&url=${encodeURIComponent(targetUrl)}&timeout=60000`;
return cy.request(scraperUrl).then((response) => {
return cy.document().then((document) => {
const container = document.createElement('div');
container.innerHTML = response.body;
const titles = Array.from(container.querySelectorAll('.product_pod h3 a')).map(el =>
el.getAttribute('title')
);
return titles;
});
});
});
Use an environment variable setup to store your ScraperAPI Key. You can get your API key here.
Install cypress-dotenv
, then create a .env
file in your project root:
npm install -D cypress-dotenv
touch .env
nano .env
# .env
SCRAPER_API_KEY=your_scraper_api_key
Update your cypress.config.js
as follows:
// cypress.config.js
const { defineConfig } = require("cypress");
require('dotenv').config();
module.exports = defineConfig({
e2e: {
setupNodeEvents(on, config) {
config.env.SCRAPER_API_KEY = process.env.SCRAPER_API_KEY;
return config;
},
supportFile: "cypress/support/commands.js"
}
});
Step 3: Use the Command in Your Test
In your cypress/
folder, create a new folder e2e
and a file scraperapi.cy.js
:
mkdir e2e
touch e2e/scraperapi.cy.js
In the file, paste the custom command in a Cypress test that displays the scraped data inside a browser DOM.
// cypress/e2e/scraperapi.cy.js
describe('Scrape Books to Scrape with ScraperAPI + Cypress', () => {
it('gets product titles and displays them', () => {
cy.visit('cypress/fixtures/blank.html'); // Load static HTML file
cy.scrapeViaScraperAPI('http://books.toscrape.com/catalogue/page-1.html').then((titles) => {
cy.document().then((doc) => {
const container = doc.getElementById('results');
const list = doc.createElement('ul');
titles.forEach(title => {
const item = doc.createElement('li');
item.innerText = title;
list.appendChild(item);
});
container.appendChild(list);
});
cy.screenshot('scraped-book-titles'); // Take screenshot after injecting
});
});
});
Step 4: Create the blank.html file and run your Cypress test
In your project folder, create the folder cypress/fixtures
if it doesn’t exist yet:
mkdir -p cypress/fixtures
Inside, create the blank.html with the following minimal code (or similar!):
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>Blank Page</title>
</head>
<body>
<div id="results"></div>
</body>
</html>
You can now run your tests from the project root folder (the one where your package.json lives).
npx cypress run
This method works because:
- ScraperAPI handles the proxying and geo-routing
- Cypress injects the content into the browser DOM
- You get full control using native DOM APIs
Alternative: cy.request without ScraperAPI
You can call cy.request()
directly, but it won’t render JS or rotate IPs:
describe('Simple cy.request test', () => {
it('should load example.com and check the response', () => {
cy.request('https://example.com').then((response) => {
expect(response.status).to.eq(200);
expect(response.body).to.include('Example Domain');
});
});
});
This method is not ideal because:
- It exposes your IP to bot protection.
- It doesn’t bypass CAPTCHAs or rotate proxies.
- It fails on sites that require geolocation or JavaScript rendering.
Prefer ScraperAPI for anything beyond basic scraping.
ScraperAPI Parameters That Matter
ScraperAPI supports options via query parameters:
const scraperUrl = `http://api.scraperapi.com?api_key=YOUR_KEY&url=https://target.com&render=true&country_code=us&session_number=555`
Parameter | What It Does | When to Use It |
---|---|---|
render=true |
Tells ScraperAPI to load JavaScript | Use this for dynamic pages or SPAs |
country_code=us |
Uses a U.S. IP address | Great for geo-blocked content |
premium=true |
Solves CAPTCHAs and retries failed requests | Needed for hard-to-scrape sites |
session_number=555 |
Keeps the same proxy IP across multiple requests | Use it when you need to maintain a session |
These three are all you need in most cases. For more, check the ScraperAPI docs.
Test Retries
Improve stability with test retries:
// cypress.config.js
export default {
e2e: {
retries: {
runMode: 2,
openMode: 0,
},
},
}
This helps when pages load slowly or throw rate errors.
Visualize the Scraped Data in the DOM
To see the data you’re scraping, run your test using:
npx cypress open
Then select scraperapi.cy.js
in the Cypress UI. You should get these results:
- The static HTML page load (
Ready for Scraped Data
) - Scraped book titles dynamically injected into the DOM
- A screenshot saved as
scraped-book-titles.png