How to Use ScraperAPI with HtmlUnit in Java

July, 2025

ScraperAPI is a powerful scraping tool that handles proxies, browsers, and CAPTCHAs automatically. In this guide, you’ll learn how to integrate ScraperAPI with HtmlUnit, a fast and lightweight headless browser for Java.

Getting Started

Before we integrate ScraperAPI, here’s a basic HtmlUnit scraping example:

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class BasicHtmlUnit {
   public static void main(String[] args) throws Exception {
       WebClient client = new WebClient(BrowserVersion.CHROME);
       client.getOptions().setCssEnabled(false);
       client.getOptions().setJavaScriptEnabled(false);

       HtmlPage page = client.getPage("https://httpbin.org/ip");
       System.out.println(page.asNormalizedText());
       client.close();
   }
}

This works for basic scraping but does not solve problems like IP bans or captchas.

Integration Methods

Recommended: API Endpoint Method

The best way to use ScraperAPI with HtmlUnit is to call the API endpoint directly and pass the target URL as a query parameter. This ensures your request routes through ScraperAPI’s proxy network with built-in CAPTCHA handling.

Required Setup

1. Install Java (if not already installed)

# Ubuntu
sudo apt-get update
sudo apt-get install default-jdk 

# MacOS
brew install openjdk@2

Then add to your shell config (e.g. .zshrc or .bash_profile):

export JAVA_HOME="/Library/Java/JavaVirtualMachines/temurin-21.jdk/Contents/Home"
export PATH="$JAVA_HOME/bin:$PATH"

echo 'export JAVA_HOME="/Library/Java/JavaVirtualMachines/temurin-21.jdk/Contents/Home"' >> ~/.bash_profile
echo 'export PATH="$JAVA_HOME/bin:$PATH"' >> ~/.bash_profile

Reload your shell:

source ~/.zshrc
# or
source ~/.bash_profile

Confirm Java is installed:

java -version

2. Install Maven

# Ubuntu
sudo apt update
sudo apt install maven

# MacOS
brew install maven

Check:

mvn -v

3. Set Up Project Structure

Create a folder and initialize the Maven project:

mkdir htmlunit-scraperapi && cd htmlunit-scraperapi

Inside, create the structure:

src/
 main/
   java/
     MarketPrice.java

4. Add Dependencies in pom.xml

At the root of your project folder, create a file pom.xml and paste the following:

<project xmlns="http://maven.apache.org/POM/4.0.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
        http://maven.apache.org/xsd/maven-4.0.0.xsd">
  
   <modelVersion>4.0.0</modelVersion>
   <groupId>com.scraperapi</groupId>
   <artifactId>htmlunit-scraperapi</artifactId>
   <version>1.0-SNAPSHOT</version>

   <dependencies>
       <!-- HtmlUnit for headless browser -->
       <dependency>
           <groupId>net.sourceforge.htmlunit</groupId>
           <artifactId>htmlunit</artifactId>
           <version>2.70.0</version>
       </dependency>

       <!-- Java dotenv to read .env variables -->
       <dependency>
           <groupId>io.github.cdimascio</groupId>
           <artifactId>java-dotenv</artifactId>
           <version>5.2.2</version>
       </dependency>
   </dependencies>

   <build>
       <plugins>
           <!-- Plugin to run Java classes with main method -->
           <plugin>
               <groupId>org.codehaus.mojo</groupId>
               <artifactId>exec-maven-plugin</artifactId>
               <version>3.1.0</version>
               <configuration>
                   <mainClass>MarketPrice</mainClass>
               </configuration>
           </plugin>
       </plugins>
   </build>
</project>

5. Add .env File in Root

In the same folder, create a .env file:

SCRAPERAPI_KEY=your_api_key_here

You can get your ScraperAPI key here.

Full Working Code

Paste this inside MarketPrice.java:

import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.DomNode;
import com.gargoylesoftware.htmlunit.html.DomNodeList;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import io.github.cdimascio.dotenv.Dotenv;

import java.io.IOException;

public class MarketPrice {
   public static void main(String[] args) throws IOException {
       // Load ScraperAPI key from .env
       Dotenv dotenv = Dotenv.load();
       String apiKey = dotenv.get("SCRAPERAPI_KEY");

       if (apiKey == null || apiKey.isEmpty()) {
           System.err.println("SCRAPERAPI_KEY is missing in your .env file.");
           return;
       }

       // Target a real HTML site
       String targetUrl = "https://quotes.toscrape.com";
       String scraperApiUrl = String.format("http://api.scraperapi.com?api_key=%s&url=%s",
               apiKey, targetUrl);

       // Initialize headless browser
       WebClient webClient = new WebClient(BrowserVersion.CHROME);
       webClient.getOptions().setUseInsecureSSL(true);
       webClient.getOptions().setCssEnabled(false);
       webClient.getOptions().setJavaScriptEnabled(false);
       webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
       webClient.getOptions().setThrowExceptionOnScriptError(false);

       // Fetch and parse page
       HtmlPage page = (HtmlPage) webClient.getPage(scraperApiUrl);

       DomNodeList<DomNode> quoteBlocks = page.querySelectorAll(".quote");

       System.out.println("\n📌 Scraped Quotes from https://quotes.toscrape.com:\n");

       for (DomNode quote : quoteBlocks) {
           String text = quote.querySelector(".text").asNormalizedText();
           String author = quote.querySelector(".author").asNormalizedText();
           DomNodeList<DomNode> tags = quote.querySelectorAll(".tags .tag");

           System.out.println("📝 Quote: " + text);
           System.out.println("👤 Author: " + author);
           System.out.print("🏷️  Tags: ");
           for (DomNode tag : tags) {
               System.out.print(tag.asNormalizedText() + " ");
           }
           System.out.println("\n------------------------------------------\n");
       }

       webClient.close();
   }
}

Make sure to set your API key in the environment variable SCRAPERAPI_KEY.

Not Recommended: Proxy Mode

HtmlUnit allows proxy configuration, but ScraperAPI uses query string authentication, which doesn’t work with HtmlUnit’s proxy model.

Why It Fails

ScraperAPI needs the API key in the URL query.
HtmlUnit proxy setup expects a static IP or basic authentication.

Error Output:

Use the API Endpoint method instead.

Optional Parameters

ScraperAPI supports various options via query parameters:

{
   Render = true,           // Load JavaScript
   CountryCode = "us",      // Use US IP
   Premium = true,          // Enable CAPTCHA solving
   SessionNumber = 123      // Maintain session across requests
};

Parameter	What It Does	When to Use It
`render=true`	Tells ScraperAPI to execute JavaScript	Use for SPAs and dynamic content
`country_code=us`	Routes requests through US proxies	Great for geo-blocked content
`premium=true`	Enables CAPTCHA solving and advanced anti-bot measures	Essential for heavily protected sites
`session_number=123`	Maintains the same proxy IP across requests	Use when you need to maintain login sessions

These parameters cover most scraping scenarios. Check the ScraperAPI documentation for additional options.

Example:

String scraperApiUrl = String.format("http://api.scraperapi.com?api_key=%s&url=%s&render=true&country_code=us", apiKey, java.net.URLEncoder.encode(targetUrl, "UTF-8"));

Best Practices

Always store your ScraperAPI key in an environment variable
Use render=true when targeting JavaScript-heavy sites
Avoid using proxy settings in HtmlUnit
Implement retry logic when scraping large datasets
Disable JavaScript/CSS for better performance on static pages

Run the Scraper

Run your MarketPrice.java file using:

mvn compile exec:java -Dexec.mainClass=MarketPrice

Expected Output:

Your terminal should display structured quote data like this:

This confirms ScraperAPI handled the request and routed it through its network.

Ready to start scraping?

Get started with 5,000 free API credits or contact sales

Get Started For Free

Ready to start scraping?

Get started with 5,000 free API credits or contact sales

No credit card required

Async Scraper Service

Structured Data

DataPipeline

Scraping API

Large-Scale Data Acquisition

Ecommerce

Market Research Firms

SEO Agencies

Travel Agencies and Hotels

VCs and Hedge Funds

AI and ML

SERP Data Collection

Ecommerce Data Collection

Market Research Scraper

Real Estate Data Collection

Online Reputation Management

cURL

Python

NodeJS

PHP

Ruby

Java

DataPipeline

Developer Guides

Free Downloads

Product FAQs

Case Studies

Webinars

Comparisons

Learning Hub

Glossary

Blog

Async Scraper Service

Structured Data

DataPipeline

Scraping API

Large-Scale Data Acquisition

Ecommerce

Market Research Firms

SEO Agencies

Travel Agencies and Hotels

VCs and Hedge Funds

AI and ML

SERP Data Collection

Ecommerce Data Collection

Market Research Scraper

Real Estate Data Collection

cURL

Python

NodeJS

PHP

Ruby

Java

DataPipeline

Developer Guides

Free Downloads

Product FAQs

Case Stuides

Webinars

Comparisons

Learning Hub

Glossary

Blog

How to Use ScraperAPI with HtmlUnit in Java

Getting Started

Integration Methods

Recommended: API Endpoint Method

Required Setup

1. Install Java (if not already installed)

2. Install Maven

3. Set Up Project Structure

4. Add Dependencies in pom.xml

5. Add .env File in Root

Full Working Code

Not Recommended: Proxy Mode

Optional Parameters

Best Practices

Run the Scraper

Table of Contents

Ready to start scraping?