In this guide, we’ll see how you can easily use ScraperAPI with a Python Selenium scraper. We will walk you through exactly how to integrate ScraperAPI and the common mistakes users make so you can get scraping as quickly as possible.

Full code examples can be found on GitHub here.

Not Recommended Method: Using The API Endpoint

A common issue we see users run into when trying to integrate ScraperAPI into a Selenium scraper is that they try to send their requests to our API endpoint. Although, this method is a small bit easier to implement and does kinda work, it isn’t the best integration method as you can’t make full use of all the functionality that Selenium offers.

To use this integration method you would simply change the URL you are sending requests to, to the ScraperAPI endpoint using a function like this:

bash

API_KEY = ‘YOUR_API_KEY’

def get_scraperapi_url(url):
    """
        Converts url into API request for ScraperAPI.
    """
    payload = {'api_key': API_KEY, 'url': url}
    proxy_url = 'http://api.scraperapi.com/?' + urlencode(payload)
    return proxy_url

And then use this function when making a GET request with the Selenium web driver:

bash

driver.get(get_scraperapi_url(url))

With this method you will run into two issues:

Issue #1 – Because you are sending a GET request to the ScraperAPI endpoint, if there are any other assets/endpoints that need to be requested by Selenium on the page Selenium will look for these on the ScraperAPI endpoint not the website’s server. 

For example, when making a request to quotes.toscrape.com, it needs to make a request to the relative url /static/main.css. Which in normal circumstances, would send the request to http://quotes.toscrape.com/static/main.css but when using the API endpoint it would try to make a request to http://api.scraperapi.com/static/main.css which will return an error. As a result, Selenium will only get access to the first HTML response (which kinda defeats the purpose of using a headless browser). 

Issue #2 – Similarly to the first issue, if you want to navigate or emulate real user behaviour using Selenium you will often have to click on relative links which won’t work when sending requests via the API endpoint for the same reasons as above.

As a result, we highly recommend that you don’t use the API endpoint if you want to integrate a Python Selenium scraper with the API. Here is a code example, if you really want to though.

Recommended Method: Use Proxy Port

To correctly use Selenium with ScraperAPI you should use our proxy mode, like you would any other proxy. 

Although there are a number of ways to configure your Selenium webdriver to use a proxy to make requests, one of the easiest is to use Selenium Wire instead of the vanilla Selenium.

To get started, install Selenium Wire with:

bash

pip install selenium-wire

Then we need to configure our proxy options to use ScraperAPI’s proxy port and include them in our webdriver:

bash

API_KEY = 'YOUR_API_KEY'

proxy_options = {
    'proxy': {
        'http': f'http://scraperapi:{API_KEY}@proxy-server.scraperapi.com:8001',
        'no_proxy': 'localhost,127.0.0.1'
    }
}

....
....

driver = webdriver.Chrome(ChromeDriverManager().install(), 
                            options=option, 
                            seleniumwire_options=proxy_options)


And that’s it. Whenever Selenium makes a request it will now send the request via our proxy port and use Selenium as you would normally do. 

Note: Sometimes some conflicts do occur when using Python Selenium with our proxy port. If you experience any issues like certain resources/assets not being loaded then contact our support team and we can give you access to a different proxy port that sends your requests directly to our proxy pools without going through our API’s proxy infrastructure. 

Here is a full code example of a Selenium scraper that scrapes quotes.toscrape.com using ScraperAPI as the proxy.