Python Selenium Integration

May, 2021

In this guide, we’ll see how you can easily use ScraperAPI with a Python Selenium scraper. We will walk you through exactly how to integrate ScraperAPI and the common mistakes users make so you can get scraping as quickly as possible.

Full code examples can be found on GitHub here.

Recommended Method: Use Proxy Port

To correctly use Selenium with ScraperAPI you should use our proxy mode, like you would any other proxy.

Although there are a number of ways to configure your Selenium webdriver to use a proxy to make requests, one of the easiest is to use Selenium Wire instead of the vanilla Selenium.

To get started, install Selenium Wire with:

Then we need to configure our proxy options to use ScraperAPI’s proxy port and include them in our webdriver:


API_KEY = 'YOUR_API_KEY'

proxy_options = {
    'proxy': {
        'http': f'http://scraperapi:{API_KEY}@proxy-server.scraperapi.com:8001',
        'https': f'http://scraperapi:{API_KEY}@proxy-server.scraperapi.com:8001',
        'no_proxy': 'localhost,127.0.0.1'
    }
}

....
....

driver = webdriver.Chrome(ChromeDriverManager().install(), 
                            options=option, 
                            seleniumwire_options=proxy_options)

And that’s it. Whenever Selenium makes a request it will now send the request via our proxy port and use Selenium as you would normally do.

Here is a full code example of a Selenium scraper that scrapes quotes.toscrape.com using ScraperAPI as the proxy.

Not Recommended Method: Using The API Endpoint

A common issue we see users run into when trying to integrate ScraperAPI into a Selenium scraper is that they try to send their requests to our API endpoint. Although, this method is a small bit easier to implement and does kinda work, it isn’t the best integration method as you can’t make full use of all the functionality that Selenium offers.

To use this integration method you would simply change the URL you are sending requests to, to the ScraperAPI endpoint using a function like this:


API_KEY = ‘YOUR_API_KEY’

def get_scraperapi_url(url):
    """
        Converts url into API request for ScraperAPI.
    """
    payload = {'api_key': API_KEY, 'url': url}
    proxy_url = 'http://api.scraperapi.com/?' + urlencode(payload)
    return proxy_url

And then use this function when making a GET request with the Selenium web driver:

With this method you will run into two issues:

Issue #1 – Because you are sending a GET request to the ScraperAPI endpoint, if there are any other assets/endpoints that need to be requested by Selenium on the page Selenium will look for these on the ScraperAPI endpoint not the website’s server.

For example, when making a request to quotes.toscrape.com, it needs to make a request to the relative url /static/main.css. Which in normal circumstances, would send the request to http://quotes.toscrape.com/static/main.css but when using the API endpoint it would try to make a request to http://api.scraperapi.com/static/main.css which will return an error. As a result, Selenium will only get access to the first HTML response (which kinda defeats the purpose of using a headless browser).

Issue #2 – Similarly to the first issue, if you want to navigate or emulate real user behaviour using Selenium you will often have to click on relative links which won’t work when sending requests via the API endpoint for the same reasons as above.

As a result, we highly recommend that you don’t use the API endpoint if you want to integrate a Python Selenium scraper with the API. Here is a code example, if you really want to though.

Looking for a way to scrape LinkedIn with Python? We have an easy guide you can follow.