Getting Started

Scraper API is designed to simplify web scraping. A few things to consider before we get started:
  • Each request will be retried until it can be successfully completed (up to 60 seconds). Remember to set your timeout to 60 seconds to ensure this process goes smoothly. In cases where every request fails in 60 seconds we will return a 500 error, you may retry the request and you will not be charged for the unsuccessful request (you are only charged for successful requests, 200 and 404 status codes). Make sure to catch these errors! They will occur on roughly 1-2% of requests for hard to scrape websites. You can scrape images, PDFs or other files just as you would any other URL, just remember that there is a 2MB limit per request.
  • If you exceed your plan concurrent connection limit, the API will respond with a 429 status code, this can be solved by slowing down your request rate
  • There is no overage allowed on the free plan, if you exceed 1000 requests per month on the free plan, you will receive a 403 error.
  • Each request will return a string containing the raw html from the page requested, along with any headers and cookies.
  • We offer SDKs for NodeJS, Python, Ruby, and PHP.

Basic Usage

Scraper API exposes a single API endpoint, simply send a GET request to http://api.scraperapi.com with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.

Sample Code

Bash Node Python/Scrapy PHP Ruby Java

      curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip"
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
      

Rendering Javascript

If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Business and Enterprise plans. To render javascript, simply set render=true and we will use a headless Google Chrome instance to fetch the page:

Sample Code

Bash Node Python/Scrapy PHP Ruby Java

      curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip&render=true"
      

      // remember to install the library: npm install scraperapi-sdk --save
      var scraperapiClient = require('scraperapi-sdk')('YOURAPIKEY')  
      var response = await scraperapiClient.get('http://httpbin.org/ip', {render: true})
      console.log(response)
      

      // remember to install the library: https://search.maven.org/artifact/com.scraperapi/sdk/1.0
      ScraperApiClient client = new ScraperApiClient("YOURAPIKEY");
          client.get("http://httpbin.org/ip")
          .render(true)
          .result();
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"192.15.81.132"}
            </pre>
          </body>
        </html>
      

Custom Headers

If you would like to keep the original request headers in order to pass through custom headers (user agents, cookies, etc.), simply set keep_headers=true. Only use this feature in order to get customized results, do not use this feature in order to avoid blocks, we handle that internally.

Sample Code

Bash Node Python/Scrapy PHP Ruby

      curl --header "X-MyHeader: 123" \
      "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/anything&keep_headers=true"
      

      // remember to install the library: npm install scraperapi-sdk --save
        var scraperapiClient = require('scraperapi-sdk')('YOURAPIKEY')  
        var response = await scraperapiClient.get('http://httpbin.org/ip', {headers: {'X-MyHeader': '123'}})
        console.log(response)
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
            {
              "args":{},
              "data":"",
              "files":{},
              "form":{},
              "headers": {
                "Accept":"*/*",
                "Accept-Encoding":"gzip, deflate",
                "Cache-Control":"max-age=259200",
                "Connection":"close",
                "Host":"httpbin.org",
                "Referer":"http://httpbin.org",
                "Timeout":"10000",
                "User-Agent":"curl/7.54.0",
                "X-Myheader":"123"
              },
              "json":null,
              "method":"GET",
              "origin":"45.72.0.249",
              "url":"http://httpbin.org/anything"
            }
            </pre>
          </body>
        </html>
      

Sessions

To reuse the same proxy for multiple requests, simply use the &session_number= flag (e.g. session_number=123). The value of session can be any integer, simply send a new integer to create a new session (this will allow you to continue using the same proxy for each request with that session number). Sessions expire 15 minutes after the last usage.

Sample Code

Bash Node Python PHP Ruby

      curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip&session_number=123"
      curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip&session_number=123"
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
      

Geographic Location

To ensure your requests come from the United States, please use the country_code= flag (e.g. country_code=us). United States (us) geotargeting is available on the Startup plan and higher. Business plan customers also have access to Canada (ca), United Kingdom (uk), Germany (de), France (fr), Spain (es), Brazil (br), Mexico (mx), India (in), Japan (jp), China (cn), and Australia (au). Other countries are available to Enterprise customers upon request.

Sample Code

Bash Node Python/Scrapy PHP Ruby

      curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip&country_code=us"
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
      

Premium Residential/Mobile Proxy Pools

Our standard proxy pools include millions of proxies from over a dozen ISPs, and should be sufficient for the vast majority of scraping jobs. However, for a few particularly difficult to scrape sites, we also maintain a private internal pool of residential and mobile IPs. This pool is only available to users on the Business plan or higher. Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit), each request that uses both rendering javascript and our premium pool will be charged at 25 times the normal rate (every successful request will count as 25 API calls against your monthly limit). To send a request through our premium proxy pool, please use the premium=true flag.

Sample Code

Bash Node Python/Scrapy PHP Ruby

      curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip&premium=true"
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
      

POST/PUT Requests

(BETA) Some advanced users will want to issue POST/PUT Requests in order to scrape forms and API endpoints directly. You can do this by sending a POST/PUT request through Scraper API. The return value will be stringified, if you want to use it as JSON, you will want to parse it into a JSON object.

Sample Code

Bash Node Python Ruby PHP

      # Replace POST with PUT to send a PUT request instead
      curl -d 'foo=bar' \
      -X POST \
      "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/anything"
      
      # For form data
      curl -H 'Content-Type: application/x-www-form-urlencoded' \
      -F 'foo=bar' \
      -X POST \
      "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/anything"
      

      // remember to install the library: npm install scraperapi-sdk --save
      var scraperapiClient = require('scraperapi-sdk')('YOURAPIKEY')  
      await scraperapiClient.post('http://httpbin.org/anything', {body: {"foo": "bar"}})
      await scraperapiClient.put('http://httpbin.org/anything', {body: {"foo": "bar"}})
      

Result


      {
        "args": {},
        "data": "{\"foo\":\"bar\"}",
        "files": {},
        "form": {},
        "headers": {
          "Accept": "application/json",
          "Accept-Encoding": "gzip, deflate",
          "Content-Length": "13",
          "Content-Type": "application/json; charset=utf-8",
          "Host": "httpbin.org",
          "Upgrade-Insecure-Requests": "1",
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
        },
        "json": {
          "foo": "bar"
        },
        "method": "POST",
        "origin": "191.101.82.154, 191.101.82.154",
        "url": "https://httpbin.org/anything"
      }
      

Scraper API Proxy Mode

To simplify implementation, we offer a proxy front-end to the API. The proxy will take your requests and pass them through to the API which will take care of proxy rotation, captchas and retries.
  • Each request will be sent through to the API and the response returned just like the API normally would. Successful requests will return with a status code of 200, 404 or 410.
  • Unsuccessful requests will return with a status code of 500. Your requests will receive a 429 error if you go over your concurrency limit and a 403 if you go over your plan's maximum requests. Make sure to catch these errors!
  • You can use the proxy to scrape images, documents, PDFs or other files just as you would any other URL. Please remember that there is a 2MB limit per request.
  • The username for the proxy is "scraperapi" and the password is your API key. You can pass parameters to the API by adding them to the username, separated by periods. For example, if you want to render a request, the username would be "scraperapi.render=true". If you want to geotarget a request, the username would be "scraperapi.country_code=us". Multiple parameters can be included by separating them with periods; for example: "scraperapi.country_code=us.session_number=1234".
  • The proxy will accept the following parameters as part of the username:
  • Any headers you set for your proxy requests are automatically sent to the site you are scraping. You can override this using "keep_headers=false".
  • So that we can properly directy your requests through the API, your code must be configured to not verify SSL certificates.

Sample Code

Bash Node Python/Scrapy PHP Ruby Java C#
        
          curl -x "http://scraperapi:YOURAPIKEY@proxy-server.scraperapi.com:8001" -k "http://httpbin.org/ip"
        
      

Account Information

If you would like to monitor your account usage and limits programmatically (how many concurrent requests you're using, how many requests you've made, etc.) you may use the /account endpoint, which returns JSON. Note that the requestCount and failedRequestCount numbers only refresh once every 15 seconds, while the concurrentRequests number is availble in realtime.

Sample Code

Bash Node Python PHP Ruby

      curl "http://api.scraperapi.com/account?api_key=YOURAPIKEY"
      

Result


      {
        "concurrentRequests":553,
        "requestCount":6655888,
        "failedRequestCount":1118,
        "requestLimit":10000000,
        "concurrencyLimit":1000
      }
      
If you have any questions, you can contact support or email us at support@scraperapi.com
Our Story
Having built many web scrapers, we repeatedly went through the tiresome process of finding proxies, setting up headless browsers, and handling CAPTCHAs. That's why we decided to start Scraper API, it handles all of this for you so you can scrape any page with a simple API call!