Getting Started

Scraper API is designed to simplify web scraping. A few things to consider before we get started:

Basic Usage

Scraper API exposes a single API endpoint, simply send a GET request to http://api.scraperapi.com with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.

Sample Code

Bash Node Python/Scrapy PHP Ruby

      curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip"
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
      

Rendering Javascript

If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Business and Enterprise plans. To render javascript, simply set render=true and we will use a headless Google Chrome instance to fetch the page:

Sample Code

Bash Node Python/Scrapy PHP Ruby

      curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip&render=true"
      

      var request = require('request');
      var url = 'http://httpbin.org/ip';

      request(
        {
          method: 'GET',
          url: 'http://api.scraperapi.com/?api_key=YOURAPIKEY&url=' + url + '&render=true',
          headers: {
            Accept: 'application/json',
          },
        },
        function(error, response, body) {
          console.log(body);
        }
      );
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"192.15.81.132"}
            </pre>
          </body>
        </html>
      

Custom Headers

If you would like to keep the original request headers in order to pass through custom headers (user agents, cookies, etc.), simply set keep_headers=true. Only use this feature in order to get customized results, do not use this feature in order to avoid blocks, we handle that internally.

Sample Code

Bash Node Python/Scrapy PHP Ruby

      curl --header "X-MyHeader: 123" \
      "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/anything&keep_headers=true"
      

      var request = require('request');
      var url = 'http://httpbin.org/ip';

      request(
        {
          method: 'GET',
          url: 'http://api.scraperapi.com/?api_key=YOURAPIKEY&url=' + url + '&keep_headers=true',
          headers: {
            Accept: 'application/json',
            'X-MyHeader': '123',
          },
        },
        function(error, response, body) {
          console.log(body);
        }
      );
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
            {
              "args":{},
              "data":"",
              "files":{},
              "form":{},
              "headers": {
                "Accept":"*/*",
                "Accept-Encoding":"gzip, deflate",
                "Cache-Control":"max-age=259200",
                "Connection":"close",
                "Host":"httpbin.org",
                "Referer":"http://httpbin.org",
                "Timeout":"10000",
                "User-Agent":"curl/7.54.0",
                "X-Myheader":"123"
              },
              "json":null,
              "method":"GET",
              "origin":"45.72.0.249",
              "url":"http://httpbin.org/anything"
            }
            </pre>
          </body>
        </html>
      

Sessions

To reuse the same proxy for multiple requests, simply use the &session_number= flag (e.g. session_number=123). The value of session can be any integer, simply send a new integer to create a new session (this will allow you to continue using the same proxy for each request with that session number). Sessions expire 60 seconds after the last usage.

Sample Code

Bash Node Python PHP Ruby

      curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip&session_number=123"
      curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip&session_number=123"
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
      

Geographic Location

To ensure your requests come from the United States, please use the country_code= flag (e.g. country_code=us). United States (us) and European Union (eu) geotargeting are available on the Startup plan and higher. Business plan customers also have access to Canada (ca), United Kingdom (uk), Germany (de), France (fr), Spain (es), Brazil (br), Mexico (mx), India (in), Japan (jp), China (cn), and Australia (au). Other countries are available to Enterprise customers upon request.

Sample Code

Bash Node Python/Scrapy PHP Ruby

      curl
      "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip&country_code=us"
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
      

Premium Proxy Pools

Our standard proxy pools include millions of proxies from over a dozen ISPs, and should be sufficient for the vast majority of scraping jobs. However, for a few particularly difficult to scrape sites, we also maintain a private internal pool of residential and mobile IPs. This pool is only available to users on the Business plan or higher. Requests through our premium pool are charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit), each request that uses both rendering javascript and our premium pool will be charged at 25 times the normal rate (every successful request will count as 25 API calls against your monthly limit). To send a request through our premium proxy pool, please use the premium=true flag.

Sample Code

Bash Node Python/Scrapy PHP Ruby

      curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip&premium=true"
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
      

POST/PUT Requests

(BETA) Some advanced users will want to issue POST/PUT Requests in order to scrape forms and API endpoints directly. You can do this by sending a POST/PUT request through Scraper API. The return value will be stringified, if you want to use it as JSON, you will want to parse it into a JSON object.

Sample Code

Bash Node Python

      # Replace POST with PUT to send a PUT request instead
      curl -d 'foo=bar' \
      -X POST \
      "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/anything"
      
      # For form data
      curl -H 'Content-Type: application/x-www-form-urlencoded' \
      -F 'foo=bar' \
      -X POST \
      "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/anything"
      

      // Replace POST with PUT to send a PUT request instead
      var request = require('request');
      var url = 'http://httpbin.org/anything';

      request(
        {
          method: 'POST',
          url: 'http://api.scraperapi.com/?api_key=YOURAPIKEY&url=' + url,
          headers: {
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({foo: 'bar'}),
        },
        function(error, response, body) {
          console.log(body);
        }
      );

      //for form data
      request(
        {
          method: 'POST',
          url: 'http://api.scraperapi.com/?api_key=YOURAPIKEY&url=' + url,
          headers: {
            'Content-Type': 'application/x-www-form-urlencoded',
          },
          form: {foo: 'bar'},
        },
        function(error, response, body) {
          console.log(body);
        }
      );
      

Result


      {
        "args": {},
        "data": "{\"foo\":\"bar\"}",
        "files": {},
        "form": {},
        "headers": {
          "Accept": "application/json",
          "Accept-Encoding": "gzip, deflate",
          "Content-Length": "13",
          "Content-Type": "application/json; charset=utf-8",
          "Host": "httpbin.org",
          "Upgrade-Insecure-Requests": "1",
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
        },
        "json": {
          "foo": "bar"
        },
        "method": "POST",
        "origin": "191.101.82.154, 191.101.82.154",
        "url": "https://httpbin.org/anything"
      }
      

Account Information

If you would like to monitor your account usage and limits programmatically (how many concurrent requests you're using, how many requests you've made, etc.) you may use the /account endpoint, which returns JSON. Note that the requestCount and failedRequestCount numbers only refresh once every 15 seconds, while the concurrentRequests number is availble in realtime.

Sample Code

Bash Node Python PHP Ruby

      curl "http://api.scraperapi.com/account?api_key=YOURAPIKEY"
      

Result


      {
        "concurrentRequests":553,
        "requestCount":6655888,
        "failedRequestCount":1118,
        "requestLimit":10000000,
        "concurrencyLimit":1000
      }
      
If you have any questions, you can contact support or email us at support@scraperapi.com
Our Story
Having built many web scrapers, we repeatedly went through the tiresome process of finding proxies, setting up headless browsers, and handling CAPTCHAs. That's why we decided to start Scraper API, it handles all of this for you so you can scrape any page with a simple API call!