Getting Started

Scraper API is designed to simplify web scraping. A few things to consider before we get started:

Basic Usage

Scraper API exposes a single API endpoint, simply send a GET request to http://api.scraperapi.com with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.

Sample Code

Bash Node Python PHP Ruby

      curl "http://api.scraperapi.com?api_key=YOURAPIKEY&url=http://httpbin.org/ip"
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
      

Rendering Javascript

If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Business and Enterprise plans. Note that rendered requests are charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit), each request that uses both rendering javascript and our premium proxy pool will be charged at 50 times the normal rate (every successful request will count as 50 API calls against your monthly limit).To render javascript, simply set render=true and we will use a headless Google Chrome instance to fetch the page:

Sample Code

Bash Node Python PHP Ruby

      curl "http://api.scraperapi.com?api_key=YOURAPIKEY&url=http://httpbin.org/ip&render=true"
      

      var request = require('request');
      var url = 'http://httpbin.org/ip';

      request(
        {
          method: 'GET',
          url: 'http://api.scraperapi.com/?api_key=YOURAPIKEY&url=' + url + '&render=true',
          headers: {
            Accept: 'application/json',
          },
        },
        function(error, response, body) {
          console.log(body);
        }
      );
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"192.15.81.132"}
            </pre>
          </body>
        </html>
      

Custom Headers

If you would like to keep the original request headers in order to pass through custom headers (user agents, cookies, etc.), simply set keep_headers=true. Note this feature will not work in conjunction with rendering javascript. Only use this feature in order to get customized results, do not use this feature in order to avoid blocks, we handle that internally.

Sample Code

Bash Node Python PHP Ruby

      curl --header "X-MyHeader: 123" \
      "http://api.scraperapi.com?api_key=YOURAPIKEY&url=http://httpbin.org/anything&keep_headers=true"
      

      var request = require('request');
      var url = 'http://httpbin.org/ip';

      request(
        {
          method: 'GET',
          url: 'http://api.scraperapi.com/?api_key=YOURAPIKEY&url=' + url + '&keep_headers=true',
          headers: {
            Accept: 'application/json',
            'X-MyHeader': '123',
          },
        },
        function(error, response, body) {
          console.log(body);
        }
      );
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
            {
              "args":{},
              "data":"",
              "files":{},
              "form":{},
              "headers": {
                "Accept":"*/*",
                "Accept-Encoding":"gzip, deflate",
                "Cache-Control":"max-age=259200",
                "Connection":"close",
                "Host":"httpbin.org",
                "Referer":"http://httpbin.org",
                "Timeout":"10000",
                "User-Agent":"curl/7.54.0",
                "X-Myheader":"123"
              },
              "json":null,
              "method":"GET",
              "origin":"45.72.0.249",
              "url":"http://httpbin.org/anything"
            }
            </pre>
          </body>
        </html>
      

POST Requests

(BETA) Some advanced users will want to issue POST Requests in order to scrape forms and API endpoints directly. You can do this by sending a POST request through Scraper API. The return value will be stringified, if you want to use it as JSON, you will want to parse it into a JSON object.

Sample Code

Bash

      curl -d 'foo=bar' \
      -X POST \
      "http://api.scraperapi.com?api_key=YOURAPIKEY&url=http://httpbin.org/post"
      

Result


      "{
        \"args\": {},
        \"data\": \"\",
        \"files\": {},
        \"form\": {
          \"{\\\"{\\\\\\\"foo\\\\\\\": \\\\\\\"\":\"\"},
          \"headers\": {
            \"Accept\":\"*/*\",
            \"Accept-Encoding\": \"gzip, deflate\",
            \"Accept-Language\":\"en-US,en;\",
            \"Cache-Control\":\"max-age=259200\",
            \"Connection\":\"close\",
            \"Content-Length\":\"14\",
            \"Content-Type\":\"application/x-www-form-urlencoded\",
            \"Host\":\"httpbin.org\",
            \"Timeout\":\"10000\",
            \"User-Agent\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36\"
          },
          \"json\":null,
          \"origin\":\"107.175.110.211\",
          \"url\":\"http://httpbin.org/post\"
        }\n"
      

PUT Requests

(BETA) Some advanced users will want to issue PUT Requests in order to scrape forms and API endpoints directly. You can do this by sending a PUT request through Scraper API. The return value will be stringified, if you want to use it as JSON, you will want to parse it into a JSON object.

Sample Code

Bash

      curl -d 'foo=bar' \
      -X PUT \
      "http://api.scraperapi.com?api_key=YOURAPIKEY&url=http://httpbin.org/anything"
      

Result


      "{
        \"args\":{},
        \"data\":\"{\\\"foo\\\":\\\"bar\\\"}\",
        \"files\":{},
        \"form\":{},
        \"headers\":{
          \"Accept\":\"application/json\",
          \"Accept-Encoding\":\"gzip, deflate\",
          \"Accept-Language\":\"en-US,en;q=0.9,es;q=0.8\",
          \"Cache-Control\":\"max-age=259200\",
          \"Connection\":\"close\",
          \"Content-Length\":\"13\",
          \"Content-Type\":\"application/json\",
          \"Host\":\"httpbin.org\",
          \"Upgrade-Insecure-Requests\":\"1\",
          \"User-Agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1 Safari/605.1.15\"
        },
        \"json\":{\"foo\":\"bar\"},
        \"method\":\"PUT\",
        \"origin\":\"104.168.95.190\",
        \"url\":\"http://httpbin.org/anything\"
      }\n"
      

Sessions

To reuse the same proxy for multiple requests, simply use the &session_number= flag (e.g. session_number=123). The value of session can be any integer, simply send a new integer to create a new session (this will allow you to continue using the same proxy for each request with that session number). Sessions expire 60 seconds after the last usage.

Sample Code

Bash Node Python PHP Ruby

      curl "http://api.scraperapi.com?api_key=YOURAPIKEY&url=http://httpbin.org/ip&session_number=123"
      curl "http://api.scraperapi.com?api_key=YOURAPIKEY&url=http://httpbin.org/ip&session_number=123"
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
      

Geographic Location

To ensure your requests come from the United States, please use the country_code= flag (e.g. country_code=us). United States (us) geotargeting is available on the Startup plan and higher. Business plan customers also have access to Canada (ca), United Kingdom (uk), Germany (de), France (fr), Spain (es), Brazil (br), Mexico (mx), India (in), Japan (jp), China (cn), and Australia (au). Other countries are available to Enterprise customers upon request.

Sample Code

Bash Node Python PHP Ruby

      curl
      "http://api.scraperapi.com?api_key=YOURAPIKEY&url=http://httpbin.org/ip&country_code=us"
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
      

Premium Proxy Pools

Our standard proxy pools include millions of proxies from over a dozen ISPs, and should be sufficient for the vast majority of scraping jobs. However, for a few particularly difficult to scrape sites, we also maintain a private internal pool of residential and mobile IPs. This pool is only available to users on the Business plan or higher. Requests through our premium pool are charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit), each request that uses both rendering javascript and our premium pool will be charged at 50 times the normal rate (every successful request will count as 50 API calls against your monthly limit). To send a request through our premium proxy pool, please use the premium=true flag.

Sample Code

Bash Node Python PHP Ruby

      curl "http://api.scraperapi.com?api_key=YOURAPIKEY&url=http://httpbin.org/ip&premium=true"
      

Result


        <html>
          <head>
          </head>
          <body>
            <pre style="word-wrap: break-word; white-space: pre-wrap;">
              {"origin":"176.12.80.34"}
            </pre>
          </body>
        </html>
      
If you have any questions, you can contact support or email us at support@scraperapi.com
Our Story
Having built many web scrapers, we repeatedly went through the tiresome process of finding proxies, setting up headless browsers, and solving CAPTCHAs. That's why we decided to start Scraper API, it handles all of this for you so you can scrape any page with a simple API call!