Web Scraping Proxies Made Simple
Get started scraping the web in as little as 2 minutes
-
API Documentation
Detailed API documentation explaining all the APIs key functionality
Read Docs -
Scraping & Integration Tutorials
Guided examples on how to integrate and use ScraperAPI
Read Tutorials -
Frequent Questions
Quickly find answers for the most common issues
Troubleshoot
Don't have an API key? Sign up here and get 5,000 free API credits
Getting Started
- API Key & Authentication
- Making Requests
- Making POST/PUT Requests
- API Status Codes
- Credits and Requests
- Dashboard
Advanced API
Functionality
Getting Started
Using ScraperAPI is easy. Just send the URL you would like to scrape to the API along with your API key and the API will return the HTML response from the URL you want to scrape.
API Key & Authentication
ScraperAPI uses API keys to authenticate requests. To use the API you need to sign up for an account and include your unique API key in every request.
If you haven’t signed up for an account yet then
sign up for a free trial here with 5,000 free API credits
Making Requests
You can use the API to scrape web pages, API endpoints, images, documents, PDFs, or other files just as you would any other URL. Note: there is a 2MB limit per request.
There are three ways in which you can send GET requests to ScraperAPI:
- Via our Async Scraper service
http://async.scraperapi.com
- Via our API endpoint
http://api.scraperapi.com?
- Via one of our SDKs (only available for some programming languages)
- Via our proxy port
http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001
Choose whichever option best suits your scraping requirements.
Important note: regardless of how you invoke the service, we highly recommend you set a 60 seconds timeout in your application to get the best possible success rates, especially for some hard-to-scrape domains.
Async requests Method #1
To ensure a higher level of successful requests when using our scraper, we’ve built a new product, Async Scraper. Rather than making requests to our endpoint waiting for the response, this endpoint submits a job of scraping, in which you can later collect the data from using our status endpoint.
Scraping websites can be a difficult process; it takes numerous steps and significant effort to get through some sites’ protection which sometimes proves to be difficult with the timeout constraints of synchronous APIs. The Async Scraper will work on your requested URLs until we have achieved a 100% success rate (when applicable), returning the data to you.
Async Scraping is the recommended way to scrape pages when success rate on difficult sites is more important to you than response time (e.g. you need a set of data periodically).
How to use
The async scraper endpoint is available at https://async.scraperapi.com
and it exposes a few useful APIs.
Note: In the examples below we use cURL to invoke the endpoints, but you can of course use any other compatible clients, such as Postman or something programmatic, e.g. Node’s axios or Python’s http.client
Submit an async job
A simple example showing how to submit a job for scraping and receive a status endpoint URL through which you can poll for the status (and later the result) of your scraping job:
curl -X POST -H "Content-Type: application/json" -d '{"apiKey": "xxxxxx", "url": "https://example.com"}' "https://async.scraperapi.com/jobs"
Response:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}
Note the statusUrl
field in the response. That is a personal URL to retrieve the status and results of your scraping job. Invoking that endpoint provides you with the status first:
curl "https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953"
Response:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}
Once your job is finished, the response will change and will contain the results of your scraping job:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"finished",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com",
"response":{
"headers":{
"date":"Thu, 14 Apr 2022 11:10:44 GMT",
"content-type":"text/html; charset=utf-8",
"content-length":"1256",
"connection":"close",
"x-powered-by":"Express",
"access-control-allow-origin":"undefined","access-control-allow-headers":"Origin, X-Requested-With, Content-Type, Accept",
"access-control-allow-methods":"HEAD,GET,POST,DELETE,OPTIONS,PUT",
"access-control-allow-credentials":"true",
"x-robots-tag":"none",
"sa-final-url":"https://example.com/",
"sa-statuscode":"200",
"etag":"W/\"4e8-Sjzo7hHgkd15I/TYxuW15B7HwEc\"",
"vary":"Accept-Encoding"
},
"body":"<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n\n <meta charset=\"utf-8\" />\n <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n <style type=\"text/css\">\n body {\n background-color: #f0f0f2;\n margin: 0;\n padding: 0;\n font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n \n }\n div {\n width: 600px;\n margin: 5em auto;\n padding: 2em;\n background-color: #fdfdff;\n border-radius: 0.5em;\n box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n }\n a:link, a:visited {\n color: #38488f;\n text-decoration: none;\n }\n @media (max-width: 700px) {\n div {\n margin: 0 auto;\n width: auto;\n }\n }\n </style> \n</head>\n\n<body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission.</p>\n <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n",
"statusCode":200
}
}
Callbacks
Using a status URL is a great way to test the API or get started quickly, but some customer environments may require some more robust solutions, so we implemented callbacks. Currently only webhook
callbacks are supported but we are planning to introduce more over time (e.g. direct database callbacks, AWS S3…etc).
An example of using a webhook callback:
curl -X POST -H "Content-Type: application/json" -d '{"apiKey": "xxxxxx", "url": "https://example.com", "callback": {"type": "webhook", "url": "https://yourcompany.com/scraperapi"}}' "https://async.scraperapi.com/jobs"
Using a callback you don’t need to use the status URL (although you still can) to fetch the status and results of the job. Once the job is finished the provided webhook URL will be invoked by our system with the same content as the status URL provides.
Just replace the https://yourcompany.com/scraperapi
URL with your preferred endpoint. You can even add basic auth to the URL in the following format: https://user:pass@yourcompany.com/scraperapi
Note: The system will try to invoke the webhook URL 3 times, then it cancels the job. So please make sure that the webhook URL is available through the public internet and will be capable of handling the traffic that you need.
Hint: Webhook.site is a free online service to test webhooks without requiring you to build a complex infrastructure.
API parameters
You can use the usual API parameters just the same way you’d use it with our synchronous API. These parameters should go into an apiParams
object inside the POST data, e.g:
{
"apiKey": "xxxxxx",
"apiParams": {
"autoparse": false, // boolean
"binary_target": false, // boolean
"country_code": "us", // string, see: https://api.scraperapi.com/geo
"device_type": "desktop", // desktop | mobile
"follow_redirect": false, // boolean
"premium": true, // boolean
"render": false, // boolean
"retry_404": false // boolean
},
"url": "https://example.com"
}
Requests to the API EndpointMethod #2
ScraperAPI exposes a single API endpoint for you to send GET requests. Simply send a GET request to http://api.scraperapi.com
with two query string parameters and the API will return the HTML response for that URL:
api_key
which contains your API key, andurl
which contains the url you would like to scrape
You should format your requests to the API endpoint as follows:
curl "http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip"
To enable other API functionality when sending a request to the API endpoint simply add the appropriate query parameters to the end of the ScraperAPI URL.
For example, if you want to enable Javascript rendering with a request, then add render=true
to the request:
curl "http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip&render=true"
To use two or more parameters, simply separate them with the “&” sign.
curl "http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true&country_code=us"
Send requests to the proxy port Method #3
To simplify implementation for users with existing proxy pools, we offer a proxy front-end to the API. The proxy will take your requests and pass them through to the API which will take care of proxy rotation, captchas, and retries.
The proxy mode is a light front-end for the API and has all the same functionality and performance as sending requests to the API endpoint.
The username
for the proxy is scraperapi and the password
is your API key.
curl -x "http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001" -k "http://httpbin.org/ip"
Note: So that we can properly direct your requests through the API, your code must be configured to not verify SSL certificates.
To enable extra functionality whilst using the API in proxy mode, you can pass parameters to the API by adding them to username, separated by periods.
For example, if you want to enable Javascript rendering with a request, the username would be scraperapi.render=true
curl -x "http://scraperapi.render=true:APIKEY@proxy-server.scraperapi.com:8001" -k "http://httpbin.org/ip"
Multiple parameters can be included by separating them with periods; for example:
curl -x "http://scraperapi.render=true.country_code=true:APIKEY@proxy-server.scraperapi.com:8001" -k "http://httpbin.org/ip"
Making POST/PUT Requests
Some advanced users may want to send POST/PUT requests in order to scrape forms and API endpoints directly. You can do this by sending a POST/PUT request through ScraperAPI. The return value will be stringified. So if you want to use it as JSON, you will need to parse it into a JSON object.
# Replace POST with PUT to send a PUT request instead
curl -d 'foo=bar' \
-X POST \
"http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/anything"
# For form data
curl -H 'Content-Type: application/x-www-form-urlencoded' \
'foo=bar' \
POST \
"http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/anything"
API Status Codes
The API will return a specific status code after every request depending on whether the request was successful, failed or some other error occurred. ScraperAPI will retry failed requests for up to 60 seconds to try and get a successful response from the target URL before responding with a 500
error indicating a failed request.
Note: To avoid timing out your request before the API has had a chance to complete all retries, remember to set your timeout to 60 seconds.
In cases where a request fails after 60 seconds of retrying, you will not be charged for the unsuccessful request (you are only charged for successful requests, 200
and 404
status codes).
Errors will inevitably occur so on your end make sure you catch these errors. You can configure your code to immediately retry the request and in most cases, it will be successful. If a request is consistently failing then check your request to make sure that it is configured correctly. Or, if you are consistently getting bans messages from an anti-bot, then create a ticket with our support team and we will try to bypass this anti-bot for you.
If you receive a successful 200
status code response from the API but the response contains a CAPTCHA, please contact our support team and they will add it to our CAPTCHA detection database. Once included in our CAPTCHA database the API will treat it as a ban in the future and automatically retry the request.
Below are the possible status codes you will receive:
Status Code | Details |
---|---|
200 |
Successful response |
404 |
Page requested does not exist. |
410 |
Page requested is no longer available. |
500 |
After retrying for 60 seconds, the API was unable to receive a successful response. |
429 |
You are sending requests too fast, and exceeding your concurrency limit. |
403 |
You have used up all your API credits. |
Credits and Requests
Every time you make a request, you consume credits. The credits you have available are defined by the plan you are currently using. The amount of credits you consume depends on the domain and parameters you add to your request.
Domains
We have created custom scrapers to target these sites, if you scrape one of these domains you will activate this scraper and the credit cost will change.
Category | Domain examples | Credit Cost |
---|---|---|
SERP | Google, Bing | 25 |
E-Commerce | Amazon, Allegro | 5 |
Protected Domains | Linkedin, G2, Social Media Sites | 30 |
Geotargeting is included in these credit costs.
If your domain does not fit in any category in the above list, it will cost 1
credit if no additional parameters are added.
Parameters
According to your needs, you may want to access different features on our platform.
premium=true
– requests cost 10 credits
render=true
– requests cost 10 credits
premium=true
+ render=true
– requests cost 25 credits
ultra_premium=true
– requests cost 30 credits
ultra_premium=true
+ render=true
– requests cost 75 credits
In any requests, with or without these parameters, we will only charge for successful requests (200
and 404
status codes). If you run out of credits sooner than planned, you can renew your subscription early as explained in the next section – Dashboard.
Dashboard
The Dashboard shows you all the information so you can keep track of your usage. It is divided into Usage, Billing, Documentation, and links to contact us.
Usage
Here you will find your API key, sample codes, monitoring tools, and usage statistics such as credits used, concurrency, failed requests, your current plan, and the end date of your billing cycle.
Billing
In this section, you are able to see your current plan, the end date of the current billing cycle, as well as subscribe to another plan. This is also the right place to set or update your billing details, payment method, view your invoices, manage your subscription, or cancel it.
Here you can also renew your subscription early, in case you run out of API credits sooner than planned. This option will let you renew your subscription at an earlier date than planned, charging you for the subscription price and resetting your credits. If you find yourself using this option regularly, please reach out to us so we can find a plan that can accommodate the number of credits you need.
Documentation
This will direct you to our documentation – exactly where you are right now.
Contact Support
Direct link to reach out to our technical support team.
Contact Sales
Direct link to our sales team, for any subscription or billing-related inquiries.
Customize API Functionality
ScraperAPI enables you to customize the API’s functionality by adding additional parameters to your requests. The API will accept the following parameters:
Parameter | Description |
---|---|
render |
Activate javascript rendering by setting render=true in your request. The API will automatically render the javascript on the page and return the HTML response after the javascript has been rendered.
Requests using this parameter cost 10 API credits, or 75 if used in combination with ultra-premium |
country_code |
Activate country geotargeting by setting country_code=us to use US proxies for example.
This parameter does not increase the cost of the API request. |
premium |
Activate premium residential and mobile IPs by setting premium=true . Using premium proxies costs 10 API credits, or 25 API credits if used in combination with Javascript rendering render=true . |
session_number |
Reuse the same proxy by setting session_number=123 for example.
This parameter does not increase the cost of the API request. |
keep_headers |
Use your own custom headers by setting keep_headers=true along with sending your own headers to the API.
This parameter does not increase the cost of the API request. |
device_type |
Set your requests to use mobile or desktop user agents by setting device_type=desktop or device_type=mobile .
This parameter does not increase the cost of the API request. |
autoparse |
Activate auto parsing for select websites by setting autoparse=true . The API will parse the data on the page and return it in JSON format.
This parameter does not increase the cost of the API request. |
ultra_premium |
Activate our advanced bypass mechanisms by setting ultra_premium=true .
Requests using this parameter cost 30 API credits, or 75 if used in combination with javascript rendering. |
Rendering Javascript
If you are crawling a page that requires you to render the javascript on the page to scrape the data you need, then we can fetch these pages using a headless browser. To render javascript, simply set render=true
and we will use a headless Google Chrome instance to fetch the page. This feature is only available on the Business and Enterprise plans.
curl "http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip&render=true"
Custom Headers
If you would like to use your own custom headers (user agents, cookies, etc.) when making a request to the website, simply set keep_headers=true
and send the API the headers you want to use. The API will then use these headers when sending requests to the website.
Note: Only use this feature if you need to send custom headers to retrieve specific results from the website. Within the API we have a sophisticated header management system designed to increase success rates and performance on difficult sites. When you send your own custom headers you override our header system, which oftentimes lowers your success rates. Unless you absolutely need to send custom headers to get the data you need, we advise that you don’t use this functionality.
If you need to get results for mobile devices, use the device_type
parameter to set the user-agent header for your request, instead of setting your own.
curl --header "X-MyHeader: 123" "http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/anything&keep_headers=true"
Sessions
To reuse the same proxy for multiple requests, simply use the session_number
parameter by setting it equal to a unique integer for every session you want to maintain (e.g. session_number=123
). This will allow you to continue using the same proxy for each request with that session number. To create a new session simply set the session_number
parameter with a new integer to the API. The session value can be any integer. Sessions expire 15 minutes after the last usage.
curl "http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip&session_number=123"
Geographic Location
Certain websites (typically e-commerce stores and search engines) display different data to different users based on the geolocation of the IP used to make the request to the website. In cases like these, you can use the API’s geotargeting functionality to easily use proxies from the required country to retrieve the correct data from the website.
To control the geolocation of the IP used to make the request, simply set the country_code
parameter to the country you want the proxy to be from and the API will automatically use the correct IP for that request.
For example: to ensure your requests come from the United States, set the country_code
parameter to country_code=us
.
Business and Enterprise Plan users can geotarget their requests to the following 13 regions (Hobby and Startup Plans can only use US and EU geotargeting) by using the country_code
in their request:
Country Code | Region | Plans |
---|---|---|
us |
United States | Hobby Plan and higher. |
eu |
Europe | Hobby Plan and higher. |
ca |
Canada | Business Plan and higher. |
uk |
United Kingdom | Business Plan and higher. |
de |
Germany | Business Plan and higher. |
fr |
France | Business Plan and higher. |
es |
Spain | Business Plan and higher. |
br |
Brazil | Business Plan and higher. |
mx |
Mexico | Business Plan and higher. |
in |
India | Business Plan and higher. |
jp |
Japan | Business Plan and higher. |
cn |
China | Business Plan and higher. |
au |
Australia | Business Plan and higher. |
Geotargeting “eu” will use IPs from any European country.
Other countries are available to Enterprise customers upon request.
curl "http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip&country_code=us"
Premium Residential/Mobile Proxy Pools
Our standard proxy pools include millions of proxies from over a dozen ISPs and should be sufficient for the vast majority of scraping jobs. However, for a few particularly difficult to scrape sites, we also maintain a private internal pool of residential and mobile IPs. This pool is only available to users on the Business plan or higher.
Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API credits against your monthly limit). Each request that uses both javascript rendering and our premium proxy pools will be charged at 25 times the normal rate (every successful request will count as 25 API credits against your monthly limit). To send a request through our premium proxy pool, please set the premium
query parameter to premium=true
.
We also have a higher premium level that you can use for really tough targets, such as LinkedIn. You can access these pools by adding the ultra_premium=true
query parameter. These requests will use 30 API credits against your monthly limit, or 75 if used together with rendering. Please note, this is only available on our paid plans.
curl "http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip&country_code=us"
Device Type
If your use case requires you to exclusively use either desktop or mobile user agents in the headers it sends to the website then you can use the device_type
parameter.
- Set
device_type=desktop
to have the API set a desktop (e.g. iOS, Windows, or Linux) user agent. Note: This is the default behavior. Not setting the parameter will have the same effect. - Set
device_type=mobile
to have the API set a mobile (e.g. iPhone or Android) user agent.
Note: The device type you set will be overridden if you use keep_headers=true
and send your own user agent in the requests header.
curl "http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip&device_type=mobile"
Autoparse
For select websites the API will parse all the valuable data in the HTML response and return it in JSON format. To enable this feature, simply add autoparse=true
to your request and the API will parse the data for you. Currently, this feature works with Amazon, Google Search, and Google Shopping.
curl "http://api.scraperapi.com/?api_key=APIKEY&url=https://www.amazon.com/dp/B07V1PHM66&autoparse=true"
Wait for Element when rendering
If a rendered request is a bit slow and the page stabilises before the request is satisfied, it can fool the API into thinking the page has finished rendering.
To cope with this, you can tell the API to wait for an dom element (selector) to appear on the page when rendering. You just need to send the wait_for_selector=
parameter, passing a URL-encoded jquery selector. The API will then wait for this to appear on the page before returning results.
curl "http://api.scraperapi.com/?api_key=APIKEY&url=https://www.snackmagic.com/&wait_for_selector=footer"
Account Information
If you would like to monitor your account usage and limits programmatically (how many concurrent requests you’re using, how many requests you’ve made, etc.) you may use the /account endpoint, which returns JSON.
Note: the requestCount
and failedRequestCount
numbers only refresh once every 15 seconds, while the concurrentRequests
number is available in real-time.
curl "http://api.scraperapi.com/account?api_key=APIKEY"
Getting Started
Using ScraperAPI is easy. Just send the URL you would like to scrape to the API along with your API key and the API will return the HTML response from the URL you want to scrape.
API Key & Authentication
ScraperAPI uses API keys to authenticate requests. To use the API you need to sign up for an account and include your unique API key in every request.
If you haven’t signed up for an account yet then
sign up for a free trial here with 5,000 free API credits
Making Requests
You can use the API to scrape web pages, API endpoints, images, documents, PDFs, or other files just as you would any other URL. Note: there is a 2MB limit per request.
There are four ways in which you can send GET requests to ScraperAPI:
- Via our Async Scraper service
http://async.scraperapi.com
- Via our API endpoint
http://api.scraperapi.com?
- Via our Python SDK
- Via our proxy port
http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001
Choose whichever option best suits your scraping requirements.
Important note: regardless of how you invoke the service, we highly recommend you set a 60 seconds timeout in your application to get the best possible success rates, especially for some hard-to-scrape domains.
Async requests Method #1
To ensure a higher level of successful requests when using our scraper, we’ve built a new product, Async Scraper. Rather than making requests to our endpoint waiting for the response, this endpoint submits a job of scraping, in which you can later collect the data from using our status endpoint.
Scraping websites can be a difficult process; it takes numerous steps and significant effort to get through some sites’ protection which sometimes proves to be difficult with the timeout constraints of synchronous APIs. The Async Scraper will work on your requested URLs until we have achieved a 100% success rate (when applicable), returning the data to you.
Async Scraping is the recommended way to scrape pages when success rate on difficult sites is more important to you than response time (e.g. you need a set of data periodically).
How to use
The async scraper endpoint is available at https://async.scraperapi.com
and it exposes a few useful APIs.
Submit an async job
A simple example showing how to submit a job for scraping and receive a status endpoint URL through which you can poll for the status (and later the result) of your scraping job:
import requests
r = requests.post(url = 'https://async.scraperapi.com/jobs', json={ 'apiKey': 'xxxxxx', 'url': 'https://example.com' })
print(r.text)
Response:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}
Note the statusUrl
field in the response. That is a personal URL to retrieve the status and results of your scraping job. Invoking that endpoint provides you with the status first:
import requests
r = requests.get(url = 'https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953')
print(r.text)
Response:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}
Once your job is finished, the response will change and will contain the results of your scraping job:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"finished",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com",
"response":{
"headers":{
"date":"Thu, 14 Apr 2022 11:10:44 GMT",
"content-type":"text/html; charset=utf-8",
"content-length":"1256",
"connection":"close",
"x-powered-by":"Express",
"access-control-allow-origin":"undefined","access-control-allow-headers":"Origin, X-Requested-With, Content-Type, Accept",
"access-control-allow-methods":"HEAD,GET,POST,DELETE,OPTIONS,PUT",
"access-control-allow-credentials":"true",
"x-robots-tag":"none",
"sa-final-url":"https://example.com/",
"sa-statuscode":"200",
"etag":"W/\"4e8-Sjzo7hHgkd15I/TYxuW15B7HwEc\"",
"vary":"Accept-Encoding"
},
"body":"<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n\n <meta charset=\"utf-8\" />\n <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n <style type=\"text/css\">\n body {\n background-color: #f0f0f2;\n margin: 0;\n padding: 0;\n font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n \n }\n div {\n width: 600px;\n margin: 5em auto;\n padding: 2em;\n background-color: #fdfdff;\n border-radius: 0.5em;\n box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n }\n a:link, a:visited {\n color: #38488f;\n text-decoration: none;\n }\n @media (max-width: 700px) {\n div {\n margin: 0 auto;\n width: auto;\n }\n }\n </style> \n</head>\n\n<body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission.</p>\n <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n",
"statusCode":200
}
}
Callbacks
Using a status URL is a great way to test the API or get started quickly, but some customer environments may require some more robust solutions, so we implemented callbacks. Currently only webhook
callbacks are supported but we are planning to introduce more over time (e.g. direct database callbacks, AWS S3…etc).
An example of using a webhook callback:
import requests
r = requests.post(url = 'https://async.scraperapi.com/jobs', json={ 'apiKey': 'xxxxxx', 'url': 'https://example.com', 'callback': {'type': 'webhook', 'url': 'https://yourcompany.com/scraperapi' }})
print(r.text)
Using a callback you don’t need to use the status URL (although you still can) to fetch the status and results of the job. Once the job is finished the provided webhook URL will be invoked by our system with the same content as the status URL provides.
Just replace the https://yourcompany.com/scraperapi
URL with your preferred endpoint. You can even add basic auth to the URL in the following format: https://user:pass@yourcompany.com/scraperapi
Note: The system will try to invoke the webhook URL 3 times, then it cancels the job. So please make sure that the webhook URL is available through the public internet and will be capable of handling the traffic that you need.
Hint: Webhook.site is a free online service to test webhooks without requiring you to build a complex infrastructure.
API parameters
You can use the usual API parameters just the same way you’d use it with our synchronous API. These parameters should go into an apiParams
object inside the POST data, e.g:
{
"apiKey": "xxxxxx",
"apiParams": {
"autoparse": false, // boolean
"binary_target": false, // boolean
"country_code": "us", // string, see: https://api.scraperapi.com/geo
"device_type": "desktop", // desktop | mobile
"follow_redirect": false, // boolean
"premium": true, // boolean
"render": false, // boolean
"retry_404": false // boolean
},
"url": "https://example.com"
}
Requests to the API EndpointMethod #2
ScraperAPI exposes a single API endpoint for you to send GET requests. Simply send a GET request to http://api.scraperapi.com
with two query string parameters and the API will return the HTML response for that URL:
api_key
which contains your API key, andurl
which contains the URL you would like to scrape
You should format your requests to the API endpoint as follows:
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip"
Sample Code
import requests
payload = {'api_key': 'APIKEY', 'url': 'https://httpbin.org/ip'}
r = requests.get('http://api.scraperapi.com', params=payload)
print r.text
# Scrapy users can simply replace the urls in their start_urls and parse function
# ...other scrapy setup code
start_urls = ['http://api.scraperapi.com?api_key=APIKEY&url=' + url]
def parse(self, response):
# ...your parsing logic here
yield scrapy.Request('http://api.scraperapi.com/?api_key=APIKEY&url=' + url, self.parse)
To enable other API functionality when sending a request to the API endpoint simply add the appropriate query parameters to the end of the ScraperAPI URL.
For example, if you want to enable Javascript rendering with a request, then add render=true
to the request.
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true"
Sample Code
import requests
payload = {'api_key': 'APIKEY', 'url':'https://httpbin.org/ip', 'render': 'true'}
r = requests.get('http://api.scraperapi.com', params=payload)
print r.text
# Scrapy users can simply replace the urls in their start_urls and parse function
# ...other scrapy setup code
start_urls = ['http://api.scraperapi.com?api_key=APIKEY&url=' + url +'&render=true']
def parse(self, response):
# ...your parsing logic here
yield scrapy.Request('http://api.scraperapi.com/?api_key=APIKEY&url=' + url +'&render=true', self.parse)
To use two or more parameters, simply separate them with the “&” sign.
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true&country_code=us"
Sample Code
import requests
payload = {'api_key': 'APIKEY', 'url':'https://httpbin.org/ip', 'render': 'true', 'country_code': 'us'}
r = requests.get('http://api.scraperapi.com', params=payload)
print r.text
# Scrapy users can simply replace the urls in their start_urls and parse function
# ...other scrapy setup code
start_urls = ['http://api.scraperapi.com?api_key=APIKEY&url=' + url +'&render=true' + ‘&country_code=true’]
def parse(self, response):
# ...your parsing logic here
yield scrapy.Request('http://api.scraperapi.com/?api_key=APIKEY&url=' + url +'&render=true' + ‘&country_code=true’, self.parse)
Requests via ScraperAPI SDK Method #3
To make it easier to get up and running, you can also use our Python SDK. First, you will need to install the SDK:
pip install scraperapi-sdk
Then integrate it into your code:
Sample Code
from scraper_api import ScraperAPIClient
client = ScraperAPIClient('APIKEY')
result = client.get(url = 'http://httpbin.org/ip').text
print(result)
# Scrapy users can simply replace the urls in their start_urls and parse function
# Note for Scrapy, you should not use DOWNLOAD_DELAY and
# RANDOMIZE_DOWNLOAD_DELAY, these will lower your concurrency and are not
# needed with our API
# ...other scrapy setup code
start_urls =[client.scrapyGet(url = 'http://httpbin.org/ip')]
def parse(self, response):
# ...your parsing logic here
yield scrapy.Request(client.scrapyGet(url = 'http://httpbin.org/ip'), self.parse)
To enable extra functionality while using our SDK, add an additional parameter to the GET request:
from scraper_api import ScraperAPIClient
client = ScraperAPIClient('APIKEY')
#Requests Users
result = client.get(url = 'http://httpbin.org/ip', render=true).text
# Scrapy users
yield scrapy.Request(client.scrapyGet(url = 'http://httpbin.org/ip', render=true), self.parse)
To use two or more parameters, simply add them to the GET request:
from scraper_api import ScraperAPIClient
client = ScraperAPIClient('APIKEY')
#Requests Users
result = client.get(url = 'http://httpbin.org/ip', render=true, country_code=true).text
# Scrapy users
yield scrapy.Request(client.scrapyGet(url = 'http://httpbin.org/ip', render=true, country_code=true), self.parse)
Send Requests to the Proxy Port Method #4
To simplify implementation for users with existing proxy pools, we offer a proxy front-end to the API. The proxy will take your requests and pass them through to the API which will take care of proxy rotation, captchas, and retries.
The proxy mode is a light front-end for the API and has all the same functionality and performance as sending requests to the API endpoint.
The username
for the proxy is scraperapi and the password
is your API key.
"http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001"
You can use the ScraperAPI proxy the same way as you would use any other proxy:
import requests
proxies = {
"http": "http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001"
}
r = requests.get('http://httpbin.org/ip', proxies=proxies, verify=False)
print(r.text)
# Scrapy users can likewise simply pass their API key in headers.
# NB: Scrapy skips SSL verification by default.
# ...other scrapy setup code
start_urls = ['http://httpbin.org/ip']
meta = {
"proxy": "http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001"
}
def parse(self, response):
# ...your parsing logic here
yield scrapy.Request(url, callback=self.parse, headers=headers, meta=meta)
Note: So that we can properly direct your requests through the API, your code must be configured to not verify SSL certificates.
To enable extra functionality whilst using the API in proxy mode, you can pass parameters to the API by adding them to the username, separated by periods.
For example, if you want to enable Javascript rendering with a request, the username would be scraperapi.render=true
"http://scraperapi.render=true:APIKEY@proxy-server.scraperapi.com:8001"
Multiple parameters can be included by separating them with periods; for example:
"http://scraperapi.render=true.country_code=us:APIKEY@proxy-server.scraperapi.com:8001"
Making POST/PUT Requests
Some advanced users may want to send POST/PUT requests in order to scrape forms and API endpoints directly. You can do this by sending a POST/PUT request through ScraperAPI. The return value will be stringified. So if you want to use it as JSON, you will need to parse it into a JSON object.
import requests
payload = {'api_key': 'APIKEY', 'url': 'https://httpbin.org/anything'}
r = requests.post('http://api.scraperapi.com', params=payload, data={'foo': 'bar'})
print(r.text)
API Status Codes
The API will return a specific status code after every request depending on whether the request was successful, failed, or some other error occurred. ScraperAPI will retry failed requests for up to 60 seconds to try and get a successful response from the target URL before responding with a 500
error indicating a failed request.
Note: To avoid timing out your request before the API has had a chance to complete all retries, remember to set your timeout to 60 seconds.
In cases where a request fails after 60 seconds of retrying, you will not be charged for the unsuccessful request (you are only charged for successful requests, 200
and 404
status codes).
Errors will inevitably occur so on your end make sure you catch these errors. You can configure your code to immediately retry the request and in most cases, it will be successful. If a request is consistently failing then check your request to make sure that it is configured correctly. Or if you are consistently getting bans messages from an anti-bot, then create a ticket with our support team and we will try to bypass this anti-bot for you.
If you receive a successful 200
status code response from the API but the response contains a CAPTCHA, please contact our support team and we will add it to our CAPTCHA detection database. Once included in our CAPTCHA database the API will treat it as a ban in the future and automatically retry the request.
Below are the possible status code you will receive:
Status Code | Details |
---|---|
200 |
Successful response |
404 |
Page requested does not exist. |
410 |
Page requested is no longer available. |
500 |
After retrying for 60 seconds, the API was unable to receive a successful response. |
429 |
You are sending requests too fast, and exceeding your concurrency limit. |
403 |
You have used up all your API credits. |
Credits and Requests
Every time you make a request, you consume credits. The credits you have available are defined by the plan you are currently using. The amount of credits you consume depends on the domain and parameters you add to your request.
Domains
We have created custom scrapers to target these sites, if you scrape one of these domains you will activate this scraper and the credit cost will change.
Category | Domain examples | Credit Cost |
---|---|---|
SERP | Google, Bing | 25 |
E-Commerce | Amazon, Allegro | 5 |
Protected Domains | Linkedin, G2, Social Media Sites | 30 |
Geotargeting is included in these credit costs.
If your domain does not fit in any category in the above list, it will cost 1
credit if no additional parameters are added.
Parameters
According to your needs, you may want to access different features on our platform.
premium=true
– requests cost 10 credits
render=true
– requests cost 10 credits
premium=true
+ render=true
– requests cost 25 credits
ultra_premium=true
– requests cost 30 credits
ultra_premium=true
+ render=true
– requests cost 75 credits
In any requests, with or without these parameters, we will only charge for successful requests (200
and 404
status codes). If you run out of credits sooner than planned, you can renew your subscription early as explained in the next section – Dashboard.
Dashboard
The Dashboard shows you all the information so you can keep track of your usage. It is divided into Usage, Billing, Documentation, and links to contact us.
Usage
Here you will find your API key, sample codes, monitoring tools, and usage statistics such as credits used, concurrency, failed requests, your current plan, and the end date of your billing cycle.
Billing
In this section, you are able to see your current plan, the end date of the current billing cycle, as well as subscribe to another plan. This is also the right place to set or update your billing details, payment method, view your invoices, manage your subscription, or cancel it.
Here you can also renew your subscription early, in case you run out of API credits sooner than planned. This option will let you renew your subscription at an earlier date than planned, charging you for the subscription price and resetting your credits. If you find yourself using this option regularly, please reach out to us so we can find a plan that can accommodate the number of credits you need.
Documentation
This will direct you to our documentation – exactly where you are right now.
Contact Support
Direct link to reach out to our technical support team.
Contact Sales
Direct link to our sales team, for any subscription or billing-related inquiries.
Customise API Functionality
ScraperAPI enables you to customize the API’s functionality by adding additional parameters to your requests. The API will accept the following parameters:
Parameter | Description |
---|---|
render |
Activate javascript rendering by setting render=true in your request. The API will automatically render the javascript on the page and return the HTML response after the javascript has been rendered.
Requests using this parameter cost 10 API credits, or 75 if used in combination with ultra-premium. |
country_code |
Activate country geotargeting by setting country_code=us to use US proxies for example.
This parameter does not increase the cost of the API request. |
premium |
Activate premium residential and mobile IPs by setting premium=true . Using premium proxies costs 10 API credits, or 25 API credits if used in combination with Javascript rendering. |
session_number |
Reuse the same proxy by setting session_number=123 for example.
This parameter does not increase the cost of the API request. |
keep_headers |
Use your own custom headers by setting keep_headers=true along with sending your own headers to the API.
This parameter does not increase the cost of the API request. |
device_type |
Set your requests to use mobile or desktop user agents by setting device_type=desktop or device_type=mobile .
This parameter does not increase the cost of the API request. |
autoparse |
Activate auto parsing for select websites by setting autoparse=true . The API will parse the data on the page and return it in JSON format.
This parameter does not increase the cost of the API request. |
ultra_premium |
Activate our advanced bypass mechanisms by setting ultra_premium=true .
Requests using this parameter cost 30 API credits, or 75 if used in combination with javascript rendering. |
Rendering Javascript
If you are crawling a page that requires you to render the javascript on the page to scrape the data you need, then we can fetch these pages using a headless browser. To render javascript, simply set render=true
and we will use a headless Google Chrome instance to fetch the page. This feature is only available on the Business and Enterprise plans.
import requests
payload = {'api_key': 'APIKEY', 'url':'https://httpbin.org/ip', 'render': 'true'}
r = requests.get('http://api.scraperapi.com', params=payload)
print r.text
# Scrapy users can simply replace the urls in their start_urls and parse function
# ...other scrapy setup code
start_urls = ['http://api.scraperapi.com?api_key=APIKEY&url=' + url + '&render=true']
def parse(self, response):
# ...your parsing logic here
yield scrapy.Request('http://api.scraperapi.com/?api_key=APIKEY&url=' + url + '&render=true', self.parse)
Custom Headers
If you would like to use your own custom headers (user agents, cookies, etc.) when making a request to the website, simply set keep_headers=true
and send the API the headers you want to use. The API will then use these headers when sending requests to the website.
Note: Only use this feature if you need to send custom headers to retrieve specific results from the website. Within the API we have a sophisticated header management system designed to increase success rates and performance on difficult sites. When you send your own custom headers you override our header system, which oftentimes lowers your success rates. Unless you absolutely need to send custom headers to get the data you need, we advise that you don’t use this functionality.
If you need to get results for mobile devices, use the device_type
parameter to set the user agent header for your request, instead of setting your own.
import requests
url = 'http://httpbin.org/anything'
headers = {
'Accept': 'application/json'
'X-MyHeader': '123',
}
payload = {'api_key': 'APIKEY', 'url': 'https://httpbin.org/ip', 'keep_headers': 'true'}
r = requests.get('http://api.scraperapi.com', params=payload, headers=headers)
print r.text
# Scrapy users can simply replace the urls in their start_urls and parse function in their parse function
# ...other scrapy setup code
headers = {
'Accept': 'application/json'
'X-MyHeader': '123',
}
start_urls = ['http://api.scraperapi.com?api_key=APIKEY&url=' + url + '&keep_headers=true']
def parse(self, response):
# ...your parsing logic here
scraper_url = 'http://api.scraperapi.com/?api_key=APIKEY&url=' + url + '&keep_headers=true'
yield scrapy.Request(scraper_url, self.parse, headers=headers)
Sessions
To reuse the same proxy for multiple requests, simply use the session_number
parameter by setting it equal to a unique integer for every session you want to maintain (e.g. session_number=123
). This will allow you to continue using the same proxy for each request with that session number. To create a new session simply set the session_number
parameter with a new integer to the API. The session value can be any integer. Sessions expire 15 minutes after the last usage.
import requests
payload = {'api_key': 'APIKEY', 'url':'https://httpbin.org/ip', 'session_number': '123'}
r = requests.get('http://api.scraperapi.com', params=payload)
print r.text
# Scrapy users can simply replace the urls in their start_urls and parse function
# ...other scrapy setup code
start_urls = ['http://api.scraperapi.com?api_key=APIKEY&url=' + url + '&session_number=123']
Geographic Location
Certain websites (typically e-commerce stores and search engines) display different data to different users based on the geolocation of the IP used to make the request to the website. In cases like these, you can use the API’s geotargeting functionality to easily use proxies from the required country to retrieve the correct data from the website.
To control the geolocation of the IP used to make the request, simply set the country_code
parameter to the country you want the proxy to be from and the API will automatically use the correct IP for that request.
For example: to ensure your requests come from the United States, set the country_code
parameter to country_code=us
.
Business and Enterprise Plan users can geotarget their requests to the following 13 regions (Hobby and Startup Plans can only use US and EU geotargeting) by using the country_code
in their request:
Country Code | Region | Plans |
---|---|---|
us |
United States | Hobby Plan and higher. |
eu |
Europe | Hobby Plan and higher. |
ca |
Canada | Business Plan and higher. |
uk |
United Kingdom | Business Plan and higher. |
de |
Germany | Business Plan and higher. |
fr |
France | Business Plan and higher. |
es |
Spain | Business Plan and higher. |
br |
Brazil | Business Plan and higher. |
mx |
Mexico | Business Plan and higher. |
in |
India | Business Plan and higher. |
jp |
Japan | Business Plan and higher. |
cn |
China | Business Plan and higher. |
au |
Australia | Business Plan and higher. |
Geotargeting “eu” will use IPs from any European country.
Other countries are available to Enterprise customers upon request.
import requests
payload = {'api_key': 'APIKEY', 'url':'https://httpbin.org/ip', 'country_code': 'us'}
r = requests.get('http://api.scraperapi.com', params=payload)
print r.text
# Scrapy users can simply replace the urls in their start_urls and parse function
# ...other scrapy setup code
start_urls = ['http://api.scraperapi.com?api_key=APIKEY&url=' + url + 'country_code=us']
def parse(self, response):
# ...your parsing logic here
yield scrapy.Request('http://api.scraperapi.com/?api_key=APIKEY&url=' + url + 'country_code=us', self.parse)
Premium Residential/Mobile Proxy Pools
Our standard proxy pools include millions of proxies from over a dozen ISPs, and should be sufficient for the vast majority of scraping jobs. However, for a few particularly difficult to scrape sites, we also maintain a private internal pool of residential and mobile IPs. This pool is only available to users on the Business plan or higher.
Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API credits against your monthly limit). Each request that uses both javascript rendering and our premium proxy pools will be charged at 25 times the normal rate (every successful request will count as 25 API credits against your monthly limit). To send a request through our premium proxy pool, please set the premium
query parameter to premium=true
.
We also have a higher premium level that you can use for really tough targets, such as LinkedIn. You can access these pools by adding the ultra_premium=true
query parameter. These requests will use 30 API credits against your monthly limit, or 75 if used together with rendering. Please note, this is only available on our paid plans.
import requests
payload = {'api_key': 'APIKEY', 'url':'https://httpbin.org/ip', 'premium': 'true'}
r = requests.get('http://api.scraperapi.com', params=payload)
print r.text
# Scrapy users can simply replace the urls in their start_urls and parse function
# ...other scrapy setup code
start_urls = ['http://api.scraperapi.com?api_key=APIKEY&url=' + url + 'premium=true']
def parse(self, response):
# ...your parsing logic here
yield scrapy.Request('http://api.scraperapi.com/?api_key=APIKEY&url=' + url + 'premium=true', self.parse)
Device Type
If your use case requires you to exclusively use either desktop or mobile user agents in the headers it sends to the website then you can use the device_type
parameter.
- Set
device_type=desktop
to have the API set a desktop (e.g. iOS, Windows, or Linux) user agent. Note: This is the default behavior. Not setting the parameter will have the same effect. - Set
device_type=mobile
to have the API set a mobile (e.g. iPhone or Android) user agent.
Note: The device type you set will be overridden if you use keep_headers=true
and send your own user agent in the requests header.
import requests
payload = {'api_key': 'APIKEY', 'url':'https://httpbin.org/ip', 'device_type': 'mobile'}
r = requests.get('http://api.scraperapi.com', params=payload)
print r.text
# Scrapy users can simply replace the urls in their start_urls and parse function
# ...other scrapy setup code
start_urls = ['http://api.scraperapi.com?api_key=APIKEY&url=' + url + 'device_type=mobile']
def parse(self, response):
# ...your parsing logic here
yield scrapy.Request('http://api.scraperapi.com/?api_key=APIKEY&url=' + url + 'device_type=mobile', self.parse)
Autoparse
For select websites the API will parse all the valuable data in the HTML response and return it in JSON format. To enable this feature, simply add autoparse=true
to your request and the API will parse the data for you. Currently, this feature works with Amazon, Google Search, and Google Shopping.
import requests
payload = {'api_key': 'APIKEY', 'url':'https://www.amazon.com/dp/B07V1PHM66', 'autoparse': 'true'}
r = requests.get('http://api.scraperapi.com', params=payload)
print r.text
# Scrapy users can simply replace the urls in their start_urls and parse function
# ...other scrapy setup code
start_urls = ['http://api.scraperapi.com?api_key=APIKEY&url=' + url + 'autoparse=true']
def parse(self, response):
# ...your parsing logic here
yield scrapy.Request('http://api.scraperapi.com/?api_key=APIKEY&url=' + url + 'autoparse=true', self.parse)
Wait for Element when rendering
If a rendered request is a bit slow and the page stabilises before the request is satisfied, it can fool the API into thinking the page has finished rendering.
To cope with this, you can tell the API to wait for an dom element (selector) to appear on the page when rendering. You just need to send the wait_for_selector=
parameter, passing a URL-encoded jquery selector. The API will then wait for this to appear on the page before returning results.
Account Information
If you would like to monitor your account usage and limits programmatically (how many concurrent requests you’re using, how many requests you’ve made, etc.) you may use the /account endpoint, which returns JSON.
Note: the requestCount
and failedRequestCount
numbers only refresh once every 15 seconds, while the concurrentRequests
number is available in real time.
import requests
payload = {'api_key': 'APIKEY'}
r = requests.get('http://api.scraperapi.com/account', params=payload)
print r.text
Dashboard
The Dashboard shows you all the information so you can keep track of your usage. It is divided into Usage, Billing, Documentation, and links to contact us.
Usage
In here you will find your API key, sample codes, monitoring tools, and usage statistics such as credits used, concurrency, failed requests, your current plan, and the end date of your billing cycle.
Billing
In this section you are able to see your current plan, the end date of the current billing cycle, as well as subscribe to another plan. This is also the right place to set or update your billing details, payment method, view your invoices, manage your subscription, or cancel it.
Here you can also renew your subscription early, in case you run out of API credits sooner than planned. This option will let you renew your subscription at an earlier date than planned, charging you for the subscription price and resetting your credits. If you find yourself using this option regularly, please reach out to us so we can find a plan that can accommodate the number of credits you need.
Documentation
This will direct you to our documentation – exactly where you are right now.
Contact Support
Direct link to reach out to our technical support team,
Contact Sales
Direct link to our sales team, for any subscription or billing-related inquiries.
Getting Started
Using ScraperAPI is easy. Just send the URL you would like to scrape to the API along with your API key and the API will return the HTML response from the URL you want to scrape.
API Key & Authentication
ScraperAPI uses API keys to authenticate requests. To use the API you need to sign up for an account and include your unique API key in every request.
If you haven’t signed up for an account yet then
sign up for a free trial here with 5,000 free API credits
Making Requests
You can use the API to scrape web pages, API endpoints, images, documents, PDFs, or other files just as you would any other URL. Note: there is a 2MB limit per request.
There are three ways in which you can send GET requests to ScraperAPI:
- Via our Async Scraper service
http://async.scraperapi.com
- Via our API endpoint
http://api.scraperapi.com?
- Via our NodeJS SDK
- Via our proxy port
http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001
Important note: regardless of how you invoke the service, we highly recommend you set a 60 seconds timeout in your application to get the best possible success rates, especially for some hard-to-scrape domains.
Async requests Method #1
To ensure a higher level of successful requests when using our scraper, we’ve built a new product, Async Scraper. Rather than making requests to our endpoint waiting for the response, this endpoint submits a job of scraping, in which you can later collect the data from using our status endpoint.
Scraping websites can be a difficult process; it takes numerous steps and significant effort to get through some sites’ protection which sometimes proves to be difficult with the timeout constraints of synchronous APIs. The Async Scraper will work on your requested URLs until we have achieved a 100% success rate (when applicable), returning the data to you.
Async Scraping is the recommended way to scrape pages when success rate on difficult sites is more important to you than response time (e.g. you need a set of data periodically).
How to use
The async scraper endpoint is available at https://async.scraperapi.com
and it exposes a few useful APIs.
Submit an async job
A simple example showing how to submit a job for scraping and receive a status endpoint URL through which you can poll for the status (and later the result) of your scraping job:
const axios = require('axios');
(async() => {
const { data } = await axios({
data: {
apiKey: 'xxxxxx',
url: 'https://example.com'
},
headers: { 'Content-Type': 'application/json' },
method: 'POST',
url: 'https://async.scraperapi.com/jobs'
});
console.log(data);
})();
Response:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}
Note the statusUrl
field in the response. That is a personal URL to retrieve the status and results of your scraping job. Invoking that endpoint provides you with the status first:
const axios = require('axios');
(async() => {
const { data } = await axios.get('https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953');
console.log(data);
})();
Response:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}
Once your job is finished, the response will change and will contain the results of your scraping job:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"finished",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com",
"response":{
"headers":{
"date":"Thu, 14 Apr 2022 11:10:44 GMT",
"content-type":"text/html; charset=utf-8",
"content-length":"1256",
"connection":"close",
"x-powered-by":"Express",
"access-control-allow-origin":"undefined","access-control-allow-headers":"Origin, X-Requested-With, Content-Type, Accept",
"access-control-allow-methods":"HEAD,GET,POST,DELETE,OPTIONS,PUT",
"access-control-allow-credentials":"true",
"x-robots-tag":"none",
"sa-final-url":"https://example.com/",
"sa-statuscode":"200",
"etag":"W/\"4e8-Sjzo7hHgkd15I/TYxuW15B7HwEc\"",
"vary":"Accept-Encoding"
},
"body":"<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n\n <meta charset=\"utf-8\" />\n <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n <style type=\"text/css\">\n body {\n background-color: #f0f0f2;\n margin: 0;\n padding: 0;\n font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n \n }\n div {\n width: 600px;\n margin: 5em auto;\n padding: 2em;\n background-color: #fdfdff;\n border-radius: 0.5em;\n box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n }\n a:link, a:visited {\n color: #38488f;\n text-decoration: none;\n }\n @media (max-width: 700px) {\n div {\n margin: 0 auto;\n width: auto;\n }\n }\n </style> \n</head>\n\n<body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission.</p>\n <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n",
"statusCode":200
}
}
Callbacks
Using a status URL is a great way to test the API or get started quickly, but some customer environments may require some more robust solutions, so we implemented callbacks. Currently only webhook
callbacks are supported but we are planning to introduce more over time (e.g. direct database callbacks, AWS S3…etc).
An example of using a webhook callback:
const axios = require('axios');
(async() => {
const { data } = await axios({
data: {
apiKey: 'xxxxxx',
callback: {
type: 'webhook',
url: 'https://yourcompany.com/scraperapi'
},
url: 'https://example.com'
},
headers: { 'Content-Type': 'application/json' },
method: 'POST',
url: 'https://async.scraperapi.com/jobs'
});
console.log(data);
})();
Using a callback you don’t need to use the status URL (although you still can) to fetch the status and results of the job. Once the job is finished the provided webhook URL will be invoked by our system with the same content as the status URL provides.
Just replace the https://yourcompany.com/scraperapi
URL with your preferred endpoint. You can even add basic auth to the URL in the following format: https://user:pass@yourcompany.com/scraperapi
Note: The system will try to invoke the webhook URL 3 times, then it cancels the job. So please make sure that the webhook URL is available through the public internet and will be capable of handling the traffic that you need.
Hint: Webhook.site is a free online service to test webhooks without requiring you to build a complex infrastructure.
API parameters
You can use the usual API parameters just the same way you’d use it with our synchronous API. These parameters should go into an apiParams
object inside the POST data, e.g:
{
"apiKey": "xxxxxx",
"apiParams": {
"autoparse": false, // boolean
"binary_target": false, // boolean
"country_code": "us", // string, see: https://api.scraperapi.com/geo
"device_type": "desktop", // desktop | mobile
"follow_redirect": false, // boolean
"premium": true, // boolean
"render": false, // boolean
"retry_404": false // boolean
},
"url": "https://example.com"
}
Requests to the API EndpointMethod #2
You should format your requests to the API endpoint as follows:
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip"
Sample Code
const request = require('request-promise');
request(`http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip`)
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
To enable other API functionality when sending a request to the API endpoint simply add the appropriate query parameters to the end of the ScraperAPI URL.
For example, if you want to enable Javascript rendering with a request, then add render=true
to the request:
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true"
Sample Code
const request = require('request-promise');
request(`http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip&render=true`)
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
To use two or more parameters, simply separate them with the “&” sign.
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true&country_code=us"
Sample Code
const request = require('request-promise');
request(`http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip&render=true&country_code=us`)
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
Requests via ScraperAPI SDK Method #3
To make it easier to get up and running you can also use our Python SDK. First, you will need to install the SDK:
npm install scraperapi-sdk --save
Then integrate it into your code:
Sample Code
// remember to install the library: npm install scraperapi-sdk --save
const scraperapiClient = require('scraperapi-sdk')('APIKEY')
scraperapiClient.get('http://httpbin.org/ip')
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
To enable extra functionality while using our SDK, simply add an additional parameter to the GET request:
// remember to install the library: npm install scraperapi-sdk --save
const scraperapiClient = require('scraperapi-sdk')('APIKEY')
scraperapiClient.get('http://httpbin.org/ip', {render: true})
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
To use two or more parameters, simply add them to the GET request:
// remember to install the library: npm install scraperapi-sdk --save
const scraperapiClient = require('scraperapi-sdk')('APIKEY')
const params = {
render: true,
country_code: 'us'
}
scraperapiClient.get('http://httpbin.org/ip', params)
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
Send Requests to the Proxy Port Method #4
To simplify implementation for users with existing proxy pools, we offer a proxy front-end to the API. The proxy will take your requests and pass them through to the API which will take care of proxy rotation, captchas, and retries.
The proxy mode is a light frontend for the API and has all the same functionality and performance as sending requests to the API endpoint.
The username
for the proxy is scraperapi and the password
is your API key.
"http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001"
You can use the ScraperAPI proxy the same way as you would use any other proxy:
const request = require('request-promise');
options = {
method: 'GET',
url: 'http://httpbin.org/ip',
proxy:`http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001`
}
request(options)
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
Note: So that we can properly direct your requests through the API, your code must be configured to not verify SSL certificates.
To enable extra functionality whilst using the API in proxy mode, you can pass parameters to the API by adding them to username, separated by periods.
For example, if you want to enable Javascript rendering with a request, the username would be scraperapi.render=true
"http://scraperapi.render=true:APIKEY@proxy-server.scraperapi.com:8001"
Multiple parameters can be included by separating them with periods; for example:
"http://scraperapi.render=true.country_code=us:APIKEY@proxy-server.scraperapi.com:8001"
Making POST/PUT Requests
Some advanced users may want to send POST/PUT requests in order to scrape forms and API endpoints directly. You can do this by sending a POST/PUT request through ScraperAPI. The return value will be stringified. So if you want to use it as JSON, you will need to parse it into a JSON object.
const request = require('request-promise');
// Replace POST with PUT to send a PUT request instead
options = {
method: 'POST',
url: 'http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/post',
body: JSON.stringify({foo: 'bar'}),
headers: {
'Content-Type': 'application/json',
},
}
request(options)
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
//For form data
options = {
method: 'POST',
url: http://api.scraperapi.com/?api_key=${api_key}&url=http://httpbin.org/post,
form: {foo: 'bar'},
headers: {
'Content-Type': 'application/x-www-form-urlencoded',
},
}
request(options)
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
API Status Codes
The API will return a specific status code after every request depending on whether the request was successful, failed or some other error occurred. ScraperAPI will retry failed requests for up to 60 seconds to try and get a successful response from the target URL before responding with a 500
error indicating a failed request.
Note: To avoid timing out your request before the API has had a chance to complete all retries, remember to set your timeout to 60 seconds.
In cases where a request fails after 60 seconds of retrying, you will not be charged for the unsuccessful request (you are only charged for successful requests, 200
and 404
status codes).
Errors will inevitably occur so on your end make sure you catch these errors. You can configure your code to immediately retry the request and in most cases, it will be successful. If a request is consistently failing then check your request to make sure that it is configured correctly. Or, if you are consistently getting bans messages from an anti-bot, then create a ticket with our support team and we will try to bypass this anti-bot for you.
If you receive a successful 200
status code response from the API but the response contains a CAPTCHA, please contact our support team and we will add it to our CAPTCHA detection database. Once included in our CAPTCHA database the API will treat it as a ban in the future and automatically retry the request.
Below are the possible status codes you will receive:
Status Code | Details |
---|---|
200 |
Successful response |
404 |
Page requested does not exist. |
410 |
Page requested is no longer available. |
500 |
After retrying for 60 seconds, the API was unable to receive a successful response. |
429 |
You are sending requests too fast, and exceeding your concurrency limit. |
403 |
You have used up all your API credits. |
Credits and Requests
Every time you make a request, you consume credits. The credits you have available are defined by the plan you are currently using. The amount of credits you consume depends on the domain and parameters you add to your request.
Domains
We have created custom scrapers to target these sites, if you scrape one of these domains you will activate this scraper and the credit cost will change.
Category | Domain examples | Credit Cost |
---|---|---|
SERP | Google, Bing | 25 |
E-Commerce | Amazon, Allegro | 5 |
Protected Domains | Linkedin, G2, Social Media Sites | 30 |
Geotargeting is included in these credit costs.
If your domain does not fit in any category in the above list, it will cost 1
credit if no additional parameters are added.
Parameters
According to your needs, you may want to access different features on our platform.
premium=true
– requests cost 10 credits
render=true
– requests cost 10 credits
premium=true
+ render=true
– requests cost 25 credits
ultra_premium=true
– requests cost 30 credits
ultra_premium=true
+ render=true
– requests cost 75 credits
In any requests, with or without these parameters, we will only charge for successful requests (200
and 404
status codes). If you run out of credits sooner than planned, you can renew your subscription early as explained in the next section – Dashboard.
Dashboard
The Dashboard shows you all the information so you can keep track of your usage. It is divided into Usage, Billing, Documentation, and links to contact us.
Usage
Here you will find your API key, sample codes, monitoring tools, and usage statistics such as credits used, concurrency, failed requests, your current plan, and the end date of your billing cycle.
Billing
In this section, you are able to see your current plan, the end date of the current billing cycle, as well as subscribe to another plan. This is also the right place to set or update your billing details, payment method, view your invoices, manage your subscription, or cancel it.
Here you can also renew your subscription early, in case you run out of API credits sooner than planned. This option will let you renew your subscription at an earlier date than planned, charging you for the subscription price and resetting your credits. If you find yourself using this option regularly, please reach out to us so we can find a plan that can accommodate the number of credits you need.
Documentation
This will direct you to our documentation – exactly where you are right now.
Contact Support
Direct link to reach out to our technical support team.
Contact Sales
Direct link to our sales team, for any subscription or billing-related inquiries.
Customize API Functionality
ScraperAPI enables you to customize the API’s functionality by adding additional parameters to your requests. The API will accept the following parameters:
Parameter | Description |
---|---|
render |
Activate javascript rendering by setting render=true in your request. The API will automatically render the javascript on the page and return the HTML response after the javascript has been rendered.
Requests using this parameter cost 10 API credits, or 75 if used in combination with ultra-premium |
country_code |
Activate country geotargeting by setting country_code=us to use US proxies for example.
This parameter does not increase the cost of the API request. |
premium |
Activate premium residential and mobile IPs by setting premium=true . Using premium proxies costs 10 API credits, or 25 API credits if used in combination with Javascript rendering render=true . |
session_number |
Reuse the same proxy by setting session_number=123 for example.
This parameter does not increase the cost of the API request. |
keep_headers |
Use your own custom headers by setting keep_headers=true along with sending your own headers to the API.
This parameter does not increase the cost of the API request. |
device_type |
Set your requests to use mobile or desktop user agents by setting device_type=desktop or device_type=mobile .
This parameter does not increase the cost of the API request. |
autoparse |
Activate auto parsing for select websites by setting autoparse=true . The API will parse the data on the page and return it in JSON format.
This parameter does not increase the cost of the API request. |
ultra_premium |
Activate our advanced bypass mechanisms by setting ultra_premium=true .
Requests using this parameter cost 30 API credits, or 75 if used in combination with javascript rendering. |
Rendering Javascript
If you are crawling a page that requires you to render the javascript on the page to scrape the data you need, then we can fetch these pages using a headless browser. To render javascript, simply set render=true
and we will use a headless Google Chrome instance to fetch the page. This feature is only available on the Business and Enterprise plans.
const request = require('request-promise');
request('http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip&render=true')
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
Custom Headers
If you would like to use your own custom headers (user agents, cookies, etc.) when making a request to the website, simply set keep_headers=true
and send the API the headers you want to use. The API will then use these headers when sending requests to the website.
Note: Only use this feature if you need to send custom headers to retrieve specific results from the website. Within the API we have a sophisticated header management system designed to increase success rates and performance on difficult sites. When you send your own custom headers you override our header system, which oftentimes lowers your success rates. Unless you absolutely need to send custom headers to get the data you need, we advise that you don’t use this functionality.
If you need to get results for mobile devices, use the device_type
parameter to set the user-agent header for your request, instead of setting your own.
const request = require('request-promise');
options = {
method: 'GET',
url: 'http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/anything&keep_headers=true',
headers: {
Accept: 'application/json',
'X-MyHeader': '123',
},
}
request(options)
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
Sessions
To reuse the same proxy for multiple requests, simply use the session_number
parameter by setting it equal to a unique integer for every session you want to maintain (e.g. session_number=123
). This will allow you to continue using the same proxy for each request with that session number. To create a new session simply set the session_number
parameter with a new integer to the API. The session value can be any integer. Sessions expire 15 minutes after the last usage.
const request = require('request-promise');
request('http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip&session_number=123')
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
Geographic Location
Certain websites (typically e-commerce stores and search engines) display different data to different users based on the geolocation of the IP used to make the request to the website. In cases like these, you can use the API’s geotargeting functionality to easily use proxies from the required country to retrieve the correct data from the website.
To control the geolocation of the IP used to make the request, simply set the country_code
parameter to the country you want the proxy to be from and the API will automatically use the correct IP for that request.
For example: to ensure your requests come from the United States, set the country_code
parameter to country_code=us
.
Business and Enterprise Plan users can geotarget their requests to the following 13 regions (Hobby and Startup Plans can only use US and EU geotargeting) by using the country_code
in their request:
Country Code | Region | Plans |
---|---|---|
us |
United States | Hobby Plan and higher. |
eu |
Europe | Hobby Plan and higher. |
ca |
Canada | Business Plan and higher. |
uk |
United Kingdom | Business Plan and higher. |
de |
Germany | Business Plan and higher. |
fr |
France | Business Plan and higher. |
es |
Spain | Business Plan and higher. |
br |
Brazil | Business Plan and higher. |
mx |
Mexico | Business Plan and higher. |
in |
India | Business Plan and higher. |
jp |
Japan | Business Plan and higher. |
cn |
China | Business Plan and higher. |
au |
Australia | Business Plan and higher. |
Geotargeting “eu” will use IPs from any European country.
Other countries are available to Enterprise customers upon request.
const request = require('request-promise');
request('http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip&country_code=us')
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
Premium Residential/Mobile Proxy Pools
Our standard proxy pools include millions of proxies from over a dozen ISPs and should be sufficient for the vast majority of scraping jobs. However, for a few particularly difficult to scrape sites, we also maintain a private internal pool of residential and mobile IPs. This pool is only available to users on the Business plan or higher.
Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API credits against your monthly limit). Each request that uses both javascript rendering and our premium proxy pools will be charged at 25 times the normal rate (every successful request will count as 25 API credits against your monthly limit). To send a request through our premium proxy pool, please set the premium
query parameter to premium=true
.
We also have a higher premium level that you can use for really tough targets, such as LinkedIn. You can access these pools by adding the ultra_premium=true
query parameter. These requests will use 30 API credits against your monthly limit, or 75 if used together with rendering. Please note, this is only available on our paid plans.
const request = require('request-promise');
request('http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip&premium=true')
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
Device Type
If your use case requires you to exclusively use either desktop or mobile user agents in the headers it sends to the website then you can use the device_type
parameter.
- Set
device_type=desktop
to have the API set a desktop (e.g. iOS, Windows, or Linux) user agent. Note: This is the default behavior. Not setting the parameter will have the same effect. - Set
device_type=mobile
to have the API set a mobile (e.g. iPhone or Android) user agent.
Note: The device type you set will be overridden if you use keep_headers=true
and send your own user agent in the requests header.
const request = require('request-promise');
request('http://api.scraperapi.com/?api_key=APIKEY&url=http://httpbin.org/ip&device_type=mobile')
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
Autoparse
For select websites, the API will parse all the valuable data in the HTML response and return it in JSON format. To enable this feature, simply add autoparse=true
to your request and the API will parse the data for you. Currently, this feature works with Amazon, Google Search, and Google Shopping.
const request = require('request-promise');
request('http://api.scraperapi.com/?api_key=APIKEY&url=https://www.amazon.com/dp/B07V1PHM66&autoparse=true')
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
Wait for Element when rendering
If a rendered request is a bit slow and the page stabilises before the request is satisfied, it can fool the API into thinking the page has finished rendering.
To cope with this, you can tell the API to wait for an dom element (selector) to appear on the page when rendering. You just need to send the wait_for_selector=
parameter, passing a URL-encoded jquery selector. The API will then wait for this to appear on the page before returning results.
Account Information
If you would like to monitor your account usage and limits programmatically (how many concurrent requests you’re using, how many requests you’ve made, etc.) you may use the /account endpoint, which returns JSON.
Note: the requestCount
and failedRequestCount
numbers only refresh once every 15 seconds, while the concurrentRequests
number is available in real time.
const request = require('request-promise');
request('http://api.scraperapi.com/account?api_key=APIKEY')
.then(response => {
console.log(response)
})
.catch(error => {
console.log(error)
})
Getting Started
Using ScraperAPI is easy. Just send the URL you would like to scrape to the API along with your API key and the API will return the HTML response from the URL you want to scrape.
API Key & Authentication
ScraperAPI uses API keys to authenticate requests. To use the API you need to sign up for an account and include your unique API key in every request.
If you haven’t signed up for an account yet then
sign up for a free trial here with 5,000 free API credits
Making Requests
You can use the API to scrape web pages, API endpoints, images, documents, PDFs, or other files just as you would any other URL. Note: there is a 2MB limit per request.
There are three ways in which you can send GET requests to ScraperAPI:
- Via our Async Scraper service
http://async.scraperapi.com
- Via our API endpoint
http://api.scraperapi.com?
- Via our PHP SDK
- Via our proxy port
http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001
Important note: regardless of how you invoke the service, we highly recommend you set a 60 seconds timeout in your application to get the best possible success rates, especially for some hard-to-scrape domains.
Async requests Method #1
To ensure a higher level of successful requests when using our scraper, we’ve built a new product, Async Scraper. Rather than making requests to our endpoint waiting for the response, this endpoint submits a job of scraping, in which you can later collect the data from using our status endpoint.
Scraping websites can be a difficult process; it takes numerous steps and significant effort to get through some sites’ protection which sometimes proves to be difficult with the timeout constraints of synchronous APIs. The Async Scraper will work on your requested URLs until we have achieved a 100% success rate (when applicable), returning the data to you.
Async Scraping is the recommended way to scrape pages when success rate on difficult sites is more important to you than response time (e.g. you need a set of data periodically).
How to use
The async scraper endpoint is available at https://async.scraperapi.com
and it exposes a few useful APIs.
Submit an async job
A simple example showing how to submit a job for scraping and receive a status endpoint URL through which you can poll for the status (and later the result) of your scraping job:
<?php
$payload = json_encode( array( "apiKey" => "xxxxxx", "url" => "https://example.com" ) );
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, "https://async.scraperapi.com/jobs" );
curl_setOpt( $ch, CURLOPT_POST, 1 );
curl_setopt( $ch, CURLOPT_POSTFIELDS, $payload );
curl_setopt( $ch, CURLOPT_HTTPHEADER, array( "Content-Type:application/json" ) );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, TRUE );
$response = curl_exec( $ch );
curl_close( $ch );
print_r( $response );
Response:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}
Note the statusUrl
field in the response. That is a personal URL to retrieve the status and results of your scraping job. Invoking that endpoint provides you with the status first:
<?php
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, "https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953" );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, TRUE );
$response = curl_exec( $ch );
curl_close( $ch );
print_r( $response );
Response:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}
Once your job is finished, the response will change and will contain the results of your scraping job:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"finished",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com",
"response":{
"headers":{
"date":"Thu, 14 Apr 2022 11:10:44 GMT",
"content-type":"text/html; charset=utf-8",
"content-length":"1256",
"connection":"close",
"x-powered-by":"Express",
"access-control-allow-origin":"undefined","access-control-allow-headers":"Origin, X-Requested-With, Content-Type, Accept",
"access-control-allow-methods":"HEAD,GET,POST,DELETE,OPTIONS,PUT",
"access-control-allow-credentials":"true",
"x-robots-tag":"none",
"sa-final-url":"https://example.com/",
"sa-statuscode":"200",
"etag":"W/\"4e8-Sjzo7hHgkd15I/TYxuW15B7HwEc\"",
"vary":"Accept-Encoding"
},
"body":"<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n\n <meta charset=\"utf-8\" />\n <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n <style type=\"text/css\">\n body {\n background-color: #f0f0f2;\n margin: 0;\n padding: 0;\n font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n \n }\n div {\n width: 600px;\n margin: 5em auto;\n padding: 2em;\n background-color: #fdfdff;\n border-radius: 0.5em;\n box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n }\n a:link, a:visited {\n color: #38488f;\n text-decoration: none;\n }\n @media (max-width: 700px) {\n div {\n margin: 0 auto;\n width: auto;\n }\n }\n </style> \n</head>\n\n<body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission.</p>\n <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n",
"statusCode":200
}
}
Callbacks
Using a status URL is a great way to test the API or get started quickly, but some customer environments may require some more robust solutions, so we implemented callbacks. Currently only webhook
callbacks are supported but we are planning to introduce more over time (e.g. direct database callbacks, AWS S3…etc).
An example of using a webhook callback:
<?php
$payload = json_encode( array( "apiKey" => "xxxxxx", "callback" => array( "type" => "webhook", "url" => "https://yourcompany.com/scraperapi" ), "url" => "https://example.com" ) );
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, "https://async.scraperapi.com/jobs" );
curl_setOpt( $ch, CURLOPT_POST, 1 );
curl_setopt( $ch, CURLOPT_POSTFIELDS, $payload );
curl_setopt( $ch, CURLOPT_HTTPHEADER, array( "Content-Type:application/json" ) );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, TRUE );
$response = curl_exec( $ch );
curl_close( $ch );
print_r( $response );
Using a callback you don’t need to use the status URL (although you still can) to fetch the status and results of the job. Once the job is finished the provided webhook URL will be invoked by our system with the same content as the status URL provides.
Just replace the https://yourcompany.com/scraperapi
URL with your preferred endpoint. You can even add basic auth to the URL in the following format: https://user:pass@yourcompany.com/scraperapi
Note: The system will try to invoke the webhook URL 3 times, then it cancels the job. So please make sure that the webhook URL is available through the public internet and will be capable of handling the traffic that you need.
Hint: Webhook.site is a free online service to test webhooks without requiring you to build a complex infrastructure.
API parameters
You can use the usual API parameters just the same way you’d use it with our synchronous API. These parameters should go into an apiParams
object inside the POST data, e.g:
{
"apiKey": "xxxxxx",
"apiParams": {
"autoparse": false, // boolean
"binary_target": false, // boolean
"country_code": "us", // string, see: https://api.scraperapi.com/geo
"device_type": "desktop", // desktop | mobile
"follow_redirect": false, // boolean
"premium": true, // boolean
"render": false, // boolean
"retry_404": false // boolean
},
"url": "https://example.com"
}
Requests to the API EndpointMethod #2
ScraperAPI exposes a single API endpoint for you to send GET requests. Simply send a GET request to http://api.scraperapi.com
with two query string parameters and the API will return the HTML response for that URL:
api_key
which contains your API key, andurl
which contains the url you would like to scrape
You should format your requests to the API endpoint as follows:
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip"
Sample Code
<?php
$url = "http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
To enable other API functionality when sending a request to the API endpoint simply add the appropriate query parameters to the end of the ScraperAPI URL.
For example, if you want to enable Javascript rendering with a request, then add render=true
to the request:
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true"
Sample Code
<?php
$url = "http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
To use two or more parameters, simply separate them with the “&” sign.
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true&country_code=us"
Sample Code
<?php
$url = "http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true&country_code=us";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
Requests via ScraperAPI SDK Method #3
To make it easier to get up and running you can also use our Python SDK. First, you will need to install the SDK:
gem install scraperapi
Then integrate it into your code:
Sample Code
# remember to install the library: composer require scraperapi/sdk
<?php
$client = new ScraperAPI\Client("APIKEY");
$result = $client->get("http://httpbin.org/ip")->raw_body;
print($result);
To enable extra functionality while using our SDK, simply add an additional parameter to the GET request:
# remember to install the library: composer require scraperapi/sdk
<?php
$client = new ScraperAPI\Client("APIKEY");
$result = $client->get("http://httpbin.org/ip", ["render" => true])->raw_body;
print($result);
To use two or more parameters, simply add them to the GET request:
# remember to install the library: composer require scraperapi/sdk
<?php
$client = new ScraperAPI\Client("APIKEY");
$result = $client->get("http://httpbin.org/ip", ["render" => true], [“country_code” => “us”])->raw_body;
print($result);
Send Requests to the Proxy Port Method #4
To simplify implementation for users with existing proxy pools, we offer a proxy front-end to the API. The proxy will take your requests and pass them through to the API which will take care of proxy rotation, captchas, and retries.
The proxy mode is a light frontend for the API and has all the same functionality and performance as sending requests to the API endpoint.
The username
for the proxy is scraperapi and the password
is your API key.
"http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001"
You can use the ScraperAPI proxy the same way as you would use any other proxy:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://httpbin.org/ip");
curl_setopt($ch, CURLOPT_PROXY, "http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
var_dump($response);
Note: So that we can properly directly your requests through the API, your code must be configured to not verify SSL certificates.
To enable extra functionality whilst using the API in proxy mode, you can pass parameters to the API by adding them to username, separated by periods.
For example, if you want to enable Javascript rendering with a request, the username would be scraperapi.render=true
"http://scraperapi.render=true:APIKEY@proxy-server.scraperapi.com:8001"
Multiple parameters can be included by separating them with periods; for example:
"http://scraperapi.render=true.country_code=us:APIKEY@proxy-server.scraperapi.com:8001"
Making POST/PUT Requests
Some advanced users may want to send POST/PUT requests in order to scrape forms and API endpoints directly. You can do this by sending a POST/PUT request through ScraperAPI. The return value will be stringified. So if you want to use it as JSON, you will need to parse it into a JSON object.
<?php
$url = "http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/anything";
# POST/PUT Requests
$postData = ["foo" => "bar"];
$postData = json_encode($postData);
$headers = [
"Content-Type: application/json"
];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData); //Post Fields
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
#Form POST Request
$postData = ["foo" => "bar"];
$postData = json_encode($postData);
$headers = [
'Content-Type: application/x-www-form-urlencoded; charset=utf-8',
];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
API Status Codes
The API will return a specific status code after every request depending on whether the request was successful, failed or some other error occurred. ScraperAPI will retry failed requests for up to 60 seconds to try and get a successful response from the target URL before responding with a 500
error indicating a failed request.
Note: To avoid timing out your request before the API has had a chance to complete all retries, remember to set your timeout to 60 seconds.
In cases where a request fails after 60 seconds of retrying, you will not be charged for the unsuccessful request (you are only charged for successful requests, 200
and 404
status codes).
Errors will inevitably occur so on your end make sure you catch these errors. You can configure your code to immediately retry the request and in most cases, it will be successful. If a request is consistently failing then check your request to make sure that it is configured correctly. Or if you are consistently getting bans messages from an anti-bot, then create a ticket with our support team and we will try to bypass this anti-bot for you.
If you receive a successful 200
status code response from the API but the response contains a CAPTCHA, please contact our support team and we will add it to our CAPTCHA detection database. Once included in our CAPTCHA database, the API will treat it as a ban in the future and automatically retry the request.
Below are the possible status codes you will receive:
Status Code | Details |
---|---|
200 |
Successful response |
404 |
Page requested does not exist. |
410 |
Page requested is no longer available. |
500 |
After retrying for 60 seconds, the API was unable to receive a successful response. |
429 |
You are sending requests too fast, and exceeding your concurrency limit. |
403 |
You have used up all your API credits. |
Credits and Requests
Every time you make a request, you consume credits. The credits you have available are defined by the plan you are currently using. The amount of credits you consume depends on the domain and parameters you add to your request.
Domains
We have created custom scrapers to target these sites, if you scrape one of these domains you will activate this scraper and the credit cost will change.
Category | Domain examples | Credit Cost |
---|---|---|
SERP | Google, Bing | 25 |
E-Commerce | Amazon, Allegro | 5 |
Protected Domains | Linkedin, G2, Social Media Sites | 30 |
Geotargeting is included in these credit costs.
If your domain does not fit in any category in the above list, it will cost 1
credit if no additional parameters are added.
Parameters
According to your needs, you may want to access different features on our platform.
premium=true
– requests cost 10 credits
render=true
– requests cost 10 credits
premium=true
+ render=true
– requests cost 25 credits
ultra_premium=true
– requests cost 30 credits
ultra_premium=true
+ render=true
– requests cost 75 credits
In any requests, with or without these parameters, we will only charge for successful requests (200
and 404
status codes). If you run out of credits sooner than planned, you can renew your subscription early as explained in the next section – Dashboard.
Dashboard
The Dashboard shows you all the information so you can keep track of your usage. It is divided into Usage, Billing, Documentation, and links to contact us.
Usage
Here you will find your API key, sample codes, monitoring tools, and usage statistics such as credits used, concurrency, failed requests, your current plan, and the end date of your billing cycle.
Billing
In this section, you are able to see your current plan, the end date of the current billing cycle, as well as subscribe to another plan. This is also the right place to set or update your billing details, payment method, view your invoices, manage your subscription, or cancel it.
Here you can also renew your subscription early, in case you run out of API credits sooner than planned. This option will let you renew your subscription at an earlier date than planned, charging you for the subscription price and resetting your credits. If you find yourself using this option regularly, please reach out to us so we can find a plan that can accommodate the number of credits you need.
Documentation
This will direct you to our documentation – exactly where you are right now.
Contact Support
Direct link to reach out to our technical support team.
Contact Sales
Direct link to our sales team, for any subscription or billing-related inquiries.
Customize API Functionality
ScraperAPI enables you to customize the API’s functionality by adding additional parameters to your requests. The API will accept the following parameters:
Parameter | Description |
---|---|
render |
Activate javascript rendering by setting render=true in your request. The API will automatically render the javascript on the page and return the HTML response after the javascript has been rendered.
Requests using this parameter cost 10 API credits, or 75 if used in combination with ultra-premium |
country_code |
Activate country geotargeting by setting country_code=us to use US proxies for example.
This parameter does not increase the cost of the API request. |
premium |
Activate premium residential and mobile IPs by setting premium=true . Using premium proxies costs 10 API credits, or 25 API credits if used in combination with Javascript rendering render=true . |
session_number |
Reuse the same proxy by setting session_number=123 for example.
This parameter does not increase the cost of the API request. |
keep_headers |
Use your own custom headers by setting keep_headers=true along with sending your own headers to the API.
This parameter does not increase the cost of the API request. |
device_type |
Set your requests to use mobile or desktop user agents by setting device_type=desktop or device_type=mobile .
This parameter does not increase the cost of the API request. |
autoparse |
Activate auto parsing for select websites by setting autoparse=true . The API will parse the data on the page and return it in JSON format.
This parameter does not increase the cost of the API request. |
ultra_premium |
Activate our advanced bypass mechanisms by setting ultra_premium=true .
Requests using this parameter cost 30 API credits, or 75 if used in combination with javascript rendering. |
Rendering Javascript
If you are crawling a page that requires you to render the javascript on the page to scrape the data you need, then we can fetch these pages using a headless browser. To render javascript, simply set render=true
and we will use a headless Google Chrome instance to fetch the page. This feature is only available on the Business and Enterprise plans.
<?php
$url = "http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
Custom Headers
If you would like to use your own custom headers (user agents, cookies, etc.) when making a request to the website, simply set keep_headers=true
and send the API the headers you want to use. The API will then use these headers when sending requests to the website.
Note: Only use this feature if you need to send custom headers to retrieve specific results from the website. Within the API we have a sophisticated header management system designed to increase success rates and performance on difficult sites. When you send your own custom headers you override our header system, which oftentimes lowers your success rates. Unless you absolutely need to send custom headers to get the data you need, we advise that you don’t use this functionality.
If you need to get results for mobile devices, use the device_type
parameter to set the user-agent header for your request, instead of setting your own.
<?php
$url = "http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&keep_headers=true";
$headerArray = array(
"Content-Type: application/json",
"X-MyHeader: 123"
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headerArray);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
Sessions
To reuse the same proxy for multiple requests, simply use the session_number
parameter by setting it equal to a unique integer for every session you want to maintain (e.g. session_number=123
). This will allow you to continue using the same proxy for each request with that session number. To create a new session simply set the session_number
parameter with a new integer to the API. The session value can be any integer. Sessions expire 15 minutes after the last usage.
<?php
$url = "http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&session_number=123";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
Geographic Location
Certain websites (typically e-commerce stores and search engines) display different data to different users based on the geolocation of the IP used to make the request to the website. In cases like these, you can use the API’s geotargeting functionality to easily use proxies from the required country to retrieve the correct data from the website.
To control the geolocation of the IP used to make the request, simply set the country_code
parameter to the country you want the proxy to be from and the API will automatically use the correct IP for that request.
For example: to ensure your requests come from the United States, set the country_code
parameter to country_code=us
.
Business and Enterprise Plan users can geotarget their requests to the following 13 regions (Hobby and Startup Plans can only use US and EU geotargeting) by using the country_code
in their request:
Country Code | Region | Plans |
---|---|---|
us |
United States | Hobby Plan and higher. |
eu |
Europe | Hobby Plan and higher. |
ca |
Canada | Business Plan and higher. |
uk |
United Kingdom | Business Plan and higher. |
de |
Germany | Business Plan and higher. |
fr |
France | Business Plan and higher. |
es |
Spain | Business Plan and higher. |
br |
Brazil | Business Plan and higher. |
mx |
Mexico | Business Plan and higher. |
in |
India | Business Plan and higher. |
jp |
Japan | Business Plan and higher. |
cn |
China | Business Plan and higher. |
au |
Australia | Business Plan and higher. |
Geotargeting “eu” will use IPs from any European country.
Other countries are available to Enterprise customers upon request.
<?php
$url = "http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&country_code=us";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
Premium Residential/Mobile Proxy Pools
Our standard proxy pools include millions of proxies from over a dozen ISPs and should be sufficient for the vast majority of scraping jobs. However, for a few particularly difficult to scrape sites, we also maintain a private internal pool of residential and mobile IPs. This pool is only available to users on the Business plan or higher.
Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API credits against your monthly limit). Each request that uses both javascript rendering and our premium proxy pools will be charged at 25 times the normal rate (every successful request will count as 25 API credits against your monthly limit). To send a request through our premium proxy pool, please set the premium
query parameter to premium=true
.
We also have a higher premium level that you can use for really tough targets, such as LinkedIn. You can access these pools by adding the ultra_premium=true
query parameter. These requests will use 30 API credits against your monthly limit, or 75 if used together with rendering. Please note this is only available on our paid plans.
<?php
$url = "http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&premium=true";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
Device Type
If your use case requires you to exclusively use either desktop or mobile user agents in the headers it sends to the website then you can use the device_type
parameter.
- Set
device_type=desktop
to have the API set a desktop (e.g. iOS, Windows, or Linux) user agent. Note: This is the default behavior. Not setting the parameter will have the same effect. - Set
device_type=mobile
to have the API set a mobile (e.g. iPhone or Android) user agent.
Note: The device type you set will be overridden if you use keep_headers=true
and send your own user agent in the requests header.
<?php
$url = "http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&device_type=mobile";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
Autoparse
For select websites, the API will parse all the valuable data in the HTML response and return it in JSON format. To enable this feature, simply add autoparse=true
to your request and the API will parse the data for you. Currently, this feature works with Amazon, Google Search, and Google Shopping.
<?php
$url = "http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&autoparse=true";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
Wait for Element when rendering
If a rendered request is a bit slow and the page stabilises before the request is satisfied, it can fool the API into thinking the page has finished rendering.
To cope with this, you can tell the API to wait for an dom element (selector) to appear on the page when rendering. You just need to send the wait_for_selector=
parameter, passing a URL-encoded jquery selector. The API will then wait for this to appear on the page before returning results.
Account Information
If you would like to monitor your account usage and limits programmatically (how many concurrent requests you’re using, how many requests you’ve made, etc.) you may use the /account endpoint, which returns JSON.
Note: the requestCount
and failedRequestCount
numbers only refresh once every 15 seconds, while the concurrentRequests
number is available in real time.
<?php
$url = "http://api.scraperapi.com/account?api_key=APIKEY";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
Getting Started
Using ScraperAPI is easy. Just send the URL you would like to scrape to the API along with your API key and the API will return the HTML response from the URL you want to scrape.
API Key & Authentication
ScraperAPI uses API keys to authenticate requests. To use the API you need to sign up for an account and include your unique API key in every request.
If you haven’t signed up for an account yet then
sign up for a free trial here with 5,000 free API credits
Making Requests
You can use the API to scrape web pages, API endpoints, images, documents, PDFs or other files just as you would any other URL. Note: there is a 2MB limit per request.
There are three ways in which you can send GET requests to ScraperAPI:
- Via our Async Scraper service
http://async.scraperapi.com
- Via our API endpoint
http://api.scraperapi.com?
- Via our Ruby SDK
- Via our proxy port
http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001
Important note: regardless of how you invoke the service, we highly recommend you set a 60 seconds timeout in your application to get the best possible success rates, especially for some hard-to-scrape domains.
Async requests Method #1
To ensure a higher level of successful requests when using our scraper, we’ve built a new product, Async Scraper. Rather than making requests to our endpoint waiting for the response, this endpoint submits a job of scraping, in which you can later collect the data from using our status endpoint.
Scraping websites can be a difficult process; it takes numerous steps and significant effort to get through some sites’ protection which sometimes proves to be difficult with the timeout constraints of synchronous APIs. The Async Scraper will work on your requested URLs until we have achieved a 100% success rate (when applicable), returning the data to you.
Async Scraping is the recommended way to scrape pages when success rate on difficult sites is more important to you than response time (e.g. you need a set of data periodically).
How to use
The async scraper endpoint is available at https://async.scraperapi.com
and it exposes a few useful APIs.
Submit an async job
A simple example showing how to submit a job for scraping and receive a status endpoint URL through which you can poll for the status (and later the result) of your scraping job:
require 'net/http'
require 'json'
uri = URI('https://async.scraperapi.com/jobs')
website_content = Net::HTTP.post(uri, { "apiKey" => "xxxxxx", "url" => "https://example.com"}.to_json, "Content-Type" => "application/json")
print(website_content.body)
Response:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}
Note the statusUrl
field in the response. That is a personal URL to retrieve the status and results of your scraping job. Invoking that endpoint provides you with the status first:
require 'net/http'
require 'json'
uri = URI('https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953')
website_content = Net::HTTP.get(uri)
print(website_content)
Response:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}
Once your job is finished, the response will change and will contain the results of your scraping job:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"finished",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com",
"response":{
"headers":{
"date":"Thu, 14 Apr 2022 11:10:44 GMT",
"content-type":"text/html; charset=utf-8",
"content-length":"1256",
"connection":"close",
"x-powered-by":"Express",
"access-control-allow-origin":"undefined","access-control-allow-headers":"Origin, X-Requested-With, Content-Type, Accept",
"access-control-allow-methods":"HEAD,GET,POST,DELETE,OPTIONS,PUT",
"access-control-allow-credentials":"true",
"x-robots-tag":"none",
"sa-final-url":"https://example.com/",
"sa-statuscode":"200",
"etag":"W/\"4e8-Sjzo7hHgkd15I/TYxuW15B7HwEc\"",
"vary":"Accept-Encoding"
},
"body":"<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n\n <meta charset=\"utf-8\" />\n <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n <style type=\"text/css\">\n body {\n background-color: #f0f0f2;\n margin: 0;\n padding: 0;\n font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n \n }\n div {\n width: 600px;\n margin: 5em auto;\n padding: 2em;\n background-color: #fdfdff;\n border-radius: 0.5em;\n box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n }\n a:link, a:visited {\n color: #38488f;\n text-decoration: none;\n }\n @media (max-width: 700px) {\n div {\n margin: 0 auto;\n width: auto;\n }\n }\n </style> \n</head>\n\n<body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission.</p>\n <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n",
"statusCode":200
}
}
Callbacks
Using a status URL is a great way to test the API or get started quickly, but some customer environments may require some more robust solutions, so we implemented callbacks. Currently only webhook
callbacks are supported but we are planning to introduce more over time (e.g. direct database callbacks, AWS S3…etc).
An example of using a webhook callback:
require 'net/http'
require 'json'
uri = URI('https://async.scraperapi.com/jobs')
website_content = Net::HTTP.post(uri, {
"apiKey" => "xxxxxx",
"callback" => {
"type" => "webhook",
"url" => "https://yourcompany.com/scraperapi"
},
"url" => "https://example.com"
}.to_json, "Content-Type" => "application/json")
print(website_content.body)
Using a callback you don’t need to use the status URL (although you still can) to fetch the status and results of the job. Once the job is finished the provided webhook URL will be invoked by our system with the same content as the status URL provides.
Just replace the https://yourcompany.com/scraperapi
URL with your preferred endpoint. You can even add basic auth to the URL in the following format: https://user:pass@yourcompany.com/scraperapi
Note #1: The system will try to invoke the webhook URL 3 times, then it cancels the job. So please make sure that the webhook URL is available through the public internet and will be capable of handling the traffic that you need.
Note #2: If you set a webhook, the status endpoint of a job will only be available while the job is being processed.
Hint: Webhook.site is a free online service to test webhooks without requiring you to build a complex infrastructure.
API parameters
You can use the usual API parameters just the same way you’d use it with our synchronous API. These parameters should go into an apiParams
object inside the POST data, e.g:
{
"apiKey": "xxxxxx",
"apiParams": {
"autoparse": false, // boolean
"binary_target": false, // boolean
"country_code": "us", // string, see: https://api.scraperapi.com/geo
"device_type": "desktop", // desktop | mobile
"follow_redirect": false, // boolean
"premium": true, // boolean
"render": false, // boolean
"retry_404": false // boolean
},
"url": "https://example.com"
}
Requests to the API EndpointMethod #2
ScraperAPI exposes a single API endpoint for you to send GET requests. Simply send a GET request to http://api.scraperapi.com
with two query string parameters and the API will return the HTML response for that URL:
api_key
which contains your API key, andurl
which contains the url you would like to scrape
You should format your requests to the API endpoint as follows:
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip"
Sample Code
require 'net/http'
require 'json'
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/ip"
}
uri = URI('http://api.scraperapi.com/')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
To enable other API functionality when sending a request to the API endpoint simply add the appropriate query parameters to the end of the ScraperAPI URL.
For example, if you want to enable Javascript rendering with a request, then add render=true
to the request:
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true"
Sample Code
require 'net/http'
require 'json'
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/ip",
:render => true
}
uri = URI('http://api.scraperapi.com/')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
To use two or more parameters, simply separate them with the “&” sign.
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true&country_code=us"
Sample Code
require 'net/http'
require 'json'
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/ip",
:render => true,
:country_code => “us”
}
uri = URI('http://api.scraperapi.com/')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
Requests via ScraperAPI SDK Method #3
To make it easier to get up and running you can also use our Python SDK. First, you will need to install the SDK:
gem install scraperapi
Then integrate it into your code:
Sample Code
# remember to install the library: gem install scraperapi
require "scraper_api"
client = ScraperAPI::Client.new("APIKEY")
result = client.get("http://httpbin.org/ip").raw_body
puts result
To enable extra functionality while using our SDK, simply add an additional parameter to the GET request:
# remember to install the library: gem install scraperapi
require "scraper_api"
client = ScraperAPI::Client.new("APIKEY")
result = client.get("http://httpbin.org/ip", render: true).raw_body
puts result
To use two or more parameters, simply add them to the GET request:
# remember to install the library: gem install scraperapi
require "scraper_api"
client = ScraperAPI::Client.new("APIKEY")
result = client.get("http://httpbin.org/ip", render: true, country_code: “us”).raw_body
puts result
Send Requests to the Proxy Port Method #4
To simplify implementation for users with existing proxy pools, we offer a proxy front-end to the API. The proxy will take your requests and pass them through to the API which will take care of proxy rotation, captchas, and retries.
The proxy mode is a light frontend for the API and has all the same functionality and performance as sending requests to the API endpoint.
The username
for the proxy is scraperapi and the password
is your API key.
"http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001"
You can use the ScraperAPI proxy the same way as you would use any other proxy:
require 'httparty'
HTTParty::Basement.default_options.update(verify: false)
response = HTTParty.get('http://httpbin.org/ip', {
http_proxyaddr: "proxy-server.scraperapi.com",
http_proxyport: "8001",
http_proxyuser: "scraperapi",
http_proxypass: "APIKEY"
})
results = response.body
puts results
Note: So that we can properly direct your requests through the API, your code must be configured to not verify SSL certificates.
To enable extra functionality whilst using the API in proxy mode, you can pass parameters to the API by adding them to username, separated by periods.
For example, if you want to enable Javascript rendering with a request, the username would be scraperapi.render=true
"http://scraperapi.render=true:APIKEY@proxy-server.scraperapi.com:8001"
Multiple parameters can be included by separating them with periods; for example:
"http://scraperapi.render=true.country_code=us:APIKEY@proxy-server.scraperapi.com:8001"
Making POST/PUT Requests
Some advanced users may want to send POST/PUT requests in order to scrape forms and API endpoints directly. You can do this by sending a POST/PUT request through ScraperAPI. The return value will be stringified. So if you want to use it as JSON, you will need to parse it into a JSON object.
require 'net/http'
require 'json'
## Replace POST with PUT to send a PUT request instead
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/anything"
}
uri = URI('http://api.scraperapi.com/')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.post(uri, { "foo" => "bar"}.to_json, "Content-Type" => "application/json")
print(website_content.body)
## For form data
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/anything"
}
uri = URI('http://api.scraperapi.com/')
uri.query = URI.encode_www_form(params)
req = Net::HTTP::Post.new(uri)
req.set_form_data('foo' => 'bar')
website_content = Net::HTTP.start(uri.hostname, uri.port) {|http|
http.request(req)
}
print(website_content.body)
API Status Codes
The API will return a specific status code after every request depending on whether the request was successful, failed or some other error occurred. ScraperAPI will retry failed requests for up to 60 seconds to try and get a successful response from the target URL before responding with a 500
error indicating a failed request.
Note: To avoid timing out your request before the API has had a chance to complete all retries, remember to set your timeout to 60 seconds.
In cases where a request fails after 60 seconds of retying, you will not be charged for the unsuccessful request (you are only charged for successful requests, 200
and 404
status codes).
Errors will inevitably occur so on your end make sure you catch these errors. You can configure your code to immediately retry the request and in most cases, it will be successful. If a request is consistently failing then check your request to make sure that it is configured correctly. Or if you are consistently getting bans messages from an anti-bot, then create a ticket with our support team and we will try to bypass this anti-bot for you.
If you receive a successful 200
status code response from the API but the response contains a CAPTCHA, please contact our support team and we will add it to our CAPTCHA detection database. Once included in our CAPTCHA database the API will treat it as a ban in the future and automatically retry the request.
Below are the possible status codes you will receive:
Status Code | Details |
---|---|
200 |
Successful response |
404 |
Page requested does not exist. |
410 |
Page requested is no longer available. |
500 |
After retrying for 60 seconds, the API was unable to receive a successful response. |
429 |
You are sending requests too fast, and exceeding your concurrency limit. |
403 |
You have used up all your API credits. |
Credits and Requests
Every time you make a request, you consume credits. The credits you have available are defined by the plan you are currently using. The amount of credits you consume depends on the domain and parameters you add to your request.
Domains
We have created custom scrapers to target these sites, if you scrape one of these domains you will activate this scraper and the credit cost will change.
Category | Domain examples | Credit Cost |
---|---|---|
SERP | Google, Bing | 25 |
E-Commerce | Amazon, Allegro | 5 |
Protected Domains | Linkedin, G2, Social Media Sites | 30 |
Geotargeting is included in these credit costs.
If your domain does not fit in any category in the above list, it will cost 1
credit if no additional parameters are added.
Parameters
According to your needs, you may want to access different features on our platform.
premium=true
– requests cost 10 credits
render=true
– requests cost 10 credits
premium=true
+ render=true
– requests cost 25 credits
ultra_premium=true
– requests cost 30 credits
ultra_premium=true
+ render=true
– requests cost 75 credits
In any requests, with or without these parameters, we will only charge for successful requests (200
and 404
status codes). If you run out of credits sooner than planned, you can renew your subscription early as explained in the next section – Dashboard.
Dashboard
The Dashboard shows you all the information so you can keep track of your usage. It is divided into Usage, Billing, Documentation, and links to contact us.
Usage
Here you will find your API key, sample codes, monitoring tools, and usage statistics such as credits used, concurrency, failed requests, your current plan, and the end date of your billing cycle.
Billing
In this section, you are able to see your current plan, the end date of the current billing cycle, as well as subscribe to another plan. This is also the right place to set or update your billing details, payment method, view your invoices, manage your subscription, or cancel it.
Here you can also renew your subscription early, in case you run out of API credits sooner than planned. This option will let you renew your subscription at an earlier date than planned, charging you for the subscription price and resetting your credits. If you find yourself using this option regularly, please reach out to us so we can find a plan that can accommodate the number of credits you need.
Documentation
This will direct you to our documentation – exactly where you are right now.
Contact Support
Direct link to reach out to our technical support team.
Contact Sales
Direct link to our sales team, for any subscription or billing-related inquiries.
Customize API Functionality
ScraperAPI enables you to customize the API’s functionality by adding additional parameters to your requests. The API will accept the following parameters:
Parameter | Description |
---|---|
render |
Activate javascript rendering by setting render=true in your request. The API will automatically render the javascript on the page and return the HTML response after the javascript has been rendered.
Requests using this parameter cost 10 API credits, or 75 if used in combination with ultra-premium |
country_code |
Activate country geotargeting by setting country_code=us to use US proxies for example.
This parameter does not increase the cost of the API request. |
premium |
Activate premium residential and mobile IPs by setting premium=true . Using premium proxies costs 10 API credits, or 25 API credits if used in combination with Javascript rendering render=true . |
session_number |
Reuse the same proxy by setting session_number=123 for example.
This parameter does not increase the cost of the API request. |
keep_headers |
Use your own custom headers by setting keep_headers=true along with sending your own headers to the API.
This parameter does not increase the cost of the API request. |
device_type |
Set your requests to use mobile or desktop user agents by setting device_type=desktop or device_type=mobile .
This parameter does not increase the cost of the API request. |
autoparse |
Activate auto parsing for select websites by setting autoparse=true . The API will parse the data on the page and return it in JSON format.
This parameter does not increase the cost of the API request. |
ultra_premium |
Activate our advanced bypass mechanisms by setting ultra_premium=true .
Requests using this parameter cost 30 API credits, or 75 if used in combination with javascript rendering. |
Rendering Javascript
If you would like to use your own custom headers (user agents, cookies, etc.) when making a request to the website, simply set keep_headers=true
and send the API the headers you want to use. The API will then use these headers when sending requests to the website.
Note: Only use this feature if you need to send custom headers to retrieve specific results from the website. Do not use this feature in order to avoid blocks. We handle header rotation and optimisation internally so letting us set the headers internally will increase your success rates.
If you need to get results for mobile devices, use the device_type
parameter to set the user agent header for your, instead of setting your own.
require 'net/http'
require 'json'
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/anything",
:keep_headers => true
}
uri = URI('http://api.scraperapi.com/')
uri.query = URI.encode_www_form(params)
req = Net::HTTP::Get.new(uri)
req['Accept'] = 'application/json'
req['X-MyHeader'] = '123'
website_content = Net::HTTP.start(uri.hostname, uri.port) {|http|
http.request(req)
}
print(website_content.body)
Custom Headers
If you would like to use your own custom headers (user agents, cookies, etc.) when making a request to the website, simply set keep_headers=true
and send the API the headers you want to use. The API will then use these headers when sending requests to the website.
Note: Only use this feature if you need to send custom headers to retrieve specific results from the website. Within the API we have a sophisticated header management system designed to increase success rates and performance on difficult sites. When you send your own custom headers you override our header system, which oftentimes lowers your success rates. Unless you absolutely need to send custom headers to get the data you need, we advise that you don’t use this functionality.
If you need to get results for mobile devices, use the device_type
parameter to set the user-agent header for your request, instead of setting your own.
require 'net/http'
require 'json'
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/anything",
:keep_headers => true
}
uri = URI('http://api.scraperapi.com/')
uri.query = URI.encode_www_form(params)
req = Net::HTTP::Get.new(uri)
req['Accept'] = 'application/json'
req['X-MyHeader'] = '123'
website_content = Net::HTTP.start(uri.hostname, uri.port) {|http|
http.request(req)
}
print(website_content.body)
Sessions
To reuse the same proxy for multiple requests, simply use the session_number
parameter by setting it equal to a unique integer for every session you want to maintain (e.g. session_number=123
). This will allow you to continue using the same proxy for each request with that session number. To create a new session simply set the session_number
parameter with a new integer to the API. The session value can be any integer. Sessions expire 15 minutes after the last usage.
require 'net/http'
require 'json'
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/ip",
:session_number => 123
}
uri = URI('http://api.scraperapi.com/')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
Geographic Location
Certain websites (typically e-commerce stores and search engines) display different data to different users based on the geolocation of the IP used to make the request to the website. In cases like these, you can use the API’s geotargeting functionality to easily use proxies from the required country to retrieve the correct data from the website.
To control the geolocation of the IP used to make the request, simply set the country_code
parameter to the country you want the proxy to be from and the API will automatically use the correct IP for that request.
For example: to ensure your requests come from the United States, set the country_code
parameter to country_code=us
.
Business and Enterprise Plan users can geotarget their requests to the following 13 regions (Hobby and Startup Plans can only use US and EU geotargeting) by using the country_code
in their request:
Country Code | Region | Plans |
---|---|---|
us |
United States | Hobby Plan and higher. |
eu |
Europe | Hobby Plan and higher. |
ca |
Canada | Business Plan and higher. |
uk |
United Kingdom | Business Plan and higher. |
de |
Germany | Business Plan and higher. |
fr |
France | Business Plan and higher. |
es |
Spain | Business Plan and higher. |
br |
Brazil | Business Plan and higher. |
mx |
Mexico | Business Plan and higher. |
in |
India | Business Plan and higher. |
jp |
Japan | Business Plan and higher. |
cn |
China | Business Plan and higher. |
au |
Australia | Business Plan and higher. |
Geotargeting “eu” will use IPs from any European country.
Other countries are available to Enterprise customers upon request.
require 'net/http'
require 'json'
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/ip",
:country_code => “us”
}
uri = URI('http://api.scraperapi.com/')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
Premium Residential/Mobile Proxy Pools
Our standard proxy pools include millions of proxies from over a dozen ISPs and should be sufficient for the vast majority of scraping jobs. However, for a few particularly difficult to scrape sites, we also maintain a private internal pool of residential and mobile IPs. This pool is only available to users on the Business plan or higher.
Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API credits against your monthly limit). Each request that uses both javascript rendering and our premium proxy pools will be charged at 25 times the normal rate (every successful request will count as 25 API credits against your monthly limit). To send a request through our premium proxy pool, please set the premium
query parameter to premium=true
.
We also have a higher premium level that you can use for really tough targets, such as LinkedIn. You can access these pools by adding the ultra_premium=true
query parameter. These requests will use 30 API credits against your monthly limit, or 75 if used together with rendering. Please note, this is only available on our paid plans.
require 'net/http'
require 'json'
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/ip",
:premium => true
}
uri = URI('http://api.scraperapi.com/')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
Device Type
If your use case requires you to exclusively use either desktop or mobile user agents in the headers it sends to the website then you can use the device_type
parameter.
- Set
device_type=desktop
to have the API set a desktop (e.g. iOS, Windows or Linux) user agent. Note: This is the default behavior. Not setting the parameter will have the same effect. - Set
device_type=mobile
to have the API set a mobile (e.g. iPhone or Android) user agent.
Note: The device type you set will be overridden if you use keep_headers=true
and send your own user agent in the requests header.
require 'net/http'
require 'json'
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/ip",
:device_type => “mobile”
}
uri = URI('http://api.scraperapi.com/')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
Autoparse
For select websites, the API will parse all the valuable data in the HTML response and return it in JSON format. To enable this feature, simply add autoparse=true
to your request and the API will parse the data for you. Currently, this feature works with Amazon, Google Search, and Google Shopping.
require 'net/http'
require 'json'
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/ip",
:autoparse => true
}
uri = URI('http://api.scraperapi.com/')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
Wait for Element when rendering
If a rendered request is a bit slow and the page stabilises before the request is satisfied, it can fool the API into thinking the page has finished rendering.
To cope with this, you can tell the API to wait for an dom element (selector) to appear on the page when rendering. You just need to send the wait_for_selector=
parameter, passing a URL-encoded jquery selector. The API will then wait for this to appear on the page before returning results.
Account Information
If you would like to monitor your account usage and limits programmatically (how many concurrent requests you’re using, how many requests you’ve made, etc.) you may use the /account endpoint, which returns JSON.
Note: the requestCount
and failedRequestCount
numbers only refresh once every 15 seconds, while the concurrentRequests
number is available in real time.
require 'net/http'
require 'json'
params = {
:api_key => "APIKEY"
}
uri = URI('http://api.scraperapi.com/account')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
Getting Started
Using ScraperAPI is easy. Just send the URL you would like to scrape to the API along with your API key and the API will return the HTML response from the URL you want to scrape.
API Key & Authentication
ScraperAPI uses API keys to authenticate requests. To use the API you need to sign up for an account and include your unique API key in every request.
If you haven’t signed up for an account yet then
sign up for a free trial here with 5,000 free API credits
Making Requests
You can use the API to scrape web pages, API endpoints, images, documents, PDFs, or other files just as you would any other URL. Note: there is a 2MB limit per request.
There are three ways in which you can send GET requests to ScraperAPI:
- Via our Async Scraper service
http://async.scraperapi.com
- Via our API endpoint
http://api.scraperapi.com?
- Via one of our SDKs (only available for some programming languages)
- Via our proxy port
http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001
Important note: regardless of how you invoke the service, we highly recommend you set a 60 seconds timeout in your application to get the best possible success rates, especially for some hard-to-scrape domains.
Async requests Method #1
To ensure a higher level of successful requests when using our scraper, we’ve built a new product, Async Scraper. Rather than making requests to our endpoint waiting for the response, this endpoint submits a job of scraping, in which you can later collect the data from using our status endpoint.
Scraping websites can be a difficult process; it takes numerous steps and significant effort to get through some sites’ protection which sometimes proves to be difficult with the timeout constraints of synchronous APIs. The Async Scraper will work on your requested URLs until we have achieved a 100% success rate (when applicable), returning the data to you.
Async Scraping is the recommended way to scrape pages when success rate on difficult sites is more important to you than response time (e.g. you need a set of data periodically).
How to use
The async scraper endpoint is available at https://async.scraperapi.com
and it exposes a few useful APIs.
Submit an async job
A simple example showing how to submit a job for scraping and receive a status endpoint URL through which you can poll for the status (and later the result) of your scraping job:
try {
URL asyncApiUrl = new URL("https://async.scraperapi.com/jobs");
HttpURLConnection connection = (HttpURLConnection) asyncApiUrl.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type", "application/json");
connection.setDoOutput(true);
try (OutputStream outputStream = connection.getOutputStream()) {
outputStream.write("{\"apiKey\": \"xxxxxx\", \"url\": \"https://example.com\"}".getBytes(StandardCharsets.UTF_8));
}
int responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
try (BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()))) {
String readLine = null;
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
System.out.println(response);
}
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
Response:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}
Note the statusUrl
field in the response. That is a personal URL to retrieve the status and results of your scraping job. Invoking that endpoint provides you with the status first:
try {
URL asyncApiUrl = new URL("https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953");
HttpURLConnection connection = (HttpURLConnection) asyncApiUrl.openConnection();
connection.setRequestMethod("GET");
int responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
try (BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()))) {
String readLine = null;
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
System.out.println(response);
}
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
Response:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}
Once your job is finished, the response will change and will contain the results of your scraping job:
{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"finished",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com",
"response":{
"headers":{
"date":"Thu, 14 Apr 2022 11:10:44 GMT",
"content-type":"text/html; charset=utf-8",
"content-length":"1256",
"connection":"close",
"x-powered-by":"Express",
"access-control-allow-origin":"undefined","access-control-allow-headers":"Origin, X-Requested-With, Content-Type, Accept",
"access-control-allow-methods":"HEAD,GET,POST,DELETE,OPTIONS,PUT",
"access-control-allow-credentials":"true",
"x-robots-tag":"none",
"sa-final-url":"https://example.com/",
"sa-statuscode":"200",
"etag":"W/\"4e8-Sjzo7hHgkd15I/TYxuW15B7HwEc\"",
"vary":"Accept-Encoding"
},
"body":"<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n\n <meta charset=\"utf-8\" />\n <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n <style type=\"text/css\">\n body {\n background-color: #f0f0f2;\n margin: 0;\n padding: 0;\n font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n \n }\n div {\n width: 600px;\n margin: 5em auto;\n padding: 2em;\n background-color: #fdfdff;\n border-radius: 0.5em;\n box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n }\n a:link, a:visited {\n color: #38488f;\n text-decoration: none;\n }\n @media (max-width: 700px) {\n div {\n margin: 0 auto;\n width: auto;\n }\n }\n </style> \n</head>\n\n<body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission.</p>\n <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n",
"statusCode":200
}
}
Callbacks
Using a status URL is a great way to test the API or get started quickly, but some customer environments may require some more robust solutions, so we implemented callbacks. Currently only webhook
callbacks are supported but we are planning to introduce more over time (e.g. direct database callbacks, AWS S3…etc).
An example of using a webhook callback:
try {
URL asyncApiUrl = new URL("https://async.scraperapi.com/jobs");
HttpURLConnection connection = (HttpURLConnection) asyncApiUrl.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type", "application/json");
connection.setDoOutput(true);
try (OutputStream outputStream = connection.getOutputStream()) {
outputStream.write(("{\"apiKey\": \"xxxxxx\", " +
"\"callback\": {" +
"\"type\": \"webhook\"," +
"\"url\": \"https://yourcompany.com/scraperapi\"" +
"}," +
"\"url\": \"https://example.com\"" +
"}").getBytes(StandardCharsets.UTF_8));
}
int responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
try (BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()))) {
String readLine = null;
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
System.out.println(response);
}
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
Using a callback you don’t need to use the status URL (although you still can) to fetch the status and results of the job. Once the job is finished the provided webhook URL will be invoked by our system with the same content as the status URL provides.
Just replace the https://yourcompany.com/scraperapi
URL with your preferred endpoint. You can even add basic auth to the URL in the following format: https://user:pass@yourcompany.com/scraperapi
Note #1: The system will try to invoke the webhook URL 3 times, then it cancels the job. So please make sure that the webhook URL is available through the public internet and will be capable of handling the traffic that you need.
Note #2: If you set a webhook, the status endpoint of a job will only be available while the job is being processed.
Hint: Webhook.site is a free online service to test webhooks without requiring you to build a complex infrastructure.
API parameters
You can use the usual API parameters just the same way you’d use it with our synchronous API. These parameters should go into an apiParams
object inside the POST data, e.g:
{
"apiKey": "xxxxxx",
"apiParams": {
"autoparse": false, // boolean
"binary_target": false, // boolean
"country_code": "us", // string, see: https://api.scraperapi.com/geo
"device_type": "desktop", // desktop | mobile
"follow_redirect": false, // boolean
"premium": true, // boolean
"render": false, // boolean
"retry_404": false // boolean
},
"url": "https://example.com"
}
Requests to the API EndpointMethod #2
ScraperAPI exposes a single API endpoint for you to send GET requests. Simply send a GET request to http://api.scraperapi.com
with two query string parameters and the API will return the HTML response for that URL:
api_key
which contains your API key, andurl
which contains the url you would like to scrape
You should format your requests to the API endpoint as follows:
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip"
Sample Code
try {
String apiKey = "APIKEY";
String url = "http://api.scraperapi.com?api_key=" + apiKey + "&url=http://httpbin.org/ip";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
To enable other API functionality when sending a request to the API endpoint simply add the appropriate query parameters to the end of the ScraperAPI URL.
For example, if you want to enable Javascript rendering with a request, then add render=true
to the request:
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true"
Sample Code
try {
String apiKey = "APIKEY";
String url = "http://api.scraperapi.com?api_key=" + apiKey + "&url=http://httpbin.org/ip&render=true";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
To use two or more parameters, simply separate them with the “&” sign.
"http://api.scraperapi.com?api_key=APIKEY&url=http://httpbin.org/ip&render=true&country_code=us"
Sample Code
try {
String apiKey = "APIKEY";
String url = "http://api.scraperapi.com?api_key=" + apiKey + "&url=http://httpbin.org/ip&render=true&country_code=us";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
Requests via ScraperAPI SDK Method #3
To make it easier to get up and running you can also use our Python SDK. First, you will need to install the SDK:
install the library: https://search.maven.org/artifact/com.scraperapi/sdk/1.0
Then integrate it into your code:
Sample Code
// remember to install the library: https://search.maven.org/artifact/com.scraperapi/sdk/1.0
import com.scraperapi
ScraperApiClient client = new ScraperApiClient("APIKEY");
client.get("http://httpbin.org/ip")
.result();
To enable extra functionality while using our SDK, simply add an additional parameter to the GET request:
ScraperApiClient client = new ScraperApiClient("APIKEY");
client.get("http://httpbin.org/ip")
.render(true)
.result();
To use two or more parameters, simply add them to the GET request:
ScraperApiClient client = new ScraperApiClient("APIKEY");
client.get("http://httpbin.org/ip")
.render(true)
.premium(true)
.result();
Send Requests to the Proxy Port Method #4
To simplify implementation for users with existing proxy pools, we offer a proxy front-end to the API. The proxy will take your requests and pass them through to the API which will take care of proxy rotation, captchas, and retries.
The proxy mode is a light frontend for the API and has all the same functionality and performance as sending requests to the API endpoint.
The username
for the proxy is scraperapi and the password
is your API key.
"http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001"
You can use the ScraperAPI proxy the same way as you would use any other proxy:
try {
String apiKey = "APIKEY";
String proxy = "http://scraperapi:" + apiKey + "@proxy-server.scraperapi.com";
URL server = new URL("https://httpbin.org/ip");
Properties systemProperties = System.getProperties();
systemProperties.setProperty("http.proxyHost", proxy);
systemProperties.setProperty("http.proxyPort", "8001");
HttpURLConnection httpURLConnection = (HttpURLConnection) server.openConnection();
httpURLConnection.connect();
String readLine = null;
int responseCode = httpURLConnection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(httpURLConnection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
Note: So that we can properly direct your requests through the API, your code must be configured to not verify SSL certificates.
To enable extra functionality whilst using the API in proxy mode, you can pass parameters to the API by adding them to username, separated by periods.
For example, if you want to enable Javascript rendering with a request, the username would be scraperapi.render=true
"http://scraperapi.render=true:APIKEY@proxy-server.scraperapi.com:8001"
Multiple parameters can be included by separating them with periods; for example:
"http://scraperapi.render=true.country_code=us:APIKEY@proxy-server.scraperapi.com:8001"
Making POST/PUT Requests
Some advanced users may want to send POST/PUT requests in order to scrape forms and API endpoints directly. You can do this by sending a POST/PUT request through ScraperAPI. The return value will be stringified. So if you want to use it as JSON, you will need to parse it into a JSON object.
try {
String apiKey = "YOURAPIKEY";
String url = "http://api.scraperapi.com?api_key=" + apiKey + "&url=http://httpbin.org/ip";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
API Status Codes
The API will return a specific status code after every request depending on whether the request was successful, failed or some other error occurred. ScraperAPI will retry failed requests for up to 60 seconds to try and get a successful response from the target URL before responding with a 500
error indicating a failed request.
Note: To avoid timing out your request before the API has had a chance to complete all retries, remember to set your timeout to 60 seconds.
In cases where a request fails after 60 seconds of retrying, you will not be charged for the unsuccessful request (you are only charged for successful requests, 200
and 404
status codes).
Errors will inevitably occur so on your end make sure you catch these errors. You can configure your code to immediately retry the request and in most cases, it will be successful. If a request is consistently failing then check your request to make sure that it is configured correctly. Or, if you are consistently getting bans messages from an anti-bot, then create a ticket with our support team and we will try to bypass this anti-bot for you.
If you receive a successful 200
status code response from the API but the response contains a CAPTCHA, please contact our support team and we will add it to our CAPTCHA detection database. Once included in our CAPTCHA database the API will treat it as a ban in the future and automatically retry the request.
Below are the possible status codes you will receive:
Status Code | Details |
---|---|
200 |
Successful response |
404 |
Page requested does not exist. |
410 |
Page requested is no longer available. |
500 |
After retrying for 60 seconds, the API was unable to receive a successful response. |
429 |
You are sending requests too fast, and exceeding your concurrency limit. |
403 |
You have used up all your API credits. |
Credits and Requests
Every time you make a request, you consume credits. The credits you have available are defined by the plan you are currently using. The amount of credits you consume depends on the domain and parameters you add to your request.
Domains
We have created custom scrapers to target these sites, if you scrape one of these domains you will activate this scraper and the credit cost will change.
Category | Domain examples | Credit Cost |
---|---|---|
SERP | Google, Bing | 25 |
E-Commerce | Amazon, Allegro | 5 |
Protected Domains | Linkedin, G2, Social Media Sites | 30 |
Geotargeting is included in these credit costs.
If your domain does not fit in any category in the above list, it will cost 1
credit if no additional parameters are added.
Parameters
According to your needs, you may want to access different features on our platform.
premium=true
– requests cost 10 credits
render=true
– requests cost 10 credits
premium=true
+ render=true
– requests cost 25 credits
ultra_premium=true
– requests cost 30 credits
ultra_premium=true
+ render=true
– requests cost 75 credits
In any requests, with or without these parameters, we will only charge for successful requests (200
and 404
status codes). If you run out of credits sooner than planned, you can renew your subscription early as explained in the next section – Dashboard.
Dashboard
The Dashboard shows you all the information so you can keep track of your usage. It is divided into Usage, Billing, Documentation, and links to contact us.
Usage
Here you will find your API key, sample codes, monitoring tools, and usage statistics such as credits used, concurrency, failed requests, your current plan, and the end date of your billing cycle.
Billing
In this section, you are able to see your current plan, the end date of the current billing cycle, as well as subscribe to another plan. This is also the right place to set or update your billing details, payment method, view your invoices, manage your subscription, or cancel it.
Here you can also renew your subscription early, in case you run out of API credits sooner than planned. This option will let you renew your subscription at an earlier date than planned, charging you for the subscription price and resetting your credits. If you find yourself using this option regularly, please reach out to us so we can find a plan that can accommodate the number of credits you need.
Documentation
This will direct you to our documentation – exactly where you are right now.
Contact Support
Direct link to reach out to our technical support team.
Contact Sales
Direct link to our sales team, for any subscription or billing-related inquiries.
Customize API Functionality
ScraperAPI enables you to customize the API’s functionality by adding additional parameters to your requests. The API will accept the following parameters:
Parameter | Description |
---|---|
render |
Activate javascript rendering by setting render=true in your request. The API will automatically render the javascript on the page and return the HTML response after the javascript has been rendered.
Requests using this parameter cost 10 API credits, or 75 if used in combination with ultra-premium |
country_code |
Activate country geotargeting by setting country_code=us to use US proxies for example.
This parameter does not increase the cost of the API request. |
premium |
Activate premium residential and mobile IPs by setting premium=true . Using premium proxies costs 10 API credits, or 25 API credits if used in combination with Javascript rendering render=true . |
session_number |
Reuse the same proxy by setting session_number=123 for example.
This parameter does not increase the cost of the API request. |
keep_headers |
Use your own custom headers by setting keep_headers=true along with sending your own headers to the API.
This parameter does not increase the cost of the API request. |
device_type |
Set your requests to use mobile or desktop user agents by setting device_type=desktop or device_type=mobile .
This parameter does not increase the cost of the API request. |
autoparse |
Activate auto parsing for select websites by setting autoparse=true . The API will parse the data on the page and return it in JSON format.
This parameter does not increase the cost of the API request. |
ultra_premium |
Activate our advanced bypass mechanisms by setting ultra_premium=true .
Requests using this parameter cost 30 API credits, or 75 if used in combination with javascript rendering. |
Rendering Javascript
If you are crawling a page that requires you to render the javascript on the page to scrape the data you need, then we can fetch these pages using a headless browser. To render javascript, simply set render=true
and we will use a headless Google Chrome instance to fetch the page. This feature is only available on the Business and Enterprise plans.
try {
String apiKey = "APIKEY";
String url = "http://api.scraperapi.com?api_key=" + apiKey + "&url=http://httpbin.org/ip&render=true";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
Custom Headers
If you would like to use your own custom headers (user agents, cookies, etc.) when making a request to the website, simply set keep_headers=true
and send the API the headers you want to use. The API will then use these headers when sending requests to the website.
Note: Only use this feature if you need to send custom headers to retrieve specific results from the website. Within the API we have a sophisticated header management system designed to increase success rates and performance on difficult sites. When you send your own custom headers you override our header system, which oftentimes lowers your success rates. Unless you absolutely need to send custom headers to get the data you need, we advise that you don’t use this functionality.
If you need to get results for mobile devices, use the device_type
parameter to set the user-agent header for your request, instead of setting your own.
try {
String apiKey = "APIKEY";
String url = "http://api.scraperapi.com?api_key=" + apiKey + "&url=http://httpbin.org/anything&keep_headers=true";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection httpURLConnection = (HttpURLConnection) urlForGetRequest.openConnection();
httpURLConnection.setRequestProperty("Content-Type", "application/json");
httpURLConnection.setRequestProperty("X-MyHeader", "123");
httpURLConnection.setRequestMethod("GET");
int responseCode = httpURLConnection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(httpURLConnection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
Sessions
To reuse the same proxy for multiple requests, simply use the session_number
parameter by setting it equal to a unique integer for every session you want to maintain (e.g. session_number=123
). This will allow you to continue using the same proxy for each request with that session number. To create a new session simply set the session_number
parameter with a new integer to the API. The session value can be any integer. Sessions expire 15 minutes after the last usage.
try {
String apiKey = "APIKEY";
String url = "http://api.scraperapi.com?api_key=" + apiKey + "&url=http://httpbin.org/ip&session_number=123";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
Geographic Location
Certain websites (typically e-commerce stores and search engines) display different data to different users based on the geolocation of the IP used to make the request to the website. In cases like these, you can use the API’s geotargeting functionality to easily use proxies from the required country to retrieve the correct data from the website.
To control the geolocation of the IP used to make the request, simply set the country_code
parameter to the country you want the proxy to be from and the API will automatically use the correct IP for that request.
For example: to ensure your requests come from the United States, set the country_code
parameter to country_code=us
.
Business and Enterprise Plan users can geotarget their requests to the following 13 regions (Hobby and Startup Plans can only use US and EU geotargeting) by using the country_code
in their request:
Country Code | Region | Plans |
---|---|---|
us |
United States | Hobby Plan and higher. |
eu |
Europe | Hobby Plan and higher. |
ca |
Canada | Business Plan and higher. |
uk |
United Kingdom | Business Plan and higher. |
de |
Germany | Business Plan and higher. |
fr |
France | Business Plan and higher. |
es |
Spain | Business Plan and higher. |
br |
Brazil | Business Plan and higher. |
mx |
Mexico | Business Plan and higher. |
in |
India | Business Plan and higher. |
jp |
Japan | Business Plan and higher. |
cn |
China | Business Plan and higher. |
au |
Australia | Business Plan and higher. |
Geotargeting “eu” will use IPs from any European country.
Other countries are available to Enterprise customers upon request.
try {
String apiKey = "APIKEY";
String url = "http://api.scraperapi.com?api_key=" + apiKey + "&url=http://httpbin.org/ip&country_code=us";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
Premium Residential/Mobile Proxy Pools
Our standard proxy pools include millions of proxies from over a dozen ISPs and should be sufficient for the vast majority of scraping jobs. However, for a few particularly difficult to scrape sites, we also maintain a private internal pool of residential and mobile IPs. This pool is only available to users on the Business plan or higher.
Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API credits against your monthly limit). Each request that uses both javascript rendering and our premium proxy pools will be charged at 25 times the normal rate (every successful request will count as 25 API credits against your monthly limit). To send a request through our premium proxy pool, please set the premium
query parameter to premium=true
.
We also have a higher premium level that you can use for really tough targets, such as LinkedIn. You can access these pools by adding the ultra_premium=true
query parameter. These requests will use 30 API credits against your monthly limit, or 75 if used together with rendering. Please note, this is only available on our paid plans.
try {
String apiKey = "APIKEY";
String url = "http://api.scraperapi.com?api_key=" + apiKey + "&url=http://httpbin.org/ip&premium=true";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
Device Type
If your use case requires you to exclusively use either desktop or mobile user agents in the headers it sends to the website then you can use the device_type
parameter.
- Set
device_type=desktop
to have the API set a desktop (e.g. iOS, Windows, or Linux) user agent. Note: This is the default behavior. Not setting the parameter will have the same effect. - Set
device_type=mobile
to have the API set a mobile (e.g. iPhone or Android) user agent.
Note: The device type you set will be overridden if you use keep_headers=true
and send your own user agent in the requests header.
try {
String apiKey = "APIKEY";
String url = "http://api.scraperapi.com?api_key=" + apiKey + "&url=http://httpbin.org/ip&device_type=mobile";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
Autoparse
For select websites the API will parse all the valuable data in the HTML response and return it in JSON format. To enable this feature, simply add autoparse=true
to your request and the API will parse the data for you. Currently, this feature works with Amazon, Google Search, and Google Shopping.
try {
String apiKey = "APIKEY";
String url = "http://api.scraperapi.com?api_key=" + apiKey + "&url=http://httpbin.org/ip&autoparse=true";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
Wait for Element when rendering
If a rendered request is a bit slow and the page stabilises before the request is satisfied, it can fool the API into thinking the page has finished rendering.
To cope with this, you can tell the API to wait for an dom element (selector) to appear on the page when rendering. You just need to send the wait_for_selector=
parameter, passing a URL-encoded jquery selector. The API will then wait for this to appear on the page before returning results.
Account Information
If you would like to monitor your account usage and limits programmatically (how many concurrent requests you’re using, how many requests you’ve made, etc.) you may use the /account endpoint, which returns JSON.
Note: the requestCount
and failedRequestCount
numbers only refresh once every 15 seconds, while the concurrentRequests
number is available in real time.
try {
String apiKey = "APIKEY";
String url = "http://api.scraperapi.com/account?api_key=" + apiKey;
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}