Scrape Amazon ASIN: How to Do It at Scale

Scrape Amazon ASIN: How to Do It at Scale

Picture this: in just two steps, you can save countless hours of development and effortlessly scale your Amazon scrapers to millions of products. Sounds good?

In this tutorial, we’re going to show you how to collect Amazon product data using its ASIN number and how to scale the process by automatically extracting hundreds of ASIN codes in just a couple of seconds.

Turn Amazon Product Pages Into JSON Data
Extract detailed product data with a simple API call using ScraperAPI’s Amazon API.

What’s an Amazon ASIN Number?

The Amazon Standard Identification Number (ASIN) is a unique alphanumeric code used to ensure accurate product identification, tracking, and cataloging on the Amazon platform, enhancing searchability and facilitating efficient e-commerce operations.

How do I find the ASIN of an Amazon URL?

You can find a product’s ASIN on its URL and the product page itself.

As an example, let’s search for “dynamic microphone” and click on the first product in the list.

Microphones on Amazon

If you scroll down, you’ll find a product information table where you’ll find the ASIN code and other details about the product.

Amazon Product Information ASIN

Also, as mentioned before, if you check the product’s URL, you’ll quickly discover the same ASIN code within it.

https://www.amazon.com/Professional-Unidirectional-Microphone-Connection-PDMIC58/dp/B003GEBGA0/ref=sr_1_1?keywords=dynamic+microphone&qid=1687965666&sprefix=dynamic+micr%2Caps%2C218&sr=8-1

This code usually starts with B0, or it follows the /dp/ directory.

Why Scrape Amazon ASIN Data with ScraperAPI

Although you could go the traditional way and scrape the Amazon ASIN data by spending hours building your own parsers and handling all web scraping complexities. Or, you can scrape Amazon ASIN data using ScraperAPI’s Amazon API to quickly turn any Amazon ASIN (product page) into structured JSON data with a simple API call.

After sending your request through ScraperAPI, it will:

  • Automatically bypass any anti-scraping mechanism thrown your way.
  • Use machine learning and years of statistical analysis to choose the right IP + headers combination to ensure a successful request.
  • Return all product information in easy-to-use JSON format.

Let’s use the example above to show you how ScraperAPI’s Amazon API can save you hours in development and help you scale your Amazon scrapers to millions of products without getting blocked in two steps.

1. Set Up Your Project

First things first, let’s create a new directory for our project and a file inside named amazon_scraper.py.

For this project, the only thing you need to import will be Requests, which we’ll use to send our HTTP request through.

import requests #pip install requests

Now, before you can use the Amazon API, create a free ScraperAPI account to get access to your API key. You’ll find it on your dashboard under getting started:

Getting Started with ScraperAPI

With the API key on hand, let’s create our payload to pass the parameters of our scraping jobs to the endpoint.

payload = {
'api_key': 'YOUR_API_KEY',
'asin': 'B003GEBGA0',
'country': 'us',
'tld': 'us'
}

A couple of things to consider:

  • The country parameter will tell ScraperAPI from where it should send the request (in this example, ScraperAPI will only use U.S-based proxies).
  • The TLD parameter will specify the Amazon domain you want to scrape – for this example, we’re setting it to the U.S as a demonstration, but when it is not defined, it’ll default to the U.S Amazon domain (amazon.com).

2. Send Your Request Through the Amazon Product Endpoint

With the payload in place, it’s time to send your request through the https://api.scraperapi.com/structured/amazon/product endpoint:

response = requests.get(
'https://api.scraperapi.com/structured/amazon/product', params=payload)
print(response.text)

And that’s it! After running your code, it’ll turn the entire product page into a JSON response, returning all the product details and relevant information from the ASIN.

{
   "name":"Pyle-Pro Includes 15ft XLR Cable to 1/4'' Audio Connection, Connector, Black, 10.10in. x 5.00in. x 3.30in. (PDMIC58)",
   "product_information":{
      "item_weight":"1 pounds",
      "product_dimensions":"10.25 x 5.25 x 3.5 inches",
      "domestic_shipping":"Item can be shipped within U.S.",
      "international_shipping":"This item can be shipped to select countries outside of the U.S.  Learn More (https://www.amazon.com/gp/help/customer/display.html/ref=help_search_1-1?nodeId=201117930)",
      "country_of_origin":"China",
      "asin":"B003GEBGA0",
      "item_model_number":"PDMIC58",
      "customer_reviews":{
         "ratings_count":6722,
         "stars":"4.4    4.4 out of 5 stars"
      },...
}

Check the entire Amazon Product API sample response

This response will include data points like:

  • “name”
  • “product_dimensions”
  • “country_of_origin”
  • “customer_reviews”
  • “best_sellers_rank”
  • “date_first_available”
  • “pricing”
  • “images”
  • “feature_bullets”

And much more.

3. Picking Specific Data Points (Bonus)

Based on your needs, you’ll want to export only specific data points from your target Amazon ASIN. The good news is that because the response is structured in predictable key pairs, picking the data you need only takes a few seconds.

Let’s extract the name, description, price, and average rating from the product above:

import requests

product_data = []

payload = {
'api_key': 'YOUR_API_KEY',
'asin': 'B003GEBGA0',
'country': 'us',
}

response = requests.get(
'https://api.scraperapi.com/structured/amazon/product', params=payload)
product = response.json()

product_data.append({
'Product Name': product["name"],
'Product Description': product["full_description"],
'Product Price': product["pricing"],
'Avg Rating': product["average_rating"]
})

print(product_data)

Note: We appended everything to an empty array to give it a better structure. Plus, it’s faster to export the data into a CSV or JSON file using Pandas this way.

Here’s what the response looks like now:

[
{
"Product Name":"Pyle-Pro Includes 15ft XLR Cable to 1/4'' Audio Connection, Connector, Black, 10.10in. x 5.00in. x 3.30in. (PDMIC58)",
"Product Description":"Features:- Built-in Acoustic Pop Filter- Ultra-Wide Frequency Response- High Signal Output for Vocals & Singing- Rugged Construction & Steel Mesh Grill- Integrated Low Noise Circuitry- Includes: 15' ft. XLR to 1/4'' Audio Connection Cable- Perfect for Stage Performances or In-Studio UseTechnical Specs:- Mic Element/Type: Dynamic- Pickup/Polar Pattern: Uni-Directional- Mic Body Material: Zinc Alloy Metal- Frequency Response: 50Hz-15KHz- 600 Ohm Output Impedance (+/-)30%- Microphone Sensitivity: -54dB (+/-)3db(0dB=1V/Pa @ 1KHz)- Dimensions: Φ1.96'' x 6.41''- Sold as: Unit- Weight: 1.48 lbs.—&mdash. —Your satisfaction is our #1 priority.If this item fails to meet your expectations we will accept it back a full refund within the first 30 days.Let us know if the item should show any defect within the first year we will help exchange it for a new one.",
"Product Price":"$13.94",
"Avg Rating":4.4
}
]

Congratulations, you just scraped your first Amazon ASIN!

Start Collecting Amazon Product Data in Minutes

Scrape millions of Amazon products with a simple API call.

How to Collect ASIN Numbers from Amazon Search

As you can see, our Amazon Product endpoint makes it super easy to scale your data collection efforts. All you need is the ASIN code for the products you want details from.

Of course, we don’t expect you to make a list of ASIN codes manually. That would be a waste of time.

A better way to collect these codes is by scraping Amazon search and extracting the ASIN code from all the products listed.

Note: You can combine the following method to collect the ASIN numbers and then pass them through our Amazon Product endpoint to collect the products’ details.

1. Understanding Amazon’s Search Page Results

Navigate to Amazon and search for dynamic microphone xlr. You’ll notice that the resulting URL is quite long.

Amazon URL

However, the actual part you’ll need is at the beginning of the URL. In this section, you’ll find the query you’re searching for: https://www.amazon.com/s?k=dynamic+microphone+xlr – and it’s the URL you’ll use to send the request.

On the page, you can see many products, but there’s no ASIN number to be found. At least not at first glance.

By inspecting the page, we can see that each product card is wrapped in a <div>:

Amazon Product

But most importantly, each <div> element has the product’s ASIN code inside a data-asin attribute, and that’s exactly what we’re going to target.

2. Sending Your Initial Request

For this example, we’re just sending one request to Amazon. However, in a real-world project, you’re probably looking to collect all the ASIN codes you can by navigating through multiple paginations – in other words, by sending hundreds, thousands, or even millions of requests.

Because of this, without using the proper tool, your IP will most likely be blocked after a couple of requests.

To avoid any risk, let’s use ScraperAPI’s standard API to send our request and let it take care of the technical challenges like IP rotation, CAPTCHA handling, and retries.

import requests
from bs4 import BeautifulSoup

asins = []

payload = {
'api_key': 'YOUR_API_KEY',
'render': 'true',
'url': 'https://www.amazon.com/s?k=dynamic+microphone+xlr'
}

response = requests.get('https://api.scraperapi.com', params=payload)

We’re adding the render parameter to avoid any JavaScript limitations. By setting it to true, we’re telling ScraperAPI to render the page before sending us the HTML response. Also, we added an empty array at the top (asins) where we’ll store all the ASIN numbers.

However, because we’re now working with an HTML response, we’ll need to parse it with BeautifulSoup:

s = BeautifulSoup(response.content, 'html.parser')

3. Extracting the Products’ ASIN Codes

As we mentioned before, every product is listed like a product card represented by a <div> element, and the code we need is inside the data-asin attribute:

Amazon Single Product

To get hold of it, we’ll first store all these elements inside a single variable we can then iterate through:

products = s.select('div[data-asin]')

Next, we’ll create a for loop and store each ASIN code inside asins using the .append() method:

for product in products:
asins.append(product.attrs['data-asin'])

print(asins)

There’s one issue, though. If you run the code as is, it’ll also return empty elements. That’s because even though all divs have the data-asin attribute, some of them don’t contain any value.

Resulting in something like this:

['', 'B082YHPC3Z', 'B09KKGPGQR', 'B09T95TRMH', '', 'B09BZZCGC8', 'B09BFPNW2J', 'B0BMFQP2ZZ', '', '', 'B000CZ0R42', 'B00TTQM8Z6', 'B001R747SG', 'B0BC3XB26X', 'B0B8SNVK5K', 'B0002D0HY4', '', 'B0002D0HY4', 'B0BC3XB26X', 'B07L6C5VRW', 'B09BFPNW2J', 'B003GEBGA0', 'B00TTQM8Z6', 'B0BSFG4SCW', 'B09BFPNW2J', 'B09MRSQDJ7', '', 'B0002E4Z8M', 'B0006H92QK', 'B07MSCRCVK', 'B0BS3QSDW9', 'B003GEBGA0', 'B0BZP61K18', 'B07L6C5VRW', 'B08XXGSLPK', 'B01HZAGWEA', 'B005BSOVRY', 'B08F8LQ3MG', 'B0B8GRCXB6', 'B08PM59YXV', 'B0BW87WZSV', 'B08ML13KSB', 'B0BRKFP94K', 'B09ZQPXFPV', 'B09KLTM8V2', 'B01DP0HHVQ', 'B0BY5PXG8Z', 'B0B3LW3FNC', 'B09625N64B', 'B094C7NKXR', 'B089JLLDQB', 'B08JPJVJ6V', 'B085LPMPBT', 'B0BVYZQCCH', 'B0C2JNK3ZZ', 'B0BVYF5MM3', 'B007JX8O0Y', 'B09F9MGWXC', 'B074HZFG3P', 'B0C2J7F4Y3', 'B08F28FDK4', 'B07RPT2P86', 'B00PL09UKK', 'B0BLCV1FV2', 'B0BB13VY4X', 'B08G7RG9ML', '', '', '', '', '', '']

To fix this, we can add an if statement so we only append ASIN numbers and not empty items:

for product in products:
if product.attrs['data-asin'] != '':
asins.append(product.attrs['data-asin'])

Running it this time will return a list of only ASIN numbers:

['B082YHPC3Z', 'B09KKGPGQR', 'B09T95TRMH', 'B0002D0HY4', 'B00TTQM8Z6', 'B000CZ0R42', 'B0002D0HY4', 'B0BC3XB26X', 'B07L6C5VRW', 'B09BFPNW2J', 'B003GEBGA0', 'B00TTQM8Z6', 'B0BSFG4SCW', 'B09BFPNW2J', 'B09MRSQDJ7', 'B0BS3QSDW9', 'B07L6C5VRW', 'B001R747SG', 'B0BC3XB26X', 'B09BZZCGC8', 'B0002E4Z8M', 'B09BFPNW2J', 'B0BMFQP2ZZ', 'B0B8SNVK5K', 'B0BW87WZSV', 'B003GEBGA0', 'B08XXGSLPK', 'B08ML13KSB', 'B0BZP61K18', 'B07MSCRCVK', 'B0006H92QK', 'B08F8LQ3MG', 'B01HZAGWEA', 'B005BSOVRY', 'B0B8GRCXB6', 'B09ZQPXFPV', 'B089JLLDQB', 'B0BRKFP94K', 'B09KLTM8V2', 'B0B3LW3FNC', 'B007JX8O0Y', 'B094C7NKXR', 'B0BB13VY4X', 'B0000AQRST', 'B074HZFG3P', 'B088FH47ZS', 'B0BBVY9L7C', 'B0BY5PXG8Z', 'B002T45X1G', 'B08PM59YXV', 'B01DP0HHVQ', 'B07ZPBFVKK', 'B09F9MGWXC', 'B0BVYF5MM3', 'B08JPJVJ6V', 'B00TTQM9A0', 'B085LPMPBT', 'B098SBZK2K', 'B00PL09UKK', 'B09625N64B']

Printing the length of the array – print(len(asins)) – reveals you just scraped 60 ASIN numbers in seconds.

Wrapping Up

To navigate through the pagination and collect all ASIN numbers from a search, you can use a for loop to increase the page parameter within the target URL:

https://www.amazon.com/s?k=dynamic+microphone+xlr&page=2
https://www.amazon.com/s?k=dynamic+microphone+xlr&page=3
https://www.amazon.com/s?k=dynamic+microphone+xlr&page=4

Thanks to ScraperAPI, you’ll never get detected by Amazon anti-scraping mechanisms, and you’ll be able to collect thousands of ASIN numbers from Amazon search as well as collect detailed product information using the Amazon Product API.

Try ScraperAPI with 5,000 free API calls, or contact our sales team to create a custom enterprise plan to enjoy all ScraperAPI premium features, premium support, an account manager, and over 10M API credits.

FAQs about Amazon ASIN scraping

What is the difference between SKU and ASIN on Amazon?

SKU (Stock Keeping Unit) is a unique code assigned by the seller to track inventory internally. ASIN (Amazon Standard Identification Number) is a unique identifier assigned by Amazon to each product, enabling easy searching and listing.

Do All Amazon Items Have an ASIN?

Yes, all items listed on Amazon have an ASIN (Amazon Standard Identification Number), with the exception of books, which use an international standard book number (ISBN).

We can use both the ASIN or ISBN number to scrape product information through ScraperAPI’s Amazon Product API.

Can 2 Products Have the Same ASIN on Amazon?

No, two different products cannot have the same ASIN on Amazon. ASINs are unique identifiers assigned by Amazon to each product, ensuring that each item has a distinct identity within the platform – having duplicated ASIN numbers would defeat the goal for which ASIN codes are designated.

Is It Legal to Scrape Amazon?

Yes, scraping Amazon data is completely legal as long as it is publicly available. For this reason, you should never try to scrape data behind login walls and curate your datasets from any sensitive data or copyrighted content.

About the author

Leonardo Rodriguez

Leonardo Rodriguez

Leo is a technical content writer based in Italy with experience in Python and Node.js. He’s currently working at saas.group as a content marketing manager and is the lead writer for ScraperAPI. Contact him on LinkedIn.

Table of Contents

Related Articles

Talk to an expert and learn how to build a scalable scraping solution.