How to Scrape Etsy Product Data with Python

Tutorial on how to scrape Etsy product data with Python

Why scrape Etsy? Since being founded in 2015, Etsy has become a $30 billion e-commerce company with over 4.4 million active sellers and 81.9 buyers, according to Statista, making it one of the largest marketplaces for non-conventional and creative goods there is.

For us, it also means that there’s a lot of information we can use to make informed decisions.

Whether you’re trying to break into a specific industry by collecting pricing information or analyzing product trends to find profitable niches, building an Etsy web scraper can save you a lot of time and potentially make you good money.

Collect Structured Amazon Data in Seconds

ScraperAPI allows you to extract data from Amazon search results and product pages without writing a single line of code.

In this tutorial, we’ll use Python’s Requests and BeautifulSoup to collect Etsy product data with a simple script.

TL;DR: Full Code

For those in a hurry, here’s the complete code snippet:


	import requests
	from bs4 import BeautifulSoup
	import pandas as pd
	
	
	etsy_products = []
	
	
	for x in range(1, 251):
		response = requests.get(
		f'https://api.scraperapi.com?api_key=Your_API_Key&url=https://www.etsy.com/c/clothing/boys-clothing/costumes?ref=pagination&explicit=1&page={x}')
		soup = BeautifulSoup(response.content, 'html.parser')
	
	
		products = soup.select('li.wt-show-md.wt-show-lg')
		for product in products:
			product_name = product.select_one(
			'h3.v2-listing-card__title').text.strip()
			product_price = product.select_one('span.currency-value').text
			product_url = product.select_one(
			'li.wt-list-unstyled div.js-merch-stash-check-listing a.listing-link')['href']
			etsy_products.append({
				'name': product_name,
				'price': product_price,
				'URL': product_url
			})
	
	
	df = pd.DataFrame(etsy_products)
	df.to_csv('etsy-products.csv', index=False)

Note: You’ll need to add your API key to the api_key= parameter in the request for it to work. Create a free ScraperAPI account to get 5,000 API credits to get started.

Follow along to see how we wrote this script from scratch.

Step-by-Step Guide to Building an Etsy Scraper

For this tutorial, we will scrape Etsy’s costumes section for boys using Requests and BeautifulSoup, and get the name, price, and URL of every product.

Let’s install our dependencies using the following command:


	pip install requests beautifulsoup4

Once those are installed, create a new directory for your project and, within it, an etsy-scraper.py file.

Step 1: Download Etsy’s HTML for Parsing

Without HTML, there is no parsing. With that in mind, let’s send an HTTP request to the server and store the response in a page variable.


	import requests
	from bs4 import BeautifulSoup
	
	
	response = requests.get(
	'https://www.etsy.com/c/clothing/boys-clothing/costumes?ref=pagination&explicit=1&page=1')

Notice that we’re using a different category page’s URL version.

If you navigate to the website manually, you’ll see that this is the original version of the URL:


	https://www.etsy.com/c/clothing/boys-clothing/costumes?ref=catnav-10923

However, when we navigate through the pagination – which we’ll definitely want to do to scrape all pages from this category – we’ll need to give access to each page to our script.

To make it more visual, here’s what the second URL of the pagination looks like:

Costumes for boys section from Etsy


	https://www.etsy.com/c/clothing/boys-clothing/costumes?ref=pagination&page=2

Therefore, we can access page 1 simply by changing the ‘page’ parameter within the URL.

We’ll go deeper into that later in this tutorial, but for now, let’s try to get some data out of this first page.

Step 2: Finding CSS Properties to Target

There are two common ways to define the CSS selectors we’ll need to extract the product’s name, price, and URL.

The first way is to manually go to the page itself and use the Inspector Tool. You can open it with CTRL + Shift + C on Windows or CMD + Shift + C on Mac.

Inspecting the costumes for boys section from Etsy

Or we can use a tool like SelectorGadget to select the correct CSS class for the element we want to scrape.

Showing how to use SelectorGadget on Etsy

After some testing, we’ve determined that there are 64 product listings per page, and these are the CSS classes for each element:

  • Product name: .v2-listing-card__title
  • Product price: .lc-price .lc-price .currency-value
  • Product URL: li.wt-list-unstyled div.js-merch-stash-check-listing a.listing-link

Note: Technically, on the page above are 65 elements. That’s because while testing the scraper, the page adds a recently viewed section.

If you’re following along, you might wonder how we got the properties for the product URL. That’s where things get tricky.

Using SelectorGadget for Logic Testing

Not every website is built the same way, so there’s no way to get the CSS properties directly from SelectorGadget, because the product’s URL is not in a particular element every time.

For example, the product’s URL tends to be inside the same element as the product’s name on many websites. That’s not the case for Etsy, though.

However, there’s another way we can use SelectorGadget to find CSS elements like URLs: testing the properties.

With SelectorGadget open, we first found the highest-level element for the product cards themselves.

Identify the product cards using SelectorGadget

Inside the li element, we want to find the main div where the product’s URL is contained.

Selecting the product URL using SelectorGadget

Next, we know we want to target the a tag within that div to find the URL. We used SelectorGadget to verify that we were picking 64 elements to verify that our logic was correct.

Testing our product listing selector

That’s why it’s so important to understand how web pages are structured when web scraping.

Tip: You can use our CSS selectors cheat sheet to learn more about CSS selectors and to help you speed up scraper development time.

Now that we know how to select each element to get the data we want, let’s test our script.

Step 3: Testing Our Parsing Script

For this test, we want to run the whole script to extract the names of the 64 products on the page.

Here’s where BeautifulSoup will come in handy, allowing us to parse the raw HTML and pick specific elements based on CSS selectors.

To get started, pass response through BS4:


	soup = BeautifulSoup(response.content, 'html.parser')

In order to get the name of every product, we first need to get all the listings on the page (so we can loop through them) into a single variable:


	products = soup.select('li.wt-show-md.wt-show-lg')

products contains all the li elements and the information within them.

HTML product listings from Etsy

By creating a loop, we can extract all the information we want from them.

Now, let’s use the selector we chose earlier to extract each product’s name and print() them to the console.


	for product in products:
	    product_name = product.select_one('h3.v2-listing-card__title').text
		print(product_name)

Here’s what we get:

List of Etsy product names printed to the console

There’s a problem: the script is picking a lot of whitespace, which will add noise information to our dataset. A quick fix for this is to add .strip().


	product_name = product.select_one('h3.v2-listing-card__title').text.strip()

Now all unnecessary whitespaces are removed from the strings:

List of cleaned Etsy product names

Sweet, 64 clean product names extracted in just a couple of seconds!

Step 4: Get the Rest of the Elements Using the Same Logic

Alright, now that we know our script is working, let’s scrape the price and URL of each of these elements on the page.


	product_price = product.select_one('span.currency-value').text

List of Etsy product prices printed to the console


	product_url = product.select_one('li.wt-list-unstyled div.js-merch-stash-check-listing a.listing-link')['href']

List of Etsy product URLs printed to the console

In the case of attributes, instead of using the .text() method, we tell BS4 to extract the value inside the attribute selected – in this case, href.

Step 5: Sending Our Data to a CSV

Alright, so far, our scraper is working perfectly. However, logging the data into our console might not be the best way to handle it. Let’s create a data frame and export it as a CSV file to organize our data for later analysis.

To do this, we’ll need to make a couple of changes.

1. Import Pandas to your project and create an empty array where we’ll store all the data we’re scraping.


	import pandas as pd

	etsy_products = []

2. Append the extracted data to the array using the .append() method within your for loop.


	etsy_products.append({
		'name': product_name,
		'price': product_price,
		'URL': product_url
		})		

If you print() this etsy_products, this is what you get now:


	{
		"name":"Mushroom headpiece  with dots for toddlers, kids and adults",
		"price":"23.10",
		"URL":"https://www.etsy.com/listing/741760095/mushroom-headpiece-with-dots-for?click_key=4a9ed56076feafe16426315e47a781c619b7f987%3A741760095&click_sum=22169a2c&ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=&ref=search_grid-380536-1-1&etp=1&sts=1&referrer_page_guid=f7038e73a27.53aac11d85fd18db9710.00"
	 },
	 {
		"name":"Personalized Safari Jumpsuit One Piece Long-short*Safari Adventure Costume for Kids* 12-24-36 Months Toddler Safari Halloween Kids Costume",
		"price":"84.95",
		"URL":"https://www.etsy.com/listing/1185173222/personalized-safari-jumpsuit-one-piece?click_key=249b111b036ffc9d24e0f5495d443c0ae919e00a%3A1185173222&click_sum=4191b104&ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=&ref=search_grid-380536-1-2&etp=1&sts=1&referrer_page_guid=f7038e73a27.53aac11d85fd18db9710.00"
	 }
	 # ... truncated ...  

You can also print the length of the list:

Printing the length of our array

3. Create a data frame using pandas and export it as CSV.


	df = pd.DataFrame(etsy_products)
	df.to_csv('etsy-products.csv', index=False)

If you run your code, a new CSV file will be generated inside your project’s directory.

Showing Etsy product data exported a to CSV file

Congratulations, you just scraped Etsy!

Collecting Etsy Product Data at Scale

Now that we’ve tested our scraper on one page, we’re ready to scale things by letting it move to other pages and extract the same data.

For this to work, we’ll have to:

  1. Create a for loop to navigate through the pagination
  2. Send our requests using ScraperAPI to avoid getting blocked

1. Looping Through the Pagination

A few things to consider when writing a for loop:

  • Analyze the URL of the paginated pages to determine how many of them are there – in our example category, there are 250 pages in total.
  • Take a look at how numbers are increasing – in our case, the pagination increases by one in the URL with each page.

With these two things in mind, we can build the loop.


	for x in range(1, 251):
	response = requests.get(f'https://www.etsy.com/c/clothing/boys-clothing/costumes?ref=pagination&explicit=1&page={x}')

We’re setting the loop to start in 1 until it hits (but not including) 251. In other words, the x variable within the URL will be replaced by 1, all the way to 251 when the script will stop – it won’t send a request to “251”.

However, it won’t take too long for Etsy to block our scraper. For it to work, let’s integrate ScraperAPI into the script.

2. Using ScraperAPI for Scalability

Etsy uses several methods to keep scripts from accessing and extracting too many pages in a short amount of time. To avoid these measures, we would have to create a function that:

  • Changes our IP address
  • Have access to a pool of IP addresses for our script to rotate between
  • Create some way to deal with CAPTCHAs
  • Set the correct headers

Just to name a few.

The good news is that we can automate all of this using ScraperAPI.

All we need to do is create a free ScraperAPI account to get access to our API key.

API key from your ScraperAPI dashboard

Then, we’ll make a little adjustment to our code:


	for x in range(1, 251):
	response = requests.get(f'https://api.scraperapi.com?api_key=Your_API_Key&url=https://www.etsy.com/c/clothing/boys-clothing/costumes?ref=pagination&explicit=1&page={x}')

Our request will now go through https://api.scraperapi.com, passing our API key and target URL as parameters.

Note: Remember to substitute Your_API_Key with the actual key.

Put the rest of the code inside the loop and run it. You’re now scraping the entire category!

Wrapping Up

To summarize:

  • The script will send a request through ScraperAPI to Etsy
  • ScraperAPI will use machine learning and statistical analysis to bypass all anti-bot mechanisms and return the raw HTML
  • The script will then download the HTML and parse it using BeautifulSoup
  • From the soup object, it’ll create a list with all the listings on the page
  • Then, it’ll loop through the listings to extract name, price, and product URL
  • Next, the script will append all this data together (as an item) to the etsy_products array
  • Once this is done for all the products within the page, the script will send the next request until hitting 251

The result is a CSV file filled with Etsy product data.

Of course, you can select more elements from the page (like the seller’s name and stars), and change the category URL to collect data from other categories.

If you want to learn how to scrape other websites and build scrapers in different languages, here are a few of our top resources to keep improving your craft:

Until next time, happy scraping!

About the author

Leonardo Rodriguez

Leonardo Rodriguez

Leo is a technical content writer based in Italy with experience in Python and Node.js. He’s currently working at saas.group as a content marketing manager and is the lead writer for ScraperAPI. Contact him on LinkedIn.

Table of Contents

Related Articles

Talk to an expert and learn how to build a scalable scraping solution.