How to Scrape Target.com Product Data [Complete Guide]

Tutorial on how to scrape Target products with javascript

Target has millions of products ready for analysts, marketing teams, and businesses to be scraped by engineers to drive business decisions that will help them gain a competitive advantage in their industry.

However, it can be difficult to collect this data due to Target’s advanced anti-scraping mechanisms and complex dynamic behaviors.

Collect Data from All Major Marketplaces

ScraperAPI’s advanced bot-detection bypassing makes it easy to scrape millions of product listings without getting blocked.

In this tutorial, we’ll show you how to build a web scraper using Puppeteer and ScraperAPI’s standard API to help scale our scraper without getting blocked.

TL;DR: Full Target.com Scraper

For those in a hurry, here’s the complete script we’ll build in this tutorial:

<pre class="wp-block-syntaxhighlighter-code">
	const puppeteer = require("puppeteer");
	const cheerio = require("cheerio");
	const Excel = require("exceljs");
	const path = require("path");
	
	const TARGET_DOT_COM_PAGE_URL = 'https://www.target.com/s?searchTerm=headphones&tref=typeahead%7Cterm%7Cheadphones%7C%7C%7Chistory';
	const EXPORT_FILENAME = 'products.xlsx';
	
	// ScraperAPI proxy configuration
	PROXY_USERNAME = 'scraperapi';
	PROXY_PASSWORD = '<API_KEY>'; // <-- enter your API_Key here
	PROXY_SERVER = 'proxy-server.scraperapi.com';
	PROXY_SERVER_PORT = '8001';
	const waitFor = (timeInMs) => new Promise(r => setTimeout(r, timeInMs));
	
	const exportProductsInExcelFile = async (productsList) => {
		const workbook = new Excel.Workbook();
		const worksheet = workbook.addWorksheet('Headphones');
	
		worksheet.columns = [
			{ key: 'title', header: 'Title' },
			{ key: 'brand', header: 'Brand' },
			{ key: 'currentPrice', header: 'Current Price' },
			{ key: 'regularPrice', header: 'Regular Price' },
			{ key: 'averageRating', header: 'Average Rating' },
			{ key: 'totalReviews', header: 'Total Reviews' },
			{ key: 'link', header: 'Link' },
		];
	
		worksheet.getRow(1).font = {
			bold: true,
			size: 13,
		};
	
		productsList.forEach((product) => {
			worksheet.addRow(product);
		});
	
		const exportFilePath = path.resolve(__dirname, EXPORT_FILENAME);
	
		await workbook.xlsx.writeFile(exportFilePath);
	};
	
	const parseProductReview = (inputString) => {
		const ratingRegex = /(\d+(\.\d+)?)/;
		const reviewCountRegex = /(\d+) ratings/;
	
		const ratingMatch = inputString.match(ratingRegex);
		const reviewCountMatch = inputString.match(reviewCountRegex);
	
		const rating = ratingMatch?.length > 0 ? parseFloat(ratingMatch[0]) : null;
		const reviewCount = reviewCountMatch?.length > 1 ? parseInt(reviewCountMatch[1]) : null;
	
		return { rating, reviewCount };
	};
	
	const webScraper = async () => {
		const browser = await puppeteer.launch({
				ignoreHTTPSErrors: true,
				args: [
	`--proxy-server=http://${PROXY_SERVER}:${PROXY_SERVER_PORT}`
			  ]
		});
	
		const page = await browser.newPage();
	
		await page.authenticate({
			  username: PROXY_USERNAME,
			  password: PROXY_PASSWORD,
		});
	
		await page.setExtraHTTPHeaders({
			"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 13.5; rv:109.0) Gecko/20100101 Firefox/117.0",
		});
	
		await page.goto(TARGET_DOT_COM_PAGE_URL, { timeout: 60000});
	
		const buttonConsentReject = await page.$('.VfPpkd-LgbsSe[aria-label="Reject all"]');
		await buttonConsentReject?.click();
	
		await waitFor(3000);
	
		await page.evaluate(async () => {
			await new Promise((resolve) => {
				let totalHeight = 0;
				const distance = 100;
	
				const timer = setInterval(() => {
					const scrollHeight = document.body.scrollHeight;
					window.scrollBy(0, distance);
					totalHeight += distance;
	
					if (totalHeight >= scrollHeight) {
						clearInterval(timer);
						resolve()
					}
				}, 100);
			});
		});
	
		const html = await page.content();
		await browser.close();
	
		const $ = cheerio.load(html);
	
		const productList = [];
	
		$(".dOpyUp").each((_, el) => {
			const link = $(el).find("a.csOImU").attr('href');
			const title = $(el).find('a.csOImU').text();
			const brand = $(el).find('a.cnZxgy').text();
			const currentPrice = $(el).find("span[data-test='current-price'] span").text();
			const regularPrice = $(el).find("span[data-test='comparison-price'] span").text();
			const reviews = $(el).find(".hMtWwx").text();
	
			const { rating, reviewCount } = parseProductReview(reviews);
	
			productList.push({
				title,
				brand,
				currentPrice,
				regularPrice,
				averageRating: rating,
				totalReviews: reviewCount,
				link,
			});
		});
	
		console.log(`Number of items extracted:`, productList.length);
	
		await exportProductsInExcelFile(productList);
	
		console.log('The products has been successfully exported in the Excel file!');
	};
	
	void webScraper();
</pre>

Want to understand each line of code of this web scraper? Let’s build it from scratch!

Scraping Target.com Product Data

As a use case for scraping Target.com products, we will write a web scraper in Node.js that finds Headphones and extracts the following information for each product:

  1. Description
  2. Price
  3. Average rating
  4. Total reviews
  5. Delivery estimation
  6. Product link

The web scraper will export the data extracted in an Excel file for further usage or analysis.

Prerequisites

You must have these tools installed on your computer to follow this tutorial:

Step 1: Set Up the Project

Let’s create a folder containing the source code of the Target.com scraper


	mkdir target-dot-com-scraper

Enter the folder and initialize a new Node.js project


	cd target-dot-com-scraper

	npm init -y	

The second command above will create a package.json file in the folder.

Next, create a file index.js in which we will write our scraper; keep it empty for now.


	touch index.js

Step 2: Install the Dependencies

To build the Target.com web scraper, we need these three Node.js packages:

  • Puppeteer – to load the website, perform product searches, scroll the page to load more results, and download the HTML content.
  • Cheerio – to extract the information from the HTML downloaded from the Axios request.
  • ExcelJS – to write the extracted data in an Excel workbook.

Run the command below to install these packages:


	npm install puppeteer cheerio exceljs

Step 3: Identify the DOM Selectors to Target

Navigate to https://www.target.com; type “headphones” in the search bar and press enter.

When the search result appears, inspect the page to display the HTML structure and identify the DOM selector associated with the HTML tag wrapping the information we want to extract.

Showing Target's DOM selections with headphone's product example

From the above picture, here are all the DOM selectors the web scraper will target to extract the information:

Information DOM selector
Product’s Title .d0pyUp a.csOImU
Brand .d0pyUp a.cnZxgy
Current price .d0pyUp “span[data-test=’current-price’] span”
Regular price .d0pyUp “span[data-test=comparison-price’] span”
Rating and reviews .d0pyUp .hMtWwx
Target.com’s link .d0pyUp a.csOImU

Be careful when writing the selector because a misspelling will prevent the script from retrieving the correct value.

Step 4: Scrape Target.com’s Product Page

To retrieve the URL to scrape, copy the URL in your browse address bar; this is the URL for the search “headphones”.

Open the index.js file and add the code below:


	const puppeteer = require("puppeteer");

	const waitFor = (timeInMs) => new Promise(r => setTimeout(r, timeInMs));
	
	const webScraper = async () => {
		const browser = await puppeteer.launch({
			headless: false,
			args: ["--disabled-setuid-sandbox", "--no-sandbox"],
		});
		const page = await browser.newPage();
	
		await page.setExtraHTTPHeaders({
			"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 13.5; rv:109.0) Gecko/20100101 Firefox/117.0",
		});
	
		await page.goto("https://www.google.com/maps/search/bookshop/@36.6092093,-129.3569836,6z/data=!3m1!4b1" , {
			waitUntil: 'domcontentloaded',
			timeout: 60000
		});
		
		const buttonConsentReject = await page.$('.VfPpkd-LgbsSe[aria-label="Reject all"]');
		await buttonConsentReject?.click();
	
		await waitFor(5000);
	};
	
	void webScraper();

This code above does the following:

  • Create a browser instance with the headless mode deactivated; it is helpful when building your scraper to view the interaction on the page and have a feedback loop; we will deactivate it when the scraper is ready for production.
  • Define the request User-Agent so Target.com does not treat us as a bot but as a real browser instead; then navigate to the Web page.
  • In most European countries, a page to ask for consent will be displayed before being redirected to the URL; we click the button to reject everything and wait three seconds for the web page to load completely.

Step 5: Implement Page Scroll to Load All the Products

When the page loads, only four products are displayed; you must scroll until the page’s bottom to load more products. We must implement this below to retrieve all products on the page instead of the first four.

In the index.js file, add the code below right after the snippet that waits for the page to load completely:


	await page.evaluate(async () => {
		await new Promise((resolve) => {
		  let totalHeight = 0;
		  const distance = 100;
	   
		  const timer = setInterval(() => {
			const scrollHeight = document.body.scrollHeight;
			window.scrollBy(0, distance);
			totalHeight += distance;
	   
			if (totalHeight >= scrollHeight) {
			  clearInterval(timer);
			  resolve()
			}
		  }, 100);
		});
	   });
	   
	   const html = await page.content();
	   
	   await browser.close();

The code above performs the scroll until the bottom page; once the bottom page is reached, download the HTML content of the page and close the browser instance.

Step 6: Extract Information from the HTML

Now that we have the HTML of the page, we must parse it with Cheerio to navigate it and extract all the information we want.

Cheerio provides functions to load HTML text and then navigate through the structure to extract information using the DOM selectors.

The code below goes through each element, extracts the information, and returns an array containing all the products.


	const  cheerio = require("cheerio")

	// Puppeteer scraping code here
	
	const $ = cheerio.load(html);
	
	const productList = [];
	
	$(".dOpyUp").each((_, el) => {
		const link = $(el).find("a.csOImU").attr('href');
		const title = $(el).find('a.csOImU').text();
		const brand = $(el).find('a.cnZxgy').text();
		const currentPrice = $(el).find("span[data-test='current-price'] span").text();
		const regularPrice = $(el).find("span[data-test='comparison-price'] span").text();
		const reviews = $(el).find(".hMtWwx").text();
	
		const { rating, reviewCount } = parseProductReview(reviews);
	
		productList.push({
		   title,
		   brand,
		   currentPrice,
		   regularPrice,
		   averageRating: rating,
		   totalReviews: reviewCount,
		   link,
		});
	});
	
	console.log(`Number of items extracted:`, productList.length);
	
	console.log(productList);

Step 7: Store the Scraped Data in an Excel File

Storing the extracted data in an Excel file is great for using the data for other usages, such as data analysis.

We will use the package ExcelJS we installed earlier to create a Workbook, map the product property to a column in the workbook, and finally create the file on the disk.

Let’s update the index.js file to add the code below:


	// Existing imports here…
	const path = require("path");
	
	const EXPORT_FILENAME = 'products.xlsx';
	
	const exportProductsInExcelFile = async (productsList) => {
		const workbook = new Excel.Workbook();
		const worksheet = workbook.addWorksheet('Headphones');
	
		worksheet.columns = [
			  { key: 'title', header: 'Title' },
			  { key: 'brand', header: 'Brand' },
			  { key: 'currentPrice', header: 'Current Price' },
			  { key: 'regularPrice', header: 'Regular Price' },
			  { key: 'averageRating', header: 'Average Rating' },
			  { key: 'totalReviews', header: 'Total Reviews' },
			  { key: 'link', header: 'Link' },
		];
	
		worksheet.getRow(1).font = {
			  bold: true,
			  size: 13,
		};
	
		productsList.forEach((product) => {
			  worksheet.addRow(product);
		});
	
		const exportFilePath = path.resolve(__dirname, EXPORT_FILENAME);
	
		await workbook.xlsx.writeFile(exportFilePath);
	};
	
	// Existing code here	

Step 8: Test the implementation

At this step, we have everything in place to run our Web scraper; here is the complete code of the file:


	const puppeteer = require("puppeteer");
	const cheerio = require("cheerio");
	const Excel = require("exceljs");
	const path = require("path");
	
	const TARGET_DOT_COM_PAGE_URL = 'https://www.target.com/s?searchTerm=headphones&tref=typeahead%7Cterm%7Cheadphones%7C%7C%7Chistory';
	const EXPORT_FILENAME = 'products.xlsx';
	
	const waitFor = (timeInMs) => new Promise(r => setTimeout(r, timeInMs));
	
	const exportProductsInExcelFile = async (productsList) => {
		const workbook = new Excel.Workbook();
		const worksheet = workbook.addWorksheet('Headphones');
	
		worksheet.columns = [
			{ key: 'title', header: 'Title' },
			{ key: 'brand', header: 'Brand' },
			{ key: 'currentPrice', header: 'Current Price' },
			{ key: 'regularPrice', header: 'Regular Price' },
			{ key: 'averageRating', header: 'Average Rating' },
			{ key: 'totalReviews', header: 'Total Reviews' },
			{ key: 'link', header: 'Link' },
		];
	
		worksheet.getRow(1).font = {
			bold: true,
			size: 13,
		};
	
		productsList.forEach((product) => {
			worksheet.addRow(product);
		});
	
		const exportFilePath = path.resolve(__dirname, EXPORT_FILENAME);
	
		await workbook.xlsx.writeFile(exportFilePath);
	};
	
	const webScraper = async () => {
		const browser = await puppeteer.launch({
			headless: false
		});
		const page = await browser.newPage();
	
		await page.setExtraHTTPHeaders({
			"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 13.5; rv:109.0) Gecko/20100101 Firefox/117.0",
		});
	
		await page.goto(TARGET_DOT_COM_PAGE_URL);
	
		const buttonConsentReject = await page.$('.VfPpkd-LgbsSe[aria-label="Reject all"]');
		await buttonConsentReject?.click();
	
		await waitFor(3000);
	
		await page.evaluate(async () => {
			await new Promise((resolve) => {
				let totalHeight = 0;
				const distance = 100;
	
				const timer = setInterval(() => {
					const scrollHeight = document.body.scrollHeight;
					window.scrollBy(0, distance);
					totalHeight += distance;
	
					if (totalHeight >= scrollHeight) {
						clearInterval(timer);
						resolve()
					}
				}, 100);
			});
		});
	
		const html = await page.content();
		await browser.close();
	
		const $ = cheerio.load(html);
	
		const productList = [];
	
		$(".dOpyUp").each((_, el) => {
			const link = $(el).find("a.csOImU").attr('href');
			const title = $(el).find('a.csOImU').text();
			const brand = $(el).find('a.cnZxgy').text();
			const currentPrice = $(el).find("span[data-test='current-price'] span").text();
			const regularPrice = $(el).find("span[data-test='comparison-price'] span").text();
			const reviews = $(el).find(".hMtWwx").text();
	
			const { rating, reviewCount } = parseProductReview(reviews);
	
			productList.push({
				title,
				brand,
				currentPrice,
				regularPrice,
				averageRating: rating,
				totalReviews: reviewCount,
				link,
			});
		});
	
		console.log(`Number of items extracted:`, productList.length);
	
		await exportProductsInExcelFile(productList);
	
		console.log('The products has been successfully exported in the Excel file!');
	};
	
	void webScraper();

Run the code with the command node index.js, and appreciate the result:

Results from running node index.js code

Enhancing the Web Scraper

The web scraper works for a single URL and a single product; to build a relevant dataset, you will need to:

  • Scrape more products to have a large variety
  • Periodically scrape the data to keep them up to date with the market

You must update the web scraper to handle many URLs at a time, but this comes with many challenges because ecommerce websites such as target.com have anti-scraping methods to prevent intensive web scraping.

For example, sending multiple requests in a second using the same IP address can get you banned by the server.

ScraperAPI’s proxy mode is the right tool to overcome these challenges because it gives you access to a pool of high-quality proxies, prune burned proxies periodically, create a system to rotate these proxies, handle CAPTCHAs, set proper headers, and build many more systems to overcome any roadblocks.

Need More than 3M API Credits?

Talk to our experts and let’s build the perfect solution for your project together.

Using ScraperAPI’s Standard API proxy with Puppeteer

Our web scraper is built using Puppeteer and integrates very well with ScraperAPI’s proxy mode with little effort.

To connect to ScraperAPI’s servers through Puppeteer, we need the proxy server, port, username, and password. You can find this information on your dashboard.

Showing where to find proxy information in ScraperAPI dashboard

Update the index.js file by adding the code below:

<pre class="wp-block-syntaxhighlighter-code">
	const TARGET_DOT_COM_PAGE_URL = 'https://www.google.com/s?searchTerm=headphones&tref=typeahead%7Cterm%7Cheadphones%7C%7C%7Chistory';
	PROXY_USERNAME = 'scraperapi';
	PROXY_PASSWORD = 'API_K:EY'; // <-- enter your API_Key here
	PROXY_SERVER = 'proxy-server.scraperapi.com';
	PROXY_SERVER_PORT = '8001';
	
	const webScraper = async () => {
		const browser = await puppeteer.launch({
				ignoreHTTPSErrors: true,
				args: [
				`--proxy-server=http://${PROXY_SERVER}:${PROXY_SERVER_PORT}`
				]
		});
		const page = await browser.newPage();
	
		await page.authenticate({
				username: PROXY_USERNAME,
				password: PROXY_PASSWORD,
		});
	
		await page.setExtraHTTPHeaders({
				"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 13.5; rv:109.0) Gecko/20100101 Firefox/117.0",
		});
	
		await page.goto(TARGET_DOT_COM_PAGE_URL, { timeout: 60000 });
	
		// The rest of the code here
	};
</pre>

Run the command node index.js.

You will still get the same result as before, but now, each instance you create will be seen as a new user navigating to the page.

Here’s a full guide on why and how to use proxy mode with Puppeteer.

Wrapping Up

To summarize, here are the steps to build a Target.com product scraper:

  • Use Puppeteer to launch a browser instance, interact with the page, and download the rendered HTML content.
  • Integrate ScraperAPI Proxy Mode into our scraper to avoid getting blocked
  • Parse the HTML with Cheerio to extract the data based on DOM selectors
  • Store the data extracted in an Excel file for further usage, such as data analysis

To go further with this Web scraper and enhance its capabilities, here are some ideas:

  • Scrape all the pages for a product result search
  • Create different versions of this script to target different products
  • Store scraped data for each product in a different Excel worksheet
  • Use DataPipeline to monitor over 10,000 URLs (per project) with no code

Check out ScraperAPI’s documentation for Node.js to learn the ins and outs of the tool. For easy access, here’s this project’s GitHub repository.

Until next time, happy scraping!

Frequently Asked Questions

Target.com has a huge catalog of products; millions of users visit the website, and the data collected from the products available can provide valuable data for market research purposes.

Analyzing product prices, trends, and customer reviews can offer insights into consumer behavior, product popularity, and competitive positioning.

Like many other e-commerce websites, Target.com implements measures to deter or block web scraping activities. Intensive Web scraping on the website causes server resource overload, leading to degraded user experience and loss of money for the company.

Some anti-scraping methods are in place: Rate limiting, CAPTCHAs, IP address bans, and dynamic content loading.

After doing Web scraping at scale, you collected thousands of data about products available on Target.com; here are a few things you can do with them.

  • Monitor competitors’ prices to adjust your product prices appropriately.
  • Retrieve customers’ reviews about a product to make a business decision.
  • Compare competitors’ product features to build a winning product offer.
  • Analyze the market data to find trending products.
  • Evaluate competitors’ listings to adjust your marketing strategy.

About the author

Eric Cabrel Tiogo

Eric Cabrel Tiogo

Eric Cabrel Tiogo is a software developer and tech mentor specializing in Java, Node.js, Typescript, GraphQL, AWS, Docker, and React. He runs a popular backend-focused blog, a developer newsletter, and actively contributes to OSS Cameroon, an open-source community. He's known for his CI/CD expertise with GitHub Actions and GitLab CI. Connect with him on Twitter and explore his work at blog.tericcabrel.com.

Table of Contents

Related Articles

Talk to an expert and learn how to build a scalable scraping solution.