How To Scrape Walmart Product Reviews at Scale

Tutorial on scraping Walmart product reviews

Product reviews are great for users to make purchase decisions, but they are also important for businesses that want to analyze customers’ satisfaction with product usage and improve quality.

With over $80B in sales, Walmart is one of the major ecommerce marketplaces in the world, making it a great source of product reviews.

Collect Walmart Product Reviews at Scale

Using ScraperAPI’s Walmart endpoint, you can scrape reviews from thousands of product IDs in JSON format with a simple API call.

In this article, we’ll show you how to scrape Walmart product reviews and build a historical dataset to help you make better marketing and business decisions.

To make our scraper scalable, allowing us to handle millions of requests a month, we’ll use ScraperAPI’s Async Scraper to automate:

  • Concurrency management
  • Retries management
  • Ati-bot detection bypassing

Scraping Walmart Product Reviews in Node.js

Building a Web scraper for Walmart product reviews at scale requires two scripts:

  1. The first will build the list of product reviews pages and send each URL to the async scraper service
  2. Then, the webhook server that will receive the response from the Async API will extract the content from the raw HTML and save it in a JSON file

Important Update

Now you can scrape Walmart product reviews using ScraperAPI’s Walmart Products endpoint to turn product pages into easy-to-navigate JSON data.


https://api.scraperapi.com/structured/walmart/product

Just send your requests to the endpoint alongside your API key and product ID within a payload. Then, you can target the "reviews" key to access the top reviews for your target product.


"reviews": [
        {
            "title": "Walmart pickup order",
            "text": "Make sure they scan your item right I received my order thru the delivery but when I finished activating my phone it still says SOS mode only so I'll be making a trip to Walmart sucks too have a locked phone after spending the money for it",
            "author": "TrustedCusto",
            "date_published": "2/8/2024",
            "rating": 5
        },
        {
            "title": "Very satisfied.",
            "text": "I'm very satisfied with my purchase and product. Thank you. Definitely will recommend when I get the chance. Keep up the good work. I appreciate you.",
            "author": "BLVCKSNVCK",
            "date_published": "10/7/2023",
            "rating": 5
        },

[More Data]


See the full Walmart Product endpoint sample response.

Explore Walmart Product SDE

To get started, let’s take a look at how product reviews are organized on the site.

Understanding Walmart Product Reviews Structure

By examining Walmart’s product reviews page, we can identify the following elements we can extract:

Structure of a Walmart product review

The picture below describes the system architecture of the scraper we’ll need to build to get this data efficiently and at scale.

System architecture diagram of an asynchronous scraper

Note: To make our system more efficient, storing reviews in a JSON file should not be done synchronously but instead through a message broker, which will prevent the Webhook server from being overwhelmed by the number of requests to proceed synchronously.

For the sake of simplicity, we will use JSON, but feel free to upgrade the implementation to take this caveat into account.

Prerequisites

You must have these tools installed on your computer to follow this tutorial.

Need More than 3M API Credits?

Our team of experts is ready to help you scale your data pipelines with a tailor-made solution that fits your needs.

Step 1: Set up the project

Let’s create the folder that will hold our source code and initialize a new Node.js project:

</p>
	mkdir walmart-async-scraper
 
	cd walmart-async-scraper
	 
	npm init -y

<p>

The above last command will create a package.json file in the folder.

Step 2: Build the asynchronous scraper

The picture below shows a Walmart product review page:

Product review page from Walmart

The interesting part is the pagination section, where the reviews go from page 1 to page 48, and we are currently on page 3.

The URL looks like this: https://www.walmart.com/reviews/product/1277532195?page=3

By analogy, we must generate 48 URLs where the page number differs.

This list of URLs will be sent to the Async Scraper to perform the Web scraping asynchronously.

Behind the scenes, it will also handle most of the challenges related to web scraping at scale, such as IP rotation, CAPTCHA solving, rate limiting, etc.

We will send 48 URLs, but the Async Scraper service can scrap millions of URLs asynchronously.

To send the request through ScraperAPI’s servers, we will use an HTTP client for Node.js, such as Axios, so let’s install it:

</p>
	npm install axios

<p>

Create the file run-scraper.js and add the code below:

</p>
<pre class="wp-block-syntaxhighlighter-code">	const axios = require('axios');

	const apiKey = '<api_key>'; // <-- Enter your API_Key here
	const apiUrl = 'https://async.scraperapi.com/batchjobs';
	const callbackUrl = '<webhook_url>'; // <-- enter your webhook URL here const runScraper = () => {
	   const PAGE_URL = 'https://www.walmart.com/reviews/product/1277532195'
	   const PAGE_NUMBER = 5;
	   const pageURLs = [];
	
	   for (let i = 1; i <= PAGE_NUMBER; i++) { pageURLs.push(`${PAGE_URL}?page=${i}`); } const requestData = { apiKey: apiKey, urls: pageURLs, callback: { type: 'webhook', url: callbackUrl, }, }; axios.post(apiUrl, requestData) .then(response => {
			   console.log(response.data);
		   })
		   .catch(error => {
			   console.error(error);
		   });
	};
	
	
	void runScraper();
</webhook_url></api_key></pre>
<p>

The variable callbackUrl stores the webhook’s URL to send the response to. We limited the page number to 5 for now; we will update it later to 48 for the final demo.

I used the online webhook service webhook.site to generate one.

Note: Remember to add your API key. You can find it in your ScraperAPI dashboard.

Run the command node run-scraper.js to launch the service. You will get the following response.

Walmart response printed to the terminal

The API returns an array of five jobs, one job per product review page URL.

Wait for a few seconds and browse your online webhook page; you can see you received five API calls.

API calls received by the webhook server

Now that we can see the webhook is triggered, let’s build our webhook server to receive the job result and proceed to extract and store the data in a JSON file.

Step 3: Write a Utilities Function to Manipulate the JSON File

We need three utility methods:

  • Create the product reviews JSON file if it doesn’t exist
  • Update the JSON file with new reviews
  • Extract the product ID from the product review URL

Create a file utils.js and add the code below:

</p>
	const fs = require("fs");
	const path = require("path");
	
	const extractProductIdFromURL = (url) => {
	 const parsedUrl = new URL(url);
	
	 const pathnameParts = parsedUrl.pathname.split("/");
	
	 if (pathnameParts.length === 0) {
		 return url;
	 }
	
	 return pathnameParts[pathnameParts.length - 1];
	};
	
	const createStorageFile = (filename) => {
	   const filePath = path.resolve(__dirname, filename);
	
	   if (fs.existsSync(filePath)) {
		   return;
	   }
	
	   fs.writeFileSync(filePath, JSON.stringify([], null, 2), { encoding: "utf-8" });
	};
	
	const saveDataInFile = (filename, items) => {
	   // TODO perform fields validation in data
	
	   const filePath = path.resolve(__dirname, filename);
	   const fileContent = fs.readFileSync(filePath, { encoding: "utf-8" });
	   const dataParsed = JSON.parse(fileContent);
	
	   const dataUpdated = dataParsed.concat(items);
	
	   fs.writeFileSync(filePath, JSON.stringify(dataUpdated, null, 2), { encoding: "utf-8" });
	};
	
	module.exports = {
	   createStorageFile,
	   extractProductIdFromURL,
	   saveDataInFile,
	};

<p>

Step 4: Identify the Information to Retrieve on the Walmart Product Review page

To extract the product’s review information, we must identify which DOM selector we can use to target its HTML tag. The picture below shows the location of each piece of information in the DOM.

CSS seletors targeting Walmart product review elements

Here’s a table that enumerates the DOM selectors for each product information:

Information DOM selector
Review’s title ul .w_HmLO div:nth-child(2) h3
Review’s description ul .w_HmLO div:nth-child(2) div + span
Name of the reviewer ul .w_HmLO div:nth-child(3) > div > div:first-child
Date of creation ul .w_HmLO div:first-child div:nth-child(2) > div.f7
Rating ul .w_HmLO span.w_iUH7
Upvote count ul .w_HmLO div:last-child button:first-child span
Downvote count ul .w_HmLO div:last-child button:last-child span
Incensitized review ul .w_HmLO div:nth-child(3) > div > div:first-child + div

To extract the information above, we’ll use Cheerio, which allows us to parse the raw HTML and traverse the DOM using CSS selectors. Let’s install it:

</p>
	npm install cheerio

<p>

Step 5: Build the Webhook Server

This application runs a web server, exposing the endpoint that the Async Scraper will trigger. To set up the Node.js Web server, we will use Express, so let’s install it:

</p>
	npm install express

<p>

Create a file webhook-server.js and add the code below:

</p>
	const crypto = require('crypto');
	const cheerio = require('cheerio');
	const express = require('express');
	const { createStorageFile, extractProductIdFromURL, saveDataInFile } = require('./utils');
	
	const PORT = 5001;
	const STORAGE_FILENAME = 'products-reviews.json';
	
	const app = express();
	
	app.use(express.urlencoded({ extended: true }));
	app.use(express.json({ limit: "10mb", extended: true }));
	
	app.post('/product-review', async (req, res) => {
		console.log('New request received!', req.body.id);
	
		if (req.body.response?.body) {
			console.log("Extract review information!");
	
			const $ = cheerio.load(req.body.response.body);
	
			const productId = extractProductIdFromURL(req.body.url);
			const currentDate = new Date();
			const reviewsList = [];
	
			$("ul .w_HmLO").each((_, el) => {
				const rating = $(el).find('span.w_iUH7').text();
				const creationDate = $(el).find('div:first-child div:nth-child(2) > div.f7').text();
				const title = $(el).find('div:nth-child(2) h3').text();
				const description = $(el).find('div:nth-child(2) div + span').text();
				const reviewer = $(el).find('div:nth-child(3) > div > div:first-child').text();
				const incentivizedReview = $(el).find('div:nth-child(3) > div > div:first-child + div').text();
				const upVoteCount = $(el).find('div:last-child button:first-child span').text();
				const downVoteCount = $(el).find('div:last-child button:last-child span').text();
	
				const review = {
					id: crypto.randomUUID(),
					productId,
					title: title.length > 0 ? title : null,
					description,
					rating: +rating.replace(' out of 5 stars review', ''),
					reviewer,
					upVoteCount: parseInt(upVoteCount.length > 0 ? upVoteCount : 0),
					downVoteCount: parseInt(downVoteCount.length > 0 ? downVoteCount : 0),
					isIncentivized: incentivizedReview.toLowerCase() === "incentivized review",
					creationDate,
					date: `${currentDate.getMonth() + 1}/${currentDate.getDate()}/${currentDate.getFullYear()}`
				};
	
				reviewsList.push(review);
			});
	
			saveDataInFile(STORAGE_FILENAME, reviewsList);
	
			console.log(`${reviewsList.length} review(s) added in the database successfully!`);
	
			return res.json({ data: reviewsList });
		}
	
	
		return res.json({ data: {} });
	});
	
	app.listen(PORT, async () => {
		createStorageFile(STORAGE_FILENAME);
	
		console.log(`Application started on URL http://localhost:${PORT} 🎉`);
	});

<p>

At the application launch, an empty JSON file is created if it doesn’t exist.

When the server receives a POST request on the route /product-review, this is what happens:

  1. Load the HTML content of the page scraped with Cheerio
  2. For each product review, the information is extracted from the HTML content and added to an array.
  3. Once all the reviews are extracted, we save the data in the JSON file by calling the function saveDataInFile()

Step 6: Test the Implementation

Launch the Webhook server with the command below:

</p>
	node webhook-server.js

<p>

The application will start on port 5001.

To make it accessible through the internet so that the Async Scraper can call it, we will use a tunneling service like Ngrok.

Run the command below to install Ngrok and create a tunnel to port 5001

</p>
	npm install -g ngrok
	ngrok http 5001

<p>

  1. Copy the Ngrok URL
  2. Open the file run-scraper.js
  3. Update the variable callbackUrl
  4. Lastly, append the route /product-review

Once that’s done, run the command below to start the product review scraper:

</p>
	node run-scraper.js

<p>

Wait for a few seconds and open the JSON file generated; you will see one hundred lines representing the first hundred reviews scraped on the product.

JSON file generated after running the Walmart review scraper

Note: The function saveDataInFile() is not concurrency-prone at scale, and handling it in this tutorial wasn’t relevant. Remember to rewrite this function to avoid inconsistent data in the JSON file. As suggested earlier, using a message broker is a good way to improve this implementation.

🎉 Congratulations, you just built your first review scraper!

Wrapping Up

To summarize, here are the steps to build a Walmart product reviews scraper:

  • Preparing a list of URLs to scrape and store them into an array
  • Send the request to the Async Scraper service
  • Create a Webhook server exposing a route to be triggered by the async service
  • Store all the product reviews scraped in a JSON file

The information stored in the JSON file can be used in Machine learning to perform sentiment analysis and NLP, helping you understand what your target audience likes or dislike from your competitors (and your) products.

To learn more about the Async Scraper, check out ScraperAPI’s documentation for Node.js. For easy access, here’s this project’s GitHub repository.

Until next time, happy scraping!



Need Enterprise Support?

Get more than 10,000,000 API Credits with all premium features, premium support and an account manager.

Frequently Asked Questions

Analyzing reviews and ratings can help businesses in decision-making. Thanks to Walmart’s large customer base, businesses can use this data to run:

  • Competitive Analysis reviews and ratings on a product can help you understand its strengths and weaknesses, allowing you to adapt your strategy to serve your customers better.
  • Quality Improvements – reviews can help you see the area of improvement of a product, give you new feature ideas, etc.
  • Customer Satisfaction – measuring customer satisfaction is critical to prevent customer churn. Keeping an eye on negative trends will help you act proactively and improve word of mouth.

Like many other e-commerce websites, Walmart implements measures to deter or block web scraping activities. Intensive Web scraping on the website causes server resource overload, which leads to degraded user experience and loss of money for the company.

For companies doing web scraping with good intentions, ScraperAPI services can help you bypass these anti-scraping mechanisms without harming the servers or hurting your scrapers’ efficiency.

Learn how to choose the right scraping tool.

After scraping thousands to millions of reviews, here are some methods to analyze this data:

  • Sentiment Analysis – you can use sentiment analysis tools to assess the overall sentiment expressed in the reviews about the product and categorize feedback as positive, negative, or neutral.
  • Text Mining and Natural Language Processing (NLP) text mining and NLP techniques are applied to extract meaningful information from reviews. This includes identifying key phrases, sentiments, and topics discussed in customer feedback.
  • Word Clouds Creating word clouds helps visualize the most frequently used words in reviews. This can highlight common themes or issues and provide a quick overview of what to focus on.
  • Rating Distribution Analysis Examining the distribution of product ratings helps businesses understand the overall satisfaction level of customers. An analysis of how many reviews fall into each rating category provides a quantitative view.
  • Time-Series Analysis – Having historical data for a product review allows you to identify trends and changes in customer sentiment. It is helpful for monitoring the impact of product updates, marketing campaigns, or changes in market conditions.

Learn how to turn ecommerce data into a competitive advantage.

About the author

Eric Cabrel Tiogo

Eric Cabrel Tiogo

Eric Cabrel Tiogo is a software developer and tech mentor specializing in Java, Node.js, Typescript, GraphQL, AWS, Docker, and React. He runs a popular backend-focused blog, a developer newsletter, and actively contributes to OSS Cameroon, an open-source community. He's known for his CI/CD expertise with GitHub Actions and GitLab CI. Connect with him on Twitter and explore his work at blog.tericcabrel.com.

Table of Contents

Related Articles

Talk to an expert and learn how to build a scalable scraping solution.