Turn webpages into LLM-ready data at scale with a simple API call

How to Scrape Yelp with Python in 2025

Excerpt content

Want to learn how to scrape Yelp for business data and reviews? Scraping Yelp allows you to research market trends, identify top-rated businesses, and uncover customers’ thoughts. However, web scraping Yelp comes with challenges, including CAPTCHAs, IP bans, and dynamically loaded content, making extracting information difficult.

In this web scraping tutorial, I’ll guide you through scraping https://www.yelp.com step by step using Python, ScraperAPI, and BeautifulSoup4. You’ll learn how to:

  • Scrape Yelp search results, extracting business names, links, and addresses.
  • Extract Yelp business details, including business names, star ratings, review counts, and addresses.
  • Scrape Yelp reviews while handling pagination and dynamically loaded content using CSS selectors.
  • Avoid detection by mimicking actual user behavior and working around bot protection.
  • Save Yelp data to a CSV file or JSON dataset for easy analysis.

By the end, you’ll have a fully functional Yelp data scraper capable of gathering insights at scale!

How to Scrape Yelp Using Python: Step-by-Step Tutorial

Scraping Yelp requires the right tools and strategies to bypass anti-scraping protections while extracting valuable business data. 

In this section, we’ll go step by step, starting with scraping search results to gather business names, addresses, and links. Then, we’ll extract business details like ratings and review counts from individual pages. Finally, we’ll tackle scraping customer reviews, handling pagination, and dynamically loaded content to collect honest customer feedback.

Ready? Let’s get started!

Project Requirements for Scraping Yelp Data

Before we begin, make sure you have Python installed. Then, install the required Python libraries:

pip install requests BeautifulSoup4 pandas

What Do These Libraries Do?

  • requests: Handles HTTP requests to fetch Yelp search results, Yelp reviews, and business pages.
  • BeautifulSoup4: Parses HTML content and extracts relevant business information.
  • pandas: Saves scraped data into a CSV file for structured analysis.

Step 1: Set Up ScraperAPI

Yelp blocks scrapers that send too many requests too quickly. Instead of making direct requests, we’ll use ScraperAPI, which rotates proxies, avoids rate limits, and handles JavaScript rendering for dynamic content.

Sign up at ScraperAPI and get your API key. Then, store it in a variable:

API_KEY = "YOUR_API_KEY"
BASE_URL = f"https://api.scraperapi.com/?api_key={API_KEY}&url="

Any request we send to Yelp.com will be routed through ScraperAPI, preventing IP bans and avoiding detection.

Step 2: Scrape Yelp Search Results

To scrape Yelp’s search results, we must first understand how the search page is structured. Open Yelp in your browser, type in “pizza” as the search term and “New York, NY” as the location, and then hit enter. Right-click to inspect the page and open Developer Tools (F12 → Elements tab).

We can see that each business name appears inside an <h3> tag with the class "y-css-14xwok2". If we expand that <h3>, an <a> tag inside holds the business name and a link to its page. This is exactly what we need to extract.

Yelp search results page showing pizza restaurants in New York.

So our plan is simple:

  1. Load Yelp’s search page for a given business type and location.
  2. Find all <h3> tags that store the top 10 business names.
  3. Extract the business name and link from the <a> tag inside.
  4. Return everything in a structured format.

Since we already set up ScraperAPI, we can request Yelp’s search page using the search term and location.

import requests
from bs4 import BeautifulSoup

def scrape_yelp_search(query, location):
    """Fetch Yelp search results page."""
    url = f"https://www.yelp.com/search?find_desc={query}&find_loc={location}"
    response = requests.get(BASE_URL + url, params={"premium": "true"})

Here’s what’s happening:

  • We format the search URL by inserting the query (e.g., “pizza”) and location (e.g., “New York”).
  • Instead of making a direct request to Yelp, we send it through ScraperAPI.
  • We add params={"premium": "true"} to enable JavaScript rendering (necessary for some parts of Yelp’s content).

Once we get the response, we must parse the HTML and locate the business names.

    soup = BeautifulSoup(response.text, "html.parser")
    businesses = soup.find_all("h3", class_="y-css-14xwok2")

This does two things:

  • Loads the HTML into BeautifulSoup, which helps us navigate and extract data easily.
  • Finds all <h3> tags with the class "y-css-14xwok2", which store business names.

Now, let’s extract that information:

    search_results = []
    for biz in businesses:
        name = biz.text.strip()  # Extract business name
        link = biz.find("a")["href"] if biz.find("a") else None  # Extract business link
        full_link = f"https://www.yelp.com{link}" if link else "N/A"  # Convert to full URL
        search_results.append({"name": name, "link": full_link})

Here’s how this works:

  • biz.text.strip() retrieves the text inside the <h3> tag and removes any extra spaces.
  • biz.find("a") locates the first anchor tag, and ["href"] extracts its href attribute, which contains the relative link.
  • Yelp stores links as relative paths (e.g., /biz/pizza-hut-new-york). We concatenate it with https://www.yelp.com to create a full URL.
  • We store each business as a dictionary with "name" and "link", then add them to search_results for structured output

Now, we simply return the extracted data:

return search_results

And just like that, we’ve successfully scraped business names and links from Yelp’s search page!

Let’s see our scraper in action by searching for pizza places in New York.

results = scrape_yelp_search("pizza", "New+York")
print(results)

If everything works correctly, the output should look something like this:

[
    {"name": "Prince Street Pizza", "link": "https://www.yelp.com/biz/prince-street-pizza-new-york-2"},
    {"name": "Joe's Pizza", "link": "https://www.yelp.com/biz/joes-pizza-new-york"},
]

Now we have a list of businesses, and their links are pulled directly from Yelp!

Step 3: Scrape Business Details from Yelp

Now that we have the business names and links from Step 2, we need to extract more details from each business page.

We first need to understand how Yelp structures its business pages to do this. Clicking on a business link from our search results takes us to a page like this:

Joe's Pizza Yelp page with restaurant details and pizza images.

If we inspect the page using Developer Tools (F12 → Elements tab), we find that:

  • The business name is inside an <h1> tag.
  • The rating is inside a <span> tag with the class "y-css-1jz061g".
  • The review count is inside an <a> tag with the class "y-css-1x1e1r2".
  • The address is inside an <address> tag.

These elements contain the structured data we need to extract.

Fetching the Business Page

Since we know Yelp blocks scrapers, we’ll send our requests through ScraperAPI again to avoid detection.

def scrape_yelp_business(biz_url, max_retries=5):
    """Scrape details from a Yelp business page with retries."""
    for attempt in range(max_retries):
        try:
            response = requests.get(BASE_URL + biz_url, timeout=30, params={"premium": "true"})  # Increased timeout
  • We send a request to Yelp using ScraperAPI.
  • We set a 30-second timeout to handle slow-loading pages.
  • If the request fails, we retry up to five times before giving up.

If Yelp responds successfully, we move on to extracting the details.

Extracting Business Information

Once we have the page content, we use BeautifulSoup to parse the HTML and extract the business details.

           if response.status_code == 200:
                soup = BeautifulSoup(response.text, "html.parser")

                business_data = {
                    "name": soup.find("h1").text.strip() if soup.find("h1") else "N/A",
                    "rating": soup.find("span", class_="y-css-1jz061g").text.strip() if soup.find("span", class_="y-css-1jz061g") else "N/A",
                    "review_count": soup.find("a", class_="y-css-1x1e1r2").text.strip() if soup.find("a", class_="y-css-1x1e1r2") else "N/A",
                    "address": soup.find("address").text.strip() if soup.find("address") else "N/A",
                }
                return business_data
  • We find each element by searching for its corresponding tag and class using soup.find().
  • We strip any extra spaces from the extracted text.
  • To prevent errors, we return "N/A" if an element is missing.

Web scraping involves sending multiple requests, and sometimes, they fail due to slow responses, network issues, or website restrictions. We need to make sure our scraper doesn’t stop when this happens.

       except requests.exceptions.ReadTimeout:
            print(f"Timeout on attempt {attempt + 1}. Retrying…")
            time.sleep(2 ** attempt)  # Exponential backoff (2s, 4s, 8s, etc.)

        except requests.exceptions.RequestException as e:
            print(f"Request failed: {e}")
            return None
  • If the request times out, we retry with an increasing delay (2s → 4s → 8s).
  • If another request error occurs, we log the error and move on.

Even after multiple retries, some pages will still fail due to bans or broken links. Instead of stopping the process, we log the failure and move on to the next business.

   print("Failed to fetch business page after retries.")
    return None

This ensures that a few failed requests don’t stop us from collecting as much data as possible.

Before scraping multiple businesses, we should test our function on a single page to confirm it works.

biz_info = scrape_yelp_business("https://www.yelp.com/biz/prince-street-pizza-new-york-2")
print(biz_info)

If everything is working correctly, we should see structured data like this:

{
    "name": "Prince Street Pizza",
    "rating": "4.5",
    "review_count": "3,456 reviews",
    "address": "27 Prince St, New York, NY 10012"
}

Now that we know our function works, we can apply it to multiple businesses from our search results.

results = scrape_yelp_search("pizza", "New York")

detailed_results = []
for biz in results[2:]:
    print(f"Scraping: {biz['name']}")  
    business_info = scrape_yelp_business(biz["link"])

    if business_info:
        detailed_results.append(business_info)

    sleep_time = random.uniform(3, 7)  # Random delay between requests
    print(f"Sleeping for {sleep_time:.2f} seconds before next request...")
    time.sleep(sleep_time)

Code Breakdown:

  • We loop through the businesses from our search results, ignoring the first three, which are usually ads.
  • We call scrape_yelp_business() for each one.
  • We store the results in detailed_results.
  • We add a random delay (3-7 seconds) between requests to avoid detection.

After scraping, we need to store the data in a structured format:

import pandas as pd

def save_to_csv(data, filename="yelp_business_data.csv"):
    """Save scraped data to a CSV file."""
    if not data:
        print("No data to save.")
        return

    df = pd.DataFrame(data)
    df.to_csv(filename, index=False)
    print(f"Data saved to {filename}")

Now, we save the results by running:

save_to_csv(detailed_results)

This generates a CSV file containing business names, ratings, review counts, and addresses.

Step 4: Scrape Yelp Reviews

Now that we can extract business details, let’s move on to scraping Yelp reviews. Yelp reviews contain direct customer feedback, making them valuable for sentiment analysis, competitor research, and understanding customer expectations.

However, scraping reviews is more complex than extracting business details because:

  1. Reviews are dynamically loaded – Yelp uses JavaScript to load more reviews as the user scrolls.
  2. Pagination is required – Yelp displays reviews across multiple pages, accessible through the start parameter (&start=10, &start=20 for subsequent pages).
  3. Anti-scraping protections exist – Making too many requests quickly can trigger blocks.

To work around these challenges, we’ll use ScraperAPI’s Render Instruction Set to ensure JavaScript executes properly and include scrolling and pagination logic to our function.

Understanding Yelp’s Review Structure

Each review on a Yelp business page is stored inside a <li> tag with the class y-css-1sqelp2:

Yelp review page for Joe's Pizza with customer feedback.

Within this element, the actual review text is found inside a <p> tag with the class comment__09f24__D0cxf y-css-1wfz87z:

Detailed Yelp review of Joe's Pizza with customer comment.

Now, we know where all our data is located. Let’s get to scraping!

Fetching the Business Reviews

Since reviews are loaded dynamically, we need to wait for them to appear and then scroll to the bottom of the page to trigger additional reviews. ScraperAPI allows us to define custom instructions for this.

Let’s start by defining our function:

import json
import time
import requests
from bs4 import BeautifulSoup

def scrape_yelp_reviews(biz_url, max_retries=5):
    """Scrape the first few customer reviews from a Yelp business page with retry on error 500."""
    reviews = []
    page_number = 0

Here, we:

  • Initialize an empty list (reviews) to store extracted review texts
  • Set page_number to 0, which we’ll increment as we navigate through review pages
  • Allow up to max_retries in case of failures

Handling Dynamic Review Loading

Since Yelp requires scrolling to load more reviews, we use ScraperAPI’s instruction set to simulate user behavior:

for attempt in range(max_retries):  
    try:
        for i in range(5):  # Scroll multiple times to load reviews
            instruction_set = json.dumps([
                {"type": "wait_for_selector", "selector": {"type": "css", "value": "li.y-css-1sqelp2"}},
                {"type": "scroll", "direction": "y", "value": "bottom"},
                {"type": "wait", "value": 10}  # Wait for new reviews to load
            ])

Here’s what this does:

  • Waits for the first review element to appear (li.y-css-1sqelp2)
  • Scrolls down to trigger more reviews
  • Waits 10 seconds to give Yelp time to load them before scraping

Once we’ve defined how the scraper should behave, we send a request to ScraperAPI:

            headers = {
                "x-sapi-api_key": API_KEY,
                "x-sapi-render": "true",
                "x-sapi-instruction_set": instruction_set
            }

            # Ensure proper pagination formatting
            paginated_url = f"{biz_url}&start={page_number}" if "start=" not in biz_url else biz_url

            params = {"url": paginated_url, "render": "true"}

            response = requests.get("https://api.scraperapi.com/", params=params, headers=headers)

This does the following:

  • Passes ScraperAPI headers, including:
    • Our API key for authentication.
    • Rendering mode to execute JavaScript
    • The instructions set to scroll and wait for the content to load
  • Formats the URL to handle pagination (&start=10, &start=20, etc.).
  • Sends a request to ScraperAPI to fetch the page

Once the page loads, we parse the HTML using BeautifulSoup and locate the reviews:

            if response.status_code == 200:
                soup = BeautifulSoup(response.text, "html.parser")

                # Find all review elements
                review_elements = soup.find_all("li", class_="y-css-1sqelp2")

                # Extract text from each review
                for review in review_elements:
                    review_text = review.find("p", class_="comment__09f24__D0cxf y-css-1wfz87z")
                    reviews.append(review_text.text.strip() if review_text else "N/A")

Here’s what this does:

  • Finds all review containers (li.y-css-1sqelp2)
  • Extracts review text from the <p> tag inside each review
  • Adds the extracted text to reviews, handling missing content gracefully

Since Yelp paginates reviews, we need to increment the page number and continue scraping:

               page_number += 10
                time.sleep(2)  # Prevent rate limiting

Each page contains 10 reviews, so we increase page_number by 10 before moving to the next request.

Handling Errors and Retries

Yelp may return errors or block requests. We handle timeouts, 500 errors, and other request failures like this:

           elif response.status_code == 500:
                print(f"Error 500 on attempt {attempt + 1}. Retrying in {2 * attempt} seconds...")
                time.sleep(2 * attempt)  # Exponential backoff
                continue

            else:
                print(f"HTTP {response.status_code}: {response.text[:500]}")  # Print first 500 characters of error
                return None

    except requests.exceptions.ReadTimeout:
        print(f"Timeout on attempt {attempt + 1}. Retrying in {2 ** attempt} seconds...")
        time.sleep(2 ** attempt)

    except requests.exceptions.RequestException as e:
        print(f"Request failed: {e}")
        return None

This ensures:

  • If the server returns a 500 error, we retry with exponential backoff (2s → 4s → 8s).
  • If the request completely fails, we print the error and move on.

If all attempts fail:

print("Failed to fetch reviews after multiple retries.")
return None

Now, let’s scrape reviews for multiple businesses from Step 2:

detailed_reviews = {}

for biz in results[:10]:  # Limit to top 10 businesses
    print(f"Scraping reviews for: {biz['name']}")
    business_reviews = scrape_yelp_reviews(biz["link"])

    if business_reviews:
        detailed_reviews[biz["name"]] = business_reviews

    time.sleep(3)  # Add delay to avoid being blocked

# Print the extracted reviews
for biz, reviews in detailed_reviews.items():
    print(f"\n{biz} Reviews:")
    for review in reviews:
        print(f"- {review}")

Here, we:

  • Loop through the top 10 businesses.
  • Scrape their reviews using our function.
  • Store results in a dictionary, with business names as keys and their reviews as values.
  • Add a delay between requests to prevent blocking

Step 5: Run Your Yelp Scraper

Now that you’ve built functions to scrape search results, business details, and reviews, it’s time to bring everything together. This is where you run the complete scraper, automatically collecting and saving Yelp data.

try:
    # Step 1: Get search results and save to CSV
    print("Scraping search results...")
    search_results = scrape_yelp_search("pizza", "New+York")
    save_to_csv(search_results, "search_results.csv")
    print("Search results saved.")

    # Step 2: Get business details for each result and save to CSV
    print("Scraping business details...")
    business_details = []
    for result in search_results:
        biz_data = scrape_yelp_business(result["link"])
        if biz_data:
            business_details.append(biz_data)

    save_to_csv(business_details, "business_details.csv")
    print("Business details saved.")

    # Step 3: Get reviews for each business and save to CSV
    print("Scraping reviews...")
    all_reviews = []
    for result in search_results:
        reviews = scrape_yelp_reviews(result["link"])
        if reviews:
            for review in reviews:
                all_reviews.append({"business": result["name"], "review": review})

    save_to_csv(all_reviews, "reviews.csv")
    print("Reviews saved.")

except Exception as e:
    print(f"An error occurred: {e}")

print("Scraping process completed.")

Code Breakdown:

Now, let’s go through what this part of the script is doing step by step:

  • Scraping Search Results:
    • Calls scrape_yelp_search() to fetch businesses from Yelp’s search page
    • Extracts business names, links, and addresses
    • Saves everything into search_results.csv, creating a structured dataset for the next steps
    • Loops through each business link in search_results.csv
  • Extracting Business Details:
    • Calls scrape_yelp_business() to visit the business page and extract:
      • Star ratings
      • Review counts
      • Other business details
  • Scraping Customer Reviews:
    • Stores this structured data in business_details.csv for later analysis
    • Iterates through each business and calls scrape_yelp_reviews()
    • Collects multiple reviews per business, handling pagination if needed
    • Stores the extracted reviews in reviews.csv, with each review linked to the business name for easy reference
  • Error Handling:
    • Catches unexpected errors so the script doesn’t crash
    • If a request fails or a business page is unavailable, the script logs the issue and continues running instead of stopping entirely

This script automates the entire scraping process, giving you a complete dataset of business details, customer reviews, and ratings in just a few runs!

Why Scrape Yelp and What You Can Do With the Data

Now that we can extract Yelp business details, reviews, and ratings, the next question is: why does this data matter? Yelp is more than just a review site—it’s a constantly updated database of customer experiences, industry trends, and competitive insights.

Scraping Yelp lets you analyze this information at scale. Instead of reading individual reviews manually, you can track patterns, measure customer sentiment, and make data-driven decisions.

Why Scrape Yelp?

  • Understand the competition: Track top businesses, analyze their strengths, and see what customers like or dislike.
  • Improve business strategy: Reviews highlight common complaints and praises, giving you direct insight into customer expectations.
  • Find new leads: Yelp acts as a directory for businesses across industries. Scraping business listings and reviews allows for targeted outreach.
  • Optimize for SEO and local search: High-ranking Yelp businesses offer clues about what customers search for. Extracting their keywords and descriptions helps refine your strategy.
  • Identify market trends: Analyze review trends over time to see emerging demands, pricing shifts, or changes in customer sentiment.

What Can You Do With This Data?

  • Track the best-performing businesses in a specific industry or location.
  • Analyze customer sentiment by aggregating and categorizing review content.
  • Monitor competitor performance over time and identify areas where they struggle.
  • Discover untapped markets by comparing business density, ratings, and demand in different areas.
  • Improve marketing efforts using genuine customer feedback to craft better messaging.

Yelp data provides a direct line to customer opinions and industry trends. Whether you’re improving a business, running a market analysis, or optimizing a strategy, scraping Yelp gives you the information you need to make smarter decisions.

Wrapping Up: Best Practices for Scraping Yelp

Scraping Yelp can be challenging, but you can extract valuable business data with the right approach while avoiding blocks and CAPTCHAs. This guide taught you how to fetch Yelp pages, extract key details, handle request failures, and scale the process efficiently.

By using ScraperAPI, you don’t have to worry about IP bans, CAPTCHAs, or dynamically loaded content—those challenges are handled for you. That means you can focus on getting the data you need rather than fighting against Yelp’s restrictions.

To make the most of web scraping Yelp, keep these best practices in mind:

  • Respect website policies: Always review Yelp’s Terms of Service and be aware of any legal considerations.
  • Monitor request success rates: Track failures and adjust your approach if you notice increased blocks.
  • Optimize your parsing logic: Extract only the data you need to improve efficiency and reduce processing time.
  • Control request volume: Space out requests and avoid making too many quickly to stay under the radar.
  • Store data in a structured format: Use CSV, JSON, or a database to keep your scraped data organized and easy to analyze.

Following these strategies, you can build a scalable and reliable Yelp scraper that runs smoothly without interruptions. ScraperAPI takes care of the most complex parts—now it’s up to you to put the data to work.

Happy scraping!

How to Scrape Yelp: Frequently Asked Questions

Is web scraping Yelp legal?

Yes, but scraping Yelp without permission may violate Yelp’s Terms of Service. While web scraping is legal in some cases, bypassing security measures, ignoring robots.txt, or using the data for commercial purposes could lead to legal consequences. To avoid potential risks, review Yelp’s policies and scrape responsibly.

How do I scrape Yelp without getting blocked?

You need to take a strategic approach to scrape Yelp effectively without getting blocked. Using ScraperAPI can help you avoid IP bans. Introducing random delays between requests makes your scraper behave more like a human user. Limiting request volume prevents detection while rotating user agents and headers make each request appear unique. Monitoring request failures and adjusting your scraping behavior can also help reduce the risk of being blocked.

What are the best tools for scraping Yelp?

The best tools for scraping Yelp depend on the data you need. ScraperAPI is key for bypassing CAPTCHAs, rate limits, and IP bans while requests fetches the page content, and BeautifulSoup parses the HTML for business details and ratings. For JavaScript-heavy pages like reviews, Selenium or Playwright can simulate user interactions. Still, ScraperAPI’s Render Instruction Set is a more efficient option, which automates scrolling, waiting, and pagination without requiring a full browser.

Does Yelp block scrapers?

Yes, Yelp has strong anti-scraping measures in place. It uses rate limiting to block frequent requests from the same IP, serves CAPTCHAs to detect automation, and blacklists IPs that show bot-like behavior. Some content is also loaded dynamically with JavaScript, making it difficult to scrape using simple HTTP requests. To avoid being blocked, using tools like ScraperAPI, limiting request frequency, and randomizing delays between requests can help keep your scraper running.

Are there alternatives to scraping Yelp?

Instead of scraping Yelp, you can use the Yelp API, which provides structured business data with strict usage limits. Other APIs also offer business details and location information without scraping.

About the author

Picture of Ize Majebi

Ize Majebi

Ize Majebi is a Python developer and data enthusiast who delights in unraveling code intricacies and exploring the depths of the data world. She transforms technical challenges into creative solutions, possessing a passion for problem-solving and a talent for making the complex feel like a friendly chat. Her ability brings a touch of simplicity to the realms of Python and data.