As websites deploy smarter detection systems and more aggressive blocking, scraping is becoming increasingly difficult. While developers chase complex solutions like residential proxies and CAPTCHA solvers, they often overlook one of the most powerful tools: the `user agent`
string.
In this guide, you’ll learn
- What user agents are
- How sites use them to detect bots
- How to build and rotate a clean user agent list
- How tools like ScraperAPI make the entire process effortless
Ready to stop getting blocked and start scraping successfully? Let’s dive in!
What Is a User Agent String?
Think of a user agent string as your browser’s business card. Every time you visit a website, your browser introduces itself with a line of text that says “Hi, I’m Chrome running on Windows” or “I’m Safari on an iPhone.” This introduction happens behind the scenes in every single web request.
What it is
A User Agent (UA) string is a line of text included in HTTP headers that identifies the software making the request. It tells websites what browser you’re using, what version it is, what operating system you’re running, and sometimes even what device you’re on. Here’s what a typical Chrome user agent looks like:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Breaking this down:
- Mozilla/5.0: Legacy identifier (all modern browsers include this)
- Windows NT 10.0; Win64; x64: Operating system and architecture
- AppleWebKit/537.36: Browser engine version
- Chrome/120.0.0.0: Browser name and version
- Safari/537.36: Additional engine compatibility info
Where it’s found
The user agent string lives in the HTTP headers of every request you make, specifically under the `User-Agent`
header. Here’s what a basic HTTP request looks like:
GET / HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
Accept: text/html, application/xhtml+xml
Would you like to view your user agent string? Check it out at whatismybrowser.com or test HTTP requests at httpbin.org/user-agent.
What it’s used for
Websites use user agent strings for several purposes:
- Content optimization: When you visit YouTube on your phone, it automatically serves you the mobile version because it reads your user agent and sees you’re on a mobile device. Desktop users get the full desktop layout.
- Analytics and tracking: Sites track which browsers and devices their visitors use. This helps them decide whether to support older browser versions or focus on mobile optimization.
- Bot detection and security: This is where web scraping gets interesting. Websites analyze user agent patterns to spot automated traffic. A request from
‘python-requests/2.28.1’
screams “I’m a bot!“, while a Chrome user agent blends in with normal traffic. - Feature compatibility: Sites might serve different JavaScript or CSS based on browser capabilities. Internet Explorer gets simpler code, while modern browsers get advanced features.
Why User Agents Matter for Web Scraping
When you’re scraping, your user agent string is often the first thing that gives you away. Servers use these strings as a primary method to distinguish between real human visitors and automated bots, and getting this wrong can shut down your scraping operation before it even starts.
Here is why user agent matters for web scraping:
The Bot Detection Problem
Most scraping libraries use obvious user agents by default. Python’s requests library, for example, sends ‘python-requests/2.28.1’
with every request. To a server, this is like walking into a store wearing a sign that says: “I’m a robot.” This results in instant blocks, empty responses, or redirects to CAPTCHA pages.
Here’s what commonly happens when servers detect suspicious user agents:
- Immediate blocking: Your IP gets banned after just a few requests.
- Empty responses: The server returns blank pages or error messages.
- Fake content: You receive dummy data instead of real information.
- CAPTCHA challenges: Every request gets redirected to human verification.
- Rate limiting: Your requests get severely throttled or queued.
Detection Systems Are Getting Smarter
Modern anti-bot systems don’t just look at your user agent, they analyze it alongside other request patterns. They check for:
- Consistency: Does your user agent match your Accept headers and other browser fingerprints?
- Frequency: Are you making requests too fast for a human?
- Behavior: Do you visit robots.txt, load images, or follow redirects like a real browser?
The Client Hints Evolution
Here’s where things get more complex. Modern browsers are transitioning from a single User-Agent string to a set of headers called Client Hints. These provide more detailed information about the client while improving privacy by reducing fingerprinting opportunities.
Instead of cramming everything into one user agent string, browsers now send separate headers:
Sec-CH-UA: "Chromium";v="120", "Google Chrome";v="120", "Not A Brand";v="24"
Sec-CH-UA-Mobile: ?0
Sec-CH-UA-Platform: "Windows"
Why This Matters for Scrapers
Detection systems now validate whether your complete browser profile makes sense. A major red flag is a mismatch between your main User-Agent string and these Client Hints headers. For example, claiming to be Chrome 120 on Windows in your user agent while sending Client Hints that say you’re on mobile Safari will get you blocked instantly.
This evolution means successful scraping in 2025 requires not just a good user agent string, but a complete, consistent browser fingerprint that includes all the modern headers real browsers send.
The Best User Agent List for Web Scraping (Updated for 2025)
Here’s a curated selection of proven user agent strings that work effectively for web scraping in 2025.
1. Chrome User Agents
Chrome dominates the browser market with over 65% market share, making its user agents your safest bet for blending in with normal traffic. They’re updated frequently and widely accepted across all websites.
Chrome on Windows:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36
Mozilla/5.0 (Windows NT 11.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Chrome on macOS:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 14_2_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Chrome on Linux:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36
2. Firefox User Agents
Firefox represents about 8-10% of browser traffic and offers a good alternative when you want to diversify your user agent rotation. These strings are particularly useful for sites that might be flagging too many Chrome requests.
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:121.0) Gecko/20100101 Firefox/121.0
Mozilla/5.0 (X11; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:122.0) Gecko/20100101 Firefox/122.0
3. Safari User Agents
Safari user agents are essential for scraping sites that cater heavily to Mac and iOS users. They’re particularly effective for e-commerce and design-focused websites with many Apple users.
Safari on macOS:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 14_2_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15
Safari on iOS:
Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1
Mozilla/5.0 (iPad; CPU OS 17_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1
4. Edge (Chromium-based) User Agents
Modern Edge uses the same engine as Chrome but has a smaller user base, making these user agents perfect when you need something that looks legitimate but isn’t as common as Chrome.
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0
5. Mobile User Agents
With mobile traffic now exceeding desktop, these user agents are crucial for accessing mobile-optimized content and APIs. Many sites serve different data to mobile users, which makes these strings invaluable for comprehensive scraping.
Android Chrome:
Mozilla/5.0 (Linux; Android 14; SM-G998B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Mobile Safari/537.36
Mozilla/5.0 (Linux; Android 13; Pixel 7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Mobile Safari/537.36
Mozilla/5.0 (Linux; Android 14; SM-S918B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Mobile Safari/537.36
iPhone:
Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1
Mozilla/5.0 (iPhone; CPU iPhone OS 17_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Mobile/15E148 Safari/604.1
User Agents to Avoid
These user agent strings will immediately flag you as a bot:
1. Default Scraping Library User Agents:
python-requests/2.28.1
curl/7.68.0
urllib/3.9
Scrapy/2.5.1
2. Headless Browser Identifiers:
HeadlessChrome/120.0.0.0
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/120.0.0.0 Safari/537.36
PhantomJS/2.1.1
3. Malformed or Generic User Agents:
Mozilla/5.0
Browser 1.0
*
(empty user agent)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
For a ready-to-use collection of tested user agent strings, you can find regularly updated lists at user-agents.net or useragentstring.com. For maximum effectiveness, always verify that your chosen user agents match current browser versions and include the appropriate Client Hints headers.
How to Set a Custom User Agent in Python
Setting a custom user agent in Python is straightforward, but it’s important to verify that your target server receives the user agent you’re sending. Here are practical examples for the most popular Python libraries, complete with validation.
1. Using Requests Library
Prerequisites:
Python
Requests
Requests is Python’s most popular HTTP library. It is simple, reliable, and perfect for basic web scraping. This example shows how to set a custom user agent and verify it’s working correctly.
import requests
# Define a realistic user agent
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
# Set user agent in headers
headers = {
'User-Agent': user_agent
}
try:
# Test with httpbin to see what headers the server receives
response = requests.get('https://httpbin.org/headers', headers=headers)
if response.status_code == 200:
data = response.json()
received_ua = data['headers'].get('User-Agent', 'Not found')
print(f"Sent User-Agent: {user_agent}")
print(f"Received User-Agent: {received_ua}")
print(f"Match: {user_agent == received_ua}")
else:
print(f"Request failed with status code: {response.status_code}")
except requests.RequestException as e:
print(f"Request error: {e}")
You should still get the green light:
Sent User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Received User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Match: True
2. Using Selenium WebDriver
Prerequisites:
Python
selenium
Selenium allows you to control the browser programmatically while scraping, and it requires setting the user agent through browser options for browser automation:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import WebDriverException
# Define user agent
user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
# Configure Chrome options
chrome_options = Options()
chrome_options.add_argument(f"--user-agent={user_agent}")
try:
# Initialize driver with custom user agent
driver = webdriver.Chrome(options=chrome_options)
# Navigate to test endpoint
driver.get('https://httpbin.org/headers')
# Get the page source and check if our user agent appears
page_source = driver.page_source
# You can also use JavaScript to get the actual user agent
actual_ua = driver.execute_script("return navigator.userAgent;")
print(f"Set User-Agent: {user_agent}")
print(f"Browser User-Agent: {actual_ua}")
print(f"Match: {user_agent == actual_ua}")
except WebDriverException as e:
print(f"WebDriver error: {e}")
finally:
if 'driver' in locals():
driver.quit()
3. Using HTTPX (Modern Alternative)
Prerequisites:
Python
httpx
HTTPX is a modern HTTP client that’s becoming increasingly popular:
import httpx
import asyncio
# Define user agent
user_agent = "Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1"
# Synchronous version
def test_sync_ua():
headers = {'User-Agent': user_agent}
try:
with httpx.Client() as client:
response = client.get('https://httpbin.org/headers', headers=headers)
if response.status_code == 200:
data = response.json()
received_ua = data['headers'].get('User-Agent', 'Not found')
print(f"Sync - Sent: {user_agent}")
print(f"Sync - Received: {received_ua}")
print(f"Sync - Match: {user_agent == received_ua}")
else:
print(f"Sync request failed: {response.status_code}")
except httpx.RequestError as e:
print(f"Sync request error: {e}")
# Asynchronous version
async def test_async_ua():
headers = {'User-Agent': user_agent}
try:
async with httpx.AsyncClient() as client:
response = await client.get('https://httpbin.org/headers', headers=headers)
if response.status_code == 200:
data = response.json()
received_ua = data['headers'].get('User-Agent', 'Not found')
print(f"Async - Sent: {user_agent}")
print(f"Async - Received: {received_ua}")
print(f"Async - Match: {user_agent == received_ua}")
else:
print(f"Async request failed: {response.status_code}")
except httpx.RequestError as e:
print(f"Async request error: {e}")
# Run both tests
if __name__ == "__main__":
test_sync_ua()
asyncio.run(test_async_ua())
How to Rotate User Agents at Scale
When scraping at scale, using the same user agent for thousands of requests is like wearing the same disguise to rob every bank in town, you’ll get caught. User agent rotation helps distribute your requests across different “browser profiles,” making your traffic appear more natural and harder to detect.
The Simple Approach: Random Selection
The basic method involves maintaining a list of user agents and randomly selecting one for each request:
import random
import requests
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0"
]
def scrape_with_rotation(url):
headers = {'User-Agent': random.choice(user_agents)}
return requests.get(url, headers=headers)
if __name__ == "__main__":
scrape_with_rotation("your-page-to-scrape.com")
While this approach works for basic scraping, it becomes inadequate for larger operations or protected sites.
The Challenge of Manual Rotation
Maintaining user agent rotation manually presents several complex challenges:
- Keeping lists current: Browser versions change monthly, and outdated user agents become red flags
- Matching complementary headers: Each user agent should pair with realistic Accept, Accept-Language, and Accept-Encoding headers
- Avoiding detectable patterns: Overtime, random selection can create suspicious patterns that websites can recognize
- Scale management: Large operations need thousands of unique, validated user agent combinations
The Browser Fingerprinting Problem
In addition to checking your user agent, modern anti-bot systems also do thorough browser fingerprinting to make sure that every feature of your “browser” matches what the user agent says. This establishes several levels of detection:
- Headers layer: Your headers must be internally consistent. Does your Accept-Language header match the location implied by your IP address? For example, if your user agent claims to be Chrome on Windows but your Accept-Language is “zh-CN” while your IP is from New York, that’s suspicious.
Client Hints layer: Modern browsers send Client Hints headers alongside the traditional user agent. Does your Sec-CH-UA-Mobile header match the OS claimed in your user agent? Claiming to be desktop Chrome while sending mobile client hints will get you blocked instantly.
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X)...
Sec-CH-UA-Mobile: ?0 ← This mismatch will get you caught
- JavaScript Detection layer: Sites can compare the
`navigator.userAgent`
value in JavaScript with the User-Agent header sent in your HTTP request. If you’re using Selenium or another browser automation tool, these values must match perfectly. - Behavioral Analysis layer: Even with perfect headers, sites analyze behavior patterns. Does this “browser” load pages in 50 milliseconds? Does it skip loading images and CSS? Does it never scroll or move the mouse? These unnatural behaviors flag automated traffic regardless of user agent authenticity.
The ScraperAPI Solution
This complexity is exactly why tools like ScraperAPI exist. Instead of managing user agent rotation manually, ScraperAPI handles the entire browser fingerprinting challenge automatically:
- Automatic user agent rotation: Thousands of verified user agents rotated intelligently
- Complete header consistency: All headers match and make sense together
- Premium proxy rotation: Residential and datacenter IPs that match geographic headers
- JavaScript rendering: Real browser rendering when needed
- Smart retry logic: Automatic blocking detection and retry with different fingerprints
We’ll look at a Python example of how a simple ScraperAPI application can scrape a heavily protected site (you’ll need requests
for this):
import requests
# ScraperAPI endpoint with your API key
api_key = "your_scraperapi_key" # Only for testing purposes
target_url = "https://protected-site-example.com"
# ScraperAPI handles all the complexity automatically
scraperapi_url = f"http://api.scraperapi.com?api_key={api_key}&url={target_url}&render=true"
try:
response = requests.get(scraperapi_url)
if response.status_code == 200:
print("Successfully scraped protected site!")
print(f"Content length: {len(response.text)}")
# ScraperAPI automatically handled:
# - User agent rotation
# - Proxy rotation
# - Header consistency
# - JavaScript rendering
# - CAPTCHA solving (if needed)
else:
print(f"Request failed: {response.status_code}")
except requests.RequestException as e:
print(f"Error: {e}")
You should be getting an OK:
Successfully scraped protected site!
Content length: 10968
If you’re dealing with large-scale scraping or tough-to-crack sites, ScraperAPI takes care of user agent management for you and offers rock-solid reliability. That way, you can spend less time worrying about setup and more time digging into the data that actually matters.
Note: As this demonstration is only for testing purposes, feel free to paste your ScraperAPI key directly into your code. However, be mindful never to hardcode it if you are pushing your code to a public repository! Save it instead in an .env
file and import it.
FAQ
What is a user agent in web scraping?
A user agent is a string that your browser sends to websites to identify what type of browser, operating system, and device you’re using. In web scraping, user agents are crucial because many websites block requests that don’t have proper user agent headers or have suspicious ones that look like bots. When scraping, you should always include a realistic user agent header (like “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36”) to make your requests appear as if they’re coming from a real browser, which helps avoid getting blocked and ensures you receive the same content that regular users see.
How to use a fake user agent for web scraping?
To use a fake user agent in web scraping, simply add a User-Agent header to your HTTP requests with a realistic browser string. In Python with requests, use
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'}
and pass it to your request like `requests.get(url, headers=headers)`
. For better results, rotate between multiple user agents by creating a list of different browser strings and randomly selecting one for each request using `random.choice(user_agents_list)`
. You can also use libraries like `fake-useragent`
, which automatically provides random, up-to-date user agent strings with `UserAgent().random`
Is web scraping for personal use illegal?
Web scraping for personal use is generally legal when you’re accessing publicly available information, respecting robots.txt files, and not overwhelming servers with excessive requests. However, legality depends on factors like the website’s terms of service, whether you’re bypassing authentication, the jurisdiction you’re in, and what you do with the scraped data. It becomes problematic when you ignore robots.txt, violate terms of service, scrape copyrighted content for commercial use, or cause harm to the website’s performance. Always check the website’s robots.txt file, read their terms of service, use reasonable delays between requests, and when in doubt, consult with a legal professional.
Where can I find a list of user agents for web scraping?
You can find user agent lists from several sources: use the Python library `fake-useragent`
, which provides current user agents with `UserAgent().random`
, visit websites like WhatIsMyBrowser.com or UserAgentString.com for comprehensive databases, or check your own browser’s user agent by opening developer tools and looking at the Network tab headers.
Conclusion: Putting It All Together – Your User-Agent Strategy in 2025
In this guide, we explored the critical role that user agents play in successful web scraping, from understanding what they are to implementing them effectively in your scrapers. We covered the best practices for selecting realistic user agents, rotating them properly, and avoiding common mistakes that can get your scraper blocked.
While using realistic user agents will help your scraper blend in, sophisticated websites in 2025 employ multiple detection methods that require a holistic approach to overcome.
Effective scrapers in 2025 will implement all of the following techniques:
- Rotate recent, realistic user agents: Use up-to-date browser strings from popular browsers
- Match headers to the claimed user agent: Include Client Hints and other headers that correspond to your chosen browser
- Rotate IP addresses using proxies: Distribute requests across multiple IP addresses to avoid rate limiting
- Delay requests to avoid rate limits: Implement delays between requests to mimic human browsing patterns
- Handle sessions and cookies: Maintain proper session state and cookie management
- Render JavaScript on modern web apps: Use headless browsers for sites that rely heavily on JavaScript
- Respect header consistency and browser fingerprinting logic: Ensure all request headers work together to create a believable browser fingerprint
Managing all these components manually is certainly possible, but it’s time-consuming and requires constant maintenance as websites update their detection methods. You’ll need to monitor for blocked requests, update user agent lists, manage proxy pools, handle CAPTCHAs, and debug fingerprinting issues.
This is where ScraperAPI simplifies the entire process. Instead of building and maintaining complex anti-detection infrastructure, ScraperAPI handles user agent rotation, proxy management, header optimization, JavaScript rendering, and CAPTCHA solving with a single API call. Your scraper can focus on extracting data while ScraperAPI manages the technical complexity of staying undetected.
Ready to scale your web scraping with a production-ready solution? Try ScraperAPI today and experience hassle-free scraping with built-in anti-detection technology. Get started with a free trial and see how easy professional web scraping can be.
Happy Scraping!