The LangChain ScraperAPI package allows AI agents to browse the web and access real-time information without getting blocked by detectors. With just two lines of code, you can transform a basic chatbot into an intelligent agent that scrapes websites, runs Google searches, and extracts Amazon product data.
This integration also handles all complex scraping infrastructures, including IP rotation, CAPTCHA solving, rate limiting, and other anti-bot measures, all in a legally compliant manner.
To use this package, here’s what you’ll need:
- Python 3.7 or higher installed
- A ScraperAPI account and API key (sign up at scraperapi.com) and
- An OpenAI API key (get yours) or another LLM provider (note: if you are using OpenAI, you might have to enter billing detail even to use the offered free trial)
- The
langchain
package - The
langchain_scraperapi
package - The
langchain_openai
package openai
, the OpenAI Python package- Basic familiarity with LangChain agents
Installation
Install the LangChain ScraperAPI package using pip:
pip install -U langchain_scraperapi openai langchain_openai langchain
Once installed, set your API credentials as environment variables in your terminal:
export SCRAPERAPI_API_KEY="YOUR_SCRAPERAPI_API_KEY"
export OPENAI_API_KEY="YOUR_OPENAI_API_KEY" # If using OpenAI
# export ANTHROPIC_API_KEY="YOUR_ANTHROPIC_API_KEY" # If using Anthropic Claude
# export COHERE_API_KEY="YOUR_COHERE_API_KEY" # If using Cohere
Replace each YOUR_..._API_KEY
with your actual keys from the respective provider dashboards. Your agent needs these to authenticate requests to both ScraperAPI and your chosen LLM provider.
You’ll find your ScraperAPI keys on your dashboard after logging in:
Available Tools
The integration provides three tools:
1. ScraperAPITool: /
Scrapes any public website and returns content in your preferred format (markdown, HTML, or text data).
from langchain_scraperapi import ScraperAPITool
tool = ScraperAPITool()
result = tool.invoke({
"url": "https://openai.com",
"output_format": "markdown"
})
2. ScraperAPIGoogleSearchTool: /structured/google/search
Scrapes Google search results and returns a JSON or CSV format.
from langchain_scraperapi.tools import ScraperAPIAmazonSearchTool
tool = ScraperAPIAmazonSearchTool()
result = tool.invoke({
"query": "wireless headphones",
"output_format": "json"
})
Basic Agent Setup
from langchain_scraperapi import ScraperAPITool
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
# Initialize the OpenAI LLM
llm = ChatOpenAI(temperature=0)
# Create the scraper tool
scraper_tool = ScraperAPITool()
# Initialize the agent with the scraper tool
agent = initialize_agent(
tools=[scraper_tool],
llm=llm,
agent=AgentType.OPENAI_FUNCTIONS,
verbose=True
)
# Test the agent
response = agent.invoke("Can you browse https://news.ycombinator.com and tell me the top 3 articles?")
print(response)
Here’s how to set up a basic LangChain agent with web scraping capabilities:
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
#Import the scraper tool
from langchain_scraperapi import ScraperAPITool
# Initialize the OpenAI LLM
llm = ChatOpenAI(temperature=0)
# Create the scraper tool
scraper_tool = ScraperAPITool()
# Initialize the agent with the scraper tool
agent = initialize_agent(
tools=[scraper_tool],
llm=llm,
agent=AgentType.OPENAI_FUNCTIONS,
verbose=True
)
# Test the agent
response = agent.invoke("Can you browse https://news.ycombinator.com and tell me the top 3 articles?")
print(response)
Multi-Tool Agent Setup
For more comprehensive web browsing capabilities, you can add all three tools to your agent:
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
from langchain_scraperapi import ScraperAPITool
from langchain_scraperapi.tools import ScraperAPIGoogleSearchTool, ScraperAPIAmazonSearchTool
# Initialize the LLM
llm = ChatOpenAI(temperature=0)
# Create all three tools
scraper_tool = ScraperAPITool()
google_tool = ScraperAPIGoogleSearchTool()
amazon_tool = ScraperAPIAmazonSearchTool()
# Initialize the agent with all tools
agent = initialize_agent(
tools=[scraper_tool, google_tool, amazon_tool],
llm=llm,
agent=AgentType.OPENAI_FUNCTIONS,
verbose=True
)
# Test with different queries
response1 = agent.invoke("Search Google for 'Python web scraping' and give me the top 3 results")
response2 = agent.invoke("Find wireless headphones on Amazon under $100")
response3 = agent.invoke("Browse the GitHub homepage and summarize what's trending")
Custom Parameters
You can pass additional ScraperAPI parameters to customize your scraping:
from langchain_scraperapi import ScraperAPITool
tool = ScraperAPITool()
result = tool.invoke({
"url": "https://example.com",
"output_format": "markdown",
"country_code": "US",
"render": True, # Enable JavaScript rendering
"wait": 5000 # Wait 5 seconds for page load
})
Error Handling
Always implement proper error handling when using the tools:
from langchain_scraperapi import ScraperAPITool
tool = ScraperAPITool()
try:
result = tool.invoke({
"url": "https://example.com",
"output_format": "markdown"
})
print("Scraping successful:", result)
except Exception as e:
print(f"Scraping failed: {e}")
# Implement fallback logic here
Performance Considerations
ScraperAPI handles all the complex infrastructure, including IP rotation, CAPTCHA solving, and rate limiting. However, keep these points in mind:
- Concurrent Requests: Make sure your ScraperAPI plan supports the number of concurrent requests your agent might make
- Timeouts: Set appropriate timeouts in your LangChain configuration to allow ScraperAPI time to process requests
- Rate Limits: While ScraperAPI manages rate limiting, avoid making excessive requests in rapid succession
Troubleshooting
API Key Issues
If you get authentication errors, verify:
- Your API key is correctly set in the environment variable
- The API key has sufficient credits
- The API key is active and not expired
Timeout Errors
If requests timeout frequently:
- Increase timeout settings in your LangChain configuration
- Check if the target website is accessible
- Verify your ScraperAPI plan supports the request volume
Tool Not Found Errors
If the agent can’t find the tools:
- Ensure the tools are properly imported
- Check that the tools are included in the agent’s tool list
- Verify the package is installed correctly
For more advanced configuration options and parameters, refer to the ScraperAPI LangChain documentation.