LangChain Integration

June, 2025

The LangChain ScraperAPI package allows AI agents to browse the web and access real-time information without getting blocked by detectors. With just two lines of code, you can transform a basic chatbot into an intelligent agent that scrapes websites, runs Google searches, and extracts Amazon product data.

This integration also handles all complex scraping infrastructures, including IP rotation, CAPTCHA solving, rate limiting, and other anti-bot measures, all in a legally compliant manner.

To use this package, here’s what you’ll need:

Python 3.7 or higher installed
A ScraperAPI account and API key (sign up at scraperapi.com) and
An OpenAI API key (get yours) or another LLM provider (note: if you are using OpenAI, you might have to enter billing detail even to use the offered free trial)
The langchain package
The langchain_scraperapi package
The langchain_openai package
openai, the OpenAI Python package
Basic familiarity with LangChain agents

Installation

Install the LangChain ScraperAPI package using pip:

pip install -U langchain_scraperapi openai langchain_openai langchain

Once installed, set your API credentials as environment variables in your terminal:

export SCRAPERAPI_API_KEY="YOUR_SCRAPERAPI_API_KEY"
export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"   # If using OpenAI
# export ANTHROPIC_API_KEY="YOUR_ANTHROPIC_API_KEY"   # If using Anthropic Claude
# export COHERE_API_KEY="YOUR_COHERE_API_KEY"   # If using Cohere

Replace each YOUR_..._API_KEY with your actual keys from the respective provider dashboards. Your agent needs these to authenticate requests to both ScraperAPI and your chosen LLM provider.

You’ll find your ScraperAPI keys on your dashboard after logging in:

Available Tools

The integration provides three tools:

1. ScraperAPITool: /

Scrapes any public website and returns content in your preferred format (markdown, HTML, or text data).

from langchain_scraperapi import ScraperAPITool
tool = ScraperAPITool()
result = tool.invoke({
    "url": "https://openai.com", 
    "output_format": "markdown"
})

2. ScraperAPIGoogleSearchTool: /structured/google/search

Scrapes Google search results and returns a JSON or CSV format.

from langchain_scraperapi.tools import ScraperAPIAmazonSearchTool
tool = ScraperAPIAmazonSearchTool()
result = tool.invoke({
    "query": "wireless headphones", 
    "output_format": "json"
})

Basic Agent Setup

from langchain_scraperapi import ScraperAPITool
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
# Initialize the OpenAI LLM
llm = ChatOpenAI(temperature=0)
# Create the scraper tool
scraper_tool = ScraperAPITool()
# Initialize the agent with the scraper tool
agent = initialize_agent(
   tools=[scraper_tool],
   llm=llm,
   agent=AgentType.OPENAI_FUNCTIONS,
   verbose=True
)
# Test the agent
response = agent.invoke("Can you browse https://news.ycombinator.com and tell me the top 3 articles?")
print(response)

Here’s how to set up a basic LangChain agent with web scraping capabilities:

from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
#Import the scraper tool
from langchain_scraperapi import ScraperAPITool
# Initialize the OpenAI LLM
llm = ChatOpenAI(temperature=0)
# Create the scraper tool
scraper_tool = ScraperAPITool()
# Initialize the agent with the scraper tool
agent = initialize_agent(
    tools=[scraper_tool],
    llm=llm,
    agent=AgentType.OPENAI_FUNCTIONS,
    verbose=True
)
# Test the agent
response = agent.invoke("Can you browse https://news.ycombinator.com and tell me the top 3 articles?")
print(response)

Multi-Tool Agent Setup

For more comprehensive web browsing capabilities, you can add all three tools to your agent:

from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
from langchain_scraperapi import ScraperAPITool
from langchain_scraperapi.tools import ScraperAPIGoogleSearchTool, ScraperAPIAmazonSearchTool
# Initialize the LLM
llm = ChatOpenAI(temperature=0)
# Create all three tools
scraper_tool = ScraperAPITool()
google_tool = ScraperAPIGoogleSearchTool()
amazon_tool = ScraperAPIAmazonSearchTool()
# Initialize the agent with all tools
agent = initialize_agent(
    tools=[scraper_tool, google_tool, amazon_tool],
    llm=llm,
    agent=AgentType.OPENAI_FUNCTIONS,
    verbose=True
)
# Test with different queries
response1 = agent.invoke("Search Google for 'Python web scraping' and give me the top 3 results")
response2 = agent.invoke("Find wireless headphones on Amazon under $100")
response3 = agent.invoke("Browse the GitHub homepage and summarize what's trending")

Custom Parameters

You can pass additional ScraperAPI parameters to customize your scraping:

from langchain_scraperapi import ScraperAPITool
tool = ScraperAPITool()
result = tool.invoke({
    "url": "https://example.com",
    "output_format": "markdown",
    "country_code": "US",
    "render": True,  # Enable JavaScript rendering
    "wait": 5000     # Wait 5 seconds for page load
})

Error Handling

Always implement proper error handling when using the tools:

from langchain_scraperapi import ScraperAPITool
tool = ScraperAPITool()
try:
    result = tool.invoke({
        "url": "https://example.com",
        "output_format": "markdown"
    })
    print("Scraping successful:", result)
except Exception as e:
    print(f"Scraping failed: {e}")
    # Implement fallback logic here

Performance Considerations

ScraperAPI handles all the complex infrastructure, including IP rotation, CAPTCHA solving, and rate limiting. However, keep these points in mind:

Concurrent Requests: Make sure your ScraperAPI plan supports the number of concurrent requests your agent might make
Timeouts: Set appropriate timeouts in your LangChain configuration to allow ScraperAPI time to process requests
Rate Limits: While ScraperAPI manages rate limiting, avoid making excessive requests in rapid succession

Troubleshooting

API Key Issues

If you get authentication errors, verify:

Your API key is correctly set in the environment variable
The API key has sufficient credits
The API key is active and not expired

Timeout Errors

If requests timeout frequently:

Increase timeout settings in your LangChain configuration
Check if the target website is accessible
Verify your ScraperAPI plan supports the request volume

Tool Not Found Errors

If the agent can’t find the tools:

Ensure the tools are properly imported
Check that the tools are included in the agent’s tool list
Verify the package is installed correctly

For more advanced configuration options and parameters, refer to the ScraperAPI LangChain documentation.

Ready to start scraping?

Get started with 5,000 free API credits or contact sales

Get Started For Free

Ready to start scraping?

Get started with 5,000 free API credits or contact sales

No credit card required

LangChain Integration

Installation

Available Tools

1. ScraperAPITool: /

2. ScraperAPIGoogleSearchTool: /structured/google/search

Basic Agent Setup

Multi-Tool Agent Setup

Custom Parameters

Error Handling

Performance Considerations

Troubleshooting

API Key Issues

Timeout Errors

Tool Not Found Errors

Table of Contents

Ready to start scraping?

Ready to start scraping?