How to Scrape LinkedIn with Python: Step-by-Step Guide

Scrape LinkedIn jobs using Python with this simple tutorial

Scraping LinkedIn is one of the most scalable ways to understand the job market, find job opportunities, or run competitor analysis by finding what roles your competitors are hiring for.

Although LinkedIn is a huge source of publicly available data, it’s quite tricky to scrape because of its advanced anti-scraping mechanisms and dynamic content.

In this article, we’ll show you how to build a web scraper using Python that doesn’t violate any privacy policies or require a headless browser to extract the following:

  • Job title
  • Company hiring
  • Job location
  • Job URL

Then, we’ll export the data to a CSV file for later analysis or use.

Note: If you’re more proficient in JavaScript, we have a tutorial on building a LinkedIn scraper using Node.js and Cheerio you can check.

Need More Than 3M API Credits?

Let our team of experts build a custom plan that allows you to run your scrapers smoothly and without interruption.

1. Setting Up Our Project

Let’s start by installing all the dependencies we’ll be using for this project.

Assuming you already have Python 3 installed, open VScode – or your favorite text editor – and open a new terminal window.

From there, use the following commands to install the libraries:

  • Requests: pip3 install requests
  • Beautiful Soup: pip3 install beautifulsoup4
  • CSV: Python comes with a CSV module ready to use

With our dependencies installed, let’s create a new file and name it linkedin_python.py and import the libraries at the top:

</p>
import csv
import requests
from bs4 import BeautifulSoup
<p>

2. Using Chrome DevTools to Understand LinkedIn’s Site Structure

Before we can scrape LinkedIn job listings, we first need to understand how to interact with its data.

To get started, navigate to the homepage at https://www.linkedin.com/ from an InPrivate browser window (Incognito in Chrome) and click on jobs at the top of the page.

LinkedIn job search

It will send us directly to the job search result page, where we can create a new search. For the sake of this example, let’s say that we’re trying to build a list of product management jobs in San Francisco.

LinkedIn search results for "product management" jobs in San Francisco

At a glance, it seems like every job’s data is inside a card-like container, and sure enough, after inspecting the page (right-click > inspect), we can see that every job result is wrapped between a <li> tag inside an <ul> element.

HTML view of LinkedIn job search

So, a first approach would be to grab the <ul> element and iterate through every <li> tag inside to extract the data we’re looking for.

Base Search Card

But there’s a problem: to access new jobs, LinkedIn uses infinite scrolling pagination, which means there is no “next page” button to grab the next page URL link from, nor does the URL itself change.

In cases like this, we can use a headless browser like Selenium to access the site, extract the data, and then scroll down to reveal the new data.

Of course, as we previously stated, we’re not doing that. Instead, let’s outsmart the website by using the Network Tab inside DevTools.

3. Using the DevTool’s Network Tab

With DevTools open, we can navigate to the Network Tab from the dropdown menu at the top of the window.

Showing where the Network tab is inside DevTools

To populate the report, just reload the page, and you’ll be able to see all the fetch requests the browser is running to render the data on the page. After scrolling to the bottom, the browser sends a new request to the URL.

Let’s try this new URL in our browser to see where it takes us:

Perfect, this page has all the information we want right there for the grabbing!

An additional finding is that this URL has a structure we can actually manipulate by changing the value in the start parameter.

To put this to a test, let’s change the value to 0 – which is the starting value from the organic URL:

https://www.linkedin.com/jobs/search?keywords=Product%20Management&location=San%20Francisco%20Bay%20Area&geoId=90000084&trk=public_jobs_jobs-search-bar_search-submit&position=1&pageNum=0

And yes, that did the trick. Confirmed because the first job for each page is the same.

Experimenting is crucial for web scraping, so here are a few more things we tried before settling for this solution:

  • Changing the pageNum parameter doesn’t change anything on the page.
  • The start parameter increases by 25 for every new URL. We found this out by scrolling down the page and comparing the fetch requests sent by the site itself.
  • Changing the start parameter by 1 (so start=2, start=3, and so on) will change the resulting page by hiding the previous job listings out of the page – which is not what we want.
  • The current last page is start=975. It goes to a 404 page when hitting 1000.

Having our initial URL, we can start writing our LinkedIn scraper.

4. Parsing LinkedIn Using Requests and Beautiful Soup

Sending a request and parsing the returning response is super simple in Python.

  1. First, let’s create a variable containing our initial URL and pass it to the requests.get() method.
  2. Then, we’ll store the returned HTML in a variable called “response” to create our Python object. For testing, let’s print response:

</p>
url = 'https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=Product%20Management&location=San%20Francisco%20Bay%20Area&geoId=90000084&trk=public_jobs_jobs-search-bar_search-submit&position=1&pageNum=0&start=0'

response = requests.get(url)

print(response)
<p>

Awesome. A 200 status code indicates a successful HTTP request.

However, before we can start extracting any data, we’ll need to parse the raw HTML data to make it easier to navigate using CSS selectors.

To do so, create a new Beautiful Soup object by passing response.content as the first argument, and our parser method as the second argument:

</p>
soup = BeautifulSoup(response.content,'html.parser')
<p>

Because testing should be a part of our development process, let’s do a simple experiment using our new soup object by selecting and printing the first job title on the page, which we already know is wrapped inside <h3> tags with the class base-search-card__title.

</p>
job_title = soup.find('h3', class_='base-search-card__title').text
print(job_title)
<p>

soup.find() will do exactly what it says. It’ll find an element inside our Beautiful Soup object that matches the parameters we stated.

By adding the .text method at the end, it’ll return only the text inside the element without the whole HTML surrounding it.

To delete all the white space around the text, all we need to do is add the .strip() method at the end of our string.

Monitoring Job Listings?

Get a custom plan and enjoy +3M API credits a month, all our premium features, premium support, and a dedicated account manager.

5. Scraping Multiple LinkedIn Pages Using Conditions

Here’s where things get a little tricky, but we’ve already done the hardest part: figuring out how to move through the pages.

In simple words, all we need to do is create some logic to change the start parameter in our URL.

In an earlier article, we talked about scraping paginated pages in Scrapy, but with Beautiful Soup, we’ll do something different.

For starters, we’ll define a new function that will contain the entirety of our code and pass webpage and page_number as arguments – we’ll use these two arguments to build the URL we’ll use to send the HTTP request.

</p>
def linkedin_scraper(webpage, page_number):
next_page = webpage + str(page_number)
print(str(next_page))
response = requests.get(str(next_page))
soup = BeautifulSoup(response.content,'html.parser')

<p>

In the variable next_page we’re combining both arguments – where webpage is a string and page_number, being a number, needs to be turned into a string – before passing it to Requests for sending the result URL.

For the next step to make sense, we need to understand that our scraper will:

  • Create the new URL
  • Send the HTTP request
  • Parse the response
  • Extract the data
  • Send it to a CSV file
  • Increase the start parameter
  • Repeat until it breaks

To increase the start parameter in a loop, we’ll create an If condition:

</p>
if page_number < 25:
page_number = page_number + 25
linkedin_scraper(webpage, page_number)

<p>

What we’re saying here is that as long as page_number is not higher than 25 (so if it’s 26 or higher it’ll break), page_number will increase by 25 and pass the new number to our function.

Why 25 you ask? Because before going all in, we want to make sure that our logic works with a simple test.

</p>
print(response)
print(page_number)

if page_number < 25:
page_number = page_number + 25
linkedin_scraper(webpage, page_number)

linkedin_scraper('https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=Product%20Management&location=San%20Francisco%20Bay%20Area&geoId=90000084&trk=public_jobs_jobs-search-bar_search-submit&position=1&pageNum=0&start=', 0)

<p>

We’re going to print the response status code and page_number to verify that we’re accessing both pages.

To run our code, we’ll need to call our function with the relevant parameters.

Note: Notice that in the parameters, we separated the start parameter from its value. We need its value to become a number so we can increase it in the If statement.

We’ve also added a print() statement for the new URL created, just to verify that everything is working correctly.

6. Testing Our Selectors

We already found the elements and classes we’re going to be using for our parser. Nevertheless, it’s always essential to test them out to avoid unnecessary requests to the server.

Right inside the DevTools’s console, we can use the document.querySelectorAll() method to test each CSS selector from the browser.

Here’s an example for extracting the job title:

Getting all the titles on the page within DevTools

It returns a NodeList of 25, which matches the number of jobs on the page. We can do the same thing for the rest of our targets. Here are our targets:

  • Job title: ‘h3′, class_=’base-search-card__title’
  • Company: ‘h4′, class_=’base-search-card__subtitle’
  • Location: ‘span’, class_=’job-search-card__location’
  • URL: ‘a’, class_=’base-card__full-link’

Notice that we changed the syntax to match the .find() method. If you want to keep using the same JQuery-like selectors, you can use the .select() function instead.

7. Scraping LinkedIn Job Data

Extracting the data is as simple as selecting all the parent elements that are wrapping our data and then looping through them to extract the information we want.

Inside each <li> element, there’s a div with a class we can target:

To access the data inside, let’s create a new variable to pick all these <div>s.

</p>
jobs = soup.find_all('div', class_='base-card relative w-full hover:no-underline focus:no-underline base-card--link base-search-card base-search-card--link job-search-card')

<p>

Now that we have a list of <div>s, we can create our for loop using the CSS selectors we chose:

</p>
for job in jobs:
job_title = job.find('h3', class_='base-search-card__title').text.strip()
job_company = job.find('h4', class_='base-search-card__subtitle').text.strip()
job_location = job.find('span', class_='job-search-card__location').text.strip()
job_link = job.find('a', class_='base-card__full-link')['href']
<p>

Note: If you feel like we’re moving too fast, we recommend you read our Python web scraping tutorial for beginners. It goes into more detail on this process.

9. Importing LinkedIn Jobs to a CSV File

Outside our main function, we’ll open a new file, create a new writer, and tell it to create our heading row using the .writerow() method:

</p>
file = open('linkedin-jobs.csv', 'a')
writer = csv.writer(file)
writer.writerow(['Title', 'Company', 'Location', 'Apply'])

<p>

When opening a file you’ll want to continuously add new rows to, you’ll need to open it in append mode, hence why the a as a second argument in the open() function.

To create a new row with the data extracted from the parser, add this snippet of code at the end of the for loop:

</p>
writer.writerow([
job_title.encode('utf-8'),
job_company.encode('utf-8'),
job_location.encode('utf-8'),
job_link.encode('utf-8')
])


<p>

At the end of each iteration through the list of jobs, our scraper will append all the data into a new row.

Note: It’s important to make sure that the order we add the new data is in the same order as our headings.

To finish this step, let’s add an else statement to close the file once the loop breaks.

</p>
 else:
file.close()
print('File closed')

<p>

10. Bypassing LinkedIn Anti-Bot Detection with ScraperAPI

Our last step is optional but can save you hours of work in the long run. After all, we’re not trying to scrape just one or two pages.

To scale your project, you’ll need to:

  • Handle IP rotations
  • Manage a pool of proxies
  • Handle CAPTCHAs
  • Send proper headers

and more to avoid getting blocked or even banned for life.

ScraperAPI can handle these and more challenges with just a few changes to the base URL we’re using right now.

First, create a new free ScraperAPI account to have access to our API key.

From there, the only thing you need to do is add this string at the beginning of the URL:

</p>
http://api.scraperapi.com?api_key={YOUR_API_KEY}&url=
<p>

Resulting in the following function call:

</p>
linkedin_scraper('http://api.scraperapi.com?api_key=51e43be283e4db2a5afb62660xxxxxxx&url=https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=Product%20Management&location=San%20Francisco%20Bay%20Area&geoId=90000084&trk=public_jobs_jobs-search-bar_search-submit&position=1&pageNum=0&start=', 0)

<p>

This way, the HTTP request will be processed by ScraperAPI’s server. It’ll rotate IPs after every request (or as needed) and choose the right Headers based on years of statistical analysis and machine learning.

In addition, ScraperAPI has ultra-premium proxies chosen specifically to handle really hard websites like LinkedIn. To use them enable the ultra_premium=true parameter in the request.

Wrapping Up: Full Code

Congratulations, you just created your first LinkedIn scraper!

If you’ve followed along, here’s how your code base should look like:

</p>
import csv
import requests
from bs4 import BeautifulSoup

file = open('linkedin-jobs.csv', 'a')
writer = csv.writer(file)
writer.writerow(['Title', 'Company', 'Location', 'Apply'])

def linkedin_scraper(webpage, page_number):
next_page = webpage + str(page_number)
print(str(next_page))
response = requests.get(str(next_page))
soup = BeautifulSoup(response.content,'html.parser')

jobs = soup.find_all('div', class_='base-card relative w-full hover:no-underline focus:no-underline base-card--link base-search-card base-search-card--link job-search-card')
for job in jobs:
job_title = job.find('h3', class_='base-search-card__title').text.strip()
job_company = job.find('h4', class_='base-search-card__subtitle').text.strip()
job_location = job.find('span', class_='job-search-card__location').text.strip()
job_link = job.find('a', class_='base-card__full-link')['href']

writer.writerow([
job_title.encode('utf-8'),
job_company.encode('utf-8'),
job_location.encode('utf-8'),
job_link.encode('utf-8')
])

print('Data updated')

if page_number < 25:
page_number = page_number + 25
linkedin_scraper(webpage, page_number)
else:
file.close()
print('File closed')

linkedin_scraper('https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=Product%20Management&location=San%20Francisco%20Bay%20Area&geoId=90000084&trk=public_jobs_jobs-search-bar_search-submit&position=1&pageNum=0&start=', 0)

<p>

We added a few print() statements as visual feedback.

After running the code, the script will create a CSV file with the data scraped:

To get even more data, you can increase the limit in the if condition to scrape more pages, handle the keywords parameter to loop through different queries and/or change the location parameter to scrape the same job title across different cities and countries.

Remember that if the project is larger than just a couple of pages, using ScraperAPI’s integration will help you avoid roadblocks and keep your IP safe from anti-scraping systems.

Don’t know where to start? Contact our sales team and let our experts build the perfect plan for your use case!

Until next time, happy scraping!

About the author

Leonardo Rodriguez

Leonardo Rodriguez

Leo is a technical content writer based in Italy with experience in Python and Node.js. He’s currently working at saas.group as a content marketing manager and is the lead writer for ScraperAPI. Contact him on LinkedIn.

Table of Contents

Related Articles

Talk to an expert and learn how to build a scalable scraping solution.