How to Scrape LinkedIn using Python

LinkedIn is one of the largest professional social media websites in the world and is a good source of social media data and job data. Using web scraping you can gather these data fields for performing analysis. We are glad that you came here to learn how to scrape LinkedIn and we won’t disappoint you. In this tutorial we will show you how to scrape the data in a LinkedIn company page.

Here are the steps to Scrape LinkedIn

  1. Download and install the latest version of Python
  2. Copy and run the code provided

Scrape job listings from LinkedIn

ScrapeHero Cloud has pre-built crawler that let’s you scrape LinkedIn jobs for as low as $5

No coding required and No setup required – Just provide URLs to start scraping!

Get started with scraping LinkedIn for job listings

Here are the fields we will be scraping LinkedIn for :

  1. Company Name
  2. Website
  3. Description
  4. Date founded
  5. Address – Street, city, zip, country
  6. Specialties
  7. Number of followers

Why Scrape LinkedIn?

  1. Job search Automation – you want to work for a company with some specific criteria and they are not the usual suspects. You do have a shortlist, but this list isn’t really short – it is more like a long list. You wish there was a tool like google finance that could help you filter companies based on criteria they have published on LinkedIn. You can take your “long list” and scrape this information into a structured format and then like every programmer before you, build an amazing analysis tool.
    Heck, you could probably even build an app for that and not need that job after all !
  2. Curiosity- not the one that killed the cat, but you are curious about companies on LinkedIn and want to gather a good clean set of data to satiate your curiosity.
  3. Tinkerer – you just like to tinker and found out that you would love to learn Python and needed something useful to get started.

Well, whatever your reason, you have come to the right place.

In this tutorial we will show you the basic steps on how to scrape the publicly available LinkedIn company pages such as LinkedIn itself or the ScrapeHero page.

Prerequisites to LinkedIn Scraping:

For this tutorial, and just like we did for the Amazon Scraper, we will stick to using basic Python and a couple of python packages – requests and lxml. We will not use more complicated packages like Scrapy in this tutorial.

You will need to install the following:

  • Python 2.7 available here ( https://www.python.org/downloads/ )
  • Python Requests available here ( http://docs.python-requests.org/en/master/user/install/) . You might need Python pip to install this available here – https://pip.pypa.io/en/stable/installing/)
  • Python LXML ( Learn how to install that here – http://lxml.de/installation.html )

Python LinkedIn Scraper

Below is the code to create your own python LinkedIn scraper. If you are unable to view the python code to scrape LinkedIn below, it can be downloaded from the GIST here

All you need to do is change the URL in this line

companyurls = ['https://www.linkedin.com/company/scrapehero']

or add more URLs separated by commas to this list

You can save the file and run it using Python – python filename.py

The output will be in a file called data.json in the same directory and will look something like this

{
        "website": "http://www.scrapehero.com", 
        "description": "ScrapeHero is one of the top web scraping companies in the world for a reason.\r\nWe don't leave you with a \"self service\" screen to build your own scrapers.\r\nWe have real humans that will talk to you within hours of your request and help you with your need.\r\nEven though we are premier provider in this space, our investments in automation have allowed us to provide a completely \"full service\" to you at an affordable cost.\r\nGet in touch with us at https://scrapehero.com and experience our awesome customer service first hand", 
        "founded": 2014, 
        "street": null, 
        "specialities": [
            "Web Scraping Service", 
            "Website Scraping", 
            "Screen scraping", 
            "Data scraping", 
            "Web crawling", 
            "Data as a Service", 
            "Data extraction API", 
            "Scrapy", 
            "DaaS"
        ], 
        "size": "11-50 employees", 
        "city": null, 
        "zip": null, 
        "url": "https://www.linkedin.com/company/scrapehero", 
        "country": null, 
        "industry": "Computer Software", 
        "state": null, 
        "company_name": "ScrapeHero", 
        "follower_count": 41, 
        "type": "Privately Held"
    }

Scrape job listings from LinkedIn

ScrapeHero Cloud has pre-built crawler that let’s you scrape LinkedIn jobs for as low as $5

No coding required and No setup required – Just provide URLs to start scraping!

Or if you run it for Cisco

companyurls = ['https://www.linkedin.com/company/cisco']

The output will look like this

{
        "website": "http://www.cisco.com", 
        "description": "Cisco (NASDAQ: CSCO) enables people to make powerful connections--whether in business, education, philanthropy, or creativity. Cisco hardware, software, and service offerings are used to create the Internet solutions that make networks possible--providing easy access to information anywhere, at any time. \r\n\r\nCisco was founded in 1984 by a small group of computer scientists from Stanford University. Since the company's inception, Cisco engineers have been leaders in the development of Internet Protocol (IP)-based networking technologies. Today, with more than 71,000 employees worldwide, this tradition of innovation continues with industry-leading products and solutions in the company's core development areas of routing and switching, as well as in advanced technologies such as home networking, IP telephony, optical networking, security, storage area networking, and wireless technology. In addition to its products, Cisco provides a broad range of service offerings, including technical support and advanced services. \r\n\r\nCisco sells its products and services, both directly through its own sales force as well as through its channel partners, to large enterprises, commercial businesses, service providers, and consumers.", 
        "founded": 1984, 
        "street": "Tasman Way, ", 
        "specialities": [
            "Networking", 
            "Wireless", 
            "Security", 
            "Unified Communication", 
            "Telepresence", 
            "Collaboration", 
            "Data Center", 
            "Virtualization", 
            "Unified Computing Systems"
        ], 
        "size": "10,001+ employees", 
        "city": "San Jose", 
        "zip": "95134", 
        "url": "https://www.linkedin.com/company/cisco", 
        "country": "United States", 
        "industry": "Computer Networking", 
        "state": "CA", 
        "company_name": "Cisco", 
        "follower_count": 1201541, 
        "type": "Public Company"
    }

Things to keep in mind before scraping LinkedIn using Python

  1. Since LinkedIn needs you to log in every time you open their website this code may not work for you.
  2. Use Request Headers, Proxies, and IP Rotation to prevent getting Captchas – How to prevent getting blacklisted while scraping.  You can also use python to solve some basic captchas using an OCR called Tesseract.

Scrape job listings from LinkedIn

ScrapeHero Cloud has pre-built crawler that let’s you scrape LinkedIn jobs for as low as $5

No coding required and No setup required – Just provide URLs to start scraping!

Feel free to change the URLs or the fields you want to scrape and Happy Scraping !

Need some help with scraping data?

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Responses

Jubin Sanghvi July 17, 2017

Hey, I tried using the scraper and it works brilliantly. Few questions though, does LinkedIn block IPs if I try to scrape a lot of pages. Is there a limit to it? Is there any workaround?

Reply

John Winger August 15, 2017

An important development on LinkedIn Scraping – a federal judge orders LinedIn to unblock access for scraping of public data.

A judge has ruled that Microsoft’s LinkedIn network must allow a third-party company to scrape data publicly posted by LinkedIn users.
A US District Judge has granted hiQ Labs with a preliminary injunction that provides access to LinkedIn data. LinkedIn tried to argue that hiQ Labs violated the 1986 Computer Fraud and Abuse Act by scraping data. The judge raised concerns around LinkedIn “unfairly leveraging its power in the professional networking market for an anticompetitive purpose,” and compared LinkedIn’s argument to allowing website owners to “block access by individuals or groups on the basis of race or gender discrimination.”

https://www.theverge.com/2017/8/15/16148250/microsoft-linkedin-third-party-data-access-judge-ruling

Reply

    Dwight July 12, 2018

    Is still the current ruling, “must allow a third-party company to scrape data publicly posted by LinkedIn users”? And does this include individuals?

    Reply

    Gabriel September 7, 2017

    Hi, I´m getting this error too, I´m hopping you can help me please. Thank you

    Reply

My3 January 16, 2018

InsecureRequestWarning
getting this warning and not able to scrape anything

Reply

Le Contemplateur May 23, 2018

After adapting the syntax to 3.x and solving the certification warning, I get a null json file. What am I doing wrong? I’ll test it with 2.7 just to be sure it’s not a compatibility problem

Reply

    João Silva May 30, 2018

    Hi! I’m getting exactly the same problem. Getting a null json file. Does anyone already have a solution for that? Thanks.

    Reply

Hugo Bernardes June 3, 2018

Hello, I am getting the following error:

Traceback (most recent call last):
File “scraper.py”, line 4, in
from exceptions import ValueError
ModuleNotFoundError: No module named ‘exceptions’

Can you help? Many thanks!

Reply

Karen Phillips October 4, 2018

I have the same error ‘ No module named ‘exceptions’.

Reply

    ScrapeHero October 5, 2018

    No module named … errors are resolved by installing the module using pip

    Reply

    Umesh S G December 18, 2018

    “exceptions” module is no more supported by Python 3.x version, alternatively install “builtins” module ($pip install builtins) and then use this line “from builtins import ValueError” .. It will work

    Reply

D January 6, 2019

Adding a cookie to the code in order to access as a web browser solves the problem

Reply

NN August 19, 2019

Does this still work in 2019?

Reply

    ScrapeHero August 19, 2019

    The code is workable but LI blocks almost everything

    Reply

M October 18, 2019

Hi
I am getting an invalid syntax error for def in def readurl ()
any help would be great! thanks!

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Turn the Internet into meaningful, structured and usable data   

ScrapeHero Logo

Can we help you get some data?