How to fake and rotate User Agents using Python 3

A user agent is a string that a browser or app sends to each website you visit. A typical user agent string contains details like – the application type, operating system, software vendor or software version of the requesting software user agent. Web servers use this data to assess the capabilities of your computer, optimizing a page’s performance and display. User Agents are sent as a request header called “User-Agent”.

Common Format of a user agent string –

User-Agent: Mozilla/<version> (<system-information>) <platform> (<platform-details>) <extensions>

Every request made from a web browser contains a user-agent header. When scraping many pages from a website, using the same user-agent consistently leads to the detection of a scraper. A way to bypass that detection is by faking your user agent and changing it with every request you make to a website. In this tutorial, we will show you how to fake user agents, and randomize them to prevent getting blocked while scraping websites.

Before we look into rotating user agents, let’s see how to fake or spoof a user agent in a request.

How to fake user agents in Python 3

You can set your own user agents by passing user-agent as a request header when you make requests.

Using urllib

Let’s send a request to https://httpbin.org/user-agent to see if it worked. HTTPBin’s user-agent endpoint returns the user-agent it received from your request.

import urllib.request
url = 'https://httpbin.org/user-agent'
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
request = urllib.request.Request(url,headers={'User-Agent': user_agent})
response = urllib.request.urlopen(request)
html = response.read()

If we print the response body, we should see the same user agent we had sent in the request. The response from HTTPBin is encoded in JSON. We’ll skip decoding the JSON, and just print it in the encoded format now.

 b'{\n  "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36"\n}\n'

Using Python Requests

Python requests – Request for Humans, is another popular library used to handle HTTP in Python. You can set user agents as custom headers here.

import requests 
url = 'https://httpbin.org/user-agent'
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
headers = {'User-Agent': user_agent}
response = requests.get(url,headers=headers)
html = response.content
print(response.content)

The output looks like this.

b'{\n  "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36"\n}\n'

Rotating User Agents

If you are blocked or banned while scraping a website, you can usually get past those bans by sending random user agents instead of sticking to one. You make it appear to the web server as if those requests are coming from different browsers instead of one. We are assuming that you already make requests through a set of IP Addresses. Randomizing user agents without making requests through multiple IP Address is pretty much useless.

Let’s gather a list of latest user agent strings of some popular browsers first from https://developers.whatismybrowser.com/useragents/explore/

You can make this list manually by copy paste, or automate this by using a scraper ( If you don’t like to copy paste every few weeks after a browser update). You can write a script to grab all the user agents you need from whatismybrowser.com and construct this list dynamically every time you initialize your web scraper. Once you have the list of user agents to rotate, rest is easy. Just use python’s random function to pick one and use that in your request header.

We’ve got some user-agents for Chrome and Firefox manually by copy-paste, for now.

Rotating User-Agents in Scrapy

To rotate user agents in scrapy, you need an additional middleware that is into bundled with scrapy. There are few of them, but we will use Scrapy-UserAgents.

Install Scrapy-UserAgents using

pip install scrapy-useragents

Add in settings file of scrapy add  the following lines

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy_useragents.downloadermiddlewares.useragents.UserAgentsMiddleware': 500,
}

USER_AGENTS = [
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/57.0.2987.110 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/61.0.3163.79 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) '
     'Gecko/20100101 '
     'Firefox/55.0'),  # firefox
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/61.0.3163.91 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/62.0.3202.89 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/63.0.3239.108 '
     'Safari/537.36'),  # chrome
]

When you start the scraper, it would now use the user agents you have in USER_AGENTS.

Using urllib

import urllib.request
import random
user_agent_list = [
   #Chrome
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36',
    'Mozilla/5.0 (Windows NT 5.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
    #Firefox
    'Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)',
    'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)',
    'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (Windows NT 6.2; WOW64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)',
    'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)',
    'Mozilla/5.0 (Windows NT 6.1; Win64; x64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',
    'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)',
    'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)'
]
url = 'https://httpbin.org/user-agent'
#Lets make 5 requests and see what user agents are used 
for i in range(1,6):
    #Pick a random user agent
    user_agent = random.choice(user_agent_list)
    #Set the headers 
    headers = {'User-Agent': user_agent}
    #Make the request
    request = urllib.request.Request(url,headers={'User-Agent': user_agent})
    response = urllib.request.urlopen(request)
    html = response.read()
    
    print("Request #%d\nUser-Agent Sent:%s\nUser Agent Recevied by HTTPBin:"%(i,user_agent))
    print(html)
    print("-------------------\n\n")

The output looks like

 

    Request #1
    User-Agent Sent:Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko
    User Agent Recevied by HTTPBin:
    b'{\n  "user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"\n}\n'
    -------------------
    
    
    Request #2
    User-Agent Sent:Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36
    User Agent Recevied by HTTPBin:
    b'{\n  "user-agent": "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"\n}\n'
    -------------------
    
    
    Request #3
    User-Agent Sent:Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36
    User Agent Recevied by HTTPBin:
    b'{\n  "user-agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"\n}\n'
    -------------------
    
    
    Request #4
    User-Agent Sent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36
    User Agent Recevied by HTTPBin:
    b'{\n  "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"\n}\n'
    -------------------
    
    
    Request #5
    User-Agent Sent:Mozilla/5.0 (Windows NT 6.1; Win64; x64; Trident/7.0; rv:11.0) like Gecko
    User Agent Recevied by HTTPBin:
    b'{\n  "user-agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; Trident/7.0; rv:11.0) like Gecko"\n}\n'
    -------------------

Using Python Requests

import requests
import random
user_agent_list = [
   #Chrome
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36',
    'Mozilla/5.0 (Windows NT 5.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
    'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
    #Firefox
    'Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)',
    'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)',
    'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (Windows NT 6.2; WOW64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)',
    'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)',
    'Mozilla/5.0 (Windows NT 6.1; Win64; x64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',
    'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)',
    'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)'
]
url = 'https://httpbin.org/user-agent'
#Lets make 5 requests and see what user agents are used 

#Using Requests 
for i in range(1,6):
    #Pick a random user agent
    user_agent = random.choice(user_agent_list)
    #Set the headers 
    headers = {'User-Agent': user_agent}
    #Make the request
    response = requests.get(url,headers=headers)
    
    print("Request #%d\nUser-Agent Sent:%s\nUser Agent Recevied by HTTPBin:"%(i,user_agent))
    print(response.content)
    print("-------------------\n\n")

The output should look similar to the output from the urllib example.

Learn More Methods to prevent getting blocked: How to prevent getting blacklisted while scraping

Things to keep in mind while rotating user agents

1. Use User Agents strings  of recent versions of popular browsers and keep them updated

New versions of browsers are released almost every other week, making many user agent strings outdated. Outdated user agent strings can get you blocked quickly as it is easy for servers and anti-scraping measures to identify your script as a web scraper. It is a wise idea to make a script to extract user agent strings and populate the user_agent_list before your scraper starts making requests, making sure that you don’t use outdated user agents.

2. Rotating User-Agents without rotating IP addresses is a bad idea

If you are making requests from a single IP Address, rotating user agents will not help you from getting blocked. It will increase the changes to get blocked instead. Anti-Scraping measures will flag requests as unusual when they see the same IP address sending many requests with different user agents. So, before using IP Rotation, use a rotating proxy, or at least a bunch of IP Address to make your requests.  IP Rotation is a lot more efficient when you combine it with user agent rotation.

3. Rotating User-Agents doesn’t guarantee that websites won’t block you

Rotating user agents can help you from getting blocked, but advanced anti-scraping services can see past your user agents and IP address, using a variety of techniques like identifying a browser fingerprints, scanning the IP Address you send requests from and more to flag a scraper. They can detect a lot of these patterns and block your scrapers even if you have rotated your user agents.

Posted in:   Scraping Tips, Web Scraping Tutorials

Responses

simabn March 17, 2018

There is a python lib called “fake-useragent” which helps getting a list of common UA.

Reply

    ScrapeHero March 18, 2018

    Great find. We had used fake user agent before, but at times we feel like the user agent lists are outdated.

    Reply

MargaritaL May 23, 2018

I have to import “urllib.request” instead of “requests”, otherwise it does not work.

Reply

    Mikie June 18, 2018

    agreed, same for me. I think that was a typo.

    Reply

      Hyyudu April 24, 2019

      requests is different package, it should be installed separately, with “pip install requests”. But urllib.request is a system library always included in your Python installation

      Reply

    Javier July 2, 2019

    requests use urllib3 packages, you need install requests with pip install.

    Reply

Nick July 15, 2019

Hi there, thanks for the great tutorials!

Just wondering; if I’m randomly rotating both ips and user agents is there a danger in trying to visit the same URL or website multiple times from the same ip address but with a different user agent and that looking suspicious?

Cheers,
Nick

Reply

    ScrapeHero July 19, 2019

    Nick,
    There is no definite answer to these things – they all vary from site to site and time to time.

    Reply

Comments or Questions?

Turn the Internet into meaningful, structured and usable data