Web Scraping MercadoLibre Using Python

Share:

web scraping MercadoLibre

MercadoLibre is an Argentine e-commerce company that serves several Latin American countries. Web Scraping MercadoLibre will tell you how your products perform on the website and give you insight into your competitors.

In this tutorial, you can learn how to scrape MercadoLibre product data using Python.

The Setup for Web Scraping MercadoLibre Using Python

Ensure you have the required Python packages before executing the code shown in this tutorial. This tutorial uses three external packages; install these Python packages using pip.

pip install requests bs4 unicodecsv

Here,

  • The requests library helps you manage HTTP requests.
  • bs4 stands for BeautifulSoup4 library, which lets you parse and extract data from the HTTP response.
  • Unicodecsv enables you to write the extracted data to a CSV file.

BeautifulSoup can use any parser. However, this tutorial uses the external Python package lxml, which is also installable using pip.

pip install lxml

The code also uses argparse, which enables the script to accept arguments from the command line. However, you don’t have to install it since it comes pre-installed with Python.

Data scraped from Mercado Libre

This code for web scraping MercadoLibre scrapes three data points using BeautifulSoup:

  • Price
  • Url
  • Name

These are available on MercadoLlibre’s search results page itself. You only need to find the attributes of these data points.

Screenshot showing the data points scraped while web scraping MercadoLibre

You can find the attributes using your browser’s inspect tool. Right-click on the name and select inspect, and an inspect panel will appear with the HTML tag containing the name. The panel also tells you what attributes the element has.

Screenshot showing the URL to be scraped while web scraping MercadoLibre

Check the code above or below. There will be an anchor tag with an href attribute. That is the required URL from MercadoLibre.

How to Scrape MercadoLibre: The Code

This section tells you how to write the code for web scraping MercadoLibre using Python. MercadoLibre data extraction with Python has three parts:

  • Import Packages
  • Define Functions
  • Integrate Everything

Import Packages

Import all the packages mentioned above.

import requests
from bs4 import BeautifulSoup
import unicodecsv as csv
Import argparse

 Define Functions

Define a separate function parse() to extract the code; this makes this code for MercadoLibre data extraction more readable.

The function accepts a URL and sends an HTTP request. Right now, MercadoLibre does not block a request from a scraper. Therefore, there is no need to use fake user-agent headers.

response = requests.get(url)

Next, parse the response using BeautifulSoup.

soup = BeautifulSoup(response.text,'lxml')

The products are inside a div element with the class ‘ui-search-result__content-wrapper.’ Select them using find_all().

products=soup.find_all('div',{'class':'ui-search-result__content-wrapper'})

Iterate through the div elements and find

  • the anchor tag containing the URL
  • h2 tag containing the name
  • a span tag with the class ‘andes-money-amount__fraction’ tag containing the price

Then, store the extracted elements in a dict object, append it to an array, and return the array.

for product in products:
         href = product.find('a')['href']
         name = product.find('h2').text
         price = product.find('span',{'class':'andes-money-amount__fraction'}).text
         data.append({
              "Name":name,
              "Price":price,
              "URL":href
         })
return data

The above function can parse data from a product URL, which includes the search term supplied at runtime. To build this URL, define a function linkMaker() that accepts the search term and returns the product URL.

def linkMaker(string):
    words = string.split()
    start = ""
    end = "#D[A:"


    for word in words:
            start = start+word+"-"
            end = end+word+"%20"
    url = "https://listado.mercadolibre.com.ar/"+start[:-1]+end[:-3]+"]"              
    return url

Integrate Everything

The argparse module is responsible for accepting the search term when you run the script from the command line. Use its add_argument() method to specify each argument.

Then, the parse_args() will parse the arguments the script gets during execution; store them in a variable. Here, the variable is the search term.

argparser = argparse.ArgumentParser()
argparser.add_argument('search',help = 'Search Term')
args = argparser.parse_args()
search_term = args.search

Call linkMaker() to build a URL and then call parse() with that.

url = linkMaker(search_term)
scraped_data =  parse(url)

Finally, write the data returned by parse() to a CSV file using the unicodecsv DictWriter() method.

if scraped_data:
   print ("Writing scraped data to mercadolibre_%s_data.csv"%(search_term))
   with open('mercadolibre_%s_data.csv'%(search_term),'wb') as csvfile:
        fieldnames = ["Name","Price","URL"]
        writer = csv.DictWriter(csvfile,fieldnames = fieldnames,quoting=csv.QUOTE_ALL)
        writer.writeheader()
        for data in scraped_data:
            writer.writerow(data)
else:
    print("No data scraped")

Here are the results of web scraping MercadoLibre.

Screenshots showing the results of web scraping MercadoLibre

And here is the complete code

import requests
import argparse
import unicodecsv as csv
from bs4 import BeautifulSoup


def parse(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text,'lxml')
    products=soup.find_all('div',{'class':'ui-search-result__content-wrapper'})
    data = []
    for product in products:
         href = product.find('a')['href']
         name = product.find('h2').text
         price = product.find('span',{'class':'andes-money-amount__fraction'}).text
         data.append({
              "Name":name,
              "Price":price,
              "URL":href
         })
    return data


def linkMaker(string):
    words = string.split()
    start = ""
    end = "#D[A:"


    for word in words:
            start = start+word+"-"
            end = end+word+"%20"
    url = "https://listado.mercadolibre.com.ar/"+start[:-1]+end[:-3]+"]"              
    return url


if __name__=="__main__":


    argparser = argparse.ArgumentParser()
    argparser.add_argument('search',help = 'Search Term')
    args = argparser.parse_args()
    search_term = args.search
    url = linkMaker(search_term)
    scraped_data =  parse(url)


    if scraped_data:
        print ("Writing scraped data to mercadolibre_%s_data.csv"%(search_term))
        with open('mercadolibre_%s_data.csv'%(search_term),'wb') as csvfile:
            fieldnames = ["Name","Price","URL"]
            writer = csv.DictWriter(csvfile,fieldnames = fieldnames,quoting=csv.QUOTE_ALL)
            writer.writeheader()
            for data in scraped_data:
                writer.writerow(data)
    else:
        print("No data scraped")

Code Limitation

The above code will scrape MercadoLibre product data if they do not use anti-scraping measures. Otherwise, bypass them using techniques like user-agent headers and proxy rotation to scrape without getting blocked.

Moreover, you may have to reanalyze their HTML code whenever MercadoLibre changes its website structure. That is because changes in website structure can affect the attributes and tags of the data points, making it necessary to update the code to reflect the changes.

Wrapping Up

You can use Python requests and BeautifulSoup to scrape MercadoLibre.com. This code shows how to do that, but you may need to change it when the website structure changes. Moreover, large-scale data extraction may not be possible, as this code does not use any techniques to bypass anti-scraping measures.

Therefore, for a more robust code, contact ScrapeHero Services. We are a full-service web scraping service provider capable of enterprise-grade web scraping and crawling. We also have ready-made web scrapers on ScrapeHero Cloud, which you can try for free.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Table of contents

Scrape any website, any format, no sweat.

ScrapeHero is the real deal for enterprise-grade scraping.

Clients love ScrapeHero on G2

Ready to turn the internet into meaningful and usable data?

Contact us to schedule a brief, introductory call with our experts and learn how we can assist your needs.

Continue Reading

Google search scraping

Google Search Scraping Step-By-Step: A Code and No-Code Guide

Guide to scraping Google results via code or no-code for SEO and research.
Questions on Web Scraping

Choosing a Web Scraping Company? Here are 10 Questions on Web Scraping You Must Ask

Questions on Web Scraping: What to Ask Before You Sign a Contract.
Scrape Google Play Store

A Guide on How to Scrape Google Play Store: Code and No-Code Approaches

Learn how to extract data from the Google Play Store.
ScrapeHero Logo

Can we help you get some data?