How to scrape Yahoo Finance and extract stock market data using Python & LXML

Yahoo Finance is a good source for extracting financial data, be it – stock market data, trading prices or business related news.

In this tutorial, we will extract the trading summary for a public company from Yahoo Finance ( like ). We’ll be extracting the following fields for this tutorial.

  1. Previous Close
  2. Open
  3. Bid
  4. Ask
  5. Day’s Range
  6. 52 Week Range
  7. Volume
  8. Average Volume
  9. Market Cap
  10. Beta
  11. PE Ratio
  12. EPS
  13. Earning’s Date
  14. Dividend & Yield
  15. Ex-Dividend Date
  16. 1yr Target EST

Below is a screenshot of what data we’ll be extracting from Yahoo Finance.

Scraping Logic

  1. Construct the URL of the search results page from Yahoo Finance. For example, here is the one for Apple-
  2. Download HTML of the search result page using Python Requests
  3. Parse the page using LXML – LXML lets you navigate the HTML Tree Structure using Xpaths. We have predefined the XPaths for the details we need in the code.
  4. Save the data to a JSON file.


For this web scraping tutorial using Python, we will need some packages for downloading and parsing the HTML. Below are the package requirements.

  • Python 2.7 ( )
  • PIP to install the  following packages in Python (
  • Python Requests, to make requests and download the HTML content of the pages (
  • Python LXML, for parsing the HTML Tree Structure using Xpaths ( Learn how to install that here – )

The Code

The code is self explanatory.

You can download the code from the link here, if the embed above does not work.

Running the Scraper

Assume the script is named If you type in the script name in command prompt or terminal with a  -h

python -h
usage: [-h] ticker

positional arguments:

optional arguments:
  -h, --help  show this help message and exit

The ticker argument is the ticker symbol or  stock symbol to identify a company .

To find the stock data for Apple Inc we would put the argument like this:

 python aapl

This should create a JSON file called aapl-summary.json that will be in the same folder as the script.

The output file would look similar to this:

    "Previous Close": "139.52", 
    "Open": "138.92", 
    "Bid": "138.69 x 100", 
    "Ask": "139.01 x 4600", 
    "Day's Range": "138.82 - 139.80", 
    "52 Week Range": "89.47 - 140.28", 
    "Volume": "16,641,812", 
    "Avg. Volume": "28,451,631", 
    "Market Cap": "729.58B", 
    "Beta": "1.36", 
    "PE Ratio (TTM)": "16.69", 
    "EPS (TTM)": 8.33, 
    "Earnings Date": "2017-04-24 to 2017-04-28", 
    "Dividend & Yield": "2.28 (1.63%)", 
    "Ex-Dividend Date": "N/A", 
    "1y Target Est": 142.48, 
    "url": "", 
    "ticker": "aapl"

You can download the code at

Let us know in the comments how this scraper worked for you.

Known Limitations

This code should work for grabbing stock market data of most companies. However, if you want to scrape for thousands of pages and do it frequently  (say, multiple times per hour) there are some important things you should be aware of, and you can read about them at How to build and run scrapers on a large scale and How to prevent getting blacklisted while scraping.

If you need some professional help with scraping complex websites contact us by filling up the form below.

Tell us about your complex web scraping projects

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

10 thoughts on “How to scrape Yahoo Finance and extract stock market data using Python & LXML

  1. I tried running your code, but I keep getting hassled by a

    SyntaxError: ‘return’ outside function

    I think I have your indentation right. Is there any way to post the code somewhere so I can get it straight from the horse’s mouth?

      1. Thanks! No more indentation errors. I adapted it for Py3 and it works. Now I’m trying to scrape other elements from the new Yahoo Finance site.

  2. If you don’t feel like writing a scraper I suggest looking at python’s yahoo-finance package. You can get all of the same data with a few lines of code.

  3. This code (raw_table_key = table_data.xpath(‘.//td[@class=”C(black)”]//text()’) is not working in python 2.7. So I have changed with this(raw_table_key = table_data.xpath(‘.//td[contains(@class,”C(black)”)]//text()’)). Anyway very nice code and helpful to me. Thanks.

  4. Tried your code and I get :

    Fetching data for aapl
    Writing data to output file

    when I go to the output file it is very limited:
    “”: “172.29”,
    “url”: “”,
    “ticker”: “aapl”,
    “1y Target Est”: 172.29,
    “EPS (TTM)”: 8.808,
    “Earnings Date”: “2017-10-23 to 2017-10-27”

    Am I doing something wrong? I use Atom as my editor and running on a Macbook

  5. I am trying to use your code just to get a current price for securities. The slimmed down version looks like this:

    from lxml import html
    import requests
    from exceptions import ValueError
    from time import sleep
    import json
    import argparse
    from collections import OrderedDict
    from time import sleep

    def parse(ticker):
    url = “”%(ticker,ticker)
    response = requests.get(url)
    parser = html.fromstring(response.text)
    summary_table = parser.xpath(‘//div[contains(@data-test,”summary-table”)]//tr’)
    summary_data = OrderedDict()
    other_details_json_link = “{0}?formatted=true&lang=en-US&region=US&modules=financialData”.format(ticker)
    summary_json_response = requests.get(other_details_json_link)
    json_loaded_summary = json.loads(summary_json_response.text)
    return json_loaded_summary[“quoteSummary”][“result”][0][“financialData”][“currentPrice”][‘raw’]
    except ValueError:
    print “Failed to parse json response”
    return {“error”:”Failed to parse json response”}

    if __name__==”__main__”:
    argparser = argparse.ArgumentParser()
    argparser.add_argument(‘ticker’,help = ”)
    args = argparser.parse_args()
    ticker = args.ticker
    print parse(ticker)

    This works fine except for getting prices of ETFs (e.g. EFA, VWO).

    Yahoo provides these prices but the data is slightly different. My question is how to modify your code to work with ETFs. Also, in writing this script how do you view the XML that needs to be parsed since it is dynamically created?

Join the conversation