How to scrape Yahoo Finance and extract stock market data using Python & LXML

Yahoo Finance is a good source for extracting financial data, be it – stock market data, trading prices or business-related news.

Steps to Scrape Yahoo Finance

Construct the URL of the search results page from Yahoo Finance. For example, here is the one for Apple-http://finance.yahoo.com/quote/AAPL?p=AAPL
Download HTML of the search result page using Python Requests
Parse the page using LXML – LXML lets you navigate the HTML Tree Structure using Xpaths. We have predefined the XPaths for the details we need in the code.
Save the data to a JSON file.

In this tutorial, we will extract the trading summary for a public company from Yahoo Finance ( like http://finance.yahoo.com/quote/AAPL?p=AAPL ). We’ll be extracting the following fields for this tutorial.

Previous Close
Open
Bid
Ask
Day’s Range
52 Week Range
Volume
Average Volume
Market Cap
Beta
PE Ratio
EPS
Earning’s Date
Dividend & Yield
Ex-Dividend Date
1yr Target EST

Read More – Scrape Zillow using Python and LXML

Below is a screenshot of what data fields we will be web scraping from Yahoo Finance.

Requirements

Install Python 3 and Pip

Here is a guide to install Python 3 in Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/

Mac Users can follow this guide – http://docs.python-guide.org/en/latest/starting/install3/osx/

Windows Users go here – https://www.scrapehero.com/how-to-install-python3-in-windows-10/

Packages

For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. Below are the package requirements:

PIP to install the following packages in Python (https://pip.pypa.io/en/stable/installing/ )
Python Requests, to make requests and download the HTML content of the pages ( http://docs.python-requests.org/en/master/user/install/).
Python LXML, for parsing the HTML Tree Structure using Xpaths ( Learn how to install that here – http://lxml.de/installation.html )

Learn More: XPaths and their relevance in Web Scraping

The Code

from lxml import html
import requests
import json
import argparse
from collections import OrderedDict


def get_headers():
    return {"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
            "accept-encoding": "gzip, deflate, br",
            "accept-language": "en-GB,en;q=0.9,en-US;q=0.8,ml;q=0.7",
            "cache-control": "max-age=0",
            "dnt": "1",
            "sec-fetch-dest": "document",
            "sec-fetch-mode": "navigate",
            "sec-fetch-site": "none",
            "sec-fetch-user": "?1",
            "upgrade-insecure-requests": "1",
            "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36"}


def parse(ticker):
    url = "http://finance.yahoo.com/quote/%s?p=%s" % (ticker, ticker)
    response = requests.get(
        url, verify=False, headers=get_headers(), timeout=30)
    print("Parsing %s" % (url))
    parser = html.fromstring(response.text)
    summary_table = parser.xpath(
        '//div[contains(@data-test,"summary-table")]//tr')
    summary_data = OrderedDict()
    other_details_json_link = "https://query2.finance.yahoo.com/v10/finance/quoteSummary/{0}?formatted=true&lang=en-US&region=US&modules=summaryProfile%2CfinancialData%2CrecommendationTrend%2CupgradeDowngradeHistory%2Cearnings%2CdefaultKeyStatistics%2CcalendarEvents&corsDomain=finance.yahoo.com".format(
        ticker)
    summary_json_response = requests.get(other_details_json_link)
    try:
        json_loaded_summary = json.loads(summary_json_response.text)
        summary = json_loaded_summary["quoteSummary"]["result"][0]
        y_Target_Est = summary["financialData"]["targetMeanPrice"]['raw']
        earnings_list = summary["calendarEvents"]['earnings']
        eps = summary["defaultKeyStatistics"]["trailingEps"]['raw']
        datelist = []

        for i in earnings_list['earningsDate']:
            datelist.append(i['fmt'])
        earnings_date = ' to '.join(datelist)

        for table_data in summary_table:
            raw_table_key = table_data.xpath(
                './/td[1]//text()')
            raw_table_value = table_data.xpath(
                './/td[2]//text()')
            table_key = ''.join(raw_table_key).strip()
            table_value = ''.join(raw_table_value).strip()
            summary_data.update({table_key: table_value})
        summary_data.update({'1y Target Est': y_Target_Est, 'EPS (TTM)': eps,
                             'Earnings Date': earnings_date, 'ticker': ticker,
                             'url': url})
        return summary_data
    except ValueError:
        print("Failed to parse json response")
        return {"error": "Failed to parse json response"}
    except:
        return {"error": "Unhandled Error"}


if __name__ == "__main__":
    argparser = argparse.ArgumentParser()
    argparser.add_argument('ticker', help='')
    args = argparser.parse_args()
    ticker = args.ticker
    print("Fetching data for %s" % (ticker))
    scraped_data = parse(ticker)
    print("Writing data to output file")
    with open('%s-summary.json' % (ticker), 'w') as fp:
        json.dump(scraped_data, fp, indent=4)

You can download the code from the link https://gist.github.com/scrapehero-code/6d87e1e1369ee701dcea8880b4b620e9.

If you would like the code in Python 2 check out this link https://gist.github.com/scrapehero/b0c7426f85aeaba441d603bb81e1d0e2

Read More – Scrape Amazon Reviews using Python

Running the Scraper

Assume the script is named yahoofinance.py. If you type in the script name in command prompt or terminal with a -h

python3 yahoofinance.py -h

usage: yahoo_finance.py [-h] ticker
positional arguments: ticker optional arguments: -h, --help show this help message and exit

The ticker argument is the ticker symbol or stock symbol to identify a company.

To find the stock data for Apple Inc we would put the argument like this:

 python3 yahoofinance.py AAPL

This should create a JSON file called AAPL-summary.json that will be in the same folder as the script.

The output file would look similar to this:

{
    "Previous Close": "293.16",
    "Open": "295.06",
    "Bid": "298.51 x 800",
    "Ask": "298.88 x 900",
    "Day's Range": "294.48 - 301.00",
    "52 Week Range": "170.27 - 327.85",
    "Volume": "36,263,602",
    "Avg. Volume": "50,925,925",
    "Market Cap": "1.29T",
    "Beta (5Y Monthly)": "1.17",
    "PE Ratio (TTM)": "23.38",
    "EPS (TTM)": 12.728,
    "Earnings Date": "2020-07-28 to 2020-08-03",
    "Forward Dividend & Yield": "3.28 (1.13%)",
    "Ex-Dividend Date": "May 08, 2020",
    "1y Target Est": 308.91,
    "ticker": "AAPL",
    "url": "http://finance.yahoo.com/quote/AAPL?p=AAPL"
}

You can download the code at https://gist.github.com/scrapehero/516fc801a210433602fe9fd41a69b496

Let us know in the comments how this scraper worked for you.

Known Limitations

This code should work for grabbing stock market data of most companies. However, if you want to scrape for thousands of pages and do it frequently (say, multiple times per hour) there are some important things you should be aware of, and you can read about them at How to build and run scrapers on a large scale and How to prevent getting blacklisted while scraping. If you want help extracting custom sources of data you can contact us.

Learn More: Scalable do-it-yourself scraping: How to build and run scrapers on a large scale

If you need some professional help with scraping complex websites contact us by filling up the form below.

Tell us about your complex web scraping projects

Turn the Internet into meaningful, structured and usable data

Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Posted in: Financial Data Gathering Tutorials, Web Scraping Tutorials

Published On: May 5, 2020

Responses

Mike March 21, 2017

I tried running your code, but I keep getting hassled by a

SyntaxError: ‘return’ outside function

I think I have your indentation right. Is there any way to post the code somewhere so I can get it straight from the horse’s mouth?

ScrapeHero March 21, 2017

I guess the indentation might have been messed up while you copy pasted. Why don’t you try downloading the code from GIST – https://gist.githubusercontent.com/scrapehero/516fc801a210433602fe9fd41a69b496/raw/67da766951334a4e7ed1b3b72a675f3b7db8502d/yahoo_finance.py and try running it.

Mike March 21, 2017

Thanks! No more indentation errors. I adapted it for Py3 and it works. Now I’m trying to scrape other elements from the new Yahoo Finance site.

redy June 18, 2017

I would like to scrap the statistics can you indicate the change to add in the code?

Jeff December 26, 2018

I would use http://beautifytools.com/html-beautifier.php to look at the source code and find the specific data you’re interested in. For example if you wanted Enterprise Value/EBITDA, I’d search for the current value. On the day I looked at it the value was 9.69 and I found it stored in a parameter named “enterpriseToEbitda”. This was part of the “defaultKeyStatistics” that is part of the “QuoteSummaryStore”. I modified ScrapeHero’s code to scrape the summary data using “QuoteSummaryStore” so you can use that to figure out how to grab what you’re looking for. I uploaded my source code https://github.com/mattice06082/webscrape/blob/master/yahoo_finance_summary.py.

Hawkeye May 12, 2017

If you don’t feel like writing a scraper I suggest looking at python’s yahoo-finance package. You can get all of the same data with a few lines of code.

delfinoharrison July 10, 2017

Yahoo finance API is not available anymore. I have moved to MarketXLS after this change, much more reliable data.

ScrapeHero July 11, 2017

This is the problem with APIs – Scraping is the only option to gather such data without using an API that can restrict or get discontinued at any time.

future learn2 July 15, 2017

This code (raw_table_key = table_data.xpath(‘.//td[@class=”C(black)”]//text()’) is not working in python 2.7. So I have changed with this(raw_table_key = table_data.xpath(‘.//td[contains(@class,”C(black)”)]//text()’)). Anyway very nice code and helpful to me. Thanks.

Ahmet September 14, 2017

Tried your code and I get :

Fetching data for aapl
Parsing http://finance.yahoo.com/quote/aapl?p=aapl
Writing data to output file

when I go to the output file it is very limited:
{
“”: “172.29”,
“url”: “http://finance.yahoo.com/quote/aapl?p=aapl”,
“ticker”: “aapl”,
“1y Target Est”: 172.29,
“EPS (TTM)”: 8.808,
“Earnings Date”: “2017-10-23 to 2017-10-27”
}

Am I doing something wrong? I use Atom as my editor and running on a Macbook

Ira Fuchs November 14, 2017

I am trying to use your code just to get a current price for securities. The slimmed down version looks like this:

from lxml import html
import requests
from exceptions import ValueError
from time import sleep
import json
import argparse
from collections import OrderedDict
from time import sleep

def parse(ticker):
url = “http://finance.yahoo.com/quote/%s?p=%s”%(ticker,ticker)
response = requests.get(url)
parser = html.fromstring(response.text)
summary_table = parser.xpath(‘//div[contains(@data-test,”summary-table”)]//tr’)
summary_data = OrderedDict()
other_details_json_link = “https://query2.finance.yahoo.com/v10/finance/quoteSummary/{0}?formatted=true&lang=en-US&region=US&modules=financialData”.format(ticker)
summary_json_response = requests.get(other_details_json_link)
try:
json_loaded_summary = json.loads(summary_json_response.text)
return json_loaded_summary[“quoteSummary”][“result”][0][“financialData”][“currentPrice”][‘raw’]
except ValueError:
print “Failed to parse json response”
return {“error”:”Failed to parse json response”}

if __name__==”__main__”:
argparser = argparse.ArgumentParser()
argparser.add_argument(‘ticker’,help = ”)
args = argparser.parse_args()
ticker = args.ticker
print parse(ticker)

This works fine except for getting prices of ETFs (e.g. EFA, VWO).

Yahoo provides these prices but the data is slightly different. My question is how to modify your code to work with ETFs. Also, in writing this script how do you view the XML that needs to be parsed since it is dynamically created?

Benjamin July 14, 2018

I’m having the same issue. For ETFs, I get the “Failed to parse json response” error. Does anyone know how to get past this?

The Market Prophet January 23, 2018

I’m having some trouble getting this working, getting an error on line 12:

C:\Python36\Scripts>python yf.py aapl
File “yf.py”, line 12
print “Parsing %s”%(url)
^
SyntaxError: invalid syntax

Any Ideas to fix this?

ScrapeHero January 24, 2018

The scraper is written in Python 2.7 and you are using 3.6. You can either install and run the script using python2.7.

We’ll soon update this script to python 3

The Market Prophet January 24, 2018

Wonderful, thank you! Got it going on 2.7. However, I’m having the same issue as Ahmet above, the output is limited to the following:
{
“”: “189.48”,
“url”: “http://finance.yahoo.com/quote/aapl?p=aapl”,
“ticker”: “aapl”,
“1y Target Est”: 189.48,
“EPS (TTM)”: 9.21,
“Earnings Date”: “2018-02-01”
}

Additionally, I’d like to pull all data from the yahoo finance statistics page (https://ca.finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL). I tried simply swapping the URL but it didn’t do the trick so I suspect there’s more to it

Any tips would be very appreciated!

The Market Prophet January 25, 2018

Got it working with future learn2’s advice above and pulling all the data I wanted from the json link. Can’t seem to get Dividend data though – doesn’t seem to be in the json link. Any idea how to get dividend data?

ScrapeHero January 26, 2018

Hi there,
Please follow the same pattern to identify the dividend field and modify the scraper to grab that or other fields.

ScrapeHero January 26, 2018

Actually the dividend is already extracted. In the above example
“Dividend & Yield”: “2.28 (1.63%)“,

Matt January 26, 2018

Is it possible to enter a list of tickers to have the program generate files for each ticker? Instead of doing it individually?

ScrapeHero January 26, 2018

Hi Matt,
Sure you can enter the ticker symbols in a text file and write a python program to read that file line by line and pass the ticker to this program.
A quick Google search for “Python read text file as input to script” can provide a lot of scripts or snippets for you.

Matt January 26, 2018

Ty 🙂

cayenne91 February 28, 2018

Hi there,

I tried to run this function parse(ticker) using ticker = ‘APPL’ but when loading for this line:

other_details_json_link = ‘https://query2.finance.yahoo.com/v10/finance/quoteSummary/{0}?formatted=true&lang=en-US&region=US&modules=financialData’.format(‘APPL’)

Also tried this url:

“https://query2.finance.yahoo.com/v10/finance/quoteSummary/{0}?formatted=true&lang=en-US&region=US&modules=summaryProfile%2CfinancialData%2CrecommendationTrend%2CupgradeDowngradeHistory%2Cearnings%2CdefaultKeyStatistics%2CcalendarEvents&corsDomain=finance.yahoo.com”.format(‘AAPL’)

I get: {‘quoteSummary’: {‘result’: None, ‘error’: {‘code’: ‘Not Found’, ‘description’: ‘Quote not found for ticker symbol: APPL’}}}

response is a 404 so I suspect this url doesn’t work anymore?

Cheers,

Ciaran

ScrapeHero March 1, 2018

Are you trying to find AAPL instead of APPL for Apple?

cayenne91 March 7, 2018

Thank you very much ScrapeHero, this is perfect

cayenne91 March 7, 2018

I’m curious how do you know what the modules are called that can be put into your query. Reason I ask is that for smaller stocks (ticker=’BG.VI’) I get back the following:

{“quoteSummary”:{“result”:null,”error”:{“code”:”Not Found”,”description”:”No fundamentals data found for any of the summaryTypes=financialData,defaultKeyStatistics,summaryProfile,earnings,calendarEvents,upgradeDowngradeHistory,recommendationTrend”}}}

If I could understand the query parameters better I might be able to tweak your code to get around this for myself.

Bey July 15, 2018

How does one get the names of all the modules from which data can be extracted?
I am referring to the line of code setting the value of the variable “other_details_json_link”

https://query2.finance.yahoo.com/v10/finance/quoteSummary/INTC?formatted=true&lang=en-US&region=US&modules=Summaryprofile%2cfinancialdata%2crecommendationtrend%2cupgradedowngradehistory%2cearnings%2cdefaultkeystatistics%2ccalendarevents&corsDomain=finance.yahoo.com

Thanks

Jeff` December 26, 2018

I modified ScrapeHero’s code to scrape the summary data only using the publicly known summary page so you can use that to figure out how to grab what you’re looking for. I would use http://beautifytools.com/html-beautifier.php to look at the source code and find the specific data you’re interested in. I uploaded my source code https://github.com/mattice06082/webscrape/blob/master/yahoo_finance_summary.py.

soulsoldseparately July 16, 2018

Is the ‘print’ function the main difference between using Python 2.7 vs 3.6?

Jose Fernandes (@joseferpt) September 15, 2018

I would like to scrap the Statistics and Analysis pages can you please share the code or indicate the changes to add in the code for the summary shared above? Thanks in advance.

Jeff December 23, 2018

Thanks for the great work you do. I’ve been wanting to do something like this for quite some time and you provided me the right motivation. I hope you don’t mind, but I’ve modified your code a bit to add some flexibility. You use the actual webpage people get at Yahoo Finance just for a few pieces of data. For the rest you use an address that returns a nice JSON blob that you use to fill in the rest of the information. It works great but the same custom address doesn’t return much for mutual funds or ETFs. I was able to find a similar address that could be used for mutual funds and ETFs but think a better approach is to just use the publicly known webpage. I was able to manipulate that and produce summary information for stocks (same output as your scipt), mutual funds and ETFs.

The other advantage of doing it this way is that there’s a vast amount of other information available in the JSON blobs that I grab. To find out what is available I suggest using http://beautifytools.com/html-beautifier.php and loading the Yahoo Financial summary page url. Once you click on “Beautify html” you’re presented with a nice tree format of what’s in there. This view will also show where the paths came from for the data I do store.

Here’s the modified code:

from lxml import html
import requests
from time import sleep
import json
import argparse
from collections import OrderedDict
from time import sleep

def matching(string, begTok, endTok):
# Find location of the beginning token
start = string.find(begTok)
stack = []
# Append it to the stack
stack.append(start)
# Loop through rest of the string until we find the matching ending token
for i in range(start+1, len(string)):
if begTok in string[i]:
stack.append(i)
elif endTok in string[i]:
stack.remove(stack[len(stack)-1])
if len(stack) == 0:
# Removed the last begTok so we’re done
end = i+1
break
return end

def parse(ticker):
# Yahoo Finance summary for stock, mutual fund or ETF
url = “http://finance.yahoo.com/quote/%s?p=%s”%(ticker,ticker)
response = requests.get(url, verify=False)
print (“Parsing %s”%(url))
sleep(4)
summary_data = OrderedDict()

# Convert the _context html object into JSON blob to tell if this is an equity, a mutual fund or an ETF contextStart = response.text.find('"_context"') contextEnd = contextStart+matching(response.text[contextStart:len(response.text)], '{', '}')


# Convert the QuoteSummaryStore html object into JSON blob

summaryStart = response.text.find('"QuoteSummaryStore"')

summaryEnd = summaryStart+matching(response.text[summaryStart:len(response.text)], '{', '}')
# Convert the ticker quote html object into JSON blob

streamStart = response.text.find('"StreamDataStore"')

quoteStart = streamStart+response.text[streamStart:len(response.text)].find("%s"%ticker.upper())-1

quoteEnd = quoteStart+matching(response.text[quoteStart:len(response.text)], '{', '}')
try:

    json_loaded_context = json.loads('{' + response.text[contextStart:contextEnd] + '}')

    json_loaded_summary = json.loads('{' + response.text[summaryStart:summaryEnd] + '}')

    # Didn't end up needing this for the summary details, but there's lots of good data there

    json_loaded_quote = json.loads('{' + response.text[quoteStart:quoteEnd] + '}')

    if "EQUITY" in json_loaded_context["_context"]["quoteType"]:

        # Define all the data that appears on the Yahoo Financial summary page for a stock

        # Use http://beautifytools.com/html-beautifier.php to understand where the path came from or to add any additional data

        prev_close = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["previousClose"]['fmt']

        mark_open = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["open"]['fmt']

        bid = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["bid"]['fmt'] + " x "\

        + str(json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["bidSize"]['raw'])

        ask = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["ask"]['fmt'] + " x "\

        + str(json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["askSize"]['raw'])

        day_range = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["regularMarketDayLow"]['fmt']\

        + " - " + json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["regularMarketDayHigh"]['fmt']

        year_range = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["fiftyTwoWeekLow"]['fmt'] + " - "\

        + json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["fiftyTwoWeekHigh"]['fmt']

        volume = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["volume"]['longFmt']

        avg_volume = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["averageVolume"]['longFmt']

        market_cap = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["marketCap"]['fmt']

        beta = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["beta"]['fmt']

        PE = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["trailingPE"]['fmt']

        eps = json_loaded_summary["QuoteSummaryStore"]["defaultKeyStatistics"]["trailingEps"]['fmt']

        earnings_list = json_loaded_summary["QuoteSummaryStore"]["calendarEvents"]['earnings']

        datelist = []

        for i in earnings_list['earningsDate']:

            datelist.append(i['fmt'])

        earnings_date = ' to '.join(datelist)

        div = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["dividendRate"]['fmt'] + " ("\

        + json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["dividendYield"]['fmt'] + ")"

        ex_div_date = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["exDividendDate"]['fmt']

        y_Target_Est = json_loaded_summary["QuoteSummaryStore"]["financialData"]["targetMeanPrice"]['raw']
        # Store ordered pairs to be written to a file

        summary_data.update({'Previous Close':prev_close,'Open':mark_open,'Bid':bid,'Ask':ask,"Day's Range":day_range\

        ,'52 Week Range':year_range,'Volume':volume,'Avg. Volume':avg_volume,'Market Cap':market_cap,'Beta (3Y Monthly)':beta\

        ,'PE Ratio (TTM)':PE,'EPS (TTM)':eps,'Earnings Date':earnings_date,'Forward Dividend & Yield':div\

        ,'Ex-Dividend Date':ex_div_date,'1y Target Est':y_Target_Est,'ticker':ticker,'url':url})

        return summary_data

    elif "MUTUALFUND" in json_loaded_context["_context"]["quoteType"]:

        # Define all the data that appears on the Yahoo Financial summary page for a mutual fund

        prev_close = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["previousClose"]['fmt']

        ytd_return = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["ytdReturn"]['fmt']

        exp_rat = json_loaded_summary["QuoteSummaryStore"]["defaultKeyStatistics"]["annualReportExpenseRatio"]['fmt']

        category = json_loaded_summary["QuoteSummaryStore"]["fundProfile"]["categoryName"]

        last_cap_gain = json_loaded_summary["QuoteSummaryStore"]["defaultKeyStatistics"]["lastCapGain"]['fmt']

        morningstar_rating = json_loaded_summary["QuoteSummaryStore"]["defaultKeyStatistics"]["morningStarOverallRating"]['raw']

        morningstar_risk_rating = json_loaded_summary["QuoteSummaryStore"]["defaultKeyStatistics"]["morningStarRiskRating"]['raw']

        sustainability_rating = json_loaded_summary["QuoteSummaryStore"]["esgScores"]["sustainScore"]['raw']

        net_assets = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["totalAssets"]['fmt']

        beta = json_loaded_summary["QuoteSummaryStore"]["defaultKeyStatistics"]["beta3Year"]['fmt']

        yld = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["yield"]['fmt']

        five_year_avg_ret = json_loaded_summary["QuoteSummaryStore"]["fundPerformance"]["performanceOverview"]["fiveYrAvgReturnPct"]['fmt']

        holdings_turnover = json_loaded_summary["QuoteSummaryStore"]["defaultKeyStatistics"]["annualHoldingsTurnover"]['fmt']

        div = json_loaded_summary["QuoteSummaryStore"]["defaultKeyStatistics"]["lastDividendValue"]['fmt']

        inception_date = json_loaded_summary["QuoteSummaryStore"]["defaultKeyStatistics"]["fundInceptionDate"]['fmt']
        # Store ordered pairs to be written to a file

        summary_data.update({'Previous Close':prev_close,'YTD Return':ytd_return,'Expense Ratio (net)':exp_rat,'Category':category\

        ,'Last Cap Gain':last_cap_gain,'Morningstar Rating':morningstar_rating,'Morningstar Risk Rating':morningstar_risk_rating\

        ,'Sustainability Rating':sustainability_rating,'Net Assets':net_assets,'Beta (3Y Monthly)':beta,'Yield':yld\

        ,'5y Average Return':five_year_avg_ret,'Holdings Turnover':holdings_turnover,'Last Dividend':div,'Average for Category':'N/A'\

        ,'Inception Date':inception_date,'ticker':ticker,'url':url})

        return summary_data

    elif "ETF" in json_loaded_context["_context"]["quoteType"]:

        # Define all the data that appears on the Yahoo Financial summary page for an ETF

        prev_close = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["previousClose"]['fmt']

        mark_open = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["open"]['fmt']

        bid = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["bid"]['fmt'] + " x "\

        + str(json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["bidSize"]['raw'])

        ask = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["ask"]['fmt'] + " x "\

        + str(json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["askSize"]['raw'])

        day_range = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["regularMarketDayLow"]['fmt'] + " - "\

        + json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["regularMarketDayHigh"]['fmt']

        year_range = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["fiftyTwoWeekLow"]['fmt'] + " - "\

        + json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["fiftyTwoWeekHigh"]['fmt']

        volume = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["volume"]['longFmt']

        avg_volume = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["averageVolume"]['longFmt']

        net_assets = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["totalAssets"]['fmt']

        nav = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["navPrice"]['fmt']

        yld = json_loaded_summary["QuoteSummaryStore"]["summaryDetail"]["yield"]['fmt']

        ytd_return = json_loaded_summary["QuoteSummaryStore"]["defaultKeyStatistics"]["ytdReturn"]['fmt']

        beta = json_loaded_summary["QuoteSummaryStore"]["defaultKeyStatistics"]['beta3Year']['fmt']

        exp_rat = json_loaded_summary["QuoteSummaryStore"]["fundProfile"]["feesExpensesInvestment"]["annualReportExpenseRatio"]['fmt']

        inception_date = json_loaded_summary["QuoteSummaryStore"]["defaultKeyStatistics"]["fundInceptionDate"]['fmt']

# Store ordered pairs to be written to a file summary_data.update({'Previous Close':prev_close,'Open':mark_open,'Bid':bid,'Ask':ask,"Day's Range":day_range,'52 Week Range':year_range\ ,'Volume':volume,'Avg. Volume':avg_volume,'Net Assets':net_assets,'NAV':nav,'PE Ratio (TTM)':'N/A','Yield':yld,'YTD Return':ytd_return\ ,'Beta (3Y Monthly)':beta,'Expense Ratio (net)':exp_rat,'Inception Date':inception_date,'ticker':ticker,'url':url}) return summary_data except: print ("Failed to parse json response") return {"error":"Failed to parse json response"}

if name==”main“:
argparser = argparse.ArgumentParser()
argparser.add_argument(‘ticker’,help = ”)
args = argparser.parse_args()
ticker = args.ticker
print (“Fetching data for %s”%(ticker))
scraped_data = parse(ticker)
print (“Writing data to output file”)
with open(‘%s-summary.json’%(ticker),’w’) as fp:
json.dump(scraped_data,fp,indent = 4)

Jeff December 26, 2018

Didn’t realize comment would lose all formatting. Posted it on github, https://github.com/mattice06082/webscrape/blob/master/yahoo_finance_summary.py

ScrapeHero December 27, 2018

Thanks Jeff.
We are glad that this motivated you and you were able to enhance the code to get other data.
We hope the community accessing our site will find this useful.

Kevin Stamile January 25, 2021

Hello, I am new to coding I have created a list for all the stocks I would like to scrape, but I do not know how to integrate it into the code that is written.

Swapnil Patil March 25, 2021

Hello sir, i just wanted to fetch days range field. what is the best way to do it by scraping. Thanks in advance.

Chris October 2, 2021

When I run the command to find the stock data I get:

$ python3 yahoofinance_scraper.py APPL

Fetching data for APPL
/usr/lib/python3/dist-packages/urllib3/connectionpool.py:999: InsecureRequestWarning: Unverified HTTPS request is being made to host ‘finance.yahoo.com’. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
warnings.warn(
Parsing http://finance.yahoo.com/quote/APPL?p=APPL
Failed to parse json response
Writing data to output file

What should I do?

Steve December 26, 2021

I’m having trouble getting a response to my GET request. Anyone else having this difficult with Yahoo Finance?

Comments are closed.

How to scrape Yahoo Finance and extract stock market data using Python & LXML

Steps to Scrape Yahoo Finance

Requirements

Install Python 3 and Pip

Packages

The Code

Running the Scraper

Known Limitations

Tell us about your complex web scraping projects

Continue Reading ..

Turn the Internet into meaningful, structured and usable data