Alternative Data – The best kept secret on Wall Street

Alternative Data is hiding in plain sight

Data drives the world and stock markets around the world make billions of trades and trade trillions of dollars based on data. The financial industry has been at the forefront of using technology and data to propel the industry forward and that process continues to this day. The industry recognizes the value of data and technology so the value proposition argument does not need to be made again. This data termed Alternative Data, is extremely valuable and is used by analysts covering various publicly traded companies to get an idea about earnings and future growth.

However, in this article we will be talking about something new – access to a secret world of alternative data that is hiding in plain sight.

I have too much data already (all from the same sources – unfortunately)

Most traders and analysts have access to most market data from the same old sources.  This data is available to almost everyone and there is no competitive advantage one firm has over the other if they all access the same data. The advantage may be around how quickly they have access to the data, but that is not as relevant to market analysis as it is to HFT.

There is another side of data that is being used very effectively, even though not so widely, and that is to use publicly available data to gather market intelligence about businesses.

The problem with this data is that it isn’t something you can get from one place or on one terminal and while it is publicly available, it isn’t obvious.

To uncover this data requires a high degree of insight into a business (which an analyst should already have) and the various key parameters that sway earnings or other metrics that will drive stock movement.

The analyst will know if the metrics that matter this quarter are based on pure sales, or is it rising inventory, or the pace of user acquisition, or the chatter about the company’s product and the “cool” factor they have on social media, or the number of bestsellers a company’s products or the quality of reviews they have online.

There are so many metrics that could potentially be available that serve as a great proxy for the metrics that matter, even though you may not be able to get the actual metric.

As more businesses rely on online, e-commerce, social media, supply chain management, there is a lot of data that shows up online that provides an amazingly accurate proxy for the metrics that matter.

Even if you already use alternative data from sources such as Quandl (now part of Nasdaq), other firms have access to the same exact data, there is nothing unique about the data. Your analysis of that data may be different, but the core data is the same and holds no competitive advantage for anyone.

How does one get there?

There is neither one solution nor one answer to this and the answer is never easy. If it were easy, then everyone would have access to this information and it would have no value. The process of getting there relies on two capable partners, the analysts with in-depth knowledge of the stock they are covering or the industry they cover and a data gathering company such as ScrapeHero.

Together, we can discuss what kind of data is needed, what the potential sources of data could there be and if the data in these sources are accurate proxies for the metrics that will eventually predict the trends of the stock or the industry. The following sections cover the general steps in this process.


The analyst brings in the subject matter expertise about the company or industry. e.g. An analyst covering a hospitality based business would have innate knowledge of the company, the industry and the current coverage and consensus of the analysts covering the stock.

With that information they can identify a few primary drivers of the consensus and from those primary drivers they would create a “wish list” of data points they would love to have to develop a better model. e.g. An analyst covering Apple Inc. would love to know the latest iPhone sales for the quarter and that may be the primary driver and that data point will end up on the “wish list” because it really is that – a wish, because this number is almost impossible to get directly.

Once the primary drivers have been identified, the next step is to create a set of secondary data points that could be good proxies for the primary metric – whether individually or in combination. The secondary data points about Apple iPhone sales could be the sales from large e-commerce sites, or the inventory or lack thereof online, the “buzz” around the new iPhone. These and many other secondary data points can be good proxies for the anticipated sales of the iPhone.

While the examples above are somewhat simplistic and may sound “obvious”, the analyst that is really driven to excel will spend a lot of time and energy in identifying these metrics.

Data Crawling Expertise

Once the metrics have been identified to a fairly detailed extent, it is time for us to come in. We will work collaboratively and in consultation with the analyst to identify the alternative data sources, the likelihood of the data being good quality and lot of the same steps outlined in this Essential checklist. We have quite a bit of experience partnering with Wall Street firms and analysts and you will need the right partner for this to be a moneymaker.

We are not just another web scraping company with a funny name – we mean serious business and have the expertise to prove it.

Some of the metrics may be discarded as not being high quality or not being reliable or not being available at all. The discussion also results in many new metrics based on our expertise in working with a huge amount of data sets over the years.

The final result will be a list of “must have” alternative data points and an optional set of “nice to have” alternative data points.

Pilot project

The next step is to go through a Pilot phase and validate all the assumptions made so far. This Pilot phase is tightly bound by cost, timelines and specific outcomes and provides a great opportunity to validate the idea with the least amount of risk and within a finte amount of time (usually weeks) and finite amount of cost.

This is an essential exercise before people break out the champagne and celebrate. It helps validate the feasibility of extracting the data, the quality of the data and whether the data is a good proxy for the primary metric.

Full set of Data

Once the Pilot project validates the process and data and the eventual model, it is time to go ahead and deploy the full solution. It is best to time the overall process after the first full run of the data gathering and analysis phase to identify how long it takes to finish the overall process. This provides the best time to run the whole process before the earnings deadline so that we can have the latest data but also complete the whole analysis in time.

ScrapeHero has expertise in various Big Data tools and cloud based scalable technologies to perform some complex and intensive analysis and our expertise will come in handy to decide how to process terabytes of data using thousands of processors in the cloud.

Gather data on a periodic basis automatically and refine the model

Data can be set to “run” on a schedule and while simple alternative data models can be set to run like clockwork every time, they require maintenance and quality checks.

In addition, to stay ahead of the competition, you always need to be thinking ahead and keep refining the model with new inputs or change in the type of data.  The process has to keep getting better, more efficient and more accurate iteratively.

Why ScrapeHero?

One simple reason – we have done this before. We have worked in and with the financial industry and have a great deal of context around the data and what you are trying to achieve. You won’t need to explain the basics to us. In some cases, a simple email is all we need to get started and then we can discuss the game-plan with you and help you execute with minimal direction.

We are one of the top companies for custom alternative data (just Google it!).

We are US based and sign strict NDAs with our potential customers to protect their Intellectual property (IP). We hold customer privacy at the highest level of secrecy and that is one of the reason we do not even go into many specifics in this article, lest we reveal something.

We work with Compliance and Legal groups to ensure we meet strict regulatory compliance checks and internal controls and risk requirements.

We also only gather publicly available data and have no access to Material Nonpublic information.

Reach out to us to get started.

Now is a great time to get started

Turn the Internet into meaningful, structured and usable data

Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Turn the Internet into meaningful, structured and usable data   

ScrapeHero Logo

Can we help you get some data?