Alternative Data – The best kept secret on Wall Street

Alternative Data is hiding in plain sight

Access to extremely valuable data for the Financial industry

Data drives the world and stock markets around the world make billions of trades and trade trillions of dollars based on data. The financial industry has been at the forefront of using technology and data to propel the industry forward over hundreds of years and that process continues to this day. The industry recognizes the value of data and technology so the value proposition argument does not need to be made again.

However, in this article we will be talking about something new – access to a secret world of alternative data that is hiding in plain sight.


I have too much data already (all from the same sources – unfortunately)

Most traders have access to most market data from the same old sources.  This data is available to almost everyone and there is no competitive advantage one firm has over the other if they all access the same data. The advantage may be around how quickly they have access to the data, but that is not as relevant to market analysis as it is to HFT.

There is another side of data that is being used very effectively, even though not so widely, and that is to use publicly available data to gather market intelligence about businesses. This data termed Alternative Data, is extremely valuable and is used by analysts covering various publicly traded companies to get an idea about earnings and future growth.

The problem with this data is that it isn’t something you can get from one place or on one terminal and while it is publicly available, it isn’t obvious.

To uncover this data requires a high degree of insight into a business (which an analyst should already have) and the various key parameters that sway earnings or other metrics that will drive stock movement.

The analyst will know if the metrics that matter this quarter are based on pure sales, or is it rising inventory, or the pace of user acquisition, or the chatter about the company’s product and the “cool” factor they have on social media, or the number of bestsellers a company’s products or the quality of reviews they have online.

There are so many metrics that could potentially be available that serve as a great proxy for the metrics that matter, even though you may not be able to get the actual metric.

As more businesses rely on online, e-commerce, social media, supply chain management, there is a lot of data that shows up online that provides an amazingly accurate proxy for the metrics that matter.

Even if you already use alternative data from sources such as Quandl, other firms have access to the same exact data, there is nothing unique about the data. Your analysis of that data may be different, but the core data is the same and holds no competitive advantage for anyone.

How does one get there?

There is neither one solution nor one answer to this and the answer is never easy. If it were easy, then everyone would have access to this information and it would have no value. The process of getting there relies on two capable partners, the analysts with in-depth knowledge of the stock they are covering or the industry they cover and a data gathering company such as ScrapeHero.

Together, we can discuss what kind of data is needed, what the potential sources of data could there be and if the data in these sources are accurate proxies for the metrics that will eventually predict the trends of the stock or the industry.


The analyst brings in the subject matter expertise about the company or industry. e.g. An analyst covering a hospitality based business would have innate knowledge of the company, the industry and the current coverage and consensus of the analysts covering the stock.

With that information they can identify a few primary drivers of the consensus and from those primary drivers they would create a “wish list” of data points they would love to have to develop a better model. e.g. An analyst covering Apple Inc. would love to know the latest iPhone sales for the quarter and that may be the primary driver and that data point will end up on the “wish list” because it really is that – a wish, because this number is almost impossible to get directly.

Once the primary drivers have been identified, the next step is to create a set of secondary data points that could be good proxies for the primary metric – whether individually or in combination. The secondary data points about Apple iPhone sales could be the sales from large e-commerce sites, or the inventory or lack thereof online, the “buzz” around the new iPhone. These and many other secondary data points can be good proxies for the anticipated sales of the iPhone.

While the examples above are somewhat simplistic and may sound “obvious”, the analyst that is really driven to excel will spend a lot of time and energy in identifying these metrics.

Time for the Hero

Once the metrics have been identified to a fairly detailed extent, it is time for ScrapeHero to come in. We will work collaboratively and in consultation with the analyst to identify the alternative data sources, the likelihood of the data being good quality and lot of the same steps outlined in this Essential checklist. We have quite a bit of experience partnering with Wall Street firms and analysts and you will need the right partner for this to be a moneymaker.

We are not just another web scraping company with a funny name – we mean serious business and have the expertise to prove it.

Some of the metrics may be discarded as not being high quality or not being reliable or not being available at all. The discussion also results in many new metrics based on ScrapeHero’s expertise in working with a huge amount of data sets over the years.

The final result will be a list of “must have” alternative data points and an optional set of “nice to have” alternative data points.

Prove it to me

The next step is to go through a Proof of Concept (PoC) phase and validate all the assumptions made so far. This is an essential exercise before people break out the champagne and celebrate. It helps validate the feasibility of extracting the data, the quality of the data and whether the data is a good proxy for the primary metric.

The first run

Once the PoC validates the process and data and the eventual model, it is time to go ahead and deploy the full solution. It is best to time the overall process after the first full run of the data gathering and analysis phase to identify how long it takes to finish the overall process. This provides the best time to run the whole process before the earnings deadline so that we can have the latest data but also complete the whole analysis in time.

ScrapeHero has expertise in various Big Data tools and cloud based scalable technologies to perform some complex and intensive analysis and our expertise will come in handy to decide how to process terabytes of data using thousands of processors in the cloud.

Set it and forget it

Unfortunately, we all know that life is never easy and the work never stops. The simple alternative data models can be set to run like clockwork every time and they do operate flawlessly (for the most part), but to stay ahead of the competition, you always need to be thinking ahead and keep refining the model, the process to keep getting better, more efficient, more accurate.

Why ScrapeHero?

One simple reason – we have done this before. We have worked in and with the financial industry and have a great deal of context around the data and what you are trying to achieve. You won’t need to explain the basics to us. In some cases, a simple email is all we need to get started and then we can discuss the game-plan with you and help you execute with minimal direction.


We are US based and sign strict NDAs with our potential customers to protect their Intellectual property (IP). We hold customer privacy at the highest level of secrecy and that is one of the reason we do not even go into many specifics in this article, lest we reveal something.


We also work with compliance groups to ensure we meet strict regulatory compliance checks and internal controls and risk requirements.

We are the heroes you have been looking for and ScrapeHero will be a reliable partner through this whole process, so get in touch with us to get started.

Now is a great time to get started

Turn websites into meaningful and structured data through our web data extraction service

Join the conversation

Turn websites into meaningful and structured data through our web data extraction service