The Importance of Good Data Sets When Backtesting (Garbage In Equals Garbage Out)

Last Updated on August 26, 2021 by Oddmund Groette

After ten years of day trading, I have experienced the expensive way the importance of good data. The famous saying “garbage in, garbage out” is indeed true.

Your backtest is only as good as the data you are testing on. Make sure you are backtesting on reliable and “clean” data. In the long run, it pays off to spend money on a good data source for backtesting.

I have probably lost tens of thousands of dollars on trading strategies that are based on “garbage”. Sad but true.

Yahoo!finance is often wrong

Problem is, it is not much you can do about it. Or is it? By writing this blog I’ve been contacted by several people. Yesterday one sent me his own data on SPY which he has downloaded from Interactive Brokers (IB) himself. I’ll do some testing on this dataset to see the differences between that and EOD data from Yahoo!finance. No doubt the dataset downloaded from IB is better than what you get from many providers.

First, I’ll show you two errors in SPY which I still can remember (in Yahoo!Finance):

This example is from the 30th of November 2011.

It’s correct that it was a big gap up opening, but the low is completely wrong. Even in a paid data feed as IQFeed.net this low price is included (on EOD data, not intraday data). In many strategies, if you rely on the low of the day to set profit targets, this will turn out to be a huge winner. But in reality, this low trade never happened. The fact is that this day had a low that was only some 20 cents lower than the open! Not 2 dollars as shown here.

Here is the second example:

This one is from the 9th of April 2012. It shows the gap is filled, but it’s fake. The high of the day was 75 cents higher than the open, not close to 2 dollars as shown in the chart! Fading the gap this turns out to be a fake huge winner.

 

 

 

Worth noting is that the CLOSE is basically 100% right. OPEN is also reasonably correct. It is the HIGH and LOW prices of the day which are sometimes (very) wrong.

A comparison between two data providers: Yahoo!finance and Interactive Brokers

Below is a comparison of the quotes comparing the manually downloaded quotes from IB and the EOD quotes from Yahoo!finance. It shows the percentage difference between the OPEN to HIGH and OPEN to LOW (the OPEN to HIGH from IB is deducted the OPEN to HIGH from Yahoo!finance).

The first bar shows that Yahoo!finance has a lot of high quotes that are a lot higher than IB’s. The second chart shows the same attributes: The low in Yahoo!finance is a lot lower than IB’s.

The question is: are these differences so brutal that it makes a theoretically good strategy useless?

Yesterday I wrote about opening gaps in SPY. And yes, the results are a lot worse. This morning I tested on all three options: EOD from Yahoo/Finance, intraday data collected from IB, and intraday data from IQFeed.net. Alle three yields significantly different numbers! When using EOD data from IQFeed I basically get the same result as in Yahoo!finance.

Conclusion about good data sets when backtesting:

So the conclusion must be: if you’re testing on only the CLOSE and OPEN data, you’re (mostly) on solid ground no matter your data provider.

If you’re using the HIGH and LOW on EOD, you must be careful. Always test the strategies by paper trading: Just on the quotes you actually see, or trade as small as you can for a period.

 

Disclaimer: We are not financial advisors. Please do your own due diligence and investment research or consult a financial professional. All articles are our opinion – they are not suggestions to buy or sell any securities.