The Importance of Good Data Sets

Last Updated on June 11, 2021 by Oddmund Groette

After ten years of day trading, I have experienced the expensive way the importance of good data. The famous saying “garbage in, garbage out” is indeed true. I have probably lost tens of thousand dollars on trading strategies that are based on “garbage”. Sad but true. Problem is, it is not much you can do about it. Or is it? By writing this blog I’ve been contacted by several people. Yesterday one sent me his own data on SPY which he has downloaded from Interactive Brokers (IB) himself. I’ll do some testing on this dataset to see the differences between that and EOD data from Yahoo!Finance. No doubt the dataset downloaded from IB is better than what you get from many providers.

First, I’ll show you two errors in SPY which I still can remember (In Yahoo!Finance):

This example is from the 30th of November 2011. It’s correct that it was a big gap up opening, but the low is completely wrong. Even in a paid data feed as this low price is included (on EOD data, not intraday data). In many strategies, this will turn out to be a huge winner, but it’s far from real trading. The fact is that this day the low was only some 20 cents lower than the open! Not 2 dollars as shown here.







Here is the second example:

This one is from the 9th of April 2012. It shows the gap is filled, but it’s fake. The high of the day was 75 cents higher than the open, not close to 2 dollars as shown in the chart! Fading the gap this turns out to be a fake huge winner.








Worth noting is that the CLOSE is basically 100% right. OPEN is also reasonably correct. It is the HIGH and LOW prices of the day which is sometimes wrong.

Below is a comparison of the quotes comparing the manually downloaded quotes from IB and the EOD quotes from Yahoo!Finance. It shows the percentage difference between the OPEN to HIGH and OPEN to LOW (the OPEN to HIGH from IB is deducted the OPEN to HIGH from Yahoo!Finance). The first bar shows that Yahoo!Finance has a lot of high quotes that are a lot higher than IB’s. The second chart shows the same attributes: The low in Yahoo!Finance is a lot lower than IB’s.

The question is: are these differences so brutal that it makes a theoretically good strategy useless? Yesterday I wrote about opening gaps in SPY. And yes, the results are a lot worse. This morning I tested on all three options: EOD from Yahoo/Finance, intraday data collected from IB and intraday data from Alle three yields significantly different numbers! When using EOD data from IQFeed,net I basically get the same result as in Yahoo!Finance.

So the conclusion must be: if you’re testing on only the CLOSE data, you’re on solid ground. If you’re using HIGH and LOW on EOD, you must be careful. Always test the strategies by paper trading: Just on the quotes you actually see, or trade as small as you can for a period.