Language:  ▾
Follow us:

Historical data for backtesting stock pairs


Our article today will deal with the issue of historical data. It is a rather extensive topic that concerns not only stock pair traders, but all traders in general – from stock to futures, options, and forex. Poor data can have a significant impact on back-test results and ultimately result in unpleasant losses.

Sources of historical stock data

Historical stock data is provided by a number of web servers, brokers, and specialised companies. Free sources include Google Finance, Yahoo Finance, MSN, or directly stock exchange websites. Paid sources include, IQFeed, eSignal, TradeStation. Historical data is usually also provided by the broker who maintains your trading account.

Specifics of historical stock data

Historical stock data is specific in that it may be (and usually has been) influenced by splits and dividends. Let us see what they are:
Split is a process of stock splitting / merger. When this mechanism is used, a multiple number (higher or lower) of shares is created from the original number, among which the original value is distributed. For example, a 2:1 split means that the number of shares is doubled, with the price of one share being reduced to one half. The aggregate value of the company remains constant. By a split (stock split) companies usually try to reduce the market price per share, as it becomes less accessible for small shareholders when it rises. By merging shares, on the other hand, companies try to increase the value per share in order to get back to a regularly traded band. The reason for a split may also be a merger or acquisition of the company.
Dividend is the payment of a profit share by a joint-stock company whose shares you hold as at the record date. Dividends are paid out to shareholders based on a general meeting decision and usually are expressed as a fixed amount per share.
Price chart with split and dividend
The dividend payment and split dates can be found in public sources, such as at Splits and dividends are usually marked in a price graph.

Impact of dividend on price

Dividend payments influences stock prices in two ways:
  1. a. The price of the stock grows before the record date (the day as at which a shareholder is entitled to dividend payment) because investors are interested in holding a long position in order to get a dividend payment. On the other hand, traders in a short position try to end their position so that they would not have to pay out any dividend. There is a short-term excess of demand over supply. That is for us, as stock pair traders, an excellent opportunity to enter the position.
  2. Some data providers adjust historical stock prices such as to reflect income from the dividend paid. In that case, the dividend amount is linearly distributed throughout the period between two dividend payments. This adjusted price is designated as “Adjusted” and it does not correspond to the actual price at which trading took place in the past. It is a “virtual” price that is good for back-testing 'BuyAndHold' type strategies.

Data for backtests

For back-testing any trading strategy, we need data that best corresponds to reality. That is, data that best simulates the reality of the subsequent live trading. For us as stock pair traders, both of the phenomena referred to above are important.
  1. Price split will appear in the computational model as gap, jump = extreme move in the calculation model, generating an extreme RelStDev value, and hence also a false signal for entry. The calculation model is rendered useless by a split and it cannot be used for Period days. After Period days, the split “disappears” from the processed data (from sliding averages) and the back test can continue.
    StockPairBuilder features a built-in mechanism that watches for splits in data and automatically filters out transactions influenced by splits. That means that even uncleansed data containing splits can be used for back testing as the Builder knows how to deal with it.
    Still, it is better to work with data that has balanced splits, because of correlation calculations, as they are significantly influenced by any split and rendered useless!

Types of data from providers

Different historical data providers take a different approach to the phenomenon of splits and dividends. In practice, all combinations of data treatment can be seen:
  • Adjusted splits, adjusted dividends
  • Adjusted splits, unadjusted dividends
  • Undjusted splits, adjusted dividends
  • Undjusted splits, unadjusted dividends
Trustworthy data providers publish on their website a description of the method whereby they cleanse and adjust their data. They are not, unfortunately, always 100% thorough and sometimes not all splits / dividends are cleansed. It can happen that one split is balanced whereas another not. These problems surprisingly occur even in data from paid services. Generally, it can be said that frequently traded stock with a regular or higher price and a high volume have their data better cleansed and reliable than “penny stock” – inexpensive stock titles with a low volume.
Our experience shows that the best-quality data is provided by Google, Yahoo, and TradeStation. Of paid providers, we can also recommend IQFeed. InteractiveBrokers provide good-quality, well cleansed data, but only for one year back.

Data Editor

The StockPairTrading package includes the StockPairTrading_DataEditor program. It is a tool that allows for fast and clear comparisons of up to three independent sources of data. With it, you can create your own, tested database of historical data. We will pay more attention to the program in one of our future articles.
DataEditor main window


In today’s article, we discussed in detail the issue of historical stock prices. It is a very (!) important aspect of back-testing. Due attention must be paid to source selection and to subsequent validation of data. A back-test carried out with poor quality data is of no value. It can even be misleading. Stock whose data is significantly influenced by splits or whose data is incomplete, is not trustworthy, and it is therefore better to eliminate it from the database altogether, thereby preventing a distortion of the back-test results.
The StockPairTrading installation includes a database of the historical data of 6000+ stock titles traded on the NYSE, NASDAQ, and AMEX, starting at the beginning of 2012. With the package, you are therefore getting a reliable foundation on which you can perform your back-test and put together your trading portfolio.