Tuesday, June 15, 2010

Quantitative Trading: Part 5 - Software and Data

Shortly I will get to the topic of backtesting. When I do, I'll remind everyone that testing and system development is a HUGE topic. It is a monstrously time consuming process that, for me, has taken many years. It requires far more than a measly blog post to get it right, so I don't want to fool anyone into thinking this is easy. Anyway... I'll get there.

Testing Software:

For now, it all begins with the platform. Backtesting can be done without specialized software, but trust me, it's painful. And it can't be done without data; preferable lots of it. In this post I thought it might be useful to talk about the software platform and data.

There are many platforms you can use, and much of it depends on how sophisticated you want to get, and whether you have specialized trading requirements. I've found that Amibroker does a great job for me. It's extremely flexible, powerful, and (important) fast! Depending on how many symbols I am testing the time varies, but for just a few it's lighting quick. For example, I can run a fairly sophisticated trading system on SPY daily data and it will test 10 years it about 3 seconds. It's nice to have it that quick so I can work through many variations.

TradeStation, NinjaTrader, StockFetcher, are some that I've use... and many others are available. There are lots of trade-offs so you may have to do some research. The main point is that it's kind of important (if you're serious about quantitative trading analysis) to have a platform for development of trading ideas. This has enabled me to trade literally thousands of variations, on millions of trades, in a fairly compressed period of time.

By the way, since Amibroker IS my platform of choice, and since this IS my blog, all of the examples and thoughts I share are from that platform.

Data:

As mentioned above, data is crucial. The very label quantitative implies that something is being measured and analysed. There must be data. There is lots to say about data, and this won't be everything, but let me break it down into a few key points:

What kind? It's important to have enough data to get a statistically significant picture of your system. The amount and type of data, then, requires that you first have an idea of your trading approach. For example, I use a swing trading approach. I don't day trade. All of my trades are entered after hours, so I am perfectly fine getting end-of-day delayed data. I don't need one-minute charts updated real time.

You will need to know whether you are trading stocks, options, forex, futures, etc. in order to determine what kind of data you'll need. If you can get by with end-of-day data you'll save some money. Real time data is more expensive. Further, some data is just hard to get. For example, getting good, clean historical data on options is very difficult. There are a couple of sources, but they are expensive and there are still many errors.

How much? As mentioned above, we need enough data to be able to acquire a reliable test. There are many theories about how much that is, so I will speak in practical terms. I don't believe that tradeable markets demonstrate a standard distribution of data. They tend to have long tails, primarily as a result of longer term trends. Therefore, I believe you need enough data to demonstrate your approach in the face of several long and intermediate term trends, as well as sideways and choppy markets.

My approach is as follows: I use about 5 years of daily data to do my in-sample testing. Then I reserve another 5 years for out-of-sample testing. Third, after some final tweaking I use another set of about two years of data to do a secondary out-of-sample test. Finally, I begin to trade in real time with real money. You could probably get by with less data, but I figure more is better, especially if I see consistency. (By the way, if you don't know what in-sample or out-of-sample testing is, don't worry. I will explain it in a future post). In other words - to answer the "how much?" question - I like about 8 to 10 years for a swing trading approach. This enables me to test bear markets, bull markets and everything in between several times. I'm sure you could get by with much less, but usually data comes in 10 or 20 year packs, so it's no big deal. And with good software it's just as easy to test 10 years as it is to test 1 year.

What about quality? Data quality is extremely important, but I also think you can overdo it. On the one hand, if it's not accurate tests are invalid. It's also crucial to get split and dividend adjusted data, as unadjusted data can skew the results substantially. On the other hand, "survivor bias" might be overstated for short term trading systems. "Survivor bias" is the idea that current data will show better results because all of the companies that went out of business are no longer there. In other words, all the crappy stocks that eventually went to zero have been taken out.

I have tested with and without delisted stock data and have found the difference to be negligible. I suspect this is because of my extremely short holding period. I'm sure it would be a bigger deal if I had a longer term approach, but for me it's insignificant... so I saved the money and don't use the delisted data. The second reason this is probably not a big deal, is that I trade both long and short. These delisted companies would have actually helped me if I were in a short trade, so to some degree it balances. Regardless of why, what I know for sure is that with short holding periods I have not experienced a noticeable difference in performance using delisted data.

Who to use? There are lots of data providers, and probably the first that comes to mind is your broker. Many brokerage platforms allow you the ability to download data which can then be imported. Another easy and free source are the online providers like Yahoo! MSN, and Google. High end data providers like CSI will give you the other extreme. They provide lots of data beyond just stock data, including intraday data, futures, options, international stocks, and more.

I've chosen Quotes Plus as my data provider. The main reason is that they give me clean data at a decent price and it's tightly integrated with AmiBroker. It's also very fast. Each day I have to download about 8000 stock updates to scan for trades, and all that takes about one minute.

To summarize, let me simply reiterate how I applied this for my own trading:

  • I am clearly a short term trader, trading after-hours, and my systems are based on daily data.
  • I needed end-of-day daily data, and I wanted it to work well with Amibroker.
  • I wanted clean data, but didn't care so much about delisted data.
  • Amibroker has a good list of data providers, so I used there site to do some research... and there you go...
Good Trading...

No comments:

Post a Comment