Factor models are heavily used in finance to create variance matrices. Here’s why.
- Provide non-degenerate estimates
- Save space
- Quantify sources of risk
First off, what does this mean?
The technical term is that you want your estimate of the variance matrix to be positive definite. In practical terms what that means is that all portfolios have positive volatility according to the estimate. Optimizers would be very pleased to find a portfolio with negative volatilty — we don’t want to give them the opportunity.
Suppose you have a universe of 1000 stocks. If you want to estimate the variance of their returns (note: returns not prices), then you need more than 1000 observations to get a positive definite estimate using the sample variance. Probably 2000 observations would be the minimum you’d want to use. That is 8 years of daily data (and if your universe is global, then daily data will have asynchrony problems).
Factor models always produce positive definite estimates (well, possibly they’ll require a nudge here and there).
A big universe means that the variance matrix is really big. If the universe has 1000 assets, then the variance matrix is going to take roughly 8 megabytes. Years ago that was a problem — as in “no way Jose”.
If the universe has 20,000 assets, then the variance matrix takes about 3 gigabytes. That need not be at all problematic now.
The issue of space was a key driver in factor models being adopted. We should reconsider the merits of factor models now that computers are massively bigger and there are alternative estimators.
Sources of risk
Factor models assume the fiction that there is a set of ghosts that drive the relationship of the returns in the universe. Fiction is sometimes enlightening, sometimes not.
There are three major classes of factor models:
Statistical factor models are mute about the sources of risk because both the factors and the sensitivities to the factors are estimated. The ghosts are anonymous.
In the other two classes, the ghosts have names — names like: “Energy sector”, “Value”, “Momentum”, “Short-term interest rates”.
This is the unique proposition that factor models hold. If what you want is names on partitions of your risk, then you want a (non-statistical) factor model.
Conversely, if you don’t care about named partitions, then you should at least consider alternatives. I hypothesize that it is suboptimal to use factor models for optimization.
The leading alternative to factor models is shrinkage models. Start with the sample variance matrix and then shrink towards some target. The model I know of that makes the most sense to me is to shrink towards equal correlation. That is commonly known as the Ledoit-Wolf shrinkage estimate.
There is an R implementation of the Ledoit-Wolf estimate (as well as a statistical factor model) in the BurStFin package.
Today we have naming of parts. Japonica
Glistens like coral in all of the neighbouring gardens,
And today we have naming of parts.
from “Lessons of the War: Naming of Parts” by Henry Reed