Factor models are heavily used in finance to create variance matrices. Here’s why.

Factor models:

- Provide non-degenerate estimates
- Save space
- Quantify sources of risk

## Non-degenerate estimates

First off, what does this mean?

The technical term is that you want your estimate of the variance matrix to be *positive definite*. In practical terms what that means is that all portfolios have positive volatility according to the estimate. Optimizers would be very pleased to find a portfolio with negative volatilty — we don’t want to give them the opportunity.

Suppose you have a universe of 1000 stocks. If you want to estimate the variance of their returns (note: returns not prices), then you need more than 1000 observations to get a positive definite estimate using the sample variance. Probably 2000 observations would be the minimum you’d want to use. That is 8 years of daily data (and if your universe is global, then daily data will have asynchrony problems).

Factor models always produce positive definite estimates (well, possibly they’ll require a nudge here and there).

## Space

A big universe means that the variance matrix is really big. If the universe has 1000 assets, then the variance matrix is going to take roughly 8 megabytes. Years ago that was a problem — as in “no way Jose”.

If the universe has 20,000 assets, then the variance matrix takes about 3 gigabytes. That need not be at all problematic now.

The issue of space was a key driver in factor models being adopted. We should reconsider the merits of factor models now that computers are massively bigger and there are alternative estimators.

## Sources of risk

Factor models assume the fiction that there is a set of ghosts that drive the relationship of the returns in the universe. Fiction is sometimes enlightening, sometimes not.

There are three major classes of factor models:

- fundamental
- macro
- statistical

Statistical factor models are mute about the sources of risk because both the factors and the sensitivities to the factors are estimated. The ghosts are anonymous.

In the other two classes, the ghosts have names — names like: “Energy sector”, “Value”, “Momentum”, “Short-term interest rates”.

This is the unique proposition that factor models hold. If what you want is names on partitions of your risk, then you want a (non-statistical) factor model.

Conversely, if you don’t care about named partitions, then you should at least consider alternatives. I hypothesize that it is suboptimal to use factor models for optimization.

## Alternatives

The leading alternative to factor models is shrinkage models. Start with the sample variance matrix and then shrink towards some target. The model I know of that makes the most sense to me is to shrink towards equal correlation. That is commonly known as the Ledoit-Wolf shrinkage estimate.

There is an R implementation of the Ledoit-Wolf estimate (as well as a statistical factor model) in the BurStFin package.

## Epilogue

Today we have naming of parts. Japonica

Glistens like coral in all of the neighbouring gardens,

And today we have naming of parts.

from “Lessons of the War: Naming of Parts” by Henry Reed

Thanks for a very informative post.Keeping in view the low latency mkt data implied volatility greeks feeds @ 4M/sec issued by ise.com ,I wonder what impact Time ( milli,micro,nano level leading to big data clouds )might play in space and quant analysis in times to come as trades might touch 8M/sec in about a years time.

Pingback: A variance campaign that failed | Portfolio Probe | Generate random portfolios. Fund management software by Burns Statistics

Pingback: The guts of a statistical factor model | Portfolio Probe | Generate random portfolios. Fund management software by Burns Statistics