In “What the hell is a variance matrix?” I talked about the basics of variance matrices and highlighted challenges for estimating them in finance. Here we look more deeply at the most popular estimation technique.
Models for variance matrices
The types of variance estimates that are used in finance can be classified as:
- Sample estimate
- Factor model
- Shrinkage estimate
The Ledoit-Wolf estimate is the leading example of a shrinkage estimate.
Dynamic estimates include multivariate garch estimates and relatives such as DCC.
Note that these are variance matrices of returns not prices.
What are factor models?
A factor model says that there is some number of drivers of the returns of the assets plus some idiosyncratic risk not associated with the drivers. Instead of being called “drivers”, they are called “factors”. Each asset may have its own set of sensitivities to the factors.
In matrix notation a factor model is:
V = B’FB + D
This notation hides a lot of details:
- V is a square of numbers (of size number-of-assets by number-of-assets).
- B is a rectangle of sensitivities of size number-of-factors by number-of-assets.
- F is the variance matrix of the factors (of size number-of-factors by number-of-factors).
- D is a diagonal matrix (all off-diagonal elements are zero) of the idiosyncratic variance of each asset. The total size of D is number-of-assets by number-of-assets, but there are only number-of-assets values that are not zero.
Why factor models?
Factor models serve two key uses:
- Ensure the estimate is positive definite
- Reduce noise
Positive definiteness is a technical condition. But it is highly practical. If the estimate were not positive definite, then there would be one or more portfolios that have zero estimated risk. Of course there are no such portfolios. A non-positive definite variance matrix can be very misleading, especially if it is given to an optimizer.
The second issue is noise. Suppose that your universe has 2000 assets. Then the variance will contain approximately 2 million unique numbers. If you have two years of daily data on your universe, then you only have 1 million numbers in your dataset. Obviously you need a bit of finesse when estimating those 2 million numbers. Factor models are one form of finesse.
Factor models do these two things, but we should still wonder if they do them well.
A very simple example of a factor model is the Capital Asset Pricing Model. Here there is just one factor — the market. The B matrix is just the betas of the assets, the F matrix is really just a number — the market variance, and D is filled with the residual variances of the assets.
I’m unlikely to say much nice about CAPM but this is one of those times. What happens when the CAPM factor model hits a volatile period? The market variance increases, which increases the size of off-diagonal elements of the variance. That is, the correlations will tend to increase.
So this very simple factor model exhibits one of the key features that actually happens.
Types of factor models
Factor models are often divided into:
A more operational classification is:
- factors estimated, sensitivities known
- factors known, sensitivities estimated
- factors estimated, sensitivities estimated
It is possible to have hybrids. There exist models that have all three types.
Both fundamental and macro models use linear regressions. The difference is the type of regressions. Fundamental models use cross sectional regressions. That is, pick one point in time and the observations are the collection of assets. Macro models use time series regressions: pick one asset and the observations are the history of returns for the asset (and the factors).
The time series regressions have an advantage given the way that factor models are used. We often care about the risk of portfolios. When aggregating up to a portfolio, the errors from the time series regressions get a chance to average out. That is not true of the errors from the cross-sectional regressions.
So why use cross-sectional regressions at all? Necessity.
When implementing a regression there are mainly two things to consider:
- What do you know?
- What is (relatively) constant?
If you want to use interest rates or oil price, then you know the history of the factors. You can also assume that the sensitivity of an asset to those factors is basically inherent and stable. So a time series regression makes sense.
Now suppose you are interested in momentum or book-to-price. What you know is the value (sensitivity) that each asset currently has for the factors. But these sensitivities are not static — they are market-driven. In the case of book-to-price there would be no point in looking at it if it were static. For factors like these the constancy that we assume is the market’s (current) pricing of the factor. That is, a cross-sectional regression.
Estimating macro models
Building a macro factor model is little more than performing a series of regressions. An R command to do the regressions (in an overly simple case) might look like:
> sens <- lm(asset.return.matrix ~ factor.changes.history)
This assumes that you want to use least squares regressions as opposed to robust regressions. Returns have long tails so robust regression is a reasonable idea. In the models that I built least squares was almost as good as the best robust regression tried. The best regression was very lightly robust (a Huber M-estimate). More profoundly robust regressions were substantially worse (out-of sample) than least squares.
You can get Huber M-estimates of regression in R with the rlm function in the MASS package.
We are assuming that the sensitivities are constant. That assumption is unlikely to be completely true. So it might be useful to give more weight in the regressions to more recent data. Some people use exponentially decreasing weights — I think that puts too much weight on the most recent data. In my experiments the best weighting scheme was linear decreasing weights. An R command to get those sorts of weights would be:
> weights <- seq(.5, 1.5, length=n.observations)
One more issue is how to treat categorizations of assets such as industry (and country if it is a multi-country model). One approach is to allow sensitivity to only one industry. A more ambitious approach is to allow sensitivity to more than one industry if those sensitivities are statistically significant (at some pre-chosen level).
Estimating statistical models
Some people use the term “implicit factor” rather than statistical factor.
The most common approach to building a statistical factor model is conceptually equivalent to the R command:
> facmod <- eigen(cor(return.matrix))
This is an eigen decomposition of the correlation matrix of the returns. Some number of eigenvectors are selected as the factor sensitivities. Choosing the number of factors is really the key decision in a statistical factor model. Too few factors means you are missing out on systematic risk. Too many factors means you are adding noise.
By the way: these factors are uncorrelated and have variance 1; hence their variance matrix F is the identity and so drops out of the calculations.
The rest of the construction process is really just some book-keeping to see how much idiosyncratic risk there is, and then scaling by the estimated volatilities.
For practical purposes, you want to ensure that the resulting variance matrix is substantially positive definite.
If you are using R, you don’t need to write your own function, you can use factor.model.stat which is in the BurStFin package (which still hasn’t migrated to CRAN).
> install.packages("BurStFin", repos="http://www.burns-stat.com/R")
Note that this package also contains an implementation of the Ledoit-Wolf shrinkage estimate.
Dealing with missing values
If you look at the definition of factor.model.stat, it is substantially more complicated than the explanation above suggests it should be. That is mostly because it handles the possibility of missing values.
In a statistics class you are never going to see a variable included in a variance matrix when there are no data for it. In finance that is a common demand. We want to have new assets in our risk models, often before the assets even trade at all. Stocks for Facebook and Twitter are currently on the horizon.
Of course the estimates for such assets will be less than perfect. We’re hoping for reasonable rather than perfect. The use that will be made of the variance should determine the assumptions used in making the estimates.
Since markets are always full of surprises, we can’t expect that the predicted volatility for a portfolio will always be a good approximation of the realized volatility. But better models will do better at putting different portfolios into the right order in terms of volatility.
Here is one way to test models:
- Pick a time point in the past.
- Generate a set of random portfolios that have constraints you care about.
- For each model and each random portfolio get the predicted volatility.
- For each random portfolio find the subsequent realized volatility.
- For each model calculate the correlation between predicted and realized volatility.
It is best to do this at several time points rather than just one. Higher correlations are better.
For evaluating risk models, it is common in practice to pay close attention to the ‘bias statistic,’ as explained in detail here:
Thanks for the comment and the link. That ‘bias statistic’ test is looking in a time series direction and really testing the responsiveness of the model to changes in volatility over time. That is very important for risk reporting and making sure that funds don’t break mandate requirements. However, it is mostly immaterial if the risk model is being used in an optimization.
In an optimization you are at one point in time but looking across portfolios that obey your constraints. If the risk model orders those portfolios in terms of volatility perfectly, then that is the most you can ask for. The test I talk about is testing for that use.
You should generally do a range of tests — it isn’t one versus another. Also your tests should be aimed at the use that the model will be put to.
I like the way you propose to test the models. But because in this case the focus is on predicting the covariance matrix, I suggest to use a deterministic portfolio, like the equally weighted, instead of a random one. This way you can avoid the variability of the portfolio weights.
On thebother hand, Have you tried your proposed backtesting with the mentioned models (shrinkage, factor, DCC, etc.)? If yes, which is the best?
My guess is that shrinkage estimators are the best ones…
I explain the rationale of the random portfolio test a little better in my comment immediately above this one. The problem with deteriministic portfolios is that you are limiting yourself to those specifics and you may be blinding yourself to problems that happen not to appear with your choice.
I haven’t done any serious testing. The Portfolio Probe User’s Manual (chapter 2) has some pictures comparing the default statistical factor model from ‘factor.model.stat’ with the default from ‘var.shrink.eqcov’ (the Ledoit-Wolf estimate). That universe was of US megacap stocks — so fairly homogenous. The two predictions are amazingly similar within the constraints used.
I’m with you that I’m guessing that Ledoit-Wolf is in general better, but I don’t know of convincing evidence.
My brother suggested I might like this website. He was entirely right. This publish actually made my day. You can not believe simply how so much time I had spent for this info! Thank you!
Pingback: Sensitivity of risk parity to variance differences | Portfolio Probe | Generate random portfolios. Fund management software by Burns Statistics
Pingback: The top 7 portfolio optimization problems | Portfolio Probe | Generate random portfolios. Fund management software by Burns Statistics
Pingback: The guts of a statistical factor model | Portfolio Probe | Generate random portfolios. Fund management software by Burns Statistics
Pingback: The quality of variance matrix estimation | Portfolio Probe | Generate random portfolios. Fund management software by Burns Statistics
hi, quick question factor models. I have seen when just using a single index model, say market, the covar matrix in the end is just =var of market*B’*B but the diagonal elements in the matrix are replaced by the variance of the assets. So the off diagonal elements have the correlations with the market or factor. Does this seem reasonable? thanks