Last week was the news analytics workshop at Birkbeck College.
There is room in news analytics for a large range of approaches. The leading model runs along the lines of:
- something happens
- a journalist (possibly a machine) creates a news item
- the news item is captured, time-stamped and given an id
- the news item is decoded (understood) using a linguistics algorithm
- the decoding results in values for several data fields for one or more observations
- the data fields are used in some sort of analysis
Suppose there is an article with the headline “Goldman initiates coverage of x with a sell”. This is going to result in at least two observations:
- about x with high relevance to it and negative sentiment about it
- about Goldman Sachs with low relevance and neutral sentiment
And thus, interesting.
To catch even a small fraction of news is a big task. There are cases where some small universe is useful, but usually news analytics is thought of as involving a large universe of news.
One reason that news analytics is coming to the fore is that capturing lots of news is now feasible. At the workshop we heard a presentation from students at University College London where they are capturing a 10% random sample of tweets plus a large dictionary of twitter hashtags, and data from selected blogs, Facebook and LinkedIn as well.
There needs to be a program that sucks in some text and makes sense of it. In our example headline it would have to understand that the main focus of the story is x (even though Goldman is the subject of the sentence), that x is a company and which one, and that the action is bad for x. That’s hard.
It can get worse. The phrase “yeah, right” can be said literally or sarcastically. The Sheldon Coopers of the world don’t understand sarcasm, and it is going to be a Sheldon Cooper or two who write the code to parse the text.
Assuming we get the literal meaning of an article, what we really want is its meaning in the larger sense. The example news item will have less significance if it is the 23rd report of the fact rather than the first or second. There can be a data field that estimates the “novelty” of the piece of news.
High frequency use
The immediately obvious use for news analytics data is in high frequency trading: go short x before most everyone hears the news and starts selling. To the extent that it is informative, it becomes mandatory for high frequency traders to have.
A characteristic that seems to be quite robust is that there is more gain from knowing about bad news than good news.
Low frequency use
What I find much more intriguing is non-obvious uses for lower frequency trading.
One example is looking at the news flow of companies (how many news ids are about the company in some time frame) adjusted by their market cap. There seems to be circumstantial evidence that increasing news flow in a company or sector is an indicator of a bubble.
I see news analytics as an emerging new data source for quant modeling. I think it can alleviate quant overcrowding — for several years at least — because there are so many different ways the data can be used.
There are two big players:
These two companies sponsored the workshop, and provided the stars of the show: Peter Hafez from Ravenpack and Jacob Sisk from Thomson Reuters.
There are smaller players as well. One that I know of is Kulshan Capital who have a market sentiment indicator based on scraping data from selected sources.
What suppliers have I missed?
What uses have I missed?
And all I ever learned from love
Was how to shoot someone who outdrew ya
from “Hallelujah” by Leonard Cohen