News analytics

News analytics

Last week was the news analytics workshop at Birkbeck College.

The idea

There is room in news analytics for a large range of approaches. The leading model runs along the lines of:

something happens
a journalist (possibly a machine) creates a news item
the news item is captured, time-stamped and given an id
the news item is decoded (understood) using a linguistics algorithm
the decoding results in values for several data fields for one or more observations
the data fields are used in some sort of analysis

Suppose there is an article with the headline “Goldman initiates coverage of x with a sell”. This is going to result in at least two observations:

about x with high relevance to it and negative sentiment about it
about Goldman Sachs with low relevance and neutral sentiment

Complications

And thus, interesting.

capture

To catch even a small fraction of news is a big task. There are cases where some small universe is useful, but usually news analytics is thought of as involving a large universe of news.

One reason that news analytics is coming to the fore is that capturing lots of news is now feasible. At the workshop we heard a presentation from students at University College London where they are capturing a 10% random sample of tweets plus a large dictionary of twitter hashtags, and data from selected blogs, Facebook and LinkedIn as well.

linguistics

There needs to be a program that sucks in some text and makes sense of it. In our example headline it would have to understand that the main focus of the story is x (even though Goldman is the subject of the sentence), that x is a company and which one, and that the action is bad for x. That’s hard.

It can get worse. The phrase “yeah, right” can be said literally or sarcastically. The Sheldon Coopers of the world don’t understand sarcasm, and it is going to be a Sheldon Cooper or two who write the code to parse the text.

significance

Assuming we get the literal meaning of an article, what we really want is its meaning in the larger sense. The example news item will have less significance if it is the 23rd report of the fact rather than the first or second. There can be a data field that estimates the “novelty” of the piece of news.

High frequency use

The immediately obvious use for news analytics data is in high frequency trading: go short x before most everyone hears the news and starts selling. To the extent that it is informative, it becomes mandatory for high frequency traders to have.

A characteristic that seems to be quite robust is that there is more gain from knowing about bad news than good news.

Low frequency use

What I find much more intriguing is non-obvious uses for lower frequency trading.

One example is looking at the news flow of companies (how many news ids are about the company in some time frame) adjusted by their market cap. There seems to be circumstantial evidence that increasing news flow in a company or sector is an indicator of a bubble.

I see news analytics as an emerging new data source for quant modeling. I think it can alleviate quant overcrowding — for several years at least — because there are so many different ways the data can be used.

News sources

There are two big players:

These two companies sponsored the workshop, and provided the stars of the show: Peter Hafez from Ravenpack and Jacob Sisk from Thomson Reuters.

There are smaller players as well. One that I know of is Kulshan Capital who have a market sentiment indicator based on scraping data from selected sources.

Questions

What suppliers have I missed?

What uses have I missed?

Epilogue

And all I ever learned from love
Was how to shoot someone who outdrew ya

from “Hallelujah” by Leonard Cohen

Subscribe to the Portfolio Probe blog by Email

4 Responses to News analytics

Ron M says:

2011/12/12 at 23:16

I use Yahoo Pipes to aggregate news feeds http://pipes.yahoo.com/pipes/ Then I use Rapid Miner http://rapid-i.com/ to analyze the RSS results from the “Pipe”. Pretty good results realtime and all for free!

- Pat says:
  
  2011/12/13 at 12:12
  
  Ron,
  
  Do you care to give a hint of what your workflow might look like?
  
Jean-Robert says:

2011/12/13 at 09:01

Another promising supplier: https://www.recordedfuture.com/

- Pat says:
  
  2011/12/13 at 12:11
  
  Jean-Robert,
  
  Thanks — interesting.

Portfolio Probe

The idea

Complications

capture

linguistics

significance

High frequency use

Low frequency use

News sources

Questions

Epilogue

4 Responses to News analytics

Leave a Reply Cancel reply

Follow us using:

Newsletter Sign-up

Categories

Recent Posts

Popular Posts

Archives

Blogs to try