Explaination vs. Prediction

Again, some older post I had lying around. Nonetheless, the topic is still prevalent.

I recently read a great paper named To Explain or To Predict? by Galit Shmueli. She explains the differences between the “old-school” explanatory statistics and predictive statistics. I saw lots of her observations by myself.

That means predictions are often regarded as unscientific and therefore there’s a bit of a lack of good literature – lately the situation became better with the uprising of machine learning.
Nonetheless, most students don’t learn how to make predictions and you see how people use R^2 to validate models.

Sure, there are some departments that teach how to predict but they are still in the minority. Of course, there’s this other trend with Big Data. I’m personally not really excited by Big Data rather by data at all.

More Info: http://galitshmueli.com/explain-predict

I wrote this post more than 2 years ago. Now machine learning became some kind of commodity on a smaller level and something strange happened. Some of the people who work with data but didn’t learn good statistical techniques started to try to explain data which is pretty terrible. It even seems that they try to reinvent statistics. I read a post yesterday called Why big data is in trouble: they forgot about applied statistics which captured this pretty nicely.

The table at the bottom is just unbelievable. It lists different fields and the application of “big data” or “data science”. They also list that in 2012 they finally start to enter fields like biology, economics, engineering, etc. Which is more sad than hilarious. So yeah, I didn’t expect this turn.

Furthermore, I saw more and more “data science” boot camps / programs popping up. Still neglecting statistical foundations. Resulting in even more terrible studies. This trend will probably follow the Gartner Hype Cycle. As far as I can tell the peak is already reached, now it will begin to be disappointing and in a few years actually reach its plateau. Here the latest Hype Cycle from July 2013:


I see the term “prescriptive analytics” on there and just looked it up. It’s astonishing that people reinvent new terms for so much stuff and it still works. Even stuff like business intelligence is basic statistics, then came predictive analysis (still statistics), data science (hey statistics), now prescriptive analytics (still statistics).

I just have to quote one of my favorite quotes on this topic:

Someone (can’t recall the source, sorry) recently defined “data scientist” as “a data analyst who lives in California.” —baconner

Economics of Angel Investing

After writing the last post I thought a bit about the further development of innovation. What would be if we could predict successful companies (ideas) with high probability? That would allow to reallocate human capital faster and thus leading to more successes.
And one idea is a prediction market for startups. You may think that sites like AngelList go in this direction. I’m not entirely sure about this. I will first take a view on Angel Investing from an economic standpoint.

Angle Investor and Entrepreneur
This is a typical Principal-agent problem. The Entrepreneur has more information than the Investor. Often happens exactly what you will expect and that is that the Investor will look for commitments of the Entrepreneur which reveal information about his private information. Examples for these commitments are quitting one’s job, using one’s own money for funding or buying an expensive domain. Furthermore, of course, they try to grasp personal attributes of the Entrepreneur and his team.

Angel Investor and other Angel Investors
This stage is more interesting. Let’s say that our Entrepreneur got his first funding from one Angel Investor. Depending on the status of the first Angel Investor there are two different scenarios.
Firstly, assume the first Angel Investor isn’t famous. The next Angel Investor will probably see that this investment could possible be profitable but he will with a high probability go to stage one again and use his own judgment.
Secondly, now assume that the first Angel Investor is famous, a top notch one. The second Angel Investor will probably trust the judgment of the first Angel Investor so much that he will skip the first stage or neglect some flaws that he found. This is herding and leads to incorrect pricing and maybe a bubble.

We could either try to make these investments anonymously but this would be impractical. However, we could at least correct the pricing allowing short selling.
This all sounds like a stock exchange and they have a similar function, i.e. funding companies.

However, I think there’s one problem of stock exchanges for Angel Investment and that is that the expectations of the participants are different. Some Investors want 2x exists, some 5x exists, some want it in the next two years other in the next five. This is totally OK if we use these mechanisms for allocating capital.

Yet, the goal is to predict future successes and here I think prediction markets are more suitable because there are clear goals. E.g. “Company X will reach 5m in sales by 31 December, 2016.” Prediction markets do these things really good. One of the biggest problems will be liquidity which can be partially solved using aggregation or even better attracting more people to the market.