#6/25: Problem Solving: A statistician’s guide


  1. Do not attempt to analyse the data until you understand what is being measured and why. Find out whether there is any prior information about likely effects.
  2. Find out how the data were collected.
  3. Look at the structure of the data.
  4. The data then need to be carefully examined in an exploratory way, before attempting a more sophisticated analysis.
  5. Use your common sense at all times.
  6. Report the results in a clear, self-explanatory way.

Thus a statistician needs to understand the general principles involved in tackling statistical problems, and at some stage it is more important to study the strategy of problem solving rather than learn yet more techniques (which can always be looked up in a book).

  • What’s the objective? Which aim? What’s important and why?
  • How was the data selected? How is its quality?
  • How are the results used? Simple vs. complicated models
  • Check existing literature => can make the study redundant or helps to do a better data collection and don’t repeat fundamental errors


  • Test as much as possible in your collection, i.e. pretesting surveys, account for time effects, order of different studies, etc.
  • Getting the right sample size is often also difficult; sometimes it is too small, other times it is too large; esp. medical research often have rule of thumbs like 20 patients, instead of proper sizes => Tip: look for previous research
  • Try to iterative over and over again to make the study better
  • Learn by experience. Do studies by yourself. It’s often harder than you think, esp. random samples. E.g. selecting random pigs in a horde
  • Ancdote: Pregnant woman had to wait for 3h and therefore had a higher blood pressure -> Medical personnel thought that this blood pressure is constant and admitted her to a hospital.
    • Always check the environment of the study
  • Non-responses can say a lot, don’t ignore them
  • questionnaire design: important! Learn about halo effects, social desirability, moral effects, etc.
  • Always pretest with a pilot study, if possible
  • The human element is often the weakest factor
  • Try to find pitfalls in your study, like Randy James

phases of analysis:

  1. Look at data
  2. Formulate a sensible model
  3. Fit the model
  4. Check the fit
  5. Utilize the model and present conclusions

Whatever the situation, one overall message is that the analyst should not be tempted to rush into using a standard statistical technique without first having a careful look at the data.

model formulation:

  • Ask lots of questions and listen
  • Incorporate background theory
  • Look at the data
  • Experience and inspiration are important
  • trying many models is helpful, but can be dangerous; don’t select the best model based on the highest R^2 or such and offer different models in your paper
  • alternatively: use Bayesian approach for model selection

model validation:

  • Is model specification satisfactory?
  • How about random component?
  • A few influential observations?
  • important feature overlooked?
  • alternative models which are as good as the used model?
  • Then iterate, iterate, iterate

Initial examination of data (IDA)

  • data structure, how many variables? categorical/binary/continuous?
  • Useful to reduce dimensionality?
  • ordinal data -> coded as numerical or with dummies?
  • data cleaning: coding errors, OCR, etc.
  • data quality: collection, errors & outliers => eyeballing is very helpful, 5-point summaries
  • missings: MCAR, impute, EM Algorithm

descriptive statistics

  • for complete data set & interesting sub groups
  • 5-point summary, IQR, tables, graphs
  • Tufte’s lie factor = apparent size of effect shown in the graph / actual size of effect int he data
  • graphs: units, title, legend

data modification

  • test data transformation
  • estimating missings
  • adjust extreme values
  • create new variables
  • try box-cox transformation


  • significance tests are widely overused, esp. in medicine, biology and psychology.
  • Statistically significant effects not always interesting, esp. using big samples
  • non-significant not always the same as no difference, opposite of previous example
  • enforcement of significant levels, why five not four or one or whatever. This can lead to an publican bias.
  • Estimates are more important, because they communicate relationships
  • Often null hypothesis silly, e.g. water doesn’t affect growth of a plant
    • Better: Interesting resuls should be repeatable in general and under different conditions. (Nelder: significant sameness)

appropriate procedure

  • do more than just one type of analysis, e.g. parametric vs. non-parametric or robust
  • robust good methods better than optimal methods with lots of assumptions
  • don’t just use a method you’re familiar with just because you are familiar with it
  • think in different ways about the problem
  • be prepared to make ad hoc modifications
  • you cannot know everything
  • analysis is more than just fitting the model


  • assumed model is often more important than frequentest vs. Bayesian


  • learn your statistics software and a scientific programming language
  • learn using a library, google scholar, searching in general

statistical consulting

  • work with the people; statistics isn’t about numbers, it’s about people
  • understand the problem and the objective
  • ask lots of questions
  • be patient
  • bear in mind resource constraints
  • write in clear language


  • be skeptical
  • understand numbers
  • learn estimating
  • check dimensions
  • My book recommendation: Innummeracy
  • check silly statistics: e.g. mean outside of range
  • avoid graph without title and labels
  • don’t use linear regression for non-linear data
  • check assumptions, e.g. mult. regression: more variables than observations
  • my first time working with real data saw how different the process was
  • => Real work isn’t like your statistics 101 course; data is messy, you don’t have an unlimited amount of time or money
  • courses let you think that you got the data, look for your perfect model and you’re done – rather it is 70% searching for data & thinking about pitfalls, 25% cleaning up data and understanding it and about 5% doing the actual analysis

The second half of the book is filled with awesome exercises. I’d recommend everybody working with statistical techniques or working with data checking them out. They are insightful, interesting and stimulating. Furthermore, Chatfield shows that you can reveal insights with simple techniques.
Problem Solving: A statistician’s guide is a clear recommendation for everybody working with data on a daily basis, especially people with less than 2 to 5 years experience. I close with a quote of D. J. Finney: Don’t analyze numbers, analyze data.

#108/111: Punished by Rewards

This is one of the books where I just read the title and bought it. Recently, I talked with a friend about rewards and rules and we noticed that they often lead to out crawling from intrinsic motivations. He said “if I have to do something in 48 hours, I will take at least 48 hours – if I can choose my time freely, I probably will do it immediately.” You probably had similar experiences.

Some of these observations will be true. Alfie Kohen wrote lots of other books about schooling and the use of rewards, so this bit in the book is especially interesting.

His main objective is a critique of pop behaviorism, i.e. you have to give something to get something or equivalent with punishments. If I want the kids to learn about history, I have to get them grades. If I want my kids to eat healthier I have to reward them after eating. Or in business settings: If I want my employees to get three new accounts I have to pay them extra for each one. It’s so inherent in our thinking that it have to be challenged.

So what is Kohn saying about this? I read a great amount of studies and presented his findings. The first and most fundamental is that rewards often don’t work and sometimes they worsen the situation. There are some things to understand.

Firstly, rewards punish. A typical setting is some superior (teacher, boss, parent) who compliments you if you did something great. What is if your superior doesn’t compliment you? It’s basically punishment. Punishment and rewards each side of one coin. There are study that found that even compliments can be bad if they are linked to some objective. That’s important! Unexpected rewards sometimes are better than none but as long as you link it so a objective it basically become some form of punishment.

Secondly, rewards distorts your intentions. If you offer your kid a buck for each carrot she eats, she will eat more carrots because of the buck not because of the carrot. The eating of a carrot is the unpleasant thing to do to get the buck. You wanted to promote eating healthy food and instead promoted that healthy food is unpleasant.

Thirdly, rewards crowd out intrinsic motivation. There’s some kind of myth that you can add motivations, that is if you are intrinsic motivated and someone gives you money/praise/etc for doing this task that you will be even more motivated. Actually, motivation doesn’t work that way. If you are not motivated at all, then of course, extrinsic motivation motivates you to do the task. However, if you already are intrinsic motivated the extrinsic motivation can crowd your complete intrinsic motivation out and replace it with extrinsic motivation. This effect is rather famous in economics and studied in psychology.

You will probably think that extrinsic motivation isn’t good but we don’t have any alternatives. Kohn himself thinks that it’s hard because extrinsic motivators are so easy to create. Just throw some money in and you’re done. But there are alternatives which aren’t so easy to implement but have a less damaging effect.

The first one is collaboration. Work with your subordinate together to solve the problem or let him work with outer people. Alfie Kohn cites an interesting case where a mother went crazy because her child don’t wanted to go to sleep at 9pm. She tried nearly everything but she never tried to understand why her child don’t wanted to go to sleep. The same goes for pupils who come to repeatedly to late to school or unmotivated employees. Talk to them and help to solve them the problem. If you’re employee doesn’t like to work at your place then it’s probably the best for both of you that he looks for another job. It’s not the easy way but it does solve problems instead of treating symptoms.

Secondly, content is important. It’s rather easy in think about it in the schooling field. Don’t let kids learn things that are boring. For example, he talked about dates in history and I agree. The interesting thing about the Franz Ferdinand’s dead isn’t that he died on a Sunday or at June 28 but rather that this coincidence lead to the first World War. You can make probably most things interesting and you should!

The last one is choice. The more freedom you allow the more intrinsic motivated people will be. For example, he shows that for uninteresting work the best one can do is, to let people handle it the way they want. Even for interesting work this has a positive effect and the business literature begins to include it. We let people work from their home or they don’t have to be in office from 8 to 5 but rather just have to get some task done till some date. This exactly the choice which helps to increase people’s motivation.

This book got so many interesting studies in it that I recommend this book to nearly everyone but to everyone who is some form of authority: Parents, teacher, supervisors.

#100/111: Talent Is Overrated

What is it about?

How did Mozart become to great at composing music? Why does Tiger Woods rock the GPA world tour? Are they more talented that you and me? Geoff Colvin explores if talent matters and how you can achieve extraordinary achievements.

What can I learn?

Talent is overrated: Colvin cited some studies which showed that learning is the critical factor in achieving great performance. You have probably experienced this by yourself. There are kids which can read and calculate with 4 years but 10 years later they are mediocre in reading and calculating. Or the other way around. There were fellow pupils which really sucked in math and three years later they graduated as the best of the class. Generally, talent doesn’t matter. What matters is (deliberate) practice.

You decide: This was my main motivator about four years ago to change my life. Either you accept that you can change your life and that your are responsible for its outcomes or you accept that you are mostly influenced by other actions and can’t really control so much. If you choose the first option, you will be able to achieve extraordinary stuff. You don’t have to think about talent. You can just start and learn. That is, if you think there’s talent and it matters, than this book will be worthless for you.

Deliberate practice: Practice isn’t practice. The most effect turned out to be deliberate practice. This costs of instant feedback, is repeatable and you are focused on learning, i.e. there’s no automation of your actions. It’s quite easy if you think about learning an instrument. But how do you learn, let’s say about marketing? Colvin recommends different things. For once, you can take case studies in marketing, work through them and create solutions. Afterwards, you compare them with actual result. For most effect, you can work on it some months later when you forgot the actual result and do it again. Probably your solution will improve. An other way is to use simulations. There are a lot of business / marketing games out there, which can help you understand mechanics better. Furthermore, it always helps to read basic literature again. Work through marketing books which you read some years ago. Frankly, one thing I learned in this reading challenge is that there isn’t much new information about marketing/business. It’s just old ideas translated into a new medium, nothing else.


It frightening how much I agree with the book. Geoff Colvin did a great job in summing up various research and presenting it to the reader. Some guy named Dan decided about a year ago to try out an experiment. He will put 10,000 hours deliberate practice into golf and plans to become a professional golfer. The hardest part is persistence. Great job, great book. Recommendation!

#90/111: The C Programming Language

Basics, basics, basics. You could call K&R’s book one basic book which every programmer should read. Nowadays lots of new programmers think that Ruby on Rails is all you need. If you just build basic CRUD web apps this is probably true. However, if you want to understand what’s really going on in your web server, operation system or music player, you should know C. K&R isn’t a introduction course into C programming but it will help you to get better at it, if you have some experience in C / programming.

I really love this book, mostly for its cool exercises (implementing tail, memory management, etc.). Always a recommendation!