Data
I saw on Hacker News a neat website posted called (http://www.sopaopera.org/). The comments stated some hypothesis, e.g. that donations of entertainment and internet companies predict the support or opposition of the SOPA bill.
The data is directly from sopaopera.org which itself aggregates it from various sites.
Graphs & Tests
After cleaning the data and importing it into Stata. I looked through it and plotted this box plot which shows how much contributions each group got by entertainment companies in comparison to entertainment and internet company contributions.
In case you don’t know how to read such plot. The thin bars indicate min and max values and the blue box indicates how many people are between the first and third quantile, i.e. 25% to 75% of the population. The line in the blue box shows the median.
You can see that the median for the opposition is about 35% contribution ratio in contrast to the 65% contribution ratio of the supporters. Afterwards, I wanted to test if this difference is significant. In fact, it is highly significant (95%, t = -4.73).
Furthermore, here’s a plot of absolute contributions log-transformed:
The next step is to do a logistic regression to check the prediction quality of each attribute. I regressed with age, party (is_democrat), seniority and quota of entertainment contributions (quota_ent). You can see the results:
------------------------------------------------------------------------------ support | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0258551 .0358136 0.72 0.470 -.0443382 .0960485 is_democrat | -1.252883 .6243361 -2.01 0.045 -2.476559 -.0292067 seniority | -.0262688 .0381962 -0.69 0.492 -.101132 .0485943 quota_ent | 5.839435 1.447732 4.03 0.000 3.001933 8.676938 _cons | -1.968467 2.01512 -0.98 0.329 -5.918029 1.981096 ------------------------------------------------------------------------------
We can see that is_democrat and quota_ent are significant not zero whereby quota_ent is the most significant. This isn’t so much of a surprise.