Collaborative writing with storyline.io

Yesterday, I read about a great site called storyline.io. It’s a site for collaborative writing. That isn’t something new by itself but this site is well executed. You have 5 minutes for writing a paragraph and one minute for proof-reading it. What I especially love is that the community is still quite small but active. Also the amount of trolling seems to be very low. This leads to some pretty good stories. Here’s probably my favorite. It’s a comedy called The Activity Club. It’s only a few paragraphs long, so go ahead and read it. One problem is that stories don’t tend to end because there’s no length specified. So, one writer may want to bend the story to go to the end while the other tries to lengthen it. This could be a great addition, maybe with variable length?

A few years ago I used a similar site which wasn’t that well executed but was nonetheless fun. I think it’s a gentle way to get into writing because you don’t have to pressure to do all the work. Post a line and you’re done – just 6 minutes of writing, that’s it.

Core values of fortune 500 companies

A few blog posts ago I had the idea to compare the values of the fortune 500 companies. You will find the data at the end for free. Here’s how I done it.

Plan

My plan is simple. I need the core values of all Fortune 500 companies, thus I need their websites and a list of all their names. The official site features a list with subpages which feature their websites. Afterwards, I just need to check the pages for core values. Also I document these steps because a few people asked how I got the data from websites.

Let’s start crawling the URLs

I want to write a small crawler to get the URLs for each company. You can find all subpages easily on the initial HTML document, no need to load further sites or such. Let’s download it:

% wget "http://money.cnn.com/magazines/fortune/fortune500/2013/full_list/"

If you look at the code you can find that the subpages look like this:
<a href="/magazines/fortune/fortune500/2013/snapshots/54.html"> <a href="/magazines/fortune/fortune500/2013/snapshots/11719.html">

We can easily extract this URL. Generally, it’s better to use an HTML parser to extract these URL but in this case I just extract the URLs using regex. It’s sufficient for this task. If you work with data that isn’t that nicely structured or has a possibility of using special characters, use an HTML parser.

% egrep -o '<a href="(.*?\/2013/snapshots/[0-9]+\.html)">' index.html % egrep -o '</a><a href="(.*?\/2013/snapshots/[0-9]+\.html)">' index.html | wc -l 500

The regex is straight forward. If you have questions about it write in the comments. The second line counts the matches which is a good indication that this match was successful. Now I remove the clutter and build the final URL.

% egrep -o '<a href="(.*?\/2013/snapshots/[0-9]+\.html)">' index.html | sed 's/</a><a href="//' | sed 's/">//' | sed 's/^/http:\/\/money.cnn.com/' > urls

The regex is the same. Afterwards I remove the HTML tags with sed and put the domain at first and direct the results into a text file called urls. I’m pretty sure the sed part could be improved but it works and is fast.

Getting the websites

I always start of by looking at the pages I want to crawl to find structure. It looks like that every subpage has a line like this:

Website: <a href="http://www.fedex.com" target="_blank">www.fedex.com</a> Website: <a href="http://www.fanniemae.com" target="_blank">www.fanniemae.com</a> Website: <a href="http://www.owenscorning.com" target="_blank">www.owenscorning.com</a>

Let’s download all the subpages and look for ourselves. Remember the urls file? I create a new directory for all the files so it doesn’t clutter my working space up and download them:

% mkdir subpages % mv urls subpages % cd subpages % wget -w 1 -i urls

I limit wget to one download per second (-w 1) so that I don’t get throttled or banned. In the meantime I create the regex to test if this structure from above holds true and I want to get the company name separate:

% egrep -o 'Website: <a href="(.*?)" target="_blank">' * % egrep -o '(.*?) - Fortune 500' *

Again I counted the results and looked at them and they looked fine. I remove the clutter again and save the data.

% egrep -o 'Website: <a href="(.*?)" target="_blank">' * | sed 's/Website: </a><a href="//' | sed 's/" target="_blank">//' > websites % egrep -o '(.*?) - Fortune 500' * | sed 's///' | sed 's/ - Fortune 500//' > names

We need to merge these two files. I didn’t remove the file names for each grep, so that I can be sure that they got merged correctly which it did. The final line is:

% paste -d "\t" names websites | sed -E 's/[0-9]+\.html://g' > ../merged

Getting the core values

Now, I could get down and write a crawler who finds the appropriate pages (for example by googling) and extracts the values and all this stuff. But there’s a way which requires less effort. Crowd sourcing. I personally use CrowdFlower which is a great service and because amazon mechanical turk isn’t available in my country. I can use it though by proxy by using CrowdFlower.

Before I upload the file I clean it up. There where some errors in it, e.g. a comma instead of a dot in URL. Then I encased each site by quotes and replaced escaped / replaced characters like quotes. Afterwards I replaced the tabs by commas to make it a csv and added headers.

CrowdFlower offers templates for different jobs. I just created my own. You basically just write an instruction and then create your form. I collected the URL and core values / core beliefs.

The first time I worked with CrowdFlower it may take me 60 minutes to set the task up. Now it takes about 20 minutes. You can’t expect perfect results using crowd sourcing. Some people will limit their effort, other people are extremely diligent. But even if you work with other people you can’t expect perfect results.

Thus the fun part begins where I check the data. I won’t check every detail because this is just for a blog post and not for research purposes. Also, the next time I would change the design of the tasks a bit. But it only costs me about $60 (about 12c per company) and I get the results in less than 4 hours, so I don’t really care.

My initial design was to give the workers the company’s URL and let them find the core values / core believes. The next time I would link to Google with ‘site: “core values”‘ and vice versa with core beliefs. I found this out that some companies have values that only appear in pdfs of their annual report. I didn’t expect the works to look there. Thus, the data will be quite incomplete. Yet, this wasn’t really my initial goal.

What is your goal btw?

Good to talk about that. While I wrote the blog post mention above I thought about how all companies basically have the same values. I expect that some values are very common (>60% of all companies have them). And that there are very few companies, if one at all, who has a unique set of values.

Data cleaning

The fun part. You can download the data directly from CrowdFlower in csv or json. I use the csv file. Trying to import to excel doesn’t really work because excel doesn’t handle the multiline comments correctly. A simple solution is to use R and the xlsx package.

dat write.xlsx(dat, "answers.xls")

The import works pretty fine and even the characters aren’t fucked up. To make the text more readable I change the cells format to wrap text (alignment tab) and clean up the spreadsheet a bit.

I check a few of the entries and correct them, however I don’t try to achieve the highest accuracy but enough for a fun Sunday data project.

Now it’s time to categorize the values. There are various ways: crowd sourcing it, measuring the frequency of words to extract values and then categorize them, using a dictionary with values, etc. I just do it by hand. I took me about 3 hours to categorize all entries. Some responses of the workers were false. I wonder if they had problems understand looking for core values or they just didn’t care. There are quite a lot missing.

Somehow, I took the time to do them by hand. That was quite a lot of work (about 2 hours) but I’m quite happy.

Look at the data

Of the 500 companies I have data for 328 companies (n=328). I grouped them by 60 categories. You can download the data here: data.csv. It is a bit messed up (i.e. I somehow set at least a wrong x because there isn’t a company with diligence as value although there is one).

These are the most used values. Over half of the companies state integrity as their value. Customer focus is quite strong and excellence (32%). This is was I expected. Interesting was that only 2 companies stated effectiveness and 8 efficiency. However, a lot of companies talked about hard work. I’m personally more on the side of smart work but I’m not surprised.

Some of the lesser stated values were honor, objectivity and authenticity. Also there wasn’t a company with unique set of values.The data wasn’t that interesting. It could be interesting if you compare stated and lived values. Yet, I’m happy that I’m done. I started today at 9 a.m. and now I’m finished at 11 p.m. I relaxed a few hours but that was basically my project for today. Quite an effort for my initial question.

Assumptions are dangerous

I assume you will love this post so much that I will swim in gold. Thus, I will buy a boat.

Intentional vs. unintentional assumptions

Assumptions are necessary. You can’t live without them because you got no perfect information about the world. There is always some factor which is undiscovered or some uncertainty. Nothing special. You already knew this.

Yet there are basically two types of assumptions. The first one, is the good one, the one you make explicitly. For example, if someone asks you how many barbers are in your state. You start by making explicit assumptions like “I go about once a month to a barber, therefore ….”. These types of assumptions could also be called hypotheses. You can formulate testable statements. In the barber example, you may assume because you go once a month so does everybody on average. The hypothesis is then “Everybody goes once a month to a barber on average”. This is testable and you can reject it. You can take a survey or look up statistics or just sit in front of barbers all day counting the people.

The bad kinds are the unintentional assumptions. These can be abstractions you hold in your unconscious mind. For example, stuff you experienced in your childhood or just made up. Or they can be prejudices. For example, you assume that your fellow students in college won’t like you because your current classmates don’t like you. This can be dangerous. In every part of your life you can face these assumptions. They become especially dangerous if you don’t adjust them or don’t even think about them. This is a big part of cognitive psychology. You have to make your assumptions concrete to correct them.

Assumptions which aren’t necessary

Some assumptions are made although they aren’t necessary. Most often they come from fear. “I will be seem as dumb if I ask this” – that’s also an assumption. These can rank from not asking to repeat something if you didn’t understand it to life threatening events. If a doctor assumes that you don’t have allergies and injects drugs which are you allergic to, you have a good chance to die. I would call these kinds of assumptions: “Just ask”. It’s such a great and easy way to deal with these types of assumptions. They are everywhere. The girl who would reject you. Just ask. The company doesn’t need me. Just ask. My wife hates me for golfing. Just ask. It’s so easy to destroy these assumptions. Which is the first one you destroyed?

Assumptions can ruin everything

I talked a bit about this in the previous paragraphs. They can ruin everything. One of the most important steps is that you become aware of your assumptions. Then you can start thinking rational about them and reject them if they are wrong or adjust them. If you imagine how much better your life would be if you would ask and adjust your assumptions instead of hiding behind them. You may live with the love of your life, have your dream job and have the greatest friends possible. Not very likely I admit. Still, you can improve your (mental) life a lot if you start to ask instead of just assuming.

Assumptions and self-fulfilling prophecies

This step follows nicely. You assumptions are your core believes about something. People also have the tendency to have a conformation bias, i.e. they put more weight on things that strengthen their existing believes. If you assume that your neighbor’s child must be “bad” (whatever that means) because your neighbor is an asshole, you will start to see more bad than good behaviors of him. Continually, your assumptions will be stronger and your prophecy will fulfill yourself in your perception. If you are somebody who has a lot of power over somebody else (parents, teachers, bosses, etc.) your assumptions will influence your behavior which will then in turn influence the behavior of your protegé. There were multiple studies where teachers were given wrong information about the ability of their students. After some time, the students developed towards this wrong information. The assumptions teachers made became self-fulfilling prophecies.

Job search sucks

Holy cow, job search is inefficient. I read about some guy who was a CIO who searched for over 18 months to find his new job. Terrible.

How does the process look at a higher view?

People are looking for employment and companies are looking for means to get a job done. Now they just hook up together and everything’s fine, right? Not that easy.

Problem: Companies don’t know what they actually need

There are some (semi-)standardized jobs, which mostly no longer exist, where the need becomes quite clear. For most jobs, however, there’s a gap between the problems a company faces and possible solutions. Job titles don’t mean a lot so there’s no guarantee that software dev at company A is capable of solving the same problems as software dev at company B. Then you start to limit the position further using stuff that’s easy to measure. If something is easy to measure and you use it because it’s easy to measure then please don’t do anything. Just no, please stop. Your measurement has to be indicative to a high degree. My favorite examples are in the IT-field, mainly because this is where I have the most experience. There are people who do online marketing for 7 years who suck. Really suck. A devoted intelligent person can outperform these people in less than 8 months. You say, the first one has seen a lot? Great job, knowing how to do SEO for altavista.

What do they need?

I don’t know and they probably also don’t know. There are two problems here. If you don’t have expertise in the job field where you are looking for new employees or means to do them, generally, you don’t know what you need. “Hey, let’s hire a web designer because we need a new website because the old one is so slow.” – “What? I didn’t knew we got 24 bazillion page views per day. You don’t know how to scale use website? Aren’t you a web designer?”

Second problem. You have to make trade-offs. Either you can look for the perfect fit for the current job OR you look for someone a bit more general because you don’t know which problems appear tomorrow. Also, this doesn’t mean that someone can’t learn. Rather, learning is also an trade-off for both types of people.

What did you learn? Companies don’t know who they need and don’t know how to spot them. Great.

Problem: Job searchers don’t know what they can offer

So you worked 15 years in finance beginning as an analyst working your way up to the master of the universe. You fill up your CV with all your work experience in finance, you got to be awesome in it. I don’t know. Maybe, you just mediocre and you don’t really like the job but you would be awesome as a gardener. How knows? You probably don’t know if you are good as something if you never tried it. And secondly, even if you tried it you have to invest some time learning to find out if you fit a job or not. Going one time in the gym and not squatting 300lb doesn’t mean that you can’t be a great weightlifter. You got to train. Same applies to every endeavor: programming, maths, playing instruments, other sports, accounting, law, design, etc.

Face it. You may have talents you will never discover / you don’t know about. However, the more you think about the stuff you enjoy and you’re good at, and the more stuff you try, the more likely you will discover your talents.

Also, the whole need thing. Same applies for both parties. You have incomplete information of your needs and wishes and you can only discover them by experiencing. Surprise.

Problem: The application process

Hurray. You are looking for jobs. Have fun with coming up with synonyms. You may be a great UX designer. But you can’t just look up UX jobs, no. You have to look up user experience, maybe usability design, maybe just web design or screen design or app design or ….

Fun thing. If people don’t know the lingo (see companies don’t know what they need) there’s maybe the most awesome job you could ever have but you did never find because they just wrote they needed a webmaster.

Secondly. Those freaking filters. You can filter by experience, position, title and maybe salary. Great fucking job if there are 59 million results. How about filtering stuff people may care about, like team size, company size, values, management style, etc. Recently, I read a study that people care more about the environment they work than the actual job. Funny heh?

Now you that you have rummaged through 9452 job ads where about 6k where duplicates. You have limited your choice to about 90 companies. Have fun looking up each company in detail. Because they gonna ask you why you want to work for their company. Pro tip: Most companies are pretty mediocre and don’t stand out. People most often don’t care. They just want a nice work experience and for graduates, they just want a fucking job.

Cool, you are ready to write the application for about 30 jobs. Of course, you got to individualize them. Not for the position, no. For the company, so that they can feel good. Also, use our buggy job portal because we don’t care, we just want to use some HRM software so that we don’t ever have to look you in your eyes.

You made it. You got invited for an assessment center. Not just one, 9 of them. You got to take 2 days off, travel to some company and fight against your fellow applications to get the job. In the first presentation, they talk to you about the vision and mission statement. About their values: fairness, equality, excellence – you know the values everybody has but doesn’t live to (that gave me a nice idea for an upcoming blog post). And afterwards they observe you like vultures to rationalize the decision they made in the first 5 minutes meeting you. Then you fight against the others and nope, you didn’t get this job. Great stuff, you are going to repeat the same thing about 8 more times.

If you got lucky and don’t have to go to an assessment center then you are ready to go into a job interview. Great thing that you got 20 interviews lined up because you are going to repeat about 80 – 90% of the stuff you said. You can really start to get a routine doing it. What a waste of time.

Yeah, job search sucks. But I don’t want to let you hang there. Enjoy a useful educational video:

Great Job.