in Back to PHP

My Way to PHP: Day 9 of 75

Day 9. I just published yesterday’s article and looked back at my first post. It’s just about a week ago but if eels like months. I learned so much in the past days it’s absolutely great.

In Mastering the SPL Library I’m on page 115 which is the start of the chapter about interfaces. My goal is 50 pages, i.e. 165 which is the end of the chapter about exceptions.

I really like the book so far. The biggest criticism is that the source code is grayish. If it would be just in black I would be fine and if it would have been highlighted it would have been great. Strange that phparch didn’t do that.


  • IS-A = interfaces
  • HAS-A = abstract class

Traversable Allows the use of foreach(). However, you can’t implement that yourself. You can use Iterator or IteratorAggregate.

IteratorAggregate Returns an iterator for an aggregation the object has. E.g. A school has students, so you iterate over them.

ArrayAccess Just like the name sounds, it provides array access to your object, i.e. you can access elements by their index (set and get).

Serializable You no longer use __sleep() and __wakeup()!

SeekableIterator Get random access.


In the intro of the chapter about exceptions there’s a beautiful graph showing all exceptions and how they inherit from each other. Nice!

There’s two main classes LogicException (‘compile’ time) and RuntimeException (runtime). It’s quite clear to which you have to look.

  • BadMethodCallException if you do some __call() magic use that
  • DomainException when something is against the domain’s rule (business rule)
  • InvalidArgumentException
  • RangeException when something is out of range

I think these 5 will be the most used ones. However, I’m going to check Github and look for results. I searched for “new language: PHP”.

[visualizer id=”3155″]


Ok, last chapter which is called miscellanous funcitonality.

  • iterator_apply() similar to array_walk() but for iterators (or anything that is a traversable)
  • iterator_to_array()
  • splt_object_hash() returns a 32-char hash. Problematic the hash changes from process to process. That’s why serialzing an object and unserialzing it doesn’t regenerate the same object!

There is also an implementation of Arrays as Objects. The idea is that you can work easier with references because Copy on Write doesn’t kick in.


This was a really good book! I’m quite relieved that after the fiasco from the other book, this book shined. It’s short, not fluffy, goes into depth and you can see that the author took time to write.


Reading-wise I want to read one last book on the internals of PHP. It’s called PHP Internals Book and it’s about the internals of PHP! I don’t know how many pages it has thus I will count the number of chapters which is 5!

After reading that book I’m okay with the knowledge about the intricacies of PHP.

The first chapter is about building PHP I will skimmed that mostly.

  • Extensions removed from the main PHP distribution will land in PECL
  • You can use pecl to install extensions (dynamically linked) without recompiling the PHP binary: pecl install $extension

Ok, the build chapter is over and now the chapter about zvals starts.

The strings are implemented with their length and a pointer. The problem is that the length is in bytes which leads to problems using unicode. Here’s an example and the correct way to do it:

echo strlen("♞"); // 3
echo mb_strlen("♞"); // 1

Imho, PHP should slowly default to mb_* functions. If you’re developing new code, it’s probably a good idea to work with unicode from the ground up.


The internal references used by zvals means how often something is used by something other (variable, array, function, etc.).

In case of PHP references &$foo the attribute is_ref__gc is set to 1 which explicitly tells the VM not to copy on write.


Bottleneck Analysis

User perception = reality!

He tells the story of Apple who uses screenshots to “restore” the screen after sleep. The users quickly sees something. I like the idea. For example, it would make sense for a website to first deliver some cached elements of the page, so that the user sees something quickly and then inject the more dynamic content.

Start off with checking the user time

You can use the typical network traffic analysis tools in Firefox, Chrome, etc. This gives you a good idea were the problems are.


There’s a tool called Boomerang which measures the speed of your site as experienced by the end-user. It returns the data in a URL access, so you can extract the data from your logs.


The next part is about web server optimization: Compression, buffers, network, i/o, etc. You can test all that with a static file.

For PHP use OpCache and Fast CGI.

XHProf can be used in product with sampling and aggregates the data which is pretty nice.

Errors can take a lot of time (thanks to writing the log). You should write error-free code nonetheless but for bigger applications it’s even a performance problem.

A good idea to set all errors to fatal. This will fix errors pretty fast.

Log slow queries in your database


Again, a talk in German. The title is Code Reviews – Leave your ego at the door. The slides are in English though!

  • Editing is standard in publishing although the writers are good but they know that they make errors
  • Code quality is about attitude. Be open to criticism and your quality will improve / you will learn
  • Reviews help sharing knowledge
  • No blame culture
  • Try to find problems not solutions
  • Review the critical code

Different types:

  • Adhoc: 5min problem solving
  • Peer desk check: Asynchronous and works for lots of code, some do 1 hour per day
  • Pair programming: Review on the fly
  • Walkthrough: Explains code, others ask questions, educational
  • Code Reading: some other person explains your code, you answer questions (this is a source material for comments)

The Architecture of Stack Overflow:

  • 560mio page views per months handled by 25 servers, developed by 5 devs

How is that possible?

  • Very simple system (around 1/10 of the average project)
  • The newest version is beta-tested on meta.stackoverflow.com by the users
  • Heavily cached (CDN/Browser, Server level, Redis, 384GB Ram whole DB cached in memory, SSDs)
  • Solve most problems at compile time
  • Have a great team

Something I have so say. I had around 240 talks queued up and look at the first 10% normally and then decide to watch it completely or not. There seems to be a strong correlation between presenters, topics and community. I enjoyed the ones from the IT security community for a long time and I never would have thought that there are such good talks around the PHP community. Kudos!


This talk is called *How Instagram.com works”

It was build as a Single Page App (SPA).

The biggest problem is the poor load performance. Even gzipped the complete JS for instagram.com is around 2.5mb. To get around that they just load the minimal JS required for each page (entrypoint).

They use a module system and then bundle modules which is optimized, so that each file can be cached which is reused! There’s also a front controller which asynchronously loads these bundles.


The next talk is called: Surviving a Prime Time TV Commercial

  • Expected visitors: up to 10k visitors within a minute or two
  • They used an ecommerce solution but rewrote the directly user-facing stuff.
  • Front-page, category pages, search etc. were rewritten
  • They hosted everything in the cloud (EC3)
  • Used Symfony as their framework – everything is bundled and decoupled
  • They stored all their data into Elasticsearch
  • Clever solution. As soon as you put something into your shopping cart (or login) you go into https mode and therefore on the old system. Otherwise you just browse in http on the new system
  • For loading they set a marker cookie if someone put something into the shopping cart, then they included the dynamically cart info otherwise just the cached version (ESI)
  • Outsource the tedious stuff: CDN, mail servers, hosting,

Jim Coplien and Bob Martin Debate TDD

Wow, I’m surprised but Bob Martin actually agreed that good architecture doesn’t just emerge if you did enough TDD.

I also like that Jim says that you don’t start without knowledge if you implement something – especially if you implemented that before.

I really liked the format.


I saw this talk a few days ago but never written my notes on it:

It’s called The Scams That Derail Programming, Motherfucker by Zed Shaw.

Generally, I like the talk because the criticism in the community against the consulting talk seems pretty small. On the other side, it’s clear that he has his own agenda to push.

There’s the saying: If you want the truth look at what the opposition has to say. I think it’s similar here.

Nonetheless, I can recommend watching it. There’s also a newer recording which however is a bit more self-censored.


You know, thinking about it. I don’t do much PM today but if I would I would look into the papers researching methods in software. I’m pretty sure that there are probably hundreds of papers about agile, TDD, software quality, etc.

On the other hand, I still hold the opinion, from what I’ve seen, that the individuals are more important than anything else (language, methodology, amount of computer screens, etc.).


I’ve written about coding bootcamps / dojos / etc. before and my opinion is that they basically recreate something that failed 15, 25, and 35 years ago.

Now I stumbled upon this post called About Coding House. It’s a long read but it’s absolutely insane.

It’s about a code FOO called coding house. People actually paid around $10-$15k to learn coding and get a job in a few months. This alone is insane (like I said) but the practices here are absolutely disastrous.

I won’t go into detail – read the post instead – but it was a machine to deceive.

This makes me fucking angry. I looked at some of the pictures he posted and projects of other people who paid a fuckload of money.

There’s so much wrong with all of that. Jesus Christ.

The guys of WhalePath interviewed us and were we humiliated since none of the students could answer the JavaScript questions. I remember the CTO telling me after the technical interview that we should ask for our money back.

Jesus.


To calm down a bit I’m going back to the book!

This chapter is about the implementation of hashtables in PHP.

I’ve written about that before but PHP handles hash collusion with linked lists. Actually, there are double linked lists. Also there’s an additional linked list which keeps track of the insertion order of the elements in the hash table.

Here’s how the bucket looks like:

typedef struct bucket {
    ulong h;
    uint nKeyLength;
    void *pData;
    void *pDataPtr;
    struct bucket *pListNext;
    struct bucket *pListLast;
    struct bucket *pNext;
    struct bucket *pLast;
    char *arKey;
} Bucket;

What’s interesting is that h holds the index if it’s an int otherwise the index / key is in arKey.

There’s no implicit conversion, that means that a hash table can have both 23 and "23" as keys.

Hash tables also don’t shrink! If the expand, they double however and rehash every time they do. The minimum size hash table is 8 which should work pretty good given the average PHP application.

Also every value is copied into the hashtable!


Neat, so here’s the hashing algorithm which PHP uses called DJBX33A:

static inline ulong zend_inline_hash_func(const char *arKey, uint nKeyLength)
{
    register ulong hash = 5381;

    for (uint i = 0; i < nKeyLength; ++i) {
        hash = hash * 33 + arKey[i];
    }

    return hash;
}

It’s very easy to understand so I won’t comment on it. However, there’s a neat way to generate keys that collide.

The first one is with integer keys. Normally, you create your hash and then apply a bit mask so that it fits into the hashtable. The starting size is 8, integers don’t get hashed, that means that you can generate keys which are a multiple of 8.

For string keys it’s a bit more complicated. Look at the algorithm again.

If we have a nKeyLength of 1 (which would mean that the loop just runs once), we get:

final_hash = 5381 * 33 + arKey[0];

that means that we only would get a collusion if we would use the same key. So we need at least one of two!

then we get:

final_hash = (5381 * 33 + arKey[0]) * 33 + arKey[1]

Let’s make that more readable and I will introduce two possible keys A and B with length 2. We want them to be equal.

(5381 * 33 + A[0]) * 33 + A[1] = (5381 * 33 + B[0]) * 33 + A[1]

This can be simplified of course:

33 * A[0] + A[1] = 33 * B[0] + B[1]

And you can quickly see, if we increment the characters with the multiple 33 by X, we have to decrement the other character by X * 33. If we take X = 1, we just increment the first by one and decrement the second by 33. And that’s it!


Doing Behavior-Driven Development (BDD) with Behat

The waterfall model was initially created to realize smaller projects. They should be bigger than one man projects but no longer than one year. Once again, people didn’t read the source material carefully.

Tests in TDD weren’t originally not tests but rather examples for an existing feature. Then TDD actually makes a ton of sense!

Dan North called this approach BDD.

Instead of writing tests, you write examples of how the system will work. And instead of refactoring you do design which makes sense.

A lot more sense. The focus is on examples. You talk with the client about examples on how the system should work and you work internal with examples on how to implement that system.

This makes so much sense. There are different kind of tests. The behat story type which is in the language of the customer. And the developer type which is unit tests – also in their language!

BDD is when you use examples in conversations to illustrate behavior.


  • 45% of the features are never used by anyone
  • 13% are often used
  • 7% are used always

He talks about talking with the customer. I can only agree I do the same for a few years now.

I repeat myself but I still can say that finding the root cause is one of the most important things you can do as a professional.


For each example ask:

  • Why would anyone want this feature?
  • Who benefits from that the most?
  • What does he need in order to benefit?

Take all examples and let the stakeholder prioritize them.

Now you can get into individual features and start the example process again. Communicate and write.

Scenario:

Given some state
When something happens
Then some result

Useful questions:

  • Is that the only result?
  • Is there are state when that doesn’t happen?

The language of the customer is the language of your code. Your scenarios written in your customer’s language give hints for classes and methods. The best thing is that your customer can help you to solve business problems.

Really excellent talk! Recommended!


And enough for today.


Updates Goals:

  • Get an overview of the standard library (SPL)
  • Learn the intricacies: how does the interpreter work, the OOP system, etc.
  • Learn about PHPUnit
  • Learn a bit more about legacy systems and how to handle them
  • Learn a bit more about MySQL
  • Learn Symfony2
  • Write at least one web app using Symfony2 and its core components (templating, testing, forms, validation, security, caching, i18n)
  • Get a bit more exposure to OOP and OOD
  • Watch one video per day on average

Progress status

Done

  • Mastering the SPL [done]

In Progress

  • Reading PHP Internals Book [4 of 5 chapters]
  • Watch one video per day on average [21 of 75]

Write a Comment

Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.