A few months ago I wondered how BitTorrent deals with resource allocation. I didn’t find so much and I thought about that myself. I think that is just a market. But let’s start at the beginning.
What is BitTorrent?
BitTorrent is a Peer-to-Peer protocol for file sharing. File sharing means down- and uploading files and peer-to-peer means that you don’t connect to a (dedicated) server but to some other person in the network.
However, it neglects one important actor in BitTorrent and that is trackers. If you want to read the BitTorrent specs.
What about trackers?
morehawes.co.uk created an excellent simple overview about BitTorrent and especially trackers. I also took this illustration from there:I quote from the site:
its main role is allow peers to ‘find each other’ and start communication, i.e. to find peers with the data they require. Peers know nothing of each other until a response is received from the tracker. Whenever a peer contacts the tracker, it reports which pieces of a file they have. That way, when another peer queries the tracker, it can provide a random list of peers who are participating in the torrent, and have the required piece.
Which information is available?
Generally we know which pieces (data) a client has and which he needs. In economic terms we have supply and demand.
How is this handled?
There are two main methods used currently. The client requests a random piece or the rarest piece.
Using market mechanism to allocate resources
At the first sight it looks like a market. On the second it doesn’t because you don’t lose your goods if you give it to another (upload one piece). On the third look it looks like a market again because the same applies to digital goods.
So, we have actors in our market and goods (pieces of data). We know that the first actor alone on the market has all data.
What is the objective?
This is a really important question to ask. What do we want? The great thing is that we work with machines (clients) instead of people which means we can build a mechanism which assumes rationality to an extremely high level.
I think the most important objective is distribution. We want a file as quickly distributed and available as possible. So we have two measurements of success:
- How many actors (client) do have the complete file after some time period?
- How much data was transferred after some time period?
Why do we need two? The first one helps us to measure the main objective, i.e. file distribution. The second one helps us to get rid of some artificial time period. Let’s say we have an allocation mechanism which allocates 99% of the data as fast as possible but waits until every client has 99%. In this case it would suck on measurement one but rock on measurement two. You can imagine the other way around.
Prices are subjective. Clients aren’t. So we have to build some which are subjective to some degree. Most interesting is the subjectivity in relation to needs.
For example, if there are 2000 clients at 95% in the network only you and the original uploader have a piece of data that is missing – all the other missing pieces are already distributed evenly. I think it’s pretty clear that they need this piece from you or the original uploader more than the other pieces.
This is our first factor which I call distribution level.
The other factor becomes evidently if we look at our measurements. We want speed. Again, it’s better that a client which has a high upload speed to get an important piece than one with a low upload speed. I name our second factor speed level.
Problem: No information about speed
Yeah, we don’t know the speed level and even if we knew how fast the internet connect is we don’t know how much a client can send. We have two options at this point:
- Throw away the speed level and work without it
- Try to estimate it
Let’s talk about the second option. We know which pieces a client requests, which ones he has, how much he uploaded and downloaded. We can estimate the network speed level. We could try to maximize this with experiments. We could work with assumptions (e.g. avg upload speed ~ 1/8 avg download speed) etc. Because this is more a thought experiment than a real implementation, I will just make speed levels up.
I will stop here and continue this post tomorrow or so.