Back to index
BitTorrentBuilding a BitTorrent client from scratch in C#
BitTorrent History Components This Project Code Scope References Tools Testing BEncoding Decoding Encoding Torrents Structure Hashes Pieces & Blocks Setup Reading & Writing Verifying Importing & Exporting Creating Trackers Announcing Peers Structure Protocol Connections IO Message Types Encoding & Decoding Sending Messages Receiving Messages Client Threads Peers Throttle Uploads Downloads Program Further Research BitTorrentBitTorrent is a protocol for peer-to-peer file sharing. It allows users to directly share files with each other across the internet without any central server acting as a middleman.
In order to do this, the files are divided up into small regular-sized pieces. Each client or peer in the network can then either request a piece (if it is missing it) or send a piece (if another peer requests it). Peers can send and receive pieces simultaneously from multiple other peers until all peers have the complete file. A peer is called a seeder if it has pieces available to send out and a leecher if they are still requesting pieces.
The lack of a central server means that there is bandwidth costs of sharing content is reduced for the originator. Initially there will be a single seeder, however once other peers obtain the files they become seeders too. The protocol tends to favour more popular content. The more peers that want a file, the more peers there will be that have the file to share. Supply scales with demand. In this regard, it is also a more resilient method as the network becomes resistant to a system failure and does not have any single point of failure once there are multiple seeders.
Unpopular content can be difficult or slow to download if there are only a handful of seeders. Small files can be slower to download than from a traditional server as there is an certain amount of time overhead finding peers. The lack of a central server can also lead to a situation where all of the peers in the network are almost complete but all missing the same piece (although this should be rare due to the algorithms used to select pieces to request).
HistoryPeer-to-peer networks became mainstream with the creation of Napster in 1999 by Shawn Fanning. Napster maintained a centralised index of selected files on each user's computer and then allowed users to search for and download files directly from each other.
The BitTorrent protocol was created by Bram Cohen in 2001 and made various improvements on Napster system. It removed the single centralised index of files and replaced them with indexes that could be hosted by anyone (called trackers ). It also broke the files into pieces that could each be downloaded from different clients rather than downloading the entire file from a single peer (the combined pieces are verified using hashes). Later improvements in the protocol have removed the need for trackers completely.
Like Napster before it, BitTorrent was heavily used for illegal file sharing. Both have negative legal connotations. Unlike Napster however, the lack of a centralized authority has made it much more resilient to being shut down.
Today, the protocol is still used for file sharing amoung regular users but also for content delivery (various open source software projects as well as games), internal distribution of new code to servers (Facebook and Twitter). Other popular systems that make use of peer-to-peer networks are various cryptocurrencies (Bitcoin, Ethereum) and decentralized marketplaces (OpenBazaar).
ComponentsThe original specification outlined a number of components:
Torrent file : This is a small simple file that contain basic metadata about either a single file or a group of files that are included in the torrent. It specifies how the file should be broken up into pieces as well as which trackers the torrent is being tracked on. A tracker : This is a centralized server that maintains a list of torrents with a corresponding list of peers for each one. The most famous example is The Pirate Bay. Client : This program that can create or open existing torrent files. It connects to the specified trackers and starts either sending or receiving parts of the file as required. Some examples are Vuze, Transmission, uTorrent and Deluge.Recent additions to the protocol mean that neither torrent files or trackers are necessary anymore resulting in a complete removal of any centralization.
This Project CodeYou get the full code from this project on my github .
ScopeThe aim of this projects was to gain a better understanding of the technical details of the BitTorrent protocol. In terms of the depth of the research, I like to get to the point where I have a good idea of what I don't know. BitTorrent is actually a great topic as it covers a lot of different areas: HTTP, TCP, custom encodings, cryptographic hashing, file IO and (optionally) multi-threading. I used C# because it's the language I'm most familiar with. I essentially built version 1.0 of the protocol, however further additions are necessary before it would be able to be tested in the real world (seeFurther Research).
ReferencesYou can get the official specification but there are also some other references with more detail.
ToolsI used the following:
Xamarin Studio as my C# IDE Deluge as a BitTorrent client OpenTracker as a BitTorrent tracker ( Official website or on GitHub ) Ubuntu running on two separate VirtualBox instances for testing. TestingBefore we go digging around, let's do a test run with the existing software to make sure everything is working correctly. First let's create a new torrent file using our BitTorrent client. To use as a test file, I grabbed a few paragraphs of ipsum lorem and threw them into a text file. Make sure you use the IP address of the machine you're going to be running the tracker on when adding a tracker to your torrent. The default port for trackers is 6969.
My testing set up is shown below. I have a Terminal open ready to run the tracker on OS X and I have two VirtualBox machines running with Transmission open in both. Only one of the boxes has the actual underlying file we want to share. One other thing to note is that I had to set both network adapters on the virtual machines to Bridged mode (VirtualBox VM -> Machine -> Settings... -> Network -> Attached To).
Next, add the torrent to each of the clients in our two virtual machines. You should be able to see that one is trying to seed the file and the other is trying to download the file.
Finally, pause both of the clients and then start the tracker in the Terminal:
./opentrackerThe program doesn't require any arguments and starts running at http://localhost:6969/ by default. You can open http://localhost:6969/stats?mode=everything in your browser to double check it's running if you need to (it'll just spit out some xml output after a few seconds).
You'll probably need to pause and restart each of the clients after starting the tracker. After a few seconds the file should have been successfully copied to the second virtual machine.
Great! So now we know everything is working so we can go back to start and try to recreate some of this.
BEncodingLet's open up the torrent file we created earlier in a text editor:
d8:announce33:http://192.168.1.74:6969/announce7:comment17:Comment goes here10:created by25:Transmission/2.92 (14714)13:creation datei1460444420e8:encoding5:UTF-84:infod6:lengthi59616e4:name9:lorem.txt12:piece lengthi32768e6:pieces40:L@fR3K*Ez>_YS86"&p<6C{9G7:privatei0eeeIt's pretty ugly in there. It's clearly encoded in some unusual format. If we take a look at the spec we can see it uses a custom encoding system called BEncoding . Fortunately, it's pretty straight forward. There are only four types that can be encoded.
Strings : 8:announceThey start with the their length followed by : , followed by the string. They use the term "string" quite loosely here in some cases they are UTF-8 encoded strings while in other cases they are raw byte arrays (SHA1 hashes). We will store these as byte[] s because there is no way of knowning beforehand whether it's a UTF-8 string or not. In C#, a string can only contain valid Unicode characters. Using a string to store raw byte arrays can (and almost definitely will) result in a loss of data as any invalid Unicode will be irreversibly replaced by the replacement character U+FFFD (). Note that the length value specific