The Scots Law Student

The SLS : Life and trials of learning law in Scotland

Tag: details

File formats and the pirate bay

The Pirate Bay is a great source of material for blog posting. Oddly enough this isn’t about the issue of, you know, their big court case. This is actually about their rather entertaining “Legal Threats” page. The Pirate Bay has (had?) a policy whereby if you found someone had posted a torrent with your copyrighted material on the Pirate Bay tracker / search engine you could write to the Pirate Bay and they… will promptly ignore it. Or they’ll send you a cheeky reply.

They post the letters they get on this page. Generally what they have are copies of emails which are very simply the plain text listings of the emails, generally with lots of lawyerly signatures including the words of “STRICTLY CONFIDENTIAL” etc. However, one of the documents is interesting because it’s a PDF. The Pirate Bay took this and replied back with a 1 megabyte image in .BMP format which looked a lot like this:

Pirate Bay message

“I can use annoying formats too” they say. But is PDF annoying? I’m not so sure.

With my techie hat on I know that the best form to find text in is simple, human readable plain text, the sort of thing you’d get if you typed it in Notepad. It’s just the words, you can do anything with it, you can copy and paste it into any other program and every computer can interpret it in such a way as to let you see it on any computer you can find. However, with my (law) student hat on I happen to really like the not so humble Portable Document Format.

What is PDF?

It’s probably worth talking about what PDF is by comparing it to the other options for text.

1) Plain text

Examples, created by: Notepad, Text Editor

Pares everything down to the words themselves. There is no option for formatting, fonts, colours, pages, anything. All you do is type a long sheet of contigous text. The great thing is the sheer efficiency of what you produce. The document provides all the substantive content of the fancier formats but without messing with formatting issues.

Pros

  1. Very lightweight
  2. Easily transferred
  3. Easily modified in many different programs running on many different systems
  4. Easily adapted into other forms, not burdened by extra code put in for formats etc.

Cons

  1. No formatting, at all. Need to use things like *bold* or /italic/ to distinguish formatting
  2. No diagrams. It’s possible to do using letters and symbols but no chance for images in the text
  3. Can be hard to set out – things like footnoting and tables of contents pretty much need to be set out by hand in the vast majority of plain text editors.
  4. Can be very elegant, can be very crude.

2) Rich text

Examples, created by: MS Word, OpenOffice

Pros

  1. Most common kind of text – every web page and every Word document are rich text.
  2. Allows visible formatting – select text and make it bold, italic etc. Allows fonts
  3. Allows image imbedding, depending on the specific format this can be within the file itself (eg, Word documents) or through referencing (eg web pages)
  4. Can be very feature rich – templates, automated footnoting, automated table of conents etc are all possible.

Cons

  1. Extra features means compatibility suffers. Documents created in MS Word may have compatibility issues when opened in slightly different programs, eg. OpenOffice, Word Perfect, Abiword.
  2. Although you can choose various fonts for your documents these fonts will only appear on other people’s computers if they also have the same fonts installed. If they don’t they’ll see a fallback option which you may not have chosen. There are ways around this.
  3. Will not look the same on every computer, settings will vary and the resulting document can be affected.

3) Image

Examples, created by: Paint, Photoshop

I might surprise some people by including this option here but I really do think that image formats are a real option (of sorts) for conveying text on a computer. The flexibility that allows the same picture format to contain a picture a funny cat or a world famous old master also allows it to hold the shape of words.

Pros:

  1. Document looks exactly as it did on your computer for everyone
  2. Very easily shared between users – every modern computer can understand the common picture formats, so no need for specialist software to view it.
  3. Very, very good for diagrams. Will look exactly as intended, allows full colour and photorealistic images to included directly with the text.
  4. Very flexible layout – not bound by justification or layout tags, can put elements in anywhere on the document

Cons

  1. Very big files for email etc (the Pirate Bay image was 1 megabyte for 7 words)
  2. Can be hard to edit, and editing it well requires specialist software that’s hard to use
  3. Can be hard to add extra pages
  4. Not actually text – only an image so can’t be copied and manipulated like a text document

4) Device Independent formats

Examples, created by: Acrobat, Foxit, TeX

Pros

  1. Will look the same on every computer (is device independent). Designed to be transferred between computers
  2. Allows you to rely on page, line numbers because it is identical to each user
  3. Allows direct embedding of images, allows for diagrams to be laid in text exactly where intended by the creator
  4. Is still text, so can be copied and pasted as text. Possible to also have original image as well as text, for example if scanning a book, in the same document
  5. Can be pretty immutable, so provides quite a good historical reference. (eg, harder to edit a PDF report from Westlaw than an RTF)

Cons

  1. Can be “annoying” – that is if you’re browsing the internet and you come across a PDF document your browser will need to load an external reader.
  2. Can be expensive. PDF is officially created by Acrobat and that is not cheap. On the other hand DVI,free PDF and so on are open-source and can be produced by many different formats.
  3. Can be pretty immutable, it can be difficult to just change something in a PDF document.

Now, if I point you to 4ii) I think I will show you a huge reason to like PDF (and other device independent formats). The reason here is to look at the ability to rely on the page numbers – so that useful summation of a case’s ratio at the bottom of page 4 is at the bottom of page 4, on everyone’s computer.

I can’t really understand why you would email someone a PDF version of a letter instead of writing your message in the email itself. I find that strange but I don’t think that means that the format is annoying. Feel free to use these formats in your own workflow. They’re good.

Just plain, old fashioned detective work

Part of the reasoning behind my love of typewriters stems from the fact that they can be used, so conclusively, as articles of evidence – this is bad for any criminal enterprises I might have planned but is wonderful from a sense of personality and uniqueness and that’s something I really quite like. The case that I think of most fondly is the Ian Frazier (The Atlantic November 1997) article “Typewriter Man” which included a wonderful anecdote about a single key:

Mrs. Tytell tapped her clear-lacquered fingernail on a key in the upper right-hand corner of the keyboard. The key had a plus sign on top and an equal sign below. “This key on this particular kind of typewriter was the deciding piece of evidence in a multi-million-dollar fraud case I worked on a few years ago,” she said. “A younger son of a wealthy man had been specifically excluded from inheriting some theaters the father had owned. An assignment document, typewritten and with the father’s signature, gave the theaters to the older sons instead. The younger son was twelve when his father died, and he always felt that his father wouldn’t have done that to him, because his father used to take him to these theaters all the time. The younger son grew up and became a lawyer and pursued this question, and finally he came to me with the assignment document, and I found that it was typed on an Underwood of this particular model and year. The assignment document had no plus or equal signs on it, but I was able to prove that the machine that had typed it also typed other documents that did have those signs, and that was the clincher. Underwood didn’t add that particular key to their keyboard until well after the document in question was supposed to have been signed. When I explained all this to the lawyer for the older brothers, he said, ‘So what?’ A few weeks later they settled out of court for a lot of money.”
The Atlantic

There are also stories from the days of the manual telegraph of individuals being identified by their “fist” – the subtle differences in how individual senders use their particular equipment but there is not much in terms of personality in the output from a typical printer. The nearest that happens today is that some laser printers leave some coloured dots on the paper that refer to the serial number of that printer – it’s useful for cases of fraud, ransom demands etc but there’s a huge issue of personal privacy for those times when the printer isn’t used to commit a crime but instead, say, is used to print off a primary school book report. That’s a whole different issue though. There is a method of looking up which computer posted particular methods by comparing the IP address of that particular poster with the records of the ISP that provided the internet connection – it’s the easiest method because the ISP generally possesses a real life name to bill their customers every month.

There’s a vast difference in my eyes between an expert with a loupe identifying the faint wear marks of a typewriter key on a stack of paper documents or a trained ear picking out the subtle differences in pace and pressure involved in using a telegraph key and an expert reading off a faint barcode printed between the lines on a page and cross referencing to a long chart of other codes. It’s also not nearly as interesting as a piece of sleuthing and that’s a sad change, although a much, much cheaper alternative to paying an expert with the technical skill to identify the faint, non scientific details that distinguish each typewriter and telegraph key to a legal standard of proof.

I thought the first sort of sleuthing has died out to be replaced entirely with the second kind – this sort of database and spreadsheet lookup. It’s more efficient but it’s not necessarily nicer, I quite like the old style of doing things.

That’s not necessarily the case – just as recently as early January 2009 there was a signed letter, reportedly written by the late Bob Hayes, which was read out by his sister which talked about his feelings, on 29 October 1999, which may have contributed to his death on 18 September 2002. The letter, which was photographed extensively, was not checked for its faint barcode or a property of the printer that printed it, but its typeface. The letter was typed in Calibri, a font which was invented in 2003 for its use as an internal Microsoft typeface which was then released to the world as part of Microsoft Office 2007, I think it’s a great font and I type a lot of my personal work in it. The problem is that there’s no way that Hayes could have typed out a letter in that style (Calibri) in 1999, it also couldn’t have been stored on disc and printed off on a copy of Microsoft Office 2007 because it was signed. How then could it come to have been printed in this typeface? For more details see Dallas News

Is it a forgery? I can’t say for sure based on the evidence I currently know of, but it’s quite a stretch to say that a blank page was signed and combined with a floppy disc or a CD and then printed off with the text of the letter when Microsoft released a program which contained the right font rather than someone else, who was still alive after 2002, used their new word processor’s default font on their new computer to type a letter and then signed it “Bob Hayes”. Occam’s razor says the simplest answer is usually right, is it in this case?

The letter in question - signed and typed

The letter in question - signed and typed

Blogging in extreme circumstances

While the content of the scotslawstudent is not something particularly worrying to me, and I’m known to write this in as insecure places as on the bus there are other situations where reporting is heavily repressed and there are few ways for the rest of the world to be informed about what is going on in the area – this is seen in the problems in Zimbabwe currently where everything including spending US dollars has been outlawed – and it can be that the only way that news gets out can be from the eyewitness accounts recorded by people using blog software.

My case

The scotslawstudent is an anonymous blog – I chose this to allow me to post on topics without being immediately identifiable. It means that I can post my grades etc without fear of being mocked in public. It’s hosted at wordpress.com which is a free blog host. Believe it or not this wasn’t a cost saving move, WordPress.com provides its blogs with a great deal of bandwidth which is more than I would be able to get from other anonymous hosts. I operate a full web site under a group name which uses WordPress software so I’m familiar with it and that made me go for WordPress.com

The registration details of the site point to scotslawstudent and the emails link to scotslawstudent@googlemail.com, my handle for this blog, which does not reveal my identity to people with access to the details I gave at sign up for the blog or the email. This keeps my identity safe from people giving the site a rudimentary look over.

I don’t feel that my privacy is under attack enough to take more steps than simply not giving my name out in the posts and when signing up so I stopped at this point. However, for some people, the information they are posting may be very much more important and sensitive than the musings about my education that I do. For this there are many other modes and means of posting to the Internet.

Further steps

Watch what you post

Admittedly, this isn’t entirely something which I don’t do. It’s mainly common sense – if you know a secret and no one else except, for example, your boss knows it you will become an obvious suspect if the secret appears on the Internet. If you fail to keep your posting anonymous – perhaps to the point of using your name or initials, then you have severely limited the amount of trust you can put on your own anonymity.

Use a web cafe

Mainly the issue that a wannabe secret blogger will need is to keep himself away from the blog which he is posting to. For example, in many ways the best way to do this is to use busy web cafes and to pay in cash. This will mean that the post will come from a public place and there will, potentially, be a lot of noise on the line to cover the post. Any court orders to identify the poster will only trace back to the IP of the web cafe, and if the web cafe uses a NAT router to split its connection among many systems may even only trace back to the single connection with no opportunity to trace to the individual terminals. The Chinese government, for one, has realised this and is now implementing measures to photograph all users of web cafes as they sit down. I think that’s fairly dodgy sounding particularly given the use that anonymous web cafes can be to people who want to get their message out. There are caveats on using a web cafe, not least if the user also drops by personally identifying material – though I doubt many whistleblowers will take advantage of the opportunity to google their names or the like – but even merely checking their own emails can be enough to allow someone who is supervising the web cafe to track a poster down. Timing is a useful method of identifying users of a computer system – I’ve used it myself to identify misuse of a school’s computer systems – and if the web cafe is empty but one at the time a whistleblower posts a scorching piece of evidence onto a blog from an IP address belonging to that then it’s elementary to connect you to the post.

In an oppressive regime this could land a blogger in a great deal of trouble. The UK currently has not touched bloggers but the US, all of places, has taken huge measures to stop the actions of corporate whistleblowers and there are notorious breaches of what we Europeans would call basic human rights around the world, even in countries which are not traditionally considered to be the Third World.

Use an anonymous proxy

Another way to get yourself some distance from the post is to use a proxy server. A proxy is a piece of network hardware which bounces (technically retransmits) anything which it receives, applying its own identification to the message in the process. This means that a user who posts to a blog from a proxy will be recorded as posting to the blog as if they were sitting at a keyboard at the proxy server.

Proxy servers can be searched for just like any other resource on the Internet – with Google. A search for “proxy server” will reveal long lists of computers which can be connected to by changing some settings in your browser. These servers are dotted around the world and as any Scot who receives a form letter from English solicitors knows the act of simply pointing out a jurisdictional difference is enough to stop casual pursuers in their tracks. It is generally a very good idea, if you are posting from a country under a repressive regime, to choose a proxy well away from your location to increase the difficulty in getting your details. Generally, although this is not an absolute tip, a proxy registered as anonymous will be safe enough to use from home.

Use Tor

The natural evolution of this idea of putting a far away computer between a poster and the blog is to put more than one computer between the poster and the blog, and to vary the computers that are used. This is done automatically by using a system called The Onion Router, this is a remarkable system which is nearly entirely volunteer run. In this case the poster connects to the network of individual computers around the world which then route the message between themselves. As long as the message goes through a number of different computers in different countries the original poster is nigh unrecoverable without extremely high amounts of work. The amount of traffic which goes through the network also helps itself in creating a large “haystack” for someone to search through to track down a particular poster.

A poster needs to be very careful that they are actually routing their data through the Tor system, for example this involves installing the Tor system and configuring any applications which the poster wants to keep anonymous. This can be as simple as activating a Firefox extension or configuring the programs individually. Tor has the disadvantage of being quite slow and low bandwidth, it’s far more than needed for web browsing or email or chat but not enough for downloading large files, which is also unfair to the network as a whole, remember that downloading files over Tor ties up resources which can be used for vitally important tasks.

Use encryption

There are two main ways to use encryption when blogging – the first is to encrypt the messages you transmit and the second is to encrypt the transmission itself.

Text encryption is not a new phenomenon, Julius Caesar was apparently the first person to use it on a large scale, and computers are extremely good at it. Currently the general standard is RSA encryption which uses the public key system – actual usage of this is fairly technical and beyond the scope of this article.

Encrypting the transmission itself is known as “end to end” encryption, and this is most commonly associated with the little padlock you see at the bottom of the page when you go to Amazon. However, it can be used for much more than protecting your details. It can also be used to protect your message from being eavesdropped on and it’s this mode which is very useful. It can be used to protect the entire connection between a poster and a blog or it can be used to protect the connection between a poster and a mailbox, used to get a message out of a country where it can be posted along the free parts of the Internet. There are various ways to do this, ranging from SSH tunnelling to just activating a “secure mode” on a webmail service. It depends on the care the user involved is taking.

Use hushmail

Hushmail.com is a high security webmail service using public key encryption – so basically a combination of hotmail and PGP, this means that the system keeps email on the system encrypted. It’s not got the same attitude to protecting its users and will respond to court orders with apparently docile acceptance, so not strong enough protection to keep a poster safe from government action, although the difficulty of a foreign government getting one in the relevant jurisdiction could keep someone safe from action for a foreseeable period of time.

Use mail redirects

A good way to hide a post is to email it around the internet, this is the equivalent difference of dropping a letter into a post box or hand delivering it. When a letter that’s been posted arrives at the recipient the sender no longer posses that letter and the person who has to make the delivery is an unrelated postman whereas if the sender hand delivers it they have taken a journey, potentially out of the ordinary, and have kept possession of the letter all the way through the process. If the police, for example, spot the sender as he arrives to deliver the letter then they have a great of information about him, whereas if the letter was posted then the observer only knows the identity of the postman.

Mail redirects work by sending a message between various computers on the Internet, it means that the message is bounced between countries and at each leg of the journey the identity of the two computers is the only identifying information which the message contains – at the end of the journey the message does not reveal any information about its source except the final computer to send it on. Obviously all the jumps mean it’s a slow process and it’s quite risky, some emails never actually do reach their destination and disappear on one of the hops but it’s very private if it’s done correctly.

Legacy note – invisiblog.com

There used to be a blog system online which worked from the very simple foundation of mail redirection and public key encryption. To sign up to invisiblog.com a user would simply email the site with their public key and then this would start a blog titled with the last 16 digits of their key. As long as the key was specifically created for the new blog there could be no way to track the poster down with the information provided. The blog is then posted to by emailing messages to the site, again using mail redirection, signed with the same key. The blog could be read by any one with a web browser but even the owners of the blog only knew the public key of the person who was writing the posts in the first place.

This comes undone should the poster’s computer be seized which would reveal the private key of the poster, and therefore create a massive presumption that the owner of the computer was the one making the posts. However, generally very few methods will withstand the physical seizure of the equipment used to make the posts and it should be considered to be a very difficult stage of affairs to get past.

This site has since stopped operating since 2005 and there does not appear to be a similar system running on the Internet at this time. I would be interested in seeing a system like it running in future even though I personally have no use for it.

Super technical note – use other ways of connecting

Optical fibre has changed the way the world works nowadays and connecting the continents together with it has made the world seem a great deal smaller. However, although it is the main way of transmitting data in the world today does not make it the only one. There are many, many ways to communicate in the world – letter, phone to name two.

In some cases it may be best to simply get the information out of the country and have it published from elsewhere. This is particularly easy if the person lives near to a border. The real reach of the government may well extend beyond the physical borders of the country, but as far as international law is concerned it is strictly limited to within the geographic limits.

In extreme circumstances, potentially, if a person comes across sensitive information and has a radio operator licence and know people outside of the country, not bound by the same repressive rules as he is, then it’s a work of an evening to send a message, encrypted or not, to that other person and to have them post it online for him. This clearly has its own, not inconsiderable worries and assumes a lot of other factors which may or may not exist in the situation.

The Electronic Frontier Foundation has written at great length on being a blogger in states where it is not a government supported action. It provides great detail on the methods of avoiding detection and how to blog about sensitive topics. Blogging in extreme situations is an extremely risky move and if the situation has deteriorated to the point where the only reporting in an area then things are very bad. It’s a conscientious decision and a very important one, hopefully one I will never have to make, in terms of keeping human rights alive in a region where they cannot be checked up on.

None of this advice is provided personally and should not be expected to provide protection from a repressive regime. It is a fact that much more care will be needed when trying to avoid the attention of a regime than in theory on a blog. Although these notes are useful to people wishing to abuse the systems mentioned this is not the purpose of the article and the use of these techniques in illegal activity is not supported by neither the author of this blog, nor the people who design and run the services provided. Services which are provided for the benefit of humanity to provide those in difficult situations a safe and protected way to broadcast their experiences simply should not be misused by those seeking to gain advantage. The design of many of the services is such that it does not really benefit anyone in carrying out illegal activity.

This post was adapted from materials provided by the EFF and personal experience.

http://w2.eff.org/Privacy/Anonymity/blog-anonymously.php

http://www.torproject.org/

http://www.hushmail.com/

http://www.gnupg.org/