December 19, 2023

Recreating the Bitcoin whitepaper (competently)

If you're not familiar with Craig Wright, the notorious Australian who for many years has claimed to be Bitcoin inventor Satoshi Nakamoto (a claim which has been widely and thoroughly debunked and discredited), first of all: congratulations. Consider skipping this article and happily moving on with your life.

If you're still here, you probably also know that a recurring part of Wright's claims is that he authored the Bitcoin whitepaper, and to corroborate this claim he has presented various supposed "drafts" of it. These "drafts" are usually just the whitepaper itself but edited to say Wright's name instead of Satoshi, usually with backdated metadata, and sometimes even printed out with coffee stains on them to look older. But they all have in common that they were rather incompetently created and the deception was quickly uncovered whenever subjected to forensic analysis.

This all raises a somewhat more important question though: what if a competent Faketoshi appeared, with forged documents that were much harder to debunk? How hard would it be to take the Bitcoin whitepaper and create credible precursors and/or source documents in support of some narrative?



Satoshi Nakamoto's whitepaper

On October 31, 2008, Satoshi Nakamoto (widely assumed to be a pseudonym for one or possibly several people) announced the Bitcoin whitepaper in an email to the Cryptography mailing list.

I've been working on a new electronic cash system that's fully
peer-to-peer, with no trusted third party.

The paper is available at:
http://www.bitcoin.org/bitcoin.pdf

This initial whitepaper later received minor revisions, with the final revision being made on March 24, 2009 (this is the most well-known version). Satoshi released the Bitcoin client and source code on January 9, 2009, with the Bitcoin blockchain's root Genesis block being dated January 3, 2009.

Satoshi described having first written the bulk of the code for Bitcoin, to convince himself that it would actually work, before writing and publishing the whitepaper for it. Satoshi also reached out to various people in the cryptography space for comments/review of the whitepaper draft prior to publication, including to Adam Back and subsequently to Wei Dai on August 22, 2008. This allows us to reasonably guess that the whitepaper was drafted during summer 2008, and it's commonly believed Satoshi began coding Bitcoin some time in 2007.

Inspecting its metadata, on its face Satoshi's whitepaper appears to be:

  • Created using OpenOffice Writer 2.4
  • Exported to PDF on October 3, 2008 (initial) / March 24, 2009 (final)
  • Exported in a time zone of UTC-7 (initial) / UTC-6 (final)
  • Authored in the en-GB (British English) locale

The contents and structure of the whitepaper PDF is also consistent with being created by OpenOffice, with all expected entries and values present and valid. In short, there's nothing that stands out that would cause us to second-guess our initial impression of how the whitepaper was created.

Craig Wright's whitepaper forgeries

Let's for a moment side step back to Craig Wright; how closely do his various produced whitepaper replicas align with the original so far?

In short: not at all.

The first generation of Wright's whitepaper replicas are just edited versions of the whitepaper PDF itself, such as the ones he posted to SSRN. See, Wright loves the edit feature in Adobe Acrobat. It makes it so easy to just open a PDF, position your text cursor, and simply delete and replace whatever text you want! And if you want the files to look like they're from many years ago, just change your system clock, or perform the operation inside a VM with a similarly altered clock, or just edit the data directly with a metadata editor; it's easy!

The only problem for wannabe forgers with this approach—which apparently world-renowned forensic expert Craig Wright was unaware of—is that it leaves pretty obvious and recognizable markers and artifacts in the edited PDF file, including a literal 'TouchUp_TextEdit' marker whose only purpose is to say "this content was edited". A ton of Wright's PDF files in the Kleiman lawsuit were debunked this way.

When simple edited PDFs weren't enough, Wright instead painstakingly pressed the "Convert to Word document" button in Acrobat (or used other online converters) to instantly produce a Word document roughly matching the whitepaper in terms of content, and then made whatever edits he wanted in that document instead. He'd then claim that this document was an "earlier draft" made in early 2008 or 2007 or whenever.

However, such converters almost never produce perfect results. They usually struggle with diagrams, you need matching fonts installed, and various other quirks. Since Wright doesn't seem to inspect the result for errors, these flaws are almost immediately discoverable. Plus, these files don't jibe with the whitepaper's apparent use of OpenOffice, adding one more inconsistency that would need to be explained.

Several of these forgeries were debunked in the Hodlonaut trial, including some innovative forensic work that allowed even a printed "whitepaper draft" to be debunked due to Wright's computer having substituted a missing font with a modern font, proving the document couldn't possibly predate the public whitepaper.

Wright has used these incredibly basic forgeries and replicas not only to bamboozle the non-technical people/marks he surrounds himself with, but in legal proceedings too. He submitted a copyright application for the whitepaper, and then inserted the URL of his copyright application as metadata into future edited whitepapers. He's sometimes not even hiding the fact that he edits and backdates documents, but claims they're just as good as originals because they're just replacements for originals that really did exist—pinky swear!—and you have to take his word for it since Wright also says he's autistic and that autistic people cannot lie.

It's absurd! It's not merely that Wright is not competent enough to create a good forgery; it's like he has zero regard for honesty in the first place, living in a world where lies are just as good as truths, and the only thing that really matters is who achieves their goals. If you view the world in those terms, then honesty is a handicap: while an honest person can only argue the truth, a liar is armed with an infinitely larger arsenal of lies.

Lawsuits

Our legal system and arguably society itself is built on a cornerstone of determining and agreeing on common truths. It is strange then that instead of holding Wright accountable our courts have thus far largely aided Wright, allowing the legal process to be abused as a weapon to silence and harass Wright's critics with protracted legal proceedings, while also serving as a permanent procession of pretrial Potemkin villages to keep Wright's investors and followers distracted.

In 2021, Wright filed a copyright claim against Cøbra, the current maintainer of bitcoin.org, over hosting the Bitcoin whitepaper. The UK court refused to let Cøbra defend himself at all unless he revealed his identity to Wright. Cøbra refused to unmask—recognizing that deanonymizing critics is in itself one of Wright's outspoken goals in order to further harass people—and the judge subsequently granted Wright a default judgment. Wright won a copyright claim without ever having to prove he was actually the author, over a document which as part of an MIT licensed project has always been freely publishable, and the whitepaper was taken down in the UK.

Wright reserves special ire for the Bitcoin Core developers who dare to ignore his rambling decrees as Lord Satoshi. He is suing them three times over, both over various aspects of intellectual property ownership in Bitcoin (which, again, is MIT licensed) and also demanding that they "modify" Bitcoin to "seize" the coins held in two randomly picked wealthy addresses (one of which is coins stolen from MtGox!) and "reassign" them directly to Wright, who insists that he's the real owner but gosh darn it the dog hackers ate his keys, but don't worry 'cause he's got more bad forgeries to prove it!

Preliminary issue trial

Several of these cases are now gearing up for a joint preliminary issue trial to first of all determine whether Craig Wright is Satoshi Nakamoto in the first place. And since he isn't, this is kind of a "Wile E. Coyote running over the cliff edge" moment for Wright. We don't yet know what kind of evidence Wright has submitted for this case, but judging from the filings it sounds he's resubmitted evidence from previous cases only for it to get even more destroyed here. And far from accepting his inevitable defeat gracefully, apparently Wright has prepared a third generation of never-before-seen whitepaper forgeries!

According to the most recently published documents in the case, Wright has apparently now adjusted his narrative to insist that even though the whitepaper looks just like an OpenOffice document in every way, and even though he's previously presented "drafts" in all manner of other formats, he now says it was actually authored in LaTeX — and presumably for some reason painstakingly made to look like an OpenOffice document.

Furthermore, it sounds like Wright intends to prove himself as Satoshi by means of uniquely possessing the specific LaTeX document which compiles exactly into Satoshi's whitepaper. Why, that sounds practically as unique and robust as a signed hash and its preimage! Case closed!

...Or, actually, apparently it doesn't quite compile exactly into the whitepaper. More like "materially identical", which I guess means "looks mostly the same unless you look closely"? Hmm. That actually doesn't sound particularly unique or robust.

Why would anyone think the whitepaper was authored in LaTeX? Well, Satoshi almost certainly intentionally formatted the whitepaper to be reminiscent of typical academic LaTeX articles. Most of the effect is just the use of the Century Schoolbook font (which is similar to LaTeX's Computer Modern font) for the headings, but unless you looked more closely, you could be forgiven for guessing it was LaTeX. I suspect there's a much more specific reason why Wright in particular, and against all evidence, insists it was LaTeX though: Wright's forgeries have been caught countless times due to embedded metadata, whereas LaTeX uses simple text files completely lacking metadata.

Another red flag is the mention that Wright's LaTeX whitepaper apparently only produces an approximation of the whitepaper. LaTeX is a highly stable format; if your file exactly compiled into the whitepaper a decade ago it generally ought to still exactly compile into the whitepaper today. But you know what does tend to produce approximations which are "materially identical" to but not quite "exactly" like the original? Easily accessible publicly available PDF-to-LaTeX conversion tools.

Turns out it's not particularly unique at all to be in possession of a LaTeX document that almost compiles into a certain PDF; even a rookie can do it with little effort! Not only that, even one of the lawyers was able to whip up a passable recreation by hand.

So while it might initially sound intuitive, the assertion that it's hard or impossible to reverse-engineer a source document from a PDF is actually generally false. Even if it may be a fair amount of work or even prohibitively hard to do manually, the existence of automated conversion tools makes it almost trivial to achieve a visually closely matching document.

However, since a PDF only contains low level drawing commands, it's hard or impossible for these tools to accurately recreate the original high level, human friendly context and style of the source document, and this makes them fairly easy to tell apart from real human-created documents. So the question should really be: is it possible and feasible to recreate a not only matching but plausible source document from the whitepaper PDF?

Making better forgeries

Jumping back to 2021, even before the travesty of the Cøbra ruling the Bitcoin community was taken aback by the brazenness of Wright's fraudulent copyright claim, and many websites began hosting copies of the whitepaper in solidarity with Cøbra.

I took it one step further and decided to discreetly make a point: https://wizsec.com/bitcoin.pdf is actually the world's first WPFaaS (WhitePaper-Forgery-as-a-Service)! Pass it a new name, email and url as query parameters and it will patch the whitepaper PDF to replace those three lines.

This little script took about an afternoon to code up, and as far as forgeries go its output is superior to any of Wright's edited PDFs: it only makes minimal edits to the PDF code, it doesn't inject any new embedded fonts or anything, and it keeps the metadata consistent. It does however intentionally inject an edit marker into the code, just in case someone gets the idea to actually use this to trick anyone. Additionally, it does seem to bug out on certain inputs...


Reverse-engineering the whitepaper

Wright is far from the only Faketoshi out there, and while fortunately they've all turned out not to be particularly competent (albeit Wright is extremely well funded), in another sense this has potentially left the Bitcoin space more vulnerable to future fakes, if we allow ourselves to get comfortable thinking it's always going to be this easy.

This thought kept bugging me especially after so easily making better whitepaper forgeries myself, so later in 2021 I began to test some of my own unspoken assumptions. Out of all the things the real Satoshi might use to demonstrate their identity, which actually have strong evidentiary value?

I had intuitively felt that (just like Wright now argues in court) having possession of a recognizable source document for the whitepaper was one of the things only the real Satoshi ought to be able to do, on the basis of it seeming pretty infeasible or at least an unreasonable amount of effort to precisely reconstruct a matching OpenOffice document from the PDF. But like Wright has shown, pretending to be Satoshi can be very lucrative, which would justify a lot of effort forging evidence. So is it really so infeasible? I started a small side project to find out.

Allowing myself to make some simple assumptions, I figured that someone like Satoshi would probably have a reasonably structured and clean but pragmatic approach when authoring the whitepaper. Further, I assumed that all of the metadata was accurate and unaltered, e.g. that it was indeed OpenOffice 2.4 that was used, and also guessed that Satoshi worked in the same or similar environment that they seemed to use for developing/compiling/testing Bitcoin, i.e. Windows XP.

I therefore spun up a Windows XP SP3 VM (although I think Satoshi used SP2), installed OpenOffice 2.4.2 (2.4.0 and 2.4.2 both identify as "OpenOffice 2.4" in PDF metadata), installed the Century Schoolbook font, and set about simply typing in the whitepaper text while trying to match the formatting as closely as possible.

It turns out this was already a promising start. It did not take much prodding at all to produce output that looked visually identical to the whitepaper PDF, just a bit of trial and error to match font sizes, page margins, and the occasional tweaking of spacing. Notably, very few settings needed to be changed from OpenOffice defaults, and the necessary changes were usually just a few simple tweaks that an author would plausibly make. This was a fairly reassuring confirmation that I was on the right track, and before long I had a first page that perfectly matched the whitepaper.

This was the easy part though. The first page is just discrete text with simple formatting; the rest of the whitepaper contains embedded math formulas and diagrams, and these could have been created in a near-infinite number of ways. Still, I kept going, hoping my guesses about Satoshi were right and would keep things a bit simpler.

The strongest confirmation that I was on the right track came when I inspected the resulting PDF more closely and compared it to Satoshi's whitepaper. See, PDF files are basically just little programs containing rendering commands for what to draw on the screen or page, and just like with computer source code there's a lot of flexibility in exactly how you write the instructions to achieve a particular result. What this means is that in practice, even for the same visual content different PDF software will output very distinct "flavors" of PDF code.

It became immediately obvious when inspecting the PDF code generated by OpenOffice that it's a clear match to what's in Satoshi's whitepaper. Indeed, very little or no coaxing at all was needed to achieve perfectly matching code. And considering OpenOffice uses an internal PDF writer rather than any external library, this "flavor" of PDF is basically unique to OpenOffice (and its later derived forks). It's like a fingerprint.

At this stage I was no longer content with just judging things by eye, so I coded up a script so that I could at any point quickly create a PDF export of the current document and have the script perform the same page-by-page comparison of PDF code against Satoshi's whitepaper, and display the results in human-readable form. This would act as my compass as I slowly worked my way forward.

The formulas, it turned out, were not too hard to recreate either. I assumed Satoshi used OpenOffice's built-in formula functionality (especially given that the whitepaper uses the OpenSymbol font, which is what OpenOffice uses for formulas and the like), so it was merely a matter of guessing which exact way Satoshi used to represent the formulas. The possible options are pretty finite, and before long I had almost matching output, with only tiny differences in the code. It turns out OpenOffice is pretty sensitive to exactly in which order you do things, and this can affect the final placement and layout of things like formulas and diagrams...

Speaking of diagrams, I dreaded trying to recreate them exactly point by point, but I couldn't put it off any further as they were the only major part left. However, as I sat looking at the whitepaper, I noticed that almost everything in the diagrams looks evenly spaced, and realized Satoshi had drawn them on a grid! Bless their pedantry! This vastly reduced the complexity of the task ahead, and I set about creating the first diagrams.

Concordantly with the observations so far, Satoshi likely used OpenOffice Draw for the diagrams, an assumption which seemed confirmed by readily available drawing primitives perfectly matching the appearance of the elements in Satoshi's diagrams. As soon as I found the right grid size, most drawing primitives produced the correct appearance with near-default settings, and recreating the first diagram became straight-forward.

The resulting diagram also perfectly reproduces a quirk of the whitepaper: arrow line segments are actually drawn with a slightly thicker line width than other lines. The specifications for all line widths are identical, so this seems to be a bug in OpenOffice and yet another fingerprint.

Not everything was smooth sailing though. First, Satoshi didn't align everything to the grid; some elements were nudged slightly sideways, either for aesthetic reasons or by accident, and recreating this exactly could be tricky. Luckily, it seems Satoshi just nudged these elements manually with the arrow keys rather than by hand with the mouse, and I soon found the right coordinates.

The next step would not be as easy to match. See, while the diagrams are all drawn with the same formatting and grid size, they're not actually embedded in the whitepaper itself at 100% scale, but scaled down by arbitrary amounts in order to look visually balanced. This scaling was probably done by hand by dragging the resize handles, and this turns out to be very hard to recreate exactly, as the exact resulting horizontal and vertical scale factors depend on the exact order of operations performed. So here began a slow process of manual trial and error, trying to solve a puzzle where I didn't even know the exact internal rules.

In September 2021, after having spent about a month on it all in all, my whitepaper OpenOffice recreation produced perfectly matching code for all content except for the scaling factors of two diagrams and one formula being slightly off. For example:

These remaining tiny visual differences are likely correctable; it's just a matter of fiddling until stumbling upon the exact input actions Satoshi performed. Likewise, there are tiny non-content differences in the overall PDF structure that are caused by things like in which order various parts of the whitepaper were created, but most of those too are possible to match up with a bit of additional effort.

At this point however, I feel I've already made the point I wanted to make: based on the available evidence, Satoshi definitely used OpenOffice to author the whitepaper, and contrary to intuition it's definitely doable for a third party to perfectly recreate the source document, so that should not be trusted as evidence of being Satoshi.


Update: With the bit of spare time afforded to me by the holidays I picked up this project again, and already by Christmas Day I was done; I now have an OpenOffice document which perfectly reproduces the PDF code for all nine pages of the whitepaper. I did say it was probably just a matter of fiddling a bit more, but had I known I was just an evening or two away I would have kept going back in 2021!

As for non-visual aspects of the PDF there are still minor differences at the binary level, but these parts are not controlled by the source document itself but are instead influenced by the environment in which OpenOffice is executing (e.g. clock time, installed fonts, user name, etc.), or are outright non-deterministic (like the order fonts are written in). I kept tweaking a bit to see if I could get them closer, but to be clear, these things would also differ if I started with Satoshi's actual OpenOffice document and re-exported another PDF from it, so they have very limited evidentiary value. I therefore now consider this proof of concept to be completed.

Update 2: While I was at it, I decided to recreate the original October 2008 version of the whitepaper as well (a.k.a. the whitepaper draft). It has exactly the same illustrations and formulas so it was mostly a matter of tweaking the text, though it does have a few minor formatting differences too. It also contains a mistake or two compared to the final whitepaper, entirely consistent with being a precursor. (This is in pretty stark contrast to Wright's claimed "drafts", which are usually heavily rewritten with synonyms, not unlike how plagiarists tend to attempt to "rewrite" stolen text.)


The PDFs produced by my recreations are available for download here:

https://wizsec.com/bitcoin_recreation.pdf (outdated)

Size: 184,273 bytes
MD5: b261c968a57a111629b24fa483c30a3e
SHA256: 9370d4a920d41a6f97ede41a01f917ed91b3709af98f1e8910943a0fd2273d67
https://wizsec.com/bitcoin_recreation_20240103.pdf

Size: 184,292 bytes (same as Satoshi's whitepaper)
MD5: 3de75e75565533597e459b522da992e4
SHA256: 4ed2af7234f8927b371c9317c0ea54a4c3f0e70200148024e4b0ea5741b70c43
https://wizsec.com/bitcoin_draft_recreation_20240103.pdf

Size: 183,697 bytes (same as Satoshi's draft whitepaper)
MD5: 5a9256c0319ac87b889cf2b8283abe71
SHA256: 858fe103076819dacc13729fc19183588a35a4908fbd05bb5af3b3b778645fdf

Q&A

Couldn't Satoshi have made a customized LaTeX document to look like OpenOffice?
Not really, no. Even though LaTeX is highly configurable and you might be able to get very close in terms of visual appearance, no amount of tweaking is to make it create perfect OpenOffice-flavored PDFs. Also, what would be the point? Making an OpenOffice document look a bit more "academic" makes sense; the other way around does not. At this point, the evidence is already conclusive enough that the LaTeX-masquerading-as-OpenOffice theory is little more than Last Thursdayism.

Why didn't you publish these findings earlier?
I was hoping I wouldn't ever have to. Even though as a researcher I'm naturally curious, I don't really think Satoshi's actions should be publicly put under a microscope like this, and this is why the bulk of my research is private. I was motivated merely to research a defense against possible future fraudsters, and while Wright is still not particularly competent, it seems he's now employing the exact scam I predicted. I therefore feel I have a responsibility to inform the public on the matter.

Will you publish the OpenOffice document you recreated?
Not currently, no. Even though the file itself is not enough (you still need to recreate the right environment etc.), it still massively lowers the bar in terms of effort required to create practically perfect forgeries, so having it loose in the wild would probably do more harm than good. Educating people so that they don't fall for forgeries like these comes first.

Who would get tricked by a Faketoshi with just a whitepaper?
It may seem laughable but you really shouldn't laugh. Social engineering and psychological manipulation are very real and quite powerful, especially if you underestimate them. Wright in particular employs a very successful combination of lots of casual forgeries (enough to make it seem implausible for them all to be fake), brazen assertiveness and abusive reversals to keep people off-balance. It doesn't take an idiot to fall for someone like Wright, and we're all vulnerable to cognitive biases. So consider Wright the practice round.

So is there some simple rule to weed out Faketoshis?
This is why everyone keeps insisting that anyone who wants to prove themselves as Satoshi will have to cryptographically sign with Satoshi's keys (e.g. the block reward addresses of the Genesis block or block 9). Private keys are literally designed to provide strong proof with mathematical certainty, whereas word processor documents are decidedly not. Even considering the possibility that private keys could be stolen, the simplest and best screening criteria we have is that all Satoshi claimants should be considered fake at least until they provide verifiable cryptographic proof.