March 12, 2024

Whitepaper recreation proof package

In my previous post I described how I recreated the Bitcoin whitepaper as an exercise and an experiment to help contextualize and properly interpret potential claims by fraudsters. Specifically, since the whitepaper can be manually reverse-engineered from the PDF to a matching OpenOffice source document, possession of such source documents is not proof of being or having any relation to Satoshi Nakamoto.

In this post I'll disclose (part of) the recreated source document itself, along with instructions for independent verification of its validity. I realize that by calling it a "proof package" I'm evoking the notorious "Sartre blog post" which Craig Wright tried to pass off as proof he had signed a message with the block 9 key, but you'll find this proof package actually does what it says on the label; no deception involved.



Background

My rationale for recreating the whitepaper became validated as the years passed and Craig Wright's various claimed "drafts" of the whitepaper became slowly more elaborate, culminating in recent claims that he actually authored the whitepaper in LaTeX with a convoluted mix of various converters and plugins (any time there's an inconsistency, he just claims there was another step in his "workflow"). This is a particularly strange hill to die on, since it is quite straightforward to verify that the whitepaper PDF was created with OpenOffice and not with LaTeX.

After I published my findings, some Wright fans have started claiming they've validated Wright's LaTeX claims by creating their own recreations that perfectly recreate the whitepaper, with some accusing me of being wrong and/or lying about my findings.

First of all: hey geniuses, Wright's whole argument is that no one can recreate the whitepaper, so by trying to refute me by claiming to have done so, you're basically shooting at Wright. Second, it's pretty rich for a cult that basically fetishizes pathological mythomania to try to smear other people with accusations of dishonesty.

I hadn't planned on releasing my actual recreation unless it became necessary, since I figure it'll do more harm by empowering Wright and other fraudsters who would be able to copy my approach or simply the file itself to create superior forgeries, and I don't want to become an indirect accomplice to it being used to potentially trick more impressionable victims.

However, to dispel any doubts and put the other side's claims into perspective once and for all, I've opted to compromise by releasing part of my recreation: specifically, the first two pages of the whitepaper. This is enough for anyone to independently validate my findings, but is of little help to forgers since it's still a relatively significant effort to recreate the remaining seven pages.

A look at recent claims

A few weeks ago, I gave a heads-up on social media to some of the people claiming to have successful whitepaper recreations (including Wright himself) that they ought to timestamp their documents if they want to be able to later prove whatever progress they made was their own (and not copied from me).

To their credit, at least one person did. They claim their recreation is pixel perfect, but it actually isn't, so obviously the PDF code is different too (plus it's only the abstract and it's not positioned on the page correctly). Still, it's a more capable effort than any of the Wright recreations I've seen, so I fail to see why this person still simps for Wright; is this some kind of public application for a job as chief forging officer?

Others who claim to have LaTeX files perfectly producing the whitepaper are clearly lying and/or trolling; there are several aspects of PDF output that are simply inherent to the renderer used and can't be controlled by the input document, and these all perfectly match OpenOffice 2.4's renderer and are quite distinct from all LaTeX renderers.

During the trial Wright's barrister made the frankly laughable suggestion that Wright could have rewritten the LaTeX renderer to create OpenOffice-looking output. COPA's expert correctly scoffed at the idea as being a huge engineering undertaking for even the most skilled LaTeX experts in the world, amongst which Wright is most certainly not counted. More importantly, there is absolutely no reason to do such a thing; it is an absurdly contrived explanation whose sole raison d'ĂȘtre is because Wright refuses to admit he was wrong and has has lied his way into a corner.

Other suggestions like using a LaTeX-to-OpenOffice conversion tool to turn OpenOffice into a kind of renderer for LaTeX are only slightly less ridiculous as there's still no reason for such an impractical setup other than to try to explain away Wright's lies. None of the available converters (including the one Wright himself suggests he may have used) are capable of producing the kind of OpenOffice document required to in turn produce the whitepaper PDF, so this "explanation" merely translates to another instance of the same problem (i.e. "there's no LaTeX file that will produce the given PDF ODT file").

Enough. These people all talk big, but they have had their chance to substantiate their claims and predictably have not done so. In lieu of putting up perhaps they'll have the decency to do the idiomatic other thing. Meanwhile, here's my recreation.


Whitepaper recreation proof package


This will walk you through how to set up the required environment, generate a PDF from the provided OpenOffice document, and validate that its content precisely matches the public Bitcoin whitepaper.

Provided files

Required software

You have to generate the PDF on Windows XP SP2 or a reasonably contemporaneous Windows version. You can't use modern Windows or a different operating system, as this results in slightly different typesetting in OpenOffice, preventing a perfect match to the whitepaper. You'll presumably want to use a VM, so this guide will walk you through setting up a VirtualBox VM.

Suggested validation software

Validating the resulting PDF does not need to be done in a VM, and can be done in whatever way you prefer. The suggested method relies on standard command-line tools, which may need to be installed depending on your operating system.

  • A typical terminal window with a unix-like shell
    • Linux and MacOS work as-is
    • On Windows, msys2 (included with Git) or Cygwin will work
  • qpdf for inspecting/extracting parts of PDF files

Generating the PDF

  1. Install VirtualBox.

  2. Create a new virtual machine (Machine > New...) and install Windows XP SP2 by choosing the downloaded installation media as the ISO image.





      
      

  3. Install VM Guest Additions.

    1. Click Devices > Insert Guest Additions CD Image...
    2. Click Start > My Computer
    3. Double-click VirtualBox Guest Additions (D:)
    4. Go through the installation procedure
    5. Reboot the VM as instructed



  4. On your host computer, create a folder containing the provided files (the whitepaper recreation, the whitepaper fonts, and the OpenOffice installer).



  5. Add a shared folder to the VM pointing to this folder (Devices > Shared Folders) and mount it as the Z:\ drive.



  6. Open the shared folder on the VM (Start > My Computer > Z:) and double-click the OpenOffice installer to run it.







  7. Extract the necessary whitepaper fonts to a new subdirectory.




  8. Copy the Century Schoolbook Bold font into the system fonts directory (Start > Control Panel > Fonts) by drag-dropping it from the extracted folder.

    If you want to get the embedded fonts exactly right you'll also need to overwrite the system's Arial, Courier New and Times New Roman fonts with the provided versions. This requires you to first delete the existing system fonts before copying over the provided fonts. (It's also best to reboot the VM after this to ensure the new fonts are used.)






  9. Head back to the shared folder (Z:) and double-click the bitcoin_recreation_20240224.odt file to open it in OpenOffice. Since you're starting OpenOffice for the first time, you'll be asked to register a name. Leave the box empty.






  10. Click File > Export as PDF... and save the output as Z:\bitcoin_recreation_20240224.pdf (use the default export settings).





  11. At this point, feel free to just browse through the document for a bit and inspect that it's all pretty normal stuff. For example:

    • Enable View > Nonprinting Characters and verify that there are no non-standard interword spacings or any other funny "watermark" business going on.
    • Check that the paragraph styles are just normal stuff (save for Satoshi's use of slightly quirky font sizes), with no custom kerning or other styling to try to coerce the output in any weird way.
    • Double-click the illustration on page 2 to verify that it's a standard OpenOffice drawing with no weird styling. Try drawing additional elements and observe that the default appearance matches the whitepaper's style.






Validating the PDF

We'll be comparing the generated PDF to Satoshi Nakamoto's March 2009 version of the Bitcoin whitepaper, available at https://bitcoin.org/bitcoin.pdf (if you live in the UK where this URL is unavailable due to a legal abuse from a certain vexatious litigant, use a mirror). Download it to the same directory containing the other provided files.

The validation process can be done on whatever computer you prefer. In this example I'm using a Windows 11 machine.

  1. On your host machine, you'll find the exported PDF in the directory we set up earlier.
    Open it side by side with Satoshi's whitepaper. You should find it's identical to the first two pages of Satoshi's whitepaper (no matter how closely you zoom in).




  2. Open a terminal and navigate (cd) to the shared folder. The following qpdf command will show that the contents (PDF rendering programs) of pages 1 and 2 of Satoshi's whitepaper are stored in PDF objects 2 and 5:

    qpdf --show-pages bitcoin.pdf



  3. You can inspect the PDF rendering programs directly with the following commands:

    qpdf --show-object=2 --filtered-stream-data bitcoin.pdf
    qpdf --show-object=5 --filtered-stream-data bitcoin.pdf





  4. You can now re-run the same commands but with bitcoin_recreation_20240224.pdf instead of bitcoin.pdf and observe that the contents looks identical, but instead, let's more quickly verify it by calculating and comparing their hash digests:

    qpdf --show-object=2 --filtered-stream-data bitcoin.pdf | sha256sum
    qpdf --show-object=2 --filtered-stream-data bitcoin_recreation_20240224.pdf | sha256sum
    qpdf --show-object=5 --filtered-stream-data bitcoin.pdf | sha256sum
    qpdf --show-object=5 --filtered-stream-data bitcoin_recreation_20240224.pdf | sha256sum


    Or, comparing them both at once with a short shell script (type on a single line):

    for p in 2 5; do for f in bitcoin.pdf bitcoin_recreation_20240224.pdf; do qpdf --show-object=$p --filtered-stream-data $f | sha256sum | head -c -3; echo $f:$p; done; done



    The hash digests for the corresponding page contents should be identical between bitcoin.pdf and bitcoin_recreation_20240224.pdf, proving that the PDF render commands are identical.

Further explanation

PDF files are collections of render programs written in the PDF language, one program per page. These programs, together with any embedded resources like fonts or images, completely control and define the "contents" of a PDF. If two programs are identical and refer to identical resources, they by definition produce exactly the same output on the page.

In this truncated recreation, the embedded fonts are generally not identical to those of the public whitepaper. This is because the OpenOffice PDF renderer embeds only the subsets of font glyphs that are actually used in the document, so glyphs that only used on pages 3-9 are "missing" in this recreated PDF. In my full recreation the font embeddings are identical to the public whitepaper.

In care you're skeptical, note that even this truncated version contains the same set of glyphs from the Times New Roman Bold font as the full whitepaper (namely those of the string "Abstract."). This results in an embedded subset font with a SHA256 hash digest of 606daa3077a16a05d9c5ae6e95b6e674c6717a7683d8dc3e201525cbe8199761, present in the public whitepaper as PDF object 59.

If you inspect the exported PDF you'll find an object with the same hash digest, i.e. the exact same subset font (provided you replaced the system font with the appropriate font version). I can't point you to the exact object number because OpenOffice's font embedding order isn't entirely deterministic, so it can be any of the objects. When I performed this procedure, it happened to be object 13.

for p in {1..31}; do printf "%2d: " "$p"; qpdf --show-object=$p --filtered-stream-data bitcoin_recreation_20240224.pdf | sha256sum; done

In other words, as you approach a full recreation of the entire public whitepaper, the embedded fonts also become identical to the public whitepaper. This is observable in the full recreation PDF I published.

Note that it is practically not feasible to generate a PDF from OpenOffice that is a perfect byte-for-byte match to the entire public whitepaper PDF. This is partially due to non-deterministic aspects like font embedding order, but also because the embedded document ID is generated on the fly from the current execution environment upon export, including the current time with millisecond precision as well as a random temporary filename.

These are neither practical nor particularly meaningful to reproduce, as even Satoshi would get different values if they re-exported the PDF again from the same source document. Therefore, it is fair to say that a generated PDF that is identical to the public whitepaper save for these non-deterministic parts is for all meaningful intents and purposes a perfect recreation.

It is also worth pointing out that a perfect recreation is by no means guaranteed to be unique; you can easily achieve the same output from two different inputs by altering parts of the document that don't affect the output. If someone else made a perfect recreation from scratch it would almost certainly be different from my recreation. And if someone makes a recreation by copying mine, we'll probably be able to tell.

For context, here's a summary of my recreation (or rather, the full non-truncated version) according to how well its PDF output lines up with the whitepaper, and a comparison to Craig Wright's LaTeX recreation (based on what's become known from the ongoing trial):

Comparison to public whitepaper OpenOffice CSW LaTeX
Identical metadata No No1
Matching metadata except for timestamps/identifiers Yes No1
Structurally identical (PDF tree topology) Yes*
*Unpredictable font object IDs
No2
Matching PDF magic bytes (renderer specific) Yes No2
Matching PDF program style (renderer specific) Yes No2
Identical PDF programs Yes No3
Visually identical Yes No3
Visually similar Yes Yes
"Fudge free"
(Does it render the correct output reasonably "out of the box" with minimum tweaks and adjustments?)
Yes No4

Conclusion

As demonstrated by even this truncated recreation, the whitepaper PDF contents   can be (and is) perfectly produced by a fairly straightforward OpenOffice document. There is no need for conspicuous styling or spacing adjustments to unnaturally coerce the output. There is no need for special software; just contemporaneous versions of Windows and OpenOffice.

The whitepaper is not some finely crafted relic with hidden secrets waiting to be revealed. It is a plain and uncomplicated explainer created with commonplace software, with Satoshi just mildly tweaking the formatting to be evocative of academic papers.

To be clear, the page content of my recreation is not only visually identical when printed out or when digitally inspected at arbitrarily high zoom levels, but since the PDF rendering instructions and their dependencies are a match to the public whitepaper, the page content is by definition identical.

Let's cut to the chase and return to why we're even discussing how the whitepaper was made in the first place, as we're slowly learning the full outrageousness of what Craig Wright tried to pull in the COPA case.

After getting all of his evidence debunked as forgeries for the umpteenth time, Craig Wright suddenly "found" — as he is wont to do — a batch of new evidence (literally). Never mind all those earlier whitepaper "drafts" he had previously submitted in various formats, because you see the whitepaper was actually authored in LaTeX, and he suddenly had a bunch of never-seen-before files that he claimed perfectly produce the whitepaper PDF, and submitted them as the new lynchpin evidence of his entire case.

As would be later discovered, he had actually just run the whitepaper PDF through publicly available PDF-to-LaTeX conversion tools, and then manually massaged the output until it sufficiently approximated the public whitepaper (unbeknownst to Wright, he was logged practically keystroke for keystroke doing so by the editing platform he used, resulting in the judge being shown a literal video of his forgery in progress). Wright then swore in a witness statement that any such reverse-engineering (i.e. what he had just done) was practically impossible, and therefore his possession of such files could only be explained by him being the author, i.e. Satoshi Nakamoto.

Can we just take a moment to reflect on the sheer outrageousness of Wright's mentality? To say he is dishonest is like complaining that a cat is bad at filing taxes, in that it is our underlying expectation that is in error. Wright isn't so much lying as he is entirely unfettered by the concept of honesty in the first place, and it is by failing to recognize this lack of shared norms that people fall prey to the manipulations of Wright and others like him.

As far as court cases go, this is a practically unheard of level of continuing, unabashed and unrepentant fraud, by and for the sake of a malignant and pathologically narcissistic mythomaniac's fragile ego but additionally in service of a sinister, well-funded and long-running scheme to harass and torment innocent people. In the UK, "perverting the course of justice" is a serious crime carrying serious penalties, with people having been convicted for far lesser offenses than Wright has demonstrably and repeatedly committed.

Coming back to Wright's argument though, if we accept his suggested logic of "whosoever holds this whitepaper, if he be able to reproduce it accurately, shall possess the power of Satoshi", then all it would prove is that I, not Craig Wright, is Satoshi Nakamoto, since I hold the superior whitepaper document. (And Craig Wright is therefore a liar.)

I am however not Satoshi Nakamoto, and thus the suggested inference must be false: possession of a source document that generates the Bitcoin whitepaper, no matter how perfectly, is not evidence that someone is Satoshi, since demonstrably anyone can create such a document with some effort.

(And Craig Wright remains a liar.)


Footnotes

  1. Wright's version was referred to as having a misformatted timestamp missing a time zone.
  2. Wright repeatedly stated his version was "compiled on Overleaf" and "with MikTeX" before that; those renderers produce distinctly different PDF structures, magic bytes, and styles of PDF rendering programs compared to OpenOffice and the public whitepaper.
  3. As is plainly visible in the forgery demonstration video, Wright's version is not actually visually identical to the public whitepaper (with red pixels indicating mismatches). By definition this also means the PDF programs cannot be identical.
  4. It was repeatedly mentioned in court that Wright's version is full of manual spacing adjustments using LaTeX commands like \; and \spaceskip. This equates to a large amount of "fudging" needed to coerce LaTeX's rendering output to approximate the whitepaper, and would result in very awkward and unnatural LaTeX code.