We have made a public, read-only archive of the two Ubuntu wikis, in preparation for their upcoming deprecation.
Posts in the Ubuntu wiki project series
Finding the archives
You can now access, read, and clone the wiki archives on GitHub:
The archive includes pages from wiki.ubuntu.com and help.ubuntu.com/community/CommunityHelpWiki, with wiki pages organized into alphanumeric folders.
Searching the archives
You can find pages quickly using GitHubās built-in search:

Note
We have made every effort to retain all non-empty pages.
If any legitimate page is missing, it is not intentional, and we will try to restore it.
Reading the archives
The files have been converted to MediaWiki syntax, which renders nicely on GitHub.
This generally results in nice, well-formatted pages:
Downloading the archives
We reduced the size of the archive from about 60GB to 0.34GB.
Cloning the wiki-archives repo to my local machine takes just over 10 seconds:
time git clone git@github.com:ubuntu/wiki-archives.git
...
...
Receiving objects: 100% (124534/124534), 150.47 MiB | 24.88 MiB/s, done.
...
Updating files: 100% (50072/50072), done.
________________________________________________________
Executed in 12.96 secs fish external
Worried that content was lost because of the reduction in size?
Read on.
We have provided tarballs that contain images, attachments, and multiple page versions. Removing these was part of how we achieved the size reduction, which was necessary for making a plaintext archive that was easy to search and download.
Searching locally
Fuzzy-searching through the cloned wikis in an editor is fast:

Getting tarballs
Tarballs are included in the Releases page of the GitHub repo:
These might be useful if you:
- Want wiki pages in the original Moinmoin syntax
- Need images and attachments from the original wiki
- Require individual page versions
Although the tarballs contain images and attachments, they are still significantly smaller than the original backups of the wikis, as files including user information, caches, and other data have been removed.
Purpose of the Ubuntu wiki archives
Preserving Ubuntu history
The Ubuntu wiki has existed almost as long as Ubuntu itself. For over 20 years, members of the community and Ubuntu developers have contributed.
The archives will serve to preserve this important part of Ubuntuās history and the contributions made by people through the years.
Supporting future migrations
Many of the wiki pages that received the highest traffic have already been migrated to new homes, such as the Ubuntu Project documentation.
Still, there are likely pages that are viewed infrequently but that could still be important when an individual or team has a particular issue.
If you are worried that a page has been lost, you can find it in the wiki archive.
If you are worried that you missed an opportunity to migrate your teamās content, you can find it in the wiki archive.
How the wikis were archived
Our goal was to make the archive available, searchable, and cloneable as a GitHub repository.
Challenges with the wiki sources
The combined size of both wikis was enormous.
We could have simply made tarballs of the raw backups available. From my own experience, however, downloading, extracting and even just deleting the full backups was demanding:

The backups were also difficult to navigate, given the complexity of the file system, the existence of multiple page versions, and the URL-encoding of page names.
For anyone wishing to find a page, the process would be slow and difficult.
Reducing the size
To reduce the size, every image, attachment, and cache was removed.
In addition, any non-latest version of a page was deleted.
Lastly, the number of spam pages, some of which existed for years, was reduced.
Cleaning up the files
Each wiki had a pages directory, containing subdirectories named after wiki pages, themselves consisting of a subdirectory with different versions of the pages.
The directories named after the pages were almost unreadable due to some type of URL encoding.
We removed the encoding and simplified the directory tree: each wiki now consists of an alphanumeric list of folders, each containing individual files named after the wiki page itself.
Converting the syntax
We made the decision to convert the original Moinmoin syntax to MediaWiki syntax, for the following reasons:
- It will make it easier to migrate content to the new MediaWiki-based Ubuntu wiki, as people wonāt need to do a full syntax conversion each time
- It will make it easier to read the wiki pages on GitHub, which supports MediaWiki syntax but not Moinmoin syntax
The syntax conversion is not perfect
There are some inconsistencies, partially because the original files themselves didnāt have consistent Moinmoin syntax. Inter-page linking also does not work in the archive. However, it is not our intention for the archives to be a perfect reading experience, nor a functioning wiki. Above all, we wanted the archives to support people who want to find and migrate content.
Acknowledgements
I want to thank @marek-suchanek for contributions and discussions about the archiving project, @rkratky for initial advice on different approaches to archiving content, @nickbellol for first highlighting the need to find an archiving solution, and Canonicalās IS team for providing the wiki backups.


