Wget
Wget is a program which you can use to download webpages. But it can read the pages it downloads and then download the pages to which that first page linked. Then it can download the pages that the pages that the first page linked to link to. If you see what I mean. The only drawback that it could be seen as having is that it doesn't have a GUI -- you have to type in all the commands. But once you get the hang of that it's usually easier and quicker than clicking on things and dragging on other things.
If your wiki has a password you can even log into it first using wget and then download the wiki.
Just for the laugh, here's the directions I did fro when I was looking after a wiki on pbwiki (now pbworks) showing how to download an entire (pretend example, called "*example*") wiki from pbworks:
Backup instructions:
These instructions will create a copy of pretty much the entire wiki which will run on your computer. It puts it in a folder with the title of the wiki - it will have hundreds of files in it, scroll down to, and click, "index.html" and you'll be using your local copy of the wiki.
wget is run from the command prompt (Start -- All Programs -- Accessories -- Command prompt), you type each command in as a single line and press enter and it will do its job without any further input from you. As it's doing its work the command prompt window will tell you what's happening -- this will not be very interesting to watch but if it fails you'll probably be able to work ou what's gone wrong. The paths in the commands below assume that you have installed wget with the defaults from the sourceforge page above. If you're using CygWin or another form of wget you probably know what to do.
The command to do the downloading will probably take a long time (as in an hour or so) -- this is deliberate so as not to annoy the computers hosting the wikis. (it mostly takes so long because I can't work out how to fine-tune it to only download the current version of each page and not every single version for all history)
When wiki does not require a password:
If the wiki doesn't require a password to read, for example http://*example*.pbwiki.com things are easier and it can be downloaded with a single command. This is working from the assumption that there are no uploaded documents. If there are then a Backup page should be created which has a copy of each file and the URL in the command altered appropriately.
"Program Files\GnuWin32\bin\wget.exe" --no-check-certificate -r --wait=3 --random-wait -e robots=off --reject=php --exclude-directories /session/,/user/ --convert-links --directory-prefix=c: http://*example*.pbworks.com
When wiki requires a password:
(prerequisite) to get this to work you'll need to set up a student account which only has reader permissions for the wiki called "backup" with the password "backup_password". I just did this to make a clear distinction between other logins and backup passwords and also so nothing accidentally happened by doing this with a permission level which is allowed to edit or delete things.
wget will need to be run twice --
1 -- To log in to the wiki as backup and save the appropriate cookies
2 -- To download all the files on this page as well as all the wiki pages which can be accessed through the links on the normal pages and the sidebar. wget is unable to follow the links within the "files and pages" tab on the top right, which is why they've been copies onto this backup page prior to backing up the wiki.
The first command, to get the appropriate cookies, is:
"Program Files\GnuWin32\bin\wget.exe" --no-check-certificate --save-cookies cookies.txt --keep-session-cookies --post-data="return=http://*example*.pbworks.com/FrontPage&u_email=backup&u_password=backup_password&u_remember=checked&wiki=*example*&redir=note&submit_submit=Log in" https://my.pbworks.com/
The command to download the wiki is as follows:
"Program Files\GnuWin32\bin\wget.exe" --no-check-certificate --convert-links --directory-prefix=c: --load-cookies cookies.txt -r --wait=3 --random-wait -e robots=off --reject=php --exclude-directories /session/,/user/ http://*example*.pbworks.com/Backup
Explanation of commands:
That will download pretty much the entire wiki (including all past revisions -- sorry, I couldn't work out how to stop it doing that). It will take a long time as it includes the instruction to wait for a few seconds between each file request so as not to annoy the nice people hosting our wiki by bashing their server.
That will log in to the wiki and save the session cookie on your computer in a file called "cookies.txt". The options mean as follows:
--no-check-certificate : don't check for the security certificates for the site. We know the site is okay and the certificates it offers cause trouble for wget
--save-cookies cookies.txt --keep-session-cookies : save the cookies in a file called "cookies.txt" and then save that file so it can be used again
--post-data="return=http://*example*.pbwiki.com/FrontPage&u_email=backup&u_password=backup_password&u_remember=checked&wiki=*example*&redir=note&submit_submit=Log in" https://my.pbwiki.com/
This sends the necessary information to the server to log in. The appropriate information was obtained using the Web Developer addon for firefox. The final https://my.pbwiki.com/ is the address necessary to complete the logging in.
--convert-links : changes all the links in downloaded files to make them refer to the other downloaded files. If a particular file isn't downloaded the link stays as it is.
--load cookies cookies.txt : loads the cookies file created previously
-r : downloads recursively i.e. follows links. wget won't go outside of *example*.pbwiki.com so you don't have to worry about it downloading the whole internet
--wait=3 : waits for 3 seconds between each request. This is considered polite behaviour when doing this type of thing, so as not to annoy the computer hosting the site. You can make the wait delay shorter if you're in a hurry, but it's probably not really that urgent?
--random-wait : makes the wait not be exactly 3 seconds.
-e robots=off : makes wget ignore the robots.txt file, otherwise it will only download a single page.
--reject=php,xml : don't download any php files - this avoids creating new pages while trying to download the old ones, which won't happen because the login doesn't have the permissions but just in case... it also avoids downloading the rss feed
--exclude-directories /session/,/user/ : exclude those two directories (mainly because going to session/logout logs you out, at which point the download will stop)
--directory-prefic=c: : This makes all the files download to a folder located at c:. This will just make it easier to transfer between computers -- the default would include all the path bits to wget
************
Those commands are mostly so complicated because of logging in -- the information you have to send to log in is a bit tricky to work out and then send. But remember that at the end of this you'll have a complete copy of the wiki on your local disk. You might have no need ever to look at it again, but at least you'll now have the option and it won't disappear if the company hosting the wiki should happen to.