Using wget to mirror websites

On a rare occasion, I find a website worth saving. Sometimes it's old and may not be around forever. I just want a local copy, right?

Thanks to the massive help that is search engines, I found a handy program called wget that's installed on every Linux box! It's like the coolest of all time. Cough.

Say you want to download all 125 blog posts of MGL? Script it, baby:

wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains minimalinux.blogspot.com \
--no-parent \
minimalinux.blogspot.com


In the domains section, that limits it to just a single domain. I'm hoping I can limit it even further, or learn enough bash to limit it myself via scripting. Either way, it's a nifty trick. I managed to get all of Old Man Murray in a minute (including those cute little static image advertisements!), and made a local copy of MGL in about twenty seconds. Handy for backing up websites that may not be around forever, for an e-book reader, or just for something to do when the internet's out.

Source for this was Linux Journal. They have more info about the various options, so go ping them. Five ad bugs.

No comments: