Skip to content

The Great Blog Migration

richard edited this page Jun 3, 2017 · 5 revisions

Migration In Detail

1 - get old blog

The old blog is a Moveable Type install. I found that the following wget command will crawl through the site and bring down all the blog entries, thumbnails, and JPGs.

wget -p -P rcb --convert-links -m -nH http://www.richardcampbell.com/blog/

So now I have all the files safe.

Notes

  • 199 main HTML blog entires
  • main entries are named 000NNN.html

2 - get new file names

Each existing MT html file has some boilerplate header, scripts, and stuff before the blog entry itself. There's a "title" H3 element and also a "posted by" element with the date of the post.

As Jekyll requires the posts to have a YYYY-MM-DD-title-text.html format, I need to change 000NNN.html to the new format.

I use good old awk to find the H3 element and posted by and make a new filename. Tweaks: Need to change the occasional special character ('/',"'",'*','#','"','!','?','&'). I don't believe the Jekyll file name is important; the title part is just used to make the file unique, so I'll just delete them.

3 - create first draft of markdown post

There's a well-known html2text.py python script that produces good enough markdown.

new jekyll format

4 - draft 2 - fix image URLs

5 - draft 3 - fix categories

Clone this wiki locally