-
Notifications
You must be signed in to change notification settings - Fork 0
The Great Blog Migration
The old blog is a Moveable Type install. I found that the following wget command will crawl through the site and bring down all the blog entries, thumbnails, and JPGs.
wget -p -P rcb --convert-links -m -nH http://www.richardcampbell.com/blog/
So now I have all the files safe.
- 199 main HTML blog entires
- main entries are named 000NNN.html
Each existing MT html file has some boilerplate header, scripts, and stuff before the blog entry itself. There's a "title" H3 element and also a "posted by" element with the date of the post.
As Jekyll requires the posts to have a YYYY-MM-DD-title-text.html format, I need to change 000NNN.html to the new format.
I use good old awk to find the H3 element and posted by and make a new filename. Tweaks: Need to change the occasional special character ('/',"'",'*','#','"','!','?','&'). I don't believe the Jekyll file name is important; the title part is just used to make the file unique, so I'll just delete them.
There's a well-known html2text.py python script that produces good enough markdown.
new jekyll format