Web log back in action, Markdown, html2markdown

2005 January 22
by darkness

I’m back up with my (new) web log. But first, let me share with you what I saw this morning when I went for my breakfast bagel:

Bagels in the blender

I have no idea. (Though I did move the bagels a bit to emphasize the fact that my bagels were found in the blender, and not put there by me - to my knowledge.)

So, anyway. I stopped updating my web log for so long because I didn’t have a good way to enter text. WordPress has all these nice plug-ins for formatted text to HTML, for formats such as Textile and Markdown. However, there is at least one major problem: WordPress doesn’t store the information about what plug-in was used to create a post with the post. In my opinion, plug-ins would ideally be able to do both text to HTML and HTML to text. WordPress would ask the plug-in to convert the content to HTML when you submit a post, and then ask the plug-in to convert back to text when you retrieve a post for editing. Markdown didn’t do this. Textile was straight out because it didn’t seem to have support for hard wrapped lines, a la Emacs, my web log writing tool of choice.

Anyway, what this all means is that I’m using Emacs to convert my posts from Markdown to HTML, and HTML back to Markdown. Aaron Swartz wrote something called html2text which is pretty decent, but doesn’t do hard wrapping of things like lists apparently. It also doesn’t handle some of the more obscure cases of “Markdown markup” in my experience. Thus, I wrote html2markdown. Like html2text, it is written in Python. I’m using it to write my posts now, so it’s going to get some testing from that, plus I tested it against many of the example Markdown markup on the Markdown web pages. You are encouraged to report bugs to me, but please don’t tell me “this HTML input doesn’t produce good Markdown output!” It’s not made to convert general HTML to Markdown, it’s made to convert Markdown-generated HTML to Markdown. Instead, send me a bit of Markdown markup that fails to output something that resembles the input to Markdown.

For posterity, the macros used in my .emacs:

(defun weblog-filter-through-program (program)
  (save-excursion
    (goto-char (point-min))
    (search-forward "nn")
    (shell-command-on-region (point) (point-max) program t t)))
(defun weblog-entry-text-to-html ()
  (interactive)
  (weblog-filter-through-program "text2html"))
(defun weblog-entry-html-to-text ()
  (interactive)
  (weblog-filter-through-program "html2text"))
(defun my-weblog-mode-hook ()
  (local-set-key "C-cbh" 'weblog-entry-text-to-html)
  (local-set-key "C-cbt" 'weblog-entry-html-to-text))

(Also for posterity’s sake, or maybe just mine: Markdown uses four spaces to indicate a pre-formatted block (<pre>). In Emacs, paste in your text, select it as a region, then M-4 C-x TAB.)

This is used in conjunction with mt.el, as described in a previous post. Also, you’ll need to disable automatic paragraphs in WordPress. For some reason, the WordPress developers seem to think that putting this in as an option will needlessly over-complicate WordPress. Whatever the case, using HTML (or, at least, Markdown’s HTML) with WordPress automatic paragraphs will yield things like <p></p> in your pre-formatted text blocks. Dumb. See the suggested plug-in at the top of http://wordpress.org/support/topic.php?id=11097 for my fix. I put this in a file in the appropriate plug-in directory, activated it, and my worries (thus far) are gone. I note that Markdown and Textile plug-ins for WordPress both do roughly the same thing, as they recognize that they don’t need/can’t use WordPress’ automatic paragraphs.

Now I need to link stuff in from my old web log. This post is going to ping the web log services, hopefully. Will post back here if I can manage to find a way to link my old pages to my new web log.

Update: I was going to do this update because I forgot something… but now I forgot what I was going to say. I was distracted because whenever I edited a post (such as this one) from Emacs, the date on the post would get changed to today’s date. Searching around a bit I found the second hunk in http://mycvs.org/wp/wp-content/xmlrpc.12.patch to hold the answer. According to http://ecto.kung-foo.tv/archives/001142.php this might not be neccessary with WP 1.2.2, but the code for current_time('timestamp',1) was still there, rather than strtotime($postdata['post_date']) which uses the original creation time from the post being edited, if a new time isn’t supplied by the client.

I think I remember now. What I was going to say is that html2markdown.py is to be used just like Markdown.pl from the command line. My text2html script is:

Markdown.pl | SmartyPants.pl

And my html2text script is:

SmartyPants.pl --reverse | html2markdown.py
2 Comments leave one →
2005 September 18
Anon permalink

Html to Markdown version 0.2 is permissioned denied.

2005 September 18
darkness permalink

So it was. It’s been fixed. I’ll note that as of this date I think I have something a little newer than 0.2 that I’m actually using. So if you find bugs in html2markdown, I’m interested to hear about them.

Leave A Comment

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS