de vita sua

a blog




View My GitHub Profile

10 June 2015

Self-publishing with pandoc, etc

by John D. Muccigrosso

Depending on your thinking, I'm either just done with or approaching the end of a sabbatical. ("Just done with" if you think that once commencement occurs, it's just a regular summer.) Among the things I produced in the past few months is a very short "note" on a topic that doesn't fall within my usual area of research. I sent it to a couple of OA journals, but neither wanted to publish it as is. I'm not interested in doing more with it at this time, but it seems silly to have it just sit on my hard drive doing nothing. It's the kind of thing I'd do as a conference paper, if I went to a conference at which I think it'd be welcome. But since I don't go to such conferences, I figure I'll just put it out there for people to check out anyway. (The advantages of tenure and the internet!)

I could do it as a blog post, though it's already written in a more "academic" style than I write this blog in. Instead I'm going to post it as html on github and as a pdf on my account at figshare, where it's easily accessible, archived, and even gets a DOI. I'll also link to it from my academia.edu page (as well as here, obviously).

The Workflow

I've started using markdown with pandoc to generate documents. I was inspired by Dennis Tenen and Grant Wythoff's post last year, "Sustainable Authorship in Plain Text using Pandoc and Markdown," but I've long been a fan of avoiding proprietary formats that are likely to become obsolete (no doubt in part because I work with very old texts and materials professionally). It's easy enough to do simple stuff this way, but getting to more complex documents requires some work. Here's a list of stuff I do/use:

The overall process then runs like this: write in MacDown, incorporating the citation keys from Zotero; process that with pandoc to generate the desired final file type; publish/share/whatever.

It's seems easy when I write it like that.

Numbering for citation

My one concern with html output of the note that I just wrote was that html has no default pagination, and pages are usually the way one cites an article. So instead of numbered pages, I decided to go with numbered paragraphs. (Read about Sebastian Heath's approach to articles he's editing for ISAW.) But how to number them, so that the numbers were visible (for easy citation) and so that I didn't have to manually put them in? With a little help from the pandoc Google group, I combined some features of pandoc with css. Since pandoc automatically gives IDs to headers and css allows for formatting those headers and even for auto-numbering them, I put a nearly empty level-6 header at the start of each paragraph in my markdown document and I used css to number them and put that number off in the margin. (They're nearly empty because markdown won't create empty headers.) Although the numbers are visible to a human reader, the IDs aren't ideal: section, section-1, section-2, and so on, but they are sequential and linkable. The headers are also a bit ugly in markdown, but they work and they also make it possible for me to indicate logical paragraphs instead of the actual ones. This is useful, for example, when there's a block quote, which technically creates a new paragraph, right in the middle of a logical paragraph.

One more thing, since I'm using css to number the paragraphs in the html version, those numbers are technically part of the display of the article, not part of its content. So you can see them, but you can't find them if you search in your browser. That's not the case in the PDF; there the numbers are "real" and you can find them in a search.

The Article

So about the article itself...it has to do with the original nature of the Golden Calf in Exodus 32. I speculate that it was in origin a "corn calf," to be associated with a lost harvest ritual. Go have a read:

Tags - AppleScript - markdown - pandoc - technology