We wanted to import our old mailing list entries from the OpenMRS mailing list Listserv archives into Nabble. No problem. Finding the GET listname FILELIST and GET listname file1, GET listname file2, … commands was easy enough. A quick search of Nabble support made it clear that I needed to send them mbox files. So, I set out in search of a Listserv to mbox converter. I found a couple scripts: one in perl and another in PHP. But trying them out, made it clear that I was going to have to do some tweaking. After a few near misses, I thought: “I could do this easier in Groovy.” So, I ended up with this script. Basically, it came down to leaving the messages and their headers alone and just adding a From_ line in front. Otherwise, the only tricky part was getting the dates right (GMT time without timezone specified in the From_ line and a some reshuffling of the date format in the message header).
Both for future me and anyone else who might benefit, here’s the script I ended up with:
import java.text.SimpleDateFormat
delim = '=' * 73 // LISTSERV separates messages with a bar of equal signs
foundDelim = false // we skip all content until first delimiter
inHeader = true // true when processing header data
def header = "" // holds current header data
dfListserv = new SimpleDateFormat("E, d MMM yyyy HH:mm:ss z")
dfHeader = new SimpleDateFormat("E MMM dd HH:mm:ss yyyy z")
dfMbox = new SimpleDateFormat("E MMM dd HH:mm:ss yyyy")
dfMbox.timeZone = TimeZone.getTimeZone("GMT") // for mbox, convert to GMT and drop timezone reference
cal = Calendar.instance
// Process input line by line from stdin
System.in.eachLine() { line ->
if (!foundDelim)
foundDelim = (line == delim) // skip until we find first delim
else if (inHeader) {
// within header
if (line =~ /^s*$/) {
// empty line signals end of header
// fetch Date from header and reformat it for output
m1 = header =~ /(?ms)^Date:s+(.*?)s*$/
date = dfListserv.parse(m1[0][1])
cal.time = date
mboxDate = dfMbox.format(cal.time)
headerDate = dfHeader.format(cal.time)
// fetch From from header
m2 = header =~ /(?ms)^From:s+(.*?)s*$/
fromHeader = m2[0][1]
leftBracket = fromHeader.indexOf('<')
rightBracket = fromHeader.indexOf('>')
if (leftBracket > 0 && rightBracket > leftBracket)
from = fromHeader.substring(leftBracket+1, rightBracket)
else
from = fromHeader
// output header with mbox-required From_ line up front and reformatted date
header = "From $from $mboxDaten" + header.replaceAll(/(?m)^Date:s+.*$/, "Date: $headerDate")
println "$headern"
inHeader = false // no longer in header
header = "" // clear for next message
} else {
header += "$linen" // accumulate full header data
}
} else if (line == delim) {
// if we find a delim, begin processing next line as header
print "nn"
inHeader = true
} else {
// within a message, just send it through untouched
println line
}
}