Build Pandoc and run it on AWS Lambda

Aug 4, 2017 AWS 中文版

Document conversion is kind of a feature that many web-based applications need. To make it, we can utilize Pandoc, a very useful conversion tool, with a disposable computing environment like AWS Lambda. But, Lambda doesn’t provide such tool for users by default. So, we have to build a Pandoc binary for it. In this post, I’m going to show you how to build the binary and execute it on AWS Lambda.

Prerequisite

To create a Pandoc binary file for AWS Lambda, we must prepare a similar environment. Luckily, we can harness Docker and docker-lambda to simulate a Lambda environment. To get the docker image:

docker pull lambci/lambda:build-python2.7

Build Pandoc

Before build, we have to install dev packages (FYI, the default working directory is /var/task):

# Install necessary packages
yum -y install gmp-devel freeglut-devel python-devel zlib-devel gcc m4

Besides those packages, we also need GHC and Cabal:

# Install GHC
GHC='https://downloads.haskell.org/~ghc/8.0.1/ghc-8.0.1-x86_64-centos67-linux.tar.xz'
curl -OL $GHC && tar xf ghc* && cd ghc* && ./configure --prefix=/usr && make install && cd ..

# Install Cabal
CABAL='https://www.haskell.org/cabal/release/cabal-install-1.24.0.0/cabal-install-1.24.0.0-x86_64-unknown-linux.tar.gz'
mkdir .bin && cd .bin && curl -OL $CABAL && tar xf cabal* && cd ..

Now, let’s build:

# Build Pandoc
.bin/cabal sandbox init && \
.bin/cabal update && \
.bin/cabal install hsb2hs && \
.bin/cabal install --disable-documentation pandoc -fembed_data_files

# Wrap it up
cp .cabal-sandbox/bin/pandoc .
gzip pandoc

After building, get the compressed Pandoc:

docker cp [container-name]:/var/task/pandoc.gz /path/to/local/pandoc.gz

Use Pandoc

How to use Pandoc on AWS Lambda? First, create a Lambda function and upload your programs with the compressed binary file. Second, you have to know that the only place AWS Lambda allows us to create or download files is /tmp, so we will put the decompressed Pandoc binary file there, and then use it to convert files. Check this example:

import subprocess

def init_cmd(cmd):
    return subprocess.Popen(
        cmd,
        shell=True,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE
    )

def execute_cmd():
    cmd = (
        "cp pandoc.gz /tmp/pandoc.gz && "
        "gzip -d /tmp/pandoc.gz && "
        "chmod 755 /tmp/pandoc && "
        "/tmp/pandoc -S -s /tmp/input-file -o /tmp/output-file"
    )
    process = init_cmd(cmd)
    out, err = process.communicate()

    # Then do anything you want

That’s it, and go convert files!

Build Pandoc and run it on AWS Lambda

Prerequisite

Build Pandoc

Use Pandoc

You might also like: