Aug 4, 2017 AWS 中文版
Document conversion is kind of a feature that many web-based applications need. To make it, we can utilize Pandoc, a very useful conversion tool, with a disposable computing environment like AWS Lambda. But, Lambda doesn’t provide such tool for users by default. So, we have to build a Pandoc binary for it. In this post, I’m going to show you how to build the binary and execute it on AWS Lambda.
To create a Pandoc binary file for AWS Lambda, we must prepare a similar environment. Luckily, we can harness Docker and docker-lambda to simulate a Lambda environment. To get the docker image:
docker pull lambci/lambda:build-python2.7
Before build, we have to install dev packages (FYI, the default working directory is /var/task):
# Install necessary packages
yum -y install gmp-devel freeglut-devel python-devel zlib-devel gcc m4
Besides those packages, we also need GHC and Cabal:
# Install GHC
GHC='https://downloads.haskell.org/~ghc/8.0.1/ghc-8.0.1-x86_64-centos67-linux.tar.xz'
curl -OL $GHC && tar xf ghc* && cd ghc* && ./configure --prefix=/usr && make install && cd ..
# Install Cabal
CABAL='https://www.haskell.org/cabal/release/cabal-install-1.24.0.0/cabal-install-1.24.0.0-x86_64-unknown-linux.tar.gz'
mkdir .bin && cd .bin && curl -OL $CABAL && tar xf cabal* && cd ..
Now, let’s build:
# Build Pandoc
.bin/cabal sandbox init && \
.bin/cabal update && \
.bin/cabal install hsb2hs && \
.bin/cabal install --disable-documentation pandoc -fembed_data_files
# Wrap it up
cp .cabal-sandbox/bin/pandoc .
gzip pandoc
After building, get the compressed Pandoc:
docker cp [container-name]:/var/task/pandoc.gz /path/to/local/pandoc.gz
How to use Pandoc on AWS Lambda? First, create a Lambda function and upload your programs with the compressed binary file. Second, you have to know that the only place AWS Lambda allows us to create or download files is /tmp, so we will put the decompressed Pandoc binary file there, and then use it to convert files. Check this example:
import subprocess
def init_cmd(cmd):
return subprocess.Popen(
cmd,
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)
def execute_cmd():
cmd = (
"cp pandoc.gz /tmp/pandoc.gz && "
"gzip -d /tmp/pandoc.gz && "
"chmod 755 /tmp/pandoc && "
"/tmp/pandoc -S -s /tmp/input-file -o /tmp/output-file"
)
process = init_cmd(cmd)
out, err = process.communicate()
# Then do anything you want
That’s it, and go convert files!
If you have any suggestions, questions or even find some typos, feel free to contact me. Thank you! :)
zeckli.devforgalaxy@gmail.com © 2015-2019 zeckli, thanks to Jekyll and GitHub.