Cropping PDF to MediaBox

Update 2009/10/24: With some Linux ghostscript versions we have seen a 600dpi pixelation not visible on Windows XP. So if your Linux installation of pdfcrop works, use that one.

Many authors want to dump diagrams from MSExcel and MSPowerpoint and include them from LaTeX. You can dump either EPS via "MS Publisher Color Printer" or PDF via PDFCreator (or some other PDF printer driver). The problem is that the BoundingBox (in case of EPS) and MediaBox (in case of PDF) are set to the page size (typically Letter or A4), not the actual bounding box of the diagram.

Linux tetex distros now have a tool called pdfcrop (which basically embeds the input image into a stub TeX document and invokes something like PDFLaTeX). This is kludgy, but works fine. Unfortunately, the LaTeX package is getting quite badly messed up because of confusion between tetex, web2c and texlive, definitely in Cygwin but also in Debians now [1, 2, 3].

I have found it easier to write my own pdfcrop script that uses only ghostscript calls. The script is given below.

#!/bin/bash

if [ $# -ne 1 ]
then
  echo "Usage: `basename $0` FileToBeCleaned.pdf"
  exit -1
fi

SRCFILE=$1
CTDIR=`mktemp -d`
trap "rm -rf $CTDIR" SIGINT SIGTERM EXIT

pdftops -eps $1 $CTDIR/1.ps
eps2eps $CTDIR/1.ps $CTDIR/2.eps
BBLINE=`(gs -q -sDEVICE=bbox -dNOPAUSE $CTDIR/2.eps -c quit 2>&1) | egrep '%%Bound'`
echo $BBLINE
sed -i "s/^%%BoundingBox.*$/$BBLINE/g" $CTDIR/2.eps
sed -i "/^%%HiResBoundingBox.*$/d" $CTDIR/2.eps

epstopdf $CTDIR/2.eps
mv $CTDIR/2.pdf $1

exit 0

You can now easily use \includegraphics to include these PDF files into LaTeX and run pdflatex to build. Note, this script has been tested only on MSExcel and MSPowerpoint output via PDFCreator. YMMV.