Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

NoxxieNl's avatar

Conversion for documents with libreoffice

Hiya all,

Currently I am trying to figure out a best approach for my following problem. Everyday, once a day (or maybe two but mostly once) I need to convert around 1200+ RTF documents to PDF.

For the time being we are using paid software that does the conversion to PDF, long story short, it just starts up a MS office instance send the document to a printer queue and the program picks up the postscript file to convert it to PDF, it is not fast and sometimes randomly fails...

I was playing with libreoffice on ubuntu for the conversion and heck, it just works, and boy it works fast, arround 1200 documents done within one minute....

But now I am currently look at the best approach on how to deploy this.

I got around three options:

  1. Using a docker image, spin it up copy the files to the docker (or shared folder) do the conversion and done
  2. Spin up a dedicated VM in Azure / AMS and copy the files over and done
  3. Add libreoffice to my webserver and to the conversion on the webserver itself

I do not have any experience with option 1 and 2 and I am not feeling like using option 3. I tested using the listener option and the uniconv solution but just use libreoffice and use the --headless mode is just plain faster.

Does anyone have any good tips on how to do this, or any suggestions? Anything would be great!

Currently my test setup is just a bash script (where the argument is the directory were the RTF files are stored):

#!/bin/bash
shopt -s nullglob
baseDirectory="/home/user/test/conversion"

# Check if directory is specified.
if [ -z "" ]
  then
    echo "No input directory specified."
    exit
fi

# Balance files to directory, each directory contains max of 200
# This is done cause Linux has a shell limit of 249 with libreoffice.
rtfs=(/*.rtf)
for ((i=0; i < ${#rtfs[@]}; i += 200)); do
    printf -v b "${baseDirectory}/balancer/%03d" $((++n))
    mkdir -p $b && mv -t $b "${rtfs[@]:$i:200}"
done

# Loop balance directies and let libreoffice do its magic in converting the RTF files to PDF.
# Remove the balance directory after the conversion is done.
for d in ${baseDirectory}/balancer/*; do
    id=${d: -3}
    soffice "-env:UserInstallation=file:///${baseDirectory}/environments/${id}" \
            --headless --convert-to pdf "$d"/*.rtf \
            --outdir  > ${baseDirectory}/environments/output_${id}.log 2>&1 && \
            rm -rf "$d" &
done

wait
echo "Conversion done"

0 likes
1 reply
bobbybouwmann's avatar

All options are probably fine. Use whatever you already know ;)

Also, if you're toying with AWS it might be nice to look into lambdas. Basically a function you can fire for each document on AWS. No server needed ;)

Please or to participate in this conversation.