Converting Office Documents on Linux Without MS Office

Converting Office Documents on Linux Without MS Office

online documentsConverting Office Documents Without Microsoft Office: Linux-Native Solutions

You’re working on a Linux server. A client sends a .docx file that needs to become a PDF. Or you need to convert 50 .odt files to HTML for a documentation site. Or you’re maintaining docs in Markdown, but stakeholders want Word files.

You don’t have Microsoft Office. You don’t want Microsoft Office. And honestly, you shouldn’t need it.

Linux has solid tools for document conversion. Some work great. Some have quirks. Here’s what actually works.

LibreOffice Headless Mode

LibreOffice isn’t just a desktop app. It has a command-line mode that handles conversions without opening the GUI.

Basic conversion syntax:

bash

libreoffice –headless –convert-to pdf document.docx

This works for most common formats. DOCX to PDF, ODT to DOCX, DOC to HTML, spreadsheets to CSV.

Multiple files:

bash

libreoffice –headless –convert-to pdf *.docx

Specify output directory:

bash

libreoffice –headless –convert-to pdf –outdir ./pdfs *.docx

What works well:

  • Simple documents with standard formatting
  • Spreadsheets to PDF or CSV
  • Batch processing multiple files
  • Basic presentations to PDF

What breaks:

  • Complex Word templates with custom fonts
  • Documents using Windows-specific font rendering
  • Files with embedded objects or unusual formatting
  • Precise layout matching (margins might shift slightly)

The conversion is good enough for most internal docs. For client-facing materials where formatting matters, you might need something more reliable.

unoconv: The Python Wrapper

unoconv wraps LibreOffice’s conversion engine with a cleaner interface. It’s a Python script that calls LibreOffice’s UNO bindings underneath.

Install on Debian/Ubuntu:

bash

sudo apt install unoconv

Convert to PDF:

bash

unoconv -f pdf document.docx

Convert to HTML:

bash

unoconv -f html report.odt

The advantage is unoconv handles LibreOffice’s process management better. Early versions of LibreOffice headless mode could leave zombie processes running. unoconv cleans up properly.

Batch script example:

bash

#!/bin/bash

for file in *.docx; do

unoconv -f pdf “$file”

echo “Converted $file”

done

unoconv supports formats like txt, html, xml, csv, xls, xlsx, doc, docx, odt, pdf, ppt, and more.

Important: unoconv needs LibreOffice installed. It’s not a separate converter—it’s calling LibreOffice under the hood. Same limitations apply.

Pandoc for Markdown and Lightweight Formats

Pandoc is different. It’s a document converter that understands markup languages really well. It doesn’t rely on LibreOffice.

Install:

bash

sudo apt install pandoc

Markdown to DOCX:

bash

pandoc -s document.md -o document.docx

DOCX to Markdown:

bash

pandoc document.docx -o document.md

Markdown to PDF (requires LaTeX):

bash

pandoc document.md -o document.pdf

Note about PDF generation: Pandoc’s default PDF engine is LaTeX. You’ll need to install a LaTeX distribution for PDF output:

bash

sudo apt install texlive

For a lighter alternative, you can use wkhtmltopdf or weasyprint as the PDF engine, but you’ll need to install those separately and specify them with –pdf-engine.

Pandoc shines when you’re working with text-based formats. Markdown to HTML, reStructuredText to PDF, LaTeX conversions. It handles the structure and formatting intelligently.

Where Pandoc wins:

  • Converting between markup formats
  • Generating documentation from Markdown
  • Creating PDFs from Markdown with proper styling (when LaTeX is installed)
  • Batch converting docs for static site generators

Where Pandoc struggles:

  • Complex Word documents with precise layouts
  • Excel spreadsheets (not designed for this)
  • Files with heavy formatting or embedded objects
  • Proprietary binary formats

Pandoc is the tool when you control the input format. If you write docs in Markdown and need to export to various formats, Pandoc is perfect.

Handling Edge Cases

Fonts are the enemy.

A document created on Windows with Calibri or Times New Roman might render differently on Linux. LibreOffice substitutes fonts. Sometimes this is fine. Sometimes it breaks pagination.

Solution: Install Microsoft core fonts:

bash

sudo apt install ttf-mscorefonts-installer

This gets you Arial, Times New Roman, Courier New, and other common Windows fonts. Not perfect but closer. You’ll need to accept Microsoft’s EULA during installation.

Embedded objects break.

Word documents with embedded Excel charts, Visio diagrams, or proprietary objects often don’t convert cleanly. LibreOffice tries. Sometimes you get a placeholder. Sometimes the object disappears.

No clean Linux-native fix for this. The objects use Windows-specific rendering.

Macros don’t transfer.

If the document has VBA macros, they won’t work in LibreOffice. The conversion process strips them or renders them non-functional.

Scripting Production Workflows

Real scenario: You run a documentation site. Contributors write in Markdown. Clients want PDF downloads. You need automated conversion.

Sample script (requires LaTeX for PDF generation):

bash

#!/bin/bash

# Convert all markdown files to PDF

for md in docs/*.md; do

filename=$(basename “$md” .md)

pandoc “$md” -o “pdfs/$filename.pdf” \

–pdf-engine=xelatex \

–variable mainfont=”DejaVu Sans” \

–variable geometry:margin=1in

echo “Generated pdfs/$filename.pdf”

done

Another scenario: Batch converting client submissions. They send DOCX, you need PDF for archival.

bash

#!/bin/bash

mkdir -p converted

for docx in submissions/*.docx; do

filename=$(basename “$docx” .docx)

libreoffice –headless –convert-to pdf \

–outdir converted “$docx”

done

Add error handling:

bash

#!/bin/bash

for docx in *.docx; do

if libreoffice –headless –convert-to pdf “$docx” 2>/dev/null; then

echo “✓ $docx converted”

else

echo “✗ $docx failed”

fi

done

When Linux Tools Aren’t Enough

Sometimes you need perfect fidelity. A contract with specific formatting. A report that must match the original exactly. A presentation with precise layout.

LibreOffice gets you 90% there. That last 10% is where things break. Fonts render slightly differently. Margins shift. Embedded objects go missing.

For these cases, web-based document conversion tools can handle the edge cases better. They’re built specifically for format conversion and usually preserve formatting more accurately than open-source alternatives.

This isn’t a knock against Linux tools—they’re excellent for most use cases. But when you’re dealing with legal docs, client deliverables, or anything where formatting precision matters, having a backup option saves time.

Choosing the Right Tool

Quick decision tree:

  • Markdown or plain text source: Use Pandoc
  • Batch converting office docs: Use LibreOffice headless or unoconv
  • Simple internal documents: LibreOffice headless works fine
  • Complex formatting matters: Test first, have a backup plan
  • Automated pipeline: Script LibreOffice or unoconv with error handling
  • Need PDFs from Markdown: Install Pandoc + LaTeX (texlive)

The Linux ecosystem has solid conversion tools. They’re free, scriptable, and work well for most documents. Know their limitations, test your specific use cases, and have alternatives ready when precision matters.