Converting Office Documents Without Microsoft Office: Linux-Native Solutions
You’re working on a Linux server. A client sends a .docx file that needs to become a PDF. Or you need to convert 50 .odt files to HTML for a documentation site. Or you’re maintaining docs in Markdown, but stakeholders want Word files.
You don’t have Microsoft Office. You don’t want Microsoft Office. And honestly, you shouldn’t need it.
Linux has solid tools for document conversion. Some work great. Some have quirks. Here’s what actually works.
LibreOffice Headless Mode
LibreOffice isn’t just a desktop app. It has a command-line mode that handles conversions without opening the GUI.
Basic conversion syntax:
bash
libreoffice –headless –convert-to pdf document.docx
This works for most common formats. DOCX to PDF, ODT to DOCX, DOC to HTML, spreadsheets to CSV.
Multiple files:
bash
libreoffice –headless –convert-to pdf *.docx
Specify output directory:
bash
libreoffice –headless –convert-to pdf –outdir ./pdfs *.docx
What works well:
- Simple documents with standard formatting
- Spreadsheets to PDF or CSV
- Batch processing multiple files
- Basic presentations to PDF
What breaks:
- Complex Word templates with custom fonts
- Documents using Windows-specific font rendering
- Files with embedded objects or unusual formatting
- Precise layout matching (margins might shift slightly)
The conversion is good enough for most internal docs. For client-facing materials where formatting matters, you might need something more reliable.
unoconv: The Python Wrapper
unoconv wraps LibreOffice’s conversion engine with a cleaner interface. It’s a Python script that calls LibreOffice’s UNO bindings underneath.
Install on Debian/Ubuntu:
bash
sudo apt install unoconv
Convert to PDF:
bash
unoconv -f pdf document.docx
Convert to HTML:
bash
unoconv -f html report.odt
The advantage is unoconv handles LibreOffice’s process management better. Early versions of LibreOffice headless mode could leave zombie processes running. unoconv cleans up properly.
Batch script example:
bash
#!/bin/bash
for file in *.docx; do
unoconv -f pdf “$file”
echo “Converted $file”
done
unoconv supports formats like txt, html, xml, csv, xls, xlsx, doc, docx, odt, pdf, ppt, and more.
Important: unoconv needs LibreOffice installed. It’s not a separate converter—it’s calling LibreOffice under the hood. Same limitations apply.
Pandoc for Markdown and Lightweight Formats
Pandoc is different. It’s a document converter that understands markup languages really well. It doesn’t rely on LibreOffice.
Install:
bash
sudo apt install pandoc
Markdown to DOCX:
bash
pandoc -s document.md -o document.docx
DOCX to Markdown:
bash
pandoc document.docx -o document.md
Markdown to PDF (requires LaTeX):
bash
pandoc document.md -o document.pdf
Note about PDF generation: Pandoc’s default PDF engine is LaTeX. You’ll need to install a LaTeX distribution for PDF output:
bash
sudo apt install texlive
For a lighter alternative, you can use wkhtmltopdf or weasyprint as the PDF engine, but you’ll need to install those separately and specify them with –pdf-engine.
Pandoc shines when you’re working with text-based formats. Markdown to HTML, reStructuredText to PDF, LaTeX conversions. It handles the structure and formatting intelligently.
Where Pandoc wins:
- Converting between markup formats
- Generating documentation from Markdown
- Creating PDFs from Markdown with proper styling (when LaTeX is installed)
- Batch converting docs for static site generators
Where Pandoc struggles:
- Complex Word documents with precise layouts
- Excel spreadsheets (not designed for this)
- Files with heavy formatting or embedded objects
- Proprietary binary formats
Pandoc is the tool when you control the input format. If you write docs in Markdown and need to export to various formats, Pandoc is perfect.
Handling Edge Cases
Fonts are the enemy.
A document created on Windows with Calibri or Times New Roman might render differently on Linux. LibreOffice substitutes fonts. Sometimes this is fine. Sometimes it breaks pagination.
Solution: Install Microsoft core fonts:
bash
sudo apt install ttf-mscorefonts-installer
This gets you Arial, Times New Roman, Courier New, and other common Windows fonts. Not perfect but closer. You’ll need to accept Microsoft’s EULA during installation.
Embedded objects break.
Word documents with embedded Excel charts, Visio diagrams, or proprietary objects often don’t convert cleanly. LibreOffice tries. Sometimes you get a placeholder. Sometimes the object disappears.
No clean Linux-native fix for this. The objects use Windows-specific rendering.
Macros don’t transfer.
If the document has VBA macros, they won’t work in LibreOffice. The conversion process strips them or renders them non-functional.
Scripting Production Workflows
Real scenario: You run a documentation site. Contributors write in Markdown. Clients want PDF downloads. You need automated conversion.
Sample script (requires LaTeX for PDF generation):
bash
#!/bin/bash
# Convert all markdown files to PDF
for md in docs/*.md; do
filename=$(basename “$md” .md)
pandoc “$md” -o “pdfs/$filename.pdf” \
–pdf-engine=xelatex \
–variable mainfont=”DejaVu Sans” \
–variable geometry:margin=1in
echo “Generated pdfs/$filename.pdf”
done
Another scenario: Batch converting client submissions. They send DOCX, you need PDF for archival.
bash
#!/bin/bash
mkdir -p converted
for docx in submissions/*.docx; do
filename=$(basename “$docx” .docx)
libreoffice –headless –convert-to pdf \
–outdir converted “$docx”
done
Add error handling:
bash
#!/bin/bash
for docx in *.docx; do
if libreoffice –headless –convert-to pdf “$docx” 2>/dev/null; then
echo “✓ $docx converted”
else
echo “✗ $docx failed”
fi
done
When Linux Tools Aren’t Enough
Sometimes you need perfect fidelity. A contract with specific formatting. A report that must match the original exactly. A presentation with precise layout.
LibreOffice gets you 90% there. That last 10% is where things break. Fonts render slightly differently. Margins shift. Embedded objects go missing.
For these cases, web-based document conversion tools can handle the edge cases better. They’re built specifically for format conversion and usually preserve formatting more accurately than open-source alternatives.
This isn’t a knock against Linux tools—they’re excellent for most use cases. But when you’re dealing with legal docs, client deliverables, or anything where formatting precision matters, having a backup option saves time.
Choosing the Right Tool
Quick decision tree:
- Markdown or plain text source: Use Pandoc
- Batch converting office docs: Use LibreOffice headless or unoconv
- Simple internal documents: LibreOffice headless works fine
- Complex formatting matters: Test first, have a backup plan
- Automated pipeline: Script LibreOffice or unoconv with error handling
- Need PDFs from Markdown: Install Pandoc + LaTeX (texlive)
The Linux ecosystem has solid conversion tools. They’re free, scriptable, and work well for most documents. Know their limitations, test your specific use cases, and have alternatives ready when precision matters.



