All posts
June 17, 2026·4 min read·Long Nguyen · CTO, ERPFit·Updated June 17, 2026
craftbenchmarkpdfcompression
How much can you really compress a PDF? Honest benchmark

How much can you really compress a PDF? Honest benchmark

Ghostscript compresses real scanned PDFs by −78% to −88%; already-optimized files only drop −3%. Honest benchmark of 2 engines × 3 levels on real files.

Table of Contents

When building Compress PDF — our Craft suite's PDF compressor, a direct competitor to iLovePDF — we needed to answer one simple question: how much does PDF compression actually save, and which approach works best on real files?

Instead of showing off a few pretty numbers, we measured on real PDFs: multi-page scanned documents, image-heavy files, and plain-text PDFs alike — and we don't hide the cases where compression barely helps. That's the spirit of our whole honest benchmark series.

2 engines tested

EngineHow it worksLicenseBest for
Ghostscript + qpdfDownsample images + re-encode JPEG (lossy) → lossless cleanupAGPLScanned / image-heavy PDFs
pdfcpu + qpdfLossless structural optimization (object streams, dedup)Apache (permissive)Text / already-optimized PDFs

The key insight: in a PDF, the bytes live in the images. The only engine that meaningfully shrinks a file is the one willing to downsample images (Ghostscript). pdfcpu compresses "safely" but barely touches images.

3 compression levels

LevelImage DPIEquivalentUse when
Less200 DPIprint qualityyou need high quality
Recommended150 DPIgood on screenbalanced (default)
Extreme72 DPIquick viewingsmallest possible

Methodology

The sample set is 8 PDF files: 6 real scanned documents from 4 to 100 pages (pulled from the public portal vanban.chinhphu.vn so the numbers stay verifiable) plus 2 internal sample files. Each file was compressed with both engines at all three levels, then we measured before/after size and processing time.

Test setup: everything runs server-side via CLI binaries, no GPU, on an ERPFit Linux x86 VM. Ghostscript uses -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 with -dColorImageResolution set per level (200 / 150 / 72 DPI) and -dDownsampleColorImages=true. pdfcpu runs its default optimize command, then both pass through a qpdf --linearize cleanup step.

Every result is compared against the original and we keep the smaller one. The numbers below are real measurements, not rounded to look good. The same approach drives our text summarization benchmark and the open-source stack we rely on.

Results — scanned / image-heavy PDFs

This is where PDF compression delivers the most. Extreme level, Ghostscript engine (ERPFit internal benchmark, Jun 2026):

File (real)BeforeAfterSaved
Scanned document · 4 pages2.06 MB250 KB−88%
Scanned document · 5 pages2.85 MB408 KB−86%
Scanned document · 10 pages2.96 MB672 KB−78%
Photo sample · 4 pages364 KB48 KB−87%

Not every file compresses

Most online tools hide this. We don't. Some PDFs are already optimized — their images are already low-resolution or heavily compressed — so there's almost nothing left to squeeze:

File (real)BeforeAfterSaved
Scanned document · 100 pages4.07 MB3.96 MB−3%
Scanned document · 31 pages1.28 MB1.20 MB−6%

Even at "Less" compression, re-encoding an already-compressed image can make the file bigger. So the tool always compares the result against the original and keeps the smaller one — you never get back a file larger than you started with.

What about the "safe" engine (pdfcpu)?

pdfcpu is lossless and permissively licensed (Apache), but on the same scanned files it only shaves 0–1%. It's the right choice for pure-text PDFs, or when you must avoid the AGPL license — not for shrinking scans.

File typeGhostscriptpdfcpu
Scanned document, 4 pages−88%−1%
Pure-text PDF−20%−21%

Speed

Everything runs server-side via CLI binaries (no GPU needed). Ghostscript at extreme: ~200–600 ms for a few-page file; a 100-page scanned document processed in ~580 ms. pdfcpu is faster (~30–200 ms) because it does less work.

Predicting output size before you compress

iLovePDF gives you 3 buttons and hides every number. We do the opposite: before compressing, the tool runs Ghostscript on the first 3 pages and extrapolates the full-file size across all 3 levels. This real-sampling approach is far more accurate than a guesswork formula — within ~10–20% — and it correctly handles PDFs with "hidden" images that structural analyzers miss.

Automatic engine selection

Auto mode estimates first, then routes: image-heavy PDFs → Ghostscript, text/already-optimized → pdfcpu, always keeping the smaller result. You don't need to understand the internals — just drop your file.

Conclusion

  • Scanned / image-heavy PDFs: −78% to −88% at extreme — that's most "heavy" real-world PDFs.
  • Already-optimized / pure-text PDFs: tiny gains, and we say so instead of pretending.
  • Never makes a file bigger — always keeps the smaller version.
  • Transparent: shows real DPI, real levels, and a predicted size before you click.

→ Try it now at pdf.erpfit.com

Frequently asked questions

Does compressing a PDF lose text or reduce quality?
Real text stays sharp because it's vector, not pixels. Only images inside the PDF get downsampled. A scanned document is essentially an image, so the "Extreme" 72 DPI level can look slightly soft when zoomed. Pick the "Recommended" 150 DPI level for a good balance.
What DPI is best for compression?
150 DPI (the "Recommended" level) is the sweet spot for most cases: readable on screen while keeping the file small. Choose 200 DPI for printing, or 72 DPI when you only need quick viewing and the smallest file. You can try all three before downloading.
Is PDF compression safe for legal documents?
Yes, if you keep the resolution high enough. For contracts or documents filed with authorities, use 150–200 DPI so stamps, signatures and small print stay legible. The tool never removes content, only shrinks image data, and always keeps the smaller-than-original version.
Why do already-optimized files barely compress further?
Because the bytes in a PDF live in its images, and these files already have low-resolution or heavily compressed images. There's nothing left to remove. In our benchmark, a 100-page scanned document dropped only −3%. We say so plainly instead of over-promising like many online tools.
How do pdfcpu and Ghostscript differ?
Ghostscript downsamples images (lossy), so it compresses scans hard. pdfcpu only does lossless structural optimization, ideal for pure-text PDFs or when you must avoid the AGPL license. Auto mode picks the right engine for each file.
LN
Long Nguyen
CTO, ERPFit

CTO at ERPFit. Hands-on builder of the technical infrastructure, the Craft tool suite, and the service platform for Vietnamese businesses.

Share:𝕏FBin