When building Compress PDF — our Craft suite's PDF compressor, a direct competitor to iLovePDF — we needed to answer one simple question: how much does PDF compression actually save, and which approach works best on real files?
Instead of showing off a few pretty numbers, we measured on real PDFs: multi-page scanned documents, image-heavy files, and plain-text PDFs alike — and we don't hide the cases where compression barely helps. That's the spirit of our whole honest benchmark series.
2 engines tested
| Engine | How it works | License | Best for |
|---|---|---|---|
| Ghostscript + qpdf | Downsample images + re-encode JPEG (lossy) → lossless cleanup | AGPL | Scanned / image-heavy PDFs |
| pdfcpu + qpdf | Lossless structural optimization (object streams, dedup) | Apache (permissive) | Text / already-optimized PDFs |
The key insight: in a PDF, the bytes live in the images. The only engine that meaningfully shrinks a file is the one willing to downsample images (Ghostscript). pdfcpu compresses "safely" but barely touches images.
3 compression levels
| Level | Image DPI | Equivalent | Use when |
|---|---|---|---|
| Less | 200 DPI | print quality | you need high quality |
| Recommended | 150 DPI | good on screen | balanced (default) |
| Extreme | 72 DPI | quick viewing | smallest possible |
Methodology
The sample set is 8 PDF files: 6 real scanned documents from 4 to 100 pages (pulled from the public portal vanban.chinhphu.vn so the numbers stay verifiable) plus 2 internal sample files. Each file was compressed with both engines at all three levels, then we measured before/after size and processing time.
Test setup: everything runs server-side via CLI binaries, no GPU, on an ERPFit Linux x86 VM. Ghostscript uses -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 with -dColorImageResolution set per level (200 / 150 / 72 DPI) and -dDownsampleColorImages=true. pdfcpu runs its default optimize command, then both pass through a qpdf --linearize cleanup step.
Every result is compared against the original and we keep the smaller one. The numbers below are real measurements, not rounded to look good. The same approach drives our text summarization benchmark and the open-source stack we rely on.
Results — scanned / image-heavy PDFs
This is where PDF compression delivers the most. Extreme level, Ghostscript engine (ERPFit internal benchmark, Jun 2026):
| File (real) | Before | After | Saved |
|---|---|---|---|
| Scanned document · 4 pages | 2.06 MB | 250 KB | −88% |
| Scanned document · 5 pages | 2.85 MB | 408 KB | −86% |
| Scanned document · 10 pages | 2.96 MB | 672 KB | −78% |
| Photo sample · 4 pages | 364 KB | 48 KB | −87% |
Not every file compresses
Most online tools hide this. We don't. Some PDFs are already optimized — their images are already low-resolution or heavily compressed — so there's almost nothing left to squeeze:
| File (real) | Before | After | Saved |
|---|---|---|---|
| Scanned document · 100 pages | 4.07 MB | 3.96 MB | −3% |
| Scanned document · 31 pages | 1.28 MB | 1.20 MB | −6% |
Even at "Less" compression, re-encoding an already-compressed image can make the file bigger. So the tool always compares the result against the original and keeps the smaller one — you never get back a file larger than you started with.
What about the "safe" engine (pdfcpu)?
pdfcpu is lossless and permissively licensed (Apache), but on the same scanned files it only shaves 0–1%. It's the right choice for pure-text PDFs, or when you must avoid the AGPL license — not for shrinking scans.
| File type | Ghostscript | pdfcpu |
|---|---|---|
| Scanned document, 4 pages | −88% | −1% |
| Pure-text PDF | −20% | −21% |
Speed
Everything runs server-side via CLI binaries (no GPU needed). Ghostscript at extreme: ~200–600 ms for a few-page file; a 100-page scanned document processed in ~580 ms. pdfcpu is faster (~30–200 ms) because it does less work.
Predicting output size before you compress
iLovePDF gives you 3 buttons and hides every number. We do the opposite: before compressing, the tool runs Ghostscript on the first 3 pages and extrapolates the full-file size across all 3 levels. This real-sampling approach is far more accurate than a guesswork formula — within ~10–20% — and it correctly handles PDFs with "hidden" images that structural analyzers miss.
Automatic engine selection
Auto mode estimates first, then routes: image-heavy PDFs → Ghostscript, text/already-optimized → pdfcpu, always keeping the smaller result. You don't need to understand the internals — just drop your file.
Conclusion
- Scanned / image-heavy PDFs: −78% to −88% at extreme — that's most "heavy" real-world PDFs.
- Already-optimized / pure-text PDFs: tiny gains, and we say so instead of pretending.
- Never makes a file bigger — always keeps the smaller version.
- Transparent: shows real DPI, real levels, and a predicted size before you click.