Does compressing a PDF lose text or reduce quality?

Real text stays sharp because it's vector, not pixels. Only images inside the PDF get downsampled. A scanned document is essentially an image, so the "Extreme" 72 DPI level can look slightly soft when zoomed. Pick the "Recommended" 150 DPI level for a good balance.

What DPI is best for compression?

150 DPI (the "Recommended" level) is the sweet spot for most cases: readable on screen while keeping the file small. Choose 200 DPI for printing, or 72 DPI when you only need quick viewing and the smallest file. You can try all three before downloading.

Is PDF compression safe for legal documents?

Yes, if you keep the resolution high enough. For contracts or documents filed with authorities, use 150–200 DPI so stamps, signatures and small print stay legible. The tool never removes content, only shrinks image data, and always keeps the smaller-than-original version.

Why do already-optimized files barely compress further?

Because the bytes in a PDF live in its images, and these files already have low-resolution or heavily compressed images. There's nothing left to remove. In our benchmark, a 100-page scanned document dropped only −3%. We say so plainly instead of over-promising like many online tools.

How do pdfcpu and Ghostscript differ?

Ghostscript downsamples images (lossy), so it compresses scans hard. pdfcpu only does lossless structural optimization, ideal for pure-text PDFs or when you must avoid the AGPL license. Auto mode picks the right engine for each file.

How much can you really compress a PDF? Honest benchmark

When building Compress PDF — our Craft suite's PDF compressor, a direct competitor to iLovePDF — we needed to answer one simple question: how much does PDF compression actually save, and which approach works best on real files?

Instead of showing off a few pretty numbers, we measured on real PDFs: multi-page scanned documents, image-heavy files, and plain-text PDFs alike — and we don't hide the cases where compression barely helps. That's the spirit of our whole honest benchmark series.

2 engines tested

Engine	How it works	License	Best for
Ghostscript + qpdf	Downsample images + re-encode JPEG (lossy) → lossless cleanup	AGPL	Scanned / image-heavy PDFs
pdfcpu + qpdf	Lossless structural optimization (object streams, dedup)	Apache (permissive)	Text / already-optimized PDFs

The key insight: in a PDF, the bytes live in the images. The only engine that meaningfully shrinks a file is the one willing to downsample images (Ghostscript). pdfcpu compresses "safely" but barely touches images.

3 compression levels

Level	Image DPI	Equivalent	Use when
Less	200 DPI	print quality	you need high quality
Recommended	150 DPI	good on screen	balanced (default)
Extreme	72 DPI	quick viewing	smallest possible

Methodology

The sample set is 8 PDF files: 6 real scanned documents from 4 to 100 pages (pulled from the public portal vanban.chinhphu.vn so the numbers stay verifiable) plus 2 internal sample files. Each file was compressed with both engines at all three levels, then we measured before/after size and processing time.

Test setup: everything runs server-side via CLI binaries, no GPU, on an ERPFit Linux x86 VM. Ghostscript uses -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 with -dColorImageResolution set per level (200 / 150 / 72 DPI) and -dDownsampleColorImages=true. pdfcpu runs its default optimize command, then both pass through a qpdf --linearize cleanup step.

Every result is compared against the original and we keep the smaller one. The numbers below are real measurements, not rounded to look good. The same approach drives our text summarization benchmark and the open-source stack we rely on.

Results — scanned / image-heavy PDFs

This is where PDF compression delivers the most. Extreme level, Ghostscript engine (ERPFit internal benchmark, Jun 2026):

File (real)	Before	After	Saved
Scanned document · 4 pages	2.06 MB	250 KB	−88%
Scanned document · 5 pages	2.85 MB	408 KB	−86%
Scanned document · 10 pages	2.96 MB	672 KB	−78%
Photo sample · 4 pages	364 KB	48 KB	−87%

Not every file compresses

Most online tools hide this. We don't. Some PDFs are already optimized — their images are already low-resolution or heavily compressed — so there's almost nothing left to squeeze:

File (real)	Before	After	Saved
Scanned document · 100 pages	4.07 MB	3.96 MB	−3%
Scanned document · 31 pages	1.28 MB	1.20 MB	−6%

Even at "Less" compression, re-encoding an already-compressed image can make the file bigger. So the tool always compares the result against the original and keeps the smaller one — you never get back a file larger than you started with.

What about the "safe" engine (pdfcpu)?

pdfcpu is lossless and permissively licensed (Apache), but on the same scanned files it only shaves 0–1%. It's the right choice for pure-text PDFs, or when you must avoid the AGPL license — not for shrinking scans.

File type	Ghostscript	pdfcpu
Scanned document, 4 pages	−88%	−1%
Pure-text PDF	−20%	−21%

Speed

Everything runs server-side via CLI binaries (no GPU needed). Ghostscript at extreme: ~200–600 ms for a few-page file; a 100-page scanned document processed in ~580 ms. pdfcpu is faster (~30–200 ms) because it does less work.

Predicting output size before you compress

iLovePDF gives you 3 buttons and hides every number. We do the opposite: before compressing, the tool runs Ghostscript on the first 3 pages and extrapolates the full-file size across all 3 levels. This real-sampling approach is far more accurate than a guesswork formula — within ~10–20% — and it correctly handles PDFs with "hidden" images that structural analyzers miss.

Automatic engine selection

Auto mode estimates first, then routes: image-heavy PDFs → Ghostscript, text/already-optimized → pdfcpu, always keeping the smaller result. You don't need to understand the internals — just drop your file.

Conclusion

Scanned / image-heavy PDFs: −78% to −88% at extreme — that's most "heavy" real-world PDFs.
Already-optimized / pure-text PDFs: tiny gains, and we say so instead of pretending.
Never makes a file bigger — always keeps the smaller version.
Transparent: shows real DPI, real levels, and a predicted size before you click.

→ Try it now at pdf.erpfit.com