Roland's homepage

My random knot in the Web

PDF tricks

This article contains several useful tricks for manipulating PDF files.

The focus of this article is on Open Source and Free software, that are available for UNIX-like operating systems. These tools are made for use on the command-line of a shell.

Adding password restrictions to a PDF file

PDF files can have two passwords;

  • user password (Must be supplied to read a document.)
  • owner password (Can restrict printing, editing, copying. Not necessary to read the document.)

You can use qpdf (see also qpdf on github) to add restrictions.

Adding restrictions is done by “encrypting” the PDF with a owner password. Since this password is easily removed, you don’t need to remember this password. So I tend to generate one automatically.

The following command uses the SHA-256 checksum of the original file as the owner password.

> qpdf --encrypt '' `sha256 -q unrestricted.pdf` 128 \
--extract=n --modify=none --use-aes=y --cleartext-metadata -- \
unrestricted.pdf restricted.pdf

As given, it prevents copying (--extract=n) and modification (--modify=none), but leaves the document metadata unencrypted. By default, printing is allowed. The user password is an empty string, leaving read access open.

Running both through pdfinfo shows the file restrictions. First the unrestricted file.

> pdfinfo unrestricted.pdf
Subject:        ...
Keywords:       ...
Author:         ...
Creator:        ...
Producer:       ...
CreationDate:   Tue Mar  1 21:17:23 2016 CET
ModDate:        Tue Mar  1 21:17:23 2016 CET
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          2
Encrypted:      no
Page size:      841.89 x 595.276 pts (A4)
Page rot:       0
File size:      152342 bytes
Optimized:      no
PDF version:    1.7

Contrast that with the output for the restricted file (trimmed for brevity).

> pdfinfo restricted.pdf
...
Encrypted:      yes (print:yes copy:no change:no addNotes:no algorithm:AES)
...

Note that this only protects your documents from laypeople, since qpdf can also remove such restrictions, as shown below.

If you need stronger access control, you should set the user password or use other kinds of encryption that would prevent people from reading the file without knowing the password.

Removing restrictions from a PDF file

If a document only has an owner password, you can use qpdf to remove it, without having to provide the owner password!

Note that this only works with one of the standard encryption handlers (RC4 and AES). If a document was encrypted with a custom encryption handler this might not work.

> qpdf -decrypt restricted.pdf unrestricted2.pdf
> pdfinfo unrestricted2.pdf
...
Encrypted:      no
...

So an owner password is not a protection against knowledgeable people.

Changing the metadata in a PDF file

The exiftool program can be used to change the Info dictionary and XMP tags in a PDF file.

For example, I’ve seen a e-book application on an android device use the “title” from the Info dictionary to label PDFs in the user interface. However in some PDF files the title is either empty or bears no resemblance to the actual contents. In cases like this you really want to update the metadata.

> exiftool -Title='Alexit hardener 405-25' -overwrite_original ALEXIT-Hardener_405-25_DE.pdf
    1 image files updated

Overlaying text and images in a PDF file

This is such a substantial topic that it is located in a separate article.

Converting PDF to bitmap formats

Sometimes a PDF needs to be converted to bitmap format, e.g. for display on a webpage. (This is assuming that generating the same image in SVG format is not possible.)

The programs are from the ImageMagick suite of tools.

to PNG

convert -density 1200 -units PixelsPerInch \
    <input.pdf> \
    -scale 25% \
    <output.png>

The first option (which needs to come before the name of the input file) tells it to convert the image to a bitmap at 1200 pixels per inch (“PPI”). The standard resolution used by convert is only 72 PPI.

After the input file, -scale 25% is used to scale the image back. This reduces the effective resolution to 300 PPI, but averages the pixels giving a less pixelated look.

to JPEG

convert -density 1200 -units PixelsPerInch \
    <input.pdf> \
    -background white -flatten\
    -scale 25% \
    <output.jpg>

Here the -background white and -flatten options are needed to prevent a black background on some PDF files.

Creating a PDF from scanned pages

In the following it is assumed that the pages are scanned on A4 format and have their resolution embedded in the metadata.

> pdfjam --a4paper -o document.pdf page1.jpg page2.jpg

The pdfjam program is really a front-end for the pdfpages TeX package.


←  Adding text or graphics to a PDF file Converting RCS to git  →