Skip to content

1.1.2 Syntax

Felix Schütt edited this page Jul 5, 2017 · 11 revisions

As any file format, the data inside a PDF has a special syntax. These are well-defined rules how the data is written into the file. Before we look closer at the contents of a PDF file, we have to learn these rules.

Everything in the PDF body is structured in so-called "objects". In the "Hello World" PDF, the document body starts after the %%PDF-1.4 and end before the xref (not include these lines!). An "object" in the context of a PDF is just any kind of information. It does not have anything to do with objects used in programming languages.

There are several types of these objects which can contain various types of information and each one is serialized differently. However, there is one rule that all objects have to adhere to: Two objects have to be seperated by one or more whitespace items, except if the start of the next object is obvious from the context the object is used in.

A general warning: Everything in PDF is case-sensitive. You always have to match the exact capitalization.

Whitespace

In PDF, a whitespace is defined as the space key (ASCII 0x20), the tab key (ASCII 0x09) and the line break. Latter can both be defined as the UNIX "\n" (ASCII 0x0A), the (until Mac OS 9) Apple "\r" (ASCII 0x0D) or both (Windows "\r\n").

Theoratically you can also use the NUL sign (ASCII 0x00) and the page feed (ASCII 0x0C). These characters are however not widespread in everyday-use PDFs.

All objects should be seperated by whitespace. Type and count of whitespace is not relevant.

Numbers

Numbers can be written as integers (referred to as "Integer") or as floating-point numbers (referred to as "Real"). Integers are simply written with the numbers 0 - 9, negative numbers with a ASCII "-". Other characters are not allowed, especially no whitespace or comma / punctuation.

Correct:

1234 -1234

Wrong:

1'234 1234-

Floating-point numbers can use a decimal dot (ASCII ".") to denote the decimal places. Comma (",") or exponential notation is not allowed.

Correct:

123.4 0.1234

Wrong:

123,4 1.234e2
Clone this wiki locally