PDF File Structure

From Drumlin Security Wiki
Jump to navigation Jump to search

(source: after Adobe, 2004)

The general structure of a PDF file is composed of the following code components: header, body, cross-reference (xref) table, and trailer, as shown in below:

Basic structure of a PDF file

PDF file structure

The header contains just one line that identifies the version of PDF. For example: %PDF-1.4 is the first line of the sample fonts PDF testfonts.pdf file. If you add the two values from the version number, e.g. 1.4 -> 1+4 you get 5 which is the version of Adobe Reader needed to view a document in that version of PDF - so version 1.6, which is probably the last overall "standard" version that is most widely used, requires Adobe Reader V7 or later (or other PDF readers that handle PDF version 1.6).

The trailer contains pointers to the xref table and to key objects contained in the trailer dictionary. It ends with %%EOF to identify end of file. The xref table contains pointers to all the objects included in the PDF file. It identifies how many objects are in the table, where the object begins (the offset), and its length in bytes. The body contains all the object information — fonts, images, words, bookmarks, form fields, and so on.

Save and Save As

When you perform a Save operation on a PDF file, the new, incremental information is appended to the original structure (see below); that is, a new body, xref table, and trailer are added to the original PDF file.

Structure of a PDF file after updates

PDF amended file structure