PDF View

From Drumlin Security Wiki
Jump to navigation Jump to search

Many book publishers have existing PDF files that they wish to make available in electronic form. In some cases the original source material is available, but often this not the case, so post-processing of these print-ready PDF files may be necessary in order to make them genuinely usable in an on-screen world. Tools like Adobe® Acrobat® enable existing PDF material to be edited in order to produce screen-friendly versions for distribution in standard PDF format or with Digital Rights Management (DRM) protection. Where the source material and document creation software is available, for example the source Quark Express, InDesign or Microsoft Word document, it is far better to start with these rather than post-processing a PDF that has already been generated.

For books and other material that is designed for extended reading, e.g. popular fiction, PDF format is not the best option as it is a fixed page-based format. This can be suitable for some material, but in most instances a flowable text format is preferable. The main set of standards for such publication is ePUB (Amazon's Kindle formats are proprietary variants of the ePUB standards). Many source material editors now provide the option to output (export) their content to a chosen ePUB standard, so for documents that are to read entirely on-screen and from cover-to-cover, ePUB is a preferable choice as long as the document is not heavily reliant on contents that include complex page layouts, such as the use of tables or multiple columns.

Existing PDF files

The broad guidelines for such existing PDF files are as follows (similar guidelines apply for newly generated PDF files):

  • remove any print-production mark-up, typically by turning off any print marks and/or using a crop-box to ensure the displayed page content does not show any such markings
  • ensure the PDFs have no PDF security settings (no password to open, and no other security settings
  • add any additional copyright and usage statements felt desirable, typically at the start of the document, as a footer on every page, and optionally via a non-intrusive static PDF Watermarking in the background or foreground of every page
  • if there is a Contents page for the publication, provide explicit links from the Contents page to the various sections of the document - note that the PDF page numbering will not normally match the printed document page numbering, as the latter will usually start at page 1 which is several pages from page 1 of the PDF (i.e. does not include PDF pages like a cover page, acknowledgements, contents, lists of figures etc.)
  • change text strings that describe external links (e.g. www.mylink.com) into explicit links - this associates the text string with a real URL so the link will be actioned when clicked or touched in a PDF viewer. Note: (i) some PDF viewer software "guesses" that a text string which has no associated explicit URL is a real link and act accordingly, but many do not perform this function and thus should not be relied upon to provide this facility; and (ii) ideally explicit links should be provided at the time of document creation (prior to PDF generation) but post-processing of a PDF to identify and convert such text strings is provide in some PDF editing software, e.g. in Adobe Acrobat (Edit, Links, Auto-create web links from URLs) - see further, below
  • for any document with more than, say 50 pages, ensure that a navigation tree (also known as a Bookmarks tree or Outline) is included within the PDF file - this can be added manually and may match the elements in the Contents page, or can be generated programmatically using specialized software. An example of the latter is AutoBookmark™ which is an advanced plug-in for Adobe® Acrobat® and Adobe® Acrobat Professional® software
  • provide a suitable single-page cover page for the document, which should NOT be the entire original PDF of the print-ready cover as this will typically be a completely different size and format from the main body of the publication contents. The cover page can be produced from the source document cover PDF by suitable cropping or by creating a blank page at the start of the main content and inserting an image of the book cover to fill the blank cover
  • PDF files that consist entirely or in large part, of scanned images (e.g. PDF files generated by scanning in a printed book in order to create an electronic version) are not suited to on-screen viewing, other than in exceptional circumstances. Where such files are provided, consider using the Adobe Acrobat "Enhance Scans" tool, as this can produce excellent results from many scanned documents
  • If the PDF filesize is large (e.g. greater than 50Mbytes) investigate whether the filesize can be reduced (optimized for screen viewing). There are standard tools for achieving filesize reduction (e.g. within Adobe Acrobat), although this is best done from the source document PDF creation facilities if possible. A screenshot of the Adobe advanced optimization, image optimization form, is shown below:

Adobe optimizer, Image tab

Newly generated PDF files

The guidance for newly generated PDF files is essentially the same as for existing PDF files, but will have the benefit of improved file creation using the source editor PDF export facilities rather than post-processing a pre-generated PDF file.

The following recommendations are from the Mozilla organization, who are responsible for the Firefox web browser, combined with some of our own recommendations. See also the page: Optimize a PDF from Adobe's website for more information. Note that the list below is specifically for PDF display within web browsers, not for PDF files that are displayed in specialized PDF Viewer software, although there is considerable commonality in the recommendations:

  • Avoid using high resolution images - 150 dpi resolution for scanned images is enough for most screens
  • ensure the PDFs have no PDF security settings (no password to open, and no other security settings
  • Try to use JPEG encoding for color images/photos in RGB colorspace when possible
  • Avoid using compositions/effects such as transitions/masking - flatten transparency
  • Avoid using PDF generators (or do not create content) that can produce very poorly structured PDF output (e.g. LibreOffice and several other PDF creators produce lots of tiny images for vector elements/pictures that they do not understand)
  • If there is such a setting, use web-optimized PDF output / linearization
  • Fix or don't produce corrupted PDFs that do not conform to the PDF32000 specification - aim for compatibility with Adobe reader 7 (PDF 1.6) or earlier
  • check the PDF to see if there are any hidden objects or transparent text boxes, as these may appear as visible objects or text when displayed on some readers and /or converted to an alternative format (e.g. HTML5) - remove any unwanted content

PDF filenames

When PDF files are created it is quite common for the filenames to be lengthy and may contain a range of spaces, special characters and details that are not required when the file is distributed. This can result in problems - for example, a filename that include spaces or an ampersand symbol (&) can result in the file being rejected on some systems and services. The recommended solution to this is to rename the PDF before distribution to a naming convention that is not likely to result in confusion or errors. For conventionally published books this could be simply based on the book's ISBN reference (which is unique), i.e. <ISBN13>.pdf or could be simplified to a naming convention where the filename was modified so that all spaces and special characters were replaced with a hyphen. In some instances it is also advisable to ensure filenames are not too long (e.g. less than 64 characters) and all in the same "case" (e.g. all in lower case).

Page size

Because PDF files are page-based and not reflowable (i.e. are fixed format, unlike most ePUB and HTML files) they can be very difficult to read on small-screen devices. Using larger fonts and/or a small underlying page size can help a great deal, assuming this is an option at the time of document editing and PDF creation. Trying to modify the format after PDF generation is not really practical. Where content is difficult to read on small format devices, end users should be recommended only to read the document on larger screens. Single page formats should always be used (i.e. not double-page spreads).

Document navigation support

Documents that are not intended to be read from cover to cover, which includes most non-fiction publications, should include tools to make it simple and quick for end users to access the specific items of information they are interested in. This can be achieved in a number of ways, as discussed above: include a linked Contents page; include a navigation tree/bookmarks/outline tree; ensure the recommended PDF reader software supports free text search (and the document content itself is searchable).

File size

Filesize is determined by the memory handling facilities of the target devices. For PCs and Macs large file sizes (e.g. 100Mbytes or more) should work in most cases, but on mobile devices their memory handling is much less sophisticated and they slice the memory up into 64Mb chunks per application. We recommend that for distribution reasons (reliable downloading/fast downloading) and for memory management reasons, files should be kept to under 50Mb and preferably under 30Mb - the number of pages does not really matter - it can be 1000s. PDF files can be saved in a reduced size manner to ensure files are not too large and of course for screens (which have relatively low resolution) there is no need to use very hi-resolution print-ready PDF formats. Some publishing systems, like InDesign and Word, allowing you to export the content as "For screen display" or "Minimum filesize" so this is the recommended approach. If the files are still too large then we recommend splitting them into 2 or more "volumes". As the files are viewed interactively it is important to include a Contents list (also known as a bookmark tree or outline) for fast navigation, particularly for mobile devices. PDF files converted should not be set with Adobe-style protection and should not include embedded media such as 3D models or video clips. To provide video and audio support use links to separately hosted mp4 and mp3 files (see further, below).

Embedded media

Embedded media can result in some PDF files being very large and also requires the use of specific PDF readers, because such embedding is non-standard and typically Adobe-specific. For details on how to do this using Adobe Acrobat see: Adobe Rich Media support - Solutions to making media-enabled PDFs available more widely and/or with content protection include: retaining the source PDF but replacing the embedded media with links to externally hosted media files (typically web-hosted streaming facilities such as Vimeo or pure MP3 or MP4 hosted files); automated conversion of the PDF to HTML5 with embedded media; or conversion of the PDF to HTML5 with linked media; we provide solutions for all these options, including solutions (with full support for transitions and animations) for PowerPoint files, all with content protection facilities.

Implicit versus Explicit links

Another common issue with PDF creation is the use of implicit rather than explicit URLs. Suppose you include www.adobe.com as text in your source document and then output the file to PDF. Some PDF readers, including the Adobe Reader, will "guess" that you meant this to be a web address and will act on it accordingly. However, the item will not be highlighted as a link (which is the case for web browsers also, as can be seen from www.adobe.com not showing up as a link) and for many PDF viewers on different technology platforms the result will be no action at all. The solution is to make the link explicit rather than implicit (when you want it to be a link instead of just text). You can do this by selecting the text in the source file and then telling the editor to add a hyperlink at that point. This is like ensuring that a link in text like "Click here" actually specifies where the resulting click should take you to. Adobe Acrobat includes a facility to automatically create URLs from implicit links