PDF Output

Bookmarks

PDF bookmarks provide an additional way of navigating, similar to a table of contents. The tree bookmark structure is intended to be used by the PDF readers, usually displayed in a side view. More often, the bookmarks show the logical hierarchy of the book, with pointers to the chapters and section, similar to a TOC. Creating bookmarks has no effect on the printed material.

Oxygen PDF Chemistry can create PDF bookmarks by using the standard CSS properties: bookmark-level, bookmark-label, and bookmark-state.

For an HTML document, you can collect the titles from the heading elements text.

h1, h2, h3, h4, h5, h6 {
   bookmark-label: content(text);
}

In the following example, the :before pseudo-element is concatenated. That prefixes each of the h1 with the value of the chapter number, with the text from the element:

body {
    counter-reset: chapter;
}
...
h1 { 
    bookmark-label: content(before) " / " content(text);
}

h1:before{
    content: counter(chapter);
    counter-increment:chapter;
}

You can define the level (depth in the hierarchy) of the bookmarks. The deeper the section, the higher the level:

h1 { bookmark-level: 1; }
h2 { bookmark-level: 2; }
h3 { bookmark-level: 3; }
h4 { bookmark-level: 4; }
h5 { bookmark-level: 5; }
h6 { bookmark-level: 6; }

Also, you can control if the bookmarks are shown expanded or collapsed in the bookmark view. By default, all bookmarks are open. To close all the nodes from the level 2, you can use:

h2 {
    bookmark-state:closed;
}

Note: In the built-in CSS that Oxygen PDF Chemistry uses for processing HTML, the bookmarks are already configured using the bookmark-level and bookmark-label properties. If you need to set the closed/open state, you should use the bookmark-state property in your custom CSS file.

Edit online

Metadata

PDF files may contain metadata. Metadata provides additional information about a certain document, such as its title, author, organization, creation date, format, or copyright.

HTML defines the meta element for keeping track of information that describes your content. Most of this information should migrate to the PDF document properties. The property values may be either static (specified directly from the CSS) or dynamic (collected from the document) using the following functions:

content(text)
attr()
oxy_xpath()

Predefined Meta Fields

Examples of common metadata:

Publication title
Author
Keywords
Short description
Copyright information

Oxygen PDF Chemistry automatically extracts this information from HTML documents.

Suppose that you have the following arbitrary XML document:

<doc>
     <title>Publication title</title>
     <meta name='keywords' content='software, network'>
     <meta name='description' content='This is a publication about software products...'>
     <meta name='author' content='John, jo@mysite.example.com'>
     <meta name='copyright' content='Copyright My Company 2021'>

...

You could use any of the following CSS selectors to extract the metadata:

-oxy-pdf-meta-title

It is used to extract the publication title. You can use it by matching the <title> element:

title {
    -oxy-pdf-meta-title: content(text);
}

If this CSS selector matches multiple elements, only the first element in the document order will be used to extract the title.

-oxy-pdf-meta-author

It is used to extract the publication author. You can use it by matching the <meta> element with the attribute name='author':

meta[name='author'] {
    -oxy-pdf-meta-author: attr(content);
}

If this CSS selector matches multiple elements, only the first element in the document order will be used to extract the author.

-oxy-pdf-meta-description

It is used to extract the publication description. You can use it by matching the <meta> element with the attribute name='description' or name='description':

meta[name='description'], 
meta[name='subject'] {
    -oxy-pdf-meta-description: attr(content);
}

If this CSS selector matches multiple elements, only the first element in the document order will be used to extract the description.

-oxy-pdf-meta-keywords

It is used to extract the publication keywords. For example, you can use it by matching the <meta> element with the attribute name='keywords'. Its value should be a list of tokens, separated by commas:

meta[name='keywords'] {
    -oxy-pdf-meta-keywords: attr(content);
}

If this CSS selector matches multiple elements, only the first element in the document order will be used to extract the keywords.

-oxy-pdf-meta-keyword

It is used to extract a single keyword. Individual keywords are accumulated from elements that match the CSS rule that uses this property and then concatenated into a single string. This single string is then set in the PDF 'keywords' section. For example, if you mark keywords in your HTML document with a span with a "kw" class, you can collect them all by using:

span.kw {
    -oxy-pdf-meta-keyword: content(text);
}

-oxy-pdf-meta-copyright

-oxy-pdf-meta-copyrighted

-oxy-pdf-meta-copyright-url

These properties define the copyright metadata. Acrobat Reader Pro, for example, displays this in the Details tab of the File/Document Properties dialog box.

meta[name='copyright'] {
    -oxy-pdf-meta-copyright: attr(content);
    -oxy-pdf-meta-copyrighted: copyrighted;
    -oxy-pdf-meta-copyright-url: "https://my.company/copyright-notice.html";
}

The -oxy-pdf-meta-copyright property specifies the copyright text for its value, the -oxy-pdf-meta-copyrighted property specifies whether or not the publication is copyrighted (accepts only copyrighted or public-domain for the value), and the -oxy-pdf-meta-copyright-url property can be used to specify the location of an external copyright notice.

Custom Meta Fields

Metadata is not restricted to the above cases. You may have custom metadata fields. It is usually displayed in a tabular format (for example, in Acrobat Reader ™, it is in the Custom tab in the Properties dialog box).

-oxy-pdf-meta-custom

This property defines a list of pairs. Each pair contains the name and the value for the meta information field. The pairs must be separated by a comma:

name1
                value1, name2 value2

In the following example, all the HTML meta tags are dumped as custom meta fields in the PDF:

meta {
    -oxy-pdf-meta-custom: attr(name) attr(content);
}

If you have a span that defines the document creation date somewhere in the document content, you can use:

span.created {
    -oxy-pdf-meta-custom: "CreationDate" content(text);
}

In case of conflicts, when two or more elements trigger the setting of a meta field with the same name, only the first definition of a meta field will be used in the PDF output.

Edit online

Named Destinations (Anchors)

The named destinations FO extension provides a way to link to a particular anchor within a PDF document.

Suppose your PDF output is published on a website and accessible at the URL http://my_site.com/files/my_document.pdf, and the original XML document has a <section> element with an @id attribute set to installation.

...
<section id="installation">
...
</section>
...

To open it in the PDF reader exactly at that particular section (with the id value of installation), you can use the #installation anchor in the URL: http://my_site.com/files/my_document.pdf#installation.

Oxygen PDF Chemistry declares named destinations for any @id or @xml:id attributes from your input XML document. As an alternative, if you do not want to alter the ids in the document, the @nd:nd-id attribute can be used. In this case, make sure the nd prefix is bound to the xmlns:nd="http://www.oxygenxml.com/css2fo/named-destinations" namespace.

Edit online

Accessibility (508 Compliance)

It is recommended that you make your PDF output accessible for people who are blind or visually impaired. Many government organizations require documents to be accessible.

PDF Accessibility Tagging

By default, Oxygen PDF Chemistry partially creates accessible PDF documents in the sense that most of the paragraphs, tables, lists, headers, and footers are tagged automatically for any XML vocabulary, and PDF readers use this information to present the content.

In addition, the default CSS files used by Oxygen PDF Chemistry to generate PDF based on HTML defines accessibility tags for headings (H1..H6), quotations (Q), sections (SECTION), and pre-formatted text (PRE).

However, this tagging just takes the element name into account. If your element has a different semantic, you can impose a different PDF accessibility tag by using the -oxy-pdf-tag-type extension. In the following example, a paragraph with the note class will be marked:

p.note {
  -oxy-pdf-tag-type: "Note";
}

Note: The headers and footers (or other text placed in the page margins) are automatically marked as artifacts, so they are ignored by the screen readers.

Hints for Making Documents More Accessible

Hint 1: The title of the document must be marked using the metadata.

This is important for accessibility since it will allow the screen reader to identify the publication title. This is an example using the -oxy-pdf-meta-title extension:

title {
    -oxy-pdf-meta-title: content(text);
}

Note: The default CSS files for generating PDF based on HTML already contains this rule.

Hint 2: Specify the language on the root of your document.

For XML documents, use

Use xml:lang on the root of your
                document. For HTML documents, use the @lang attribute.

on the root of your document. For HTML documents, use the @lang attribute.

Hint 3: Set alternate text on all images.

Oxygen PDF Chemistry supports the -oxy-alt-text extension that can be used to associate the alternate text.

The following is an example from the Oxygen PDF Chemistry default CSS for generating PDF based on HTML, where it maps the property to the value of the @alt attribute of the <img> tag:

img {
   -oxy-alt-text: attr(alt);
}

For embedded SVG, Oxygen PDF Chemistry automatically uses the <title> element as the alternate text of the image.

For embedded MathML, Oxygen PDF Chemistry automatically uses the @alttext attribute as the alternate text of the equation.

Fully Accessible PDF (PDF/UA1)

To make the PDF fully accessible, you have to activate the PDF/UA-1 mode. PDF/UA documents meet the regulations set in Section 508. This mode has special requirements:

Activate the PDF UA-1 mode from the command line using the -pdf-ua parameter.
All the fonts must be embedded. If you are using one of the basic fonts (such as "Times", "Helvetica", etc.), make sure you explicitly define CSS font faces for them. For details, see: Font Embedding.
Troubleshooting: If you are using fonts other than the basic ones and still have problems embedding the basic default fonts, make sure all elements are styled using one of your fonts of choice. A catch all CSS rule might be helpful:
```
:root{
  font-family: Arial;
}

@page {
    @top-left {font-family: Arial }
    @top-right {font-family: Arial }
    @top-center {font-family: Arial }
    @top-left-corner {font-family: Arial }
    @top-right-corner {font-family: Arial }
    
    @bottom-left {font-family: Arial }
    @bottom-right {font-family: Arial }
    @bottom-center {font-family: Arial }
    @bottom-left-corner {font-family: Arial }
    @bottom-right-corner {font-family: Arial }
}
```
The title of the document must be marked using the metadata. This is important for accessibility since it will allow the screen reader to identify the publication title. This is important for accessibility since it will allow the screen reader to identify the publication title. This is an example using the -oxy-pdf-meta-title extension:
```
title {
    -oxy-pdf-meta-title: content(text);
}
```
Note: The default CSS files for generating PDF based on HTML already contains this rule.

Tools for Checking the Document Accessibility

For smaller documents, this site might be helpful: http://www.access-for-all.ch/ch/pdf-werkstatt/pdf-accessibility-checker-pac.html
From Adobe: https://helpx.adobe.com/acrobat/using/create-verify-pdf-accessibility.html

Edit online

Archiving

PDF/A is the ISO standard for PDF specialized in the archiving and long-term preservation of electronic documents. To use this mode, you must:

Set a PDF/A mode from the command line using the -pdf-a parameter with one of the following values:
- PDF/A-1a
- PDF/A-1b
- PDF/A-2a
- PDF/A-2b
- PDF/A-2u
- PDF/A-3a
- PDF/A-3b
- PDF/A-3u
Embed all of the fonts. If you use one of the basic fonts (such as "Times", "Helvetica", etc.), make sure you explicitly define CSS font faces for them. For details, see: Font Embedding.