PDF Output
You may have specific requirements for the PDF files you need to produce (such as the set of metadata, bookmarks, the level of accessibility, or the PDF format).
Bookmarks
PDF bookmarks provide an additional way of navigating, similar to a table of contents. The tree bookmark structure is intended to be used by the PDF readers, usually displayed in a side view. More often, the bookmarks show the logical hierarchy of the book, with pointers to the chapters and section, similar to a TOC. Creating bookmarks has no effect on the printed material.
Oxygen PDF Chemistry can create PDF bookmarks by using the standard CSS properties:
bookmark-level
, bookmark-label
, and
bookmark-state
.
For an HTML document, you can collect the titles from the heading elements text.
h1, h2, h3, h4, h5, h6 {
bookmark-label: content(text);
}
In the following example, the :before
pseudo-element is concatenated. That
prefixes each of the h1 with the value of the chapter number, with the text from the
element:
body {
counter-reset: chapter;
}
...
h1 {
bookmark-label: content(before) " / " content(text);
}
h1:before{
content: counter(chapter);
counter-increment:chapter;
}
You can define the level (depth in the hierarchy) of the bookmarks. The deeper the section, the higher the level:
h1 { bookmark-level: 1; }
h2 { bookmark-level: 2; }
h3 { bookmark-level: 3; }
h4 { bookmark-level: 4; }
h5 { bookmark-level: 5; }
h6 { bookmark-level: 6; }
Also, you can control if the bookmarks are shown expanded or collapsed in the bookmark view. By default, all bookmarks are open. To close all the nodes from the level 2, you can use:
h2 {
bookmark-state:closed;
}
bookmark-level
and
bookmark-label
properties. If you need to set the closed/open state, you
should use the bookmark-state
property in your custom CSS file.Metadata
PDF files may contain metadata. Metadata provides additional information about a certain document, such as its title, author, organization, creation date, format, or copyright.
meta
element for keeping track of information that
describes your content. Most of this information should migrate to the PDF document
properties. The property values may be either static (specified directly from the CSS)
or dynamic (collected from the document) using the following functions:- content(text)
- attr()
- oxy_xpath()
Predefined Meta Fields
- Publication title
- Author
- Keywords
- Short description
- Copyright information
Suppose that you have the following arbitrary XML document:
<doc>
<title>Publication title</title>
<meta name='keywords' content='software, network'>
<meta name='description' content='This is a publication about software products...'>
<meta name='author' content='John, jo@mysite.example.com'>
<meta name='copyright' content='Copyright My Company 2021'>
...
You could use any of the following CSS selectors to extract the metadata:
- -oxy-pdf-meta-title
- It is used to extract the publication title. You can use it by matching the
<title>
element:title { -oxy-pdf-meta-title: content(text); }
- -oxy-pdf-meta-author
- It is used to extract the publication author. You can use it by matching the
<meta>
element with the attributename='author'
:meta[name='author'] { -oxy-pdf-meta-author: attr(content); }
- -oxy-pdf-meta-description
- It is used to extract the publication description. You can use it by matching the
<meta>
element with the attributename='description'
orname='description'
:meta[name='description'], meta[name='subject'] { -oxy-pdf-meta-description: attr(content); }
- -oxy-pdf-meta-keywords
- It is used to extract the publication keywords. For example, you can use it by
matching the
<meta>
element with the attributename='keywords'
. Its value should be a list of tokens, separated by commas:meta[name='keywords'] { -oxy-pdf-meta-keywords: attr(content); }
- -oxy-pdf-meta-keyword
- It is used to extract a single keyword. Individual keywords are accumulated from
elements that match the CSS rule that uses this property and then concatenated into a
single string. This single string is then set in the PDF 'keywords' section. For
example, if you mark keywords in your HTML document with a span with a "kw" class, you
can collect them all by
using:
span.kw { -oxy-pdf-meta-keyword: content(text); }
- -oxy-pdf-meta-copyright
- -oxy-pdf-meta-copyrighted
- -oxy-pdf-meta-copyright-url
- These properties define the copyright metadata. Acrobat Reader Pro, for example,
displays this in the Details tab of the
File/Document Properties dialog
box.
Themeta[name='copyright'] { -oxy-pdf-meta-copyright: attr(content); -oxy-pdf-meta-copyrighted: copyrighted; -oxy-pdf-meta-copyright-url: "https://my.company/copyright-notice.html"; }
-oxy-pdf-meta-copyright
property specifies the copyright text for its value, the-oxy-pdf-meta-copyrighted
property specifies whether or not the publication is copyrighted (accepts onlycopyrighted
orpublic-domain
for the value), and the-oxy-pdf-meta-copyright-url
property can be used to specify the location of an external copyright notice.
Custom Meta Fields
Metadata is not restricted to the above cases. You may have custom metadata fields. It is usually displayed in a tabular format (for example, in Acrobat Reader ™, it is in the Custom tab in the Properties dialog box).
- -oxy-pdf-meta-custom
- This property defines a list of pairs. Each pair contains the name and the value for
the meta information field. The pairs must be separated by a comma:
name1 value1, name2 value2
Named Destinations (Anchors)
The named destinations FO extension provides a way to link to a particular anchor within a PDF document.
Suppose your PDF output is published on a website and accessible at the URL
http://my_site.com/files/my_document.pdf
, and the original XML document has
a <section>
element with an @id
attribute set to
installation
.
...
<section id="installation">
...
</section>
...
To open it in the PDF reader exactly at that particular section (with the id value of
installation
), you can use the #installation
anchor in the
URL: http://my_site.com/files/my_document.pdf#installation
.
Oxygen PDF Chemistry declares named destinations for any @id
or
@xml:id
attributes from your input XML document. As an alternative, if you
do not want to alter the ids in the document, the @nd:nd-id
attribute can be
used. In this case, make sure the nd
prefix is bound to the
xmlns:nd="http://www.oxygenxml.com/css2fo/named-destinations"
namespace.
Accessibility (508 Compliance)
It is recommended that you make your PDF output accessible for people who are blind or visually impaired. Many government organizations require documents to be accessible.
PDF Accessibility Tagging
By default, Oxygen PDF Chemistry partially creates accessible PDF documents in the sense that most of the paragraphs, tables, lists, headers, and footers are tagged automatically for any XML vocabulary, and PDF readers use this information to present the content.
In addition, the default CSS files used by Oxygen PDF Chemistry to generate PDF based on
HTML defines accessibility tags for headings (H1..H6
), quotations
(Q
), sections (SECTION
), and pre-formatted text
(PRE
).
However, this tagging just takes the element name into account. If your element has a
different semantic, you can impose a different PDF accessibility tag by using the
-oxy-pdf-tag-type
extension. In the following example, a paragraph with
the note
class will be marked:
p.note {
-oxy-pdf-tag-type: "Note";
}
Hints for Making Documents More Accessible
- Hint 1: The title of the document must be marked using the metadata.
-
This is important for accessibility since it will allow the screen reader to identify the publication title. This is an example using the
-oxy-pdf-meta-title
extension:title { -oxy-pdf-meta-title: content(text); }
Note: The default CSS files for generating PDF based on HTML already contains this rule. - Hint 2: Specify the language on the root of your document.
- For XML documents, use
Use
on the root of your document. For HTML documents, use thexml:lang
on the root of your document. For HTML documents, use the@lang
attribute.@lang
attribute. - Hint 3: Set alternate text on all images.
-
Oxygen PDF Chemistry supports the
-oxy-alt-text
extension that can be used to associate the alternate text.The following is an example from the Oxygen PDF Chemistry default CSS for generating PDF based on HTML, where it maps the property to the value of the
@alt
attribute of the<img>
tag:img { -oxy-alt-text: attr(alt); }
For embedded SVG, Oxygen PDF Chemistry automatically uses the
<title>
element as the alternate text of the image.For embedded MathML, Oxygen PDF Chemistry automatically uses the
@alttext
attribute as the alternate text of the equation.
Fully Accessible PDF (PDF/UA1)
To make the PDF fully accessible, you have to activate the PDF/UA-1 mode. PDF/UA documents meet the regulations set in Section 508. This mode has special requirements:
- Activate the PDF UA-1 mode from the command line using the
-pdf-ua
parameter. - All the fonts must be embedded. If you are using one of the basic fonts (such as
"Times", "Helvetica", etc.), make sure you explicitly define CSS font faces for them. For
details, see: Font Embedding. Troubleshooting: If you are using fonts other than the basic ones and still have problems embedding the basic default fonts, make sure all elements are styled using one of your fonts of choice. A catch all CSS rule might be helpful:
:root{ font-family: Arial; } @page { @top-left {font-family: Arial } @top-right {font-family: Arial } @top-center {font-family: Arial } @top-left-corner {font-family: Arial } @top-right-corner {font-family: Arial } @bottom-left {font-family: Arial } @bottom-right {font-family: Arial } @bottom-center {font-family: Arial } @bottom-left-corner {font-family: Arial } @bottom-right-corner {font-family: Arial } }
- The title of the document must be marked using the metadata. This is important for
accessibility since it will allow the screen reader to identify the publication title.
This is important for accessibility since it will allow the screen reader to identify the
publication title. This is an example using the
-oxy-pdf-meta-title
extension:title { -oxy-pdf-meta-title: content(text); }
Note: The default CSS files for generating PDF based on HTML already contains this rule.
Tools for Checking the Document Accessibility
- For smaller documents, this site might be helpful: http://www.access-for-all.ch/ch/pdf-werkstatt/pdf-accessibility-checker-pac.html
- From Adobe: https://helpx.adobe.com/acrobat/using/create-verify-pdf-accessibility.html
Archiving
- Set a PDF/A mode from the command line using the
-pdf-a
parameter with one of the following values:- PDF/A-1a
- PDF/A-1b
- PDF/A-2a
- PDF/A-2b
- PDF/A-2u
- PDF/A-3a
- PDF/A-3b
- PDF/A-3u
- Embed all of the fonts. If you use one of the basic fonts (such as "Times", "Helvetica", etc.), make sure you explicitly define CSS font faces for them. For details, see: Font Embedding.