Using C# and iTextSharp to create a PDF
April 13, 2011 7 Comments
The Biodiversity Heritage Library (BHL) is a consortium of many of the world’s leading natural history and botanical libraries. The goal of the organization is to digitize and make available legacy biodiversity literature. One popular feature of the BHL web site is the ability for visitors to select up to 100 pages from a book and generate a PDF containing those pages. More than 100 custom PDFs are created each day.
As the primary developer of the site, I want to highlight the tool that we use to generate the PDFs. iTextSharp is a freely-available port of the popular Java component for generating PDFs, iText.
While iTextSharp is powerful, it’s documentation is not ideal. The official website for the component points you to the documentation for the original Java tool. Unfortunately, while this provides good information, many things that you’d like to accomplish with iTextSharp are implemented slightly differently than with iText. I found that these discrepancies between the Java documentation and the .NET implementation led to many instances of trial-and-error development. I hope that this post will help illustrate how to use the iTextSharp component, and save others some frustration.
Getting Set Up
To get started using iTextSharp, go to http://sourceforge.net/projects/itextsharp/ and download the latest version of iTextSharp (5.0.6 at the time of this writing). You can download the compiled assembly, or if you prefer, the source code.
To make iTextSharp available for use in your application, simply add a reference to the iTextSharp library.
How-To: The Code Samples
The following code samples illustrate a number of basic and advanced features of iTextSharp. Included are examples of basic text layout and formatting, image insertion, page sizing, page labeling, metadata assignment, bullet lists, and linking.
Let’s start with a method named Build() which provides the framework for a simple application that builds a five-page PDF. The rest of the code samples build on this one. Here is the code listing:
// Set up the fonts to be used on the pages
private Font _largeFont = new Font(Font.FontFamily.HELVETICA, 18, Font.BOLD, BaseColor.BLACK);
private Font _standardFont = new Font(Font.FontFamily.HELVETICA, 14, Font.NORMAL, BaseColor.BLACK);
private Font _smallFont = new Font(Font.FontFamily.HELVETICA, 10, Font.NORMAL, BaseColor.BLACK);
public void Build()
{
iTextSharp.text.Document doc = null;
try
{
// Initialize the PDF document
doc = new Document();
iTextSharp.text.pdf.PdfWriter writer = pdf.PdfWriter.GetInstance(doc,
new System.IO.FileStream(System.IO.Directory.GetCurrentDirectory() + "\\ScienceReport.pdf",
System.IO.FileMode.Create));
// Set margins and page size for the document
doc.SetMargins(50, 50, 50, 50);
// There are a huge number of possible page sizes, including such sizes as
// EXECUTIVE, LEGAL, LETTER_LANDSCAPE, and NOTE
doc.SetPageSize(new iTextSharp.text.Rectangle(iTextSharp.text.PageSize.LETTER.Width,
iTextSharp.text.PageSize.LETTER.Height));
// Add metadata to the document. This information is visible when viewing the
// document properities within Adobe Reader.
doc.AddTitle("My Science Report");
doc.AddCreator("M. Lichtenberg");
doc.AddKeywords("paper airplanes");
// Add Xmp metadata to the document.
this.CreateXmpMetadata(writer);
// Open the document for writing content
doc.Open();
// Add pages to the document
this.AddPageWithBasicFormatting(doc);
this.AddPageWithInternalLinks(doc);
this.AddPageWithBulletList(doc);
this.AddPageWithExternalLinks(doc);
this.AddPageWithImage(doc, System.IO.Directory.GetCurrentDirectory() + "\\FinalGraph.jpg");
// Add page labels to the document
iTextSharp.text.pdf.PdfPageLabels pdfPageLabels = new iTextSharp.text.pdf.PdfPageLabels();
pdfPageLabels.AddPageLabel(1, iTextSharp.text.pdf.PdfPageLabels.EMPTY, "Basic Formatting");
pdfPageLabels.AddPageLabel(2, iTextSharp.text.pdf.PdfPageLabels.EMPTY, "Internal Links");
pdfPageLabels.AddPageLabel(3, iTextSharp.text.pdf.PdfPageLabels.EMPTY, "Bullet List");
pdfPageLabels.AddPageLabel(4, iTextSharp.text.pdf.PdfPageLabels.EMPTY, "External Links");
pdfPageLabels.AddPageLabel(5, iTextSharp.text.pdf.PdfPageLabels.EMPTY, "Image");
writer.PageLabels = pdfPageLabels;
}
catch (iTextSharp.text.DocumentException dex)
{
// Handle iTextSharp errors
}
finally
{
// Clean up
doc.Close();
doc = null;
}
}
The code starts by setting up the fonts that will be used within the PDF. In fact, these are used in most of the following code samples. You can see that various font faces, sizes, weights, and colors can be specified.
The first significant lines of the Build() method initialize the file (ScienceReport.pdf) that will be built. Next, margins and page size are set. Following that you see AddTitle(), AddCreator(), and AddKeyword() being called to add metadata to the file. An additional form of metadata is added by the CreateXmpMetadata() function, which will be explained later.
After this basic setup is complete, the new document is opened for writing and five “AddPage…()” methods are called; these also are explained later. After the pages are added to the document, page labels are added by populating a pdfPageLabels object and adding it to the document. (In Acrobat Reader, these labels are displayed below the page thumbnails shown in the “Pages” navigation panel.) At this point the content of the document has been completely written. Notice that the Close() method is explicitly called on the Document object to finalize the writes to the open file (this happens in the “finally” block).
The only other thing to point out in this sample is the error handling. Catch errors of type iTextSharp.text.DocumentException to handle errors originating from iTextSharp operations.
The next code sample shows two methods: AddPageWithBasicFormatting(), which is one of the methods used to add a page to the document, and AddParagraph(), which is a helper function used to add a paragraph to current page of the document.
The AddPageWithBasicFormatting() method illustrates the basic methods for adding text and images to a PDF document. It starts by calling the AddParagraph() helper method to add two short text strings to the current page. Notice that when adding a paragraph, you can specify the alignment and font to be used to render the paragraph contents. Next, a small JPG image is read from disk and inserted into the document. The method finishes up by adding two more paragraphs to the page.
/// Add the header page to the document. This shows an example of a page containing
/// both text and images. The contents of the page are centered and the text is of
/// various sizes.
/// </summary>
/// <param name="doc"></param>
private void AddPageWithBasicFormatting(iTextSharp.text.Document doc)
{
// Write page content. Note the use of fonts and alignment attributes.
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _largeFont, new Chunk("\n\nMY SCIENCE PROJECT\n\n"));
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _standardFont, new Chunk("by M. Lichtenberg\n\n\n\n"));
// Add a logo
String appPath = System.IO.Directory.GetCurrentDirectory();
iTextSharp.text.Image logoImage = iTextSharp.text.Image.GetInstance(appPath + "\\PaperAirplane.jpg");
logoImage.Alignment = iTextSharp.text.Element.ALIGN_CENTER;
doc.Add(logoImage);
logoImage = null;
// Write additional page content
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _largeFont, new Chunk("\n\n\nWhat kind of paper is the best for making paper airplanes?\n\n\n\n\n"));
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _smallFont, new Chunk("Generated " +
DateTime.Now.Day.ToString() + " " +
System.Globalization.CultureInfo.CurrentCulture.DateTimeFormat.GetMonthName(DateTime.Now.Month) + " " +
DateTime.Now.Year.ToString() + " " +
DateTime.Now.ToShortTimeString()));
}
/// <summary>
/// Add a paragraph object containing the specified element to the PDF document.
/// </summary>
/// <param name="doc">Document to which to add the paragraph.</param>
/// <param name="alignment">Alignment of the paragraph.</param>
/// <param name="font">Font to assign to the paragraph.</param>
/// <param name="content">Object that is the content of the paragraph.</param>
private void AddParagraph(Document doc, int alignment, iTextSharp.text.Font font, iTextSharp.text.IElement content)
{
Paragraph paragraph = new Paragraph();
paragraph.SetLeading(0f, 1.2f);
paragraph.Alignment = alignment;
paragraph.Font = font;
paragraph.Add(content);
doc.Add(paragraph);
}
The AddParagraph() method simplifies the process of adding a paragraph to a document by wrapping the basic actions that need to be performed to properly format a new paragraph. These actions include setting the alignment, font, and content. Notice that the content is not restricted to text. Anything that supports the iTextSharp.txt.IElement interface can form the content of a paragraph. This means that plain text, anchor tags, external links, and other objects can be used.
The AddPageWithInternalLinks() method, shown in the next code sample, demonstrates how to add links that reference other locations within the PDF document. If you are familiar with how to link to anchor tags in an HTML document, then you should understand what is happening in this example.
As you can see, the method is a simple one. Three Anchor objects are created that reference “#research”, “#graph”, and “#results”. These are references to named anchors are found in other locations in the finished PDF document. Creation of the named anchors is explained in the next code sample. Notice that as with paragraphs and other text fragments, you specify a font when creating the Anchor objects.
After the Anchor objects are created, a new page is added to the document, a paragraph of text is added to the page, and then the three Anchor objects are added to the page. Notice that our AddParagraph() helper method is used to add the Anchor objects.
/// Add a blank page to the document.
/// </summary>
/// <param name="doc"></param>
private void AddPageWithInternalLinks(iTextSharp.text.Document doc)
{
// Generate links to be embedded in the page
Anchor researchAnchor = new Anchor("Research & Hypothesis\n\n", _standardFont);
researchAnchor.Reference = "#research"; // this link references a named anchor within the document
Anchor graphAnchor = new Anchor("Graph\n\n", _standardFont);
graphAnchor.Reference = "#graph";
Anchor resultsAnchor = new Anchor("Results & Bibliography", _standardFont);
resultsAnchor.Reference = "#results";
// Add a new page to the document
doc.NewPage();
// Add heading text to the page
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _largeFont, new iTextSharp.text.Chunk("TABLE OF CONTENTS\n\n\n\n\n"));
// Add the links to the page
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _standardFont, researchAnchor);
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _standardFont, graphAnchor);
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _standardFont, resultsAnchor);
}
The method in the next code sample, AddPageWithBulletList(), builds on the previous sample. It shows how to create the named anchors that were referenced by the anchors created in the previous example. In addition, it shows a new concept, a bulleted list.
In this method, after adding a new page to the document, a new Anchor object is created and added to the page. The important thing to notice is that this anchor is not assigned a reference; instead it is simply given a name. This is what makes this object a… well.. anchor… and not a link to another resource.
/// Add a page that includes a bullet list.
/// </summary>
/// <param name="doc"></param>
private void AddPageWithBulletList(iTextSharp.text.Document doc)
{
// Add a new page to the document
doc.NewPage();
// The header at the top of the page is an anchor linked to by the table of contents.
iTextSharp.text.Anchor contentsAnchor = new iTextSharp.text.Anchor("RESEARCH\n\n", _largeFont);
contentsAnchor.Name = "research";
// Add the header anchor to the page
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _largeFont, contentsAnchor);
// Create an unordered bullet list. The 10f argument separates the bullet from the text by 10 points
iTextSharp.text.List list = new iTextSharp.text.List(iTextSharp.text.List.UNORDERED, 10f);
list.SetListSymbol("\u2022"); // Set the bullet symbol (without this a hypen starts each list item)
list.IndentationLeft = 20f; // Indent the list 20 points
list.Add(new ListItem("Lift, thrust, drag, and gravity are forces that act on a plane.", _standardFont));
list.Add(new ListItem("A plane should be light to help fight against gravity’s pull.", _standardFont));
list.Add(new ListItem("Gravity will have less effect on a plane built from light materials.", _standardFont));
list.Add(new ListItem("In order to fly well, airplanes must be stable.", _standardFont));
list.Add(new ListItem("A plane that is unstable will either pitch up into a stall, or nose-dive.", _standardFont));
doc.Add(list); // Add the list to the page
}
After the named anchor is added to the page, a List object is created. This object is used to define a bulleted list.
After the List object has been instantiated, some additional customizations are made. These include a modification to the leading symbol of each list item (the default hyphen is changed to the bullet symbol) and the indentation of the entire list. Once these actions are complete, five ListItem objects are added to the list, and the list is added to the page.
The next sample is very similar to the earlier example that shows how to add links to locations within the PDF. This one shows a method that adds links to external resources. The key difference to note between the method shown here (AddPageWithExternalLinks()) and the one shown earlier (AddPageWithInternalLinks()) is that the Reference properties of the anchors are set to external URLs instead of to internal named anchors.
/// Add a page that contains embedded hyperlinks to external resources
/// </summary>
/// <param name="doc"></param>
private void AddPageWithExternalLinks(Document doc)
{
// Generate external links to be embedded in the page
iTextSharp.text.Anchor bibliographyAnchor1 = new Anchor("Scholastic.com", _standardFont);
bibliographyAnchor1.Reference = "http://teacher.scholastic.com/paperairplane/airplane.htm";
Anchor bibliographyAnchor2 = new Anchor("Berkeley.edu", _standardFont);
bibliographyAnchor1.Reference = "http://www.eecs.berkeley.edu/Programs/doublex/spring02/paperairplane.html";
Anchor bibliographyAnchor3 = new Anchor("Paper Airplane Science", _standardFont);
bibliographyAnchor1.Reference = "http://www.exo.net/~pauld/activities/flying/PaperAirplaneScience.html";
Anchor bibliographyAnchor4 = new Anchor("LittleToyAirplanes.com", _standardFont);
bibliographyAnchor4.Reference = "http://www.littletoyairplanes.com/theoryofflight/02whyplanes.html";
// Add a new page to the document
doc.NewPage();
// Add text to the page
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _largeFont, new Chunk("BIBLIOGRAPHY\n\n"));
// Add the links to the page
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_LEFT, _standardFont, bibliographyAnchor1);
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_LEFT, _standardFont, bibliographyAnchor2);
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_LEFT, _standardFont, bibliographyAnchor3);
this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_LEFT, _standardFont, bibliographyAnchor4);
}
One final example of adding content to a PDF file is the AddPageWithImage() method in the next code sample. Looking at the body of the method, you can see that the image is read from disk, the page is resized to match the size of the image, and the image is added to the document.
The key thing to notice here is that the modifications to the margins and page size are made before the new page is added. Modifications to margins and page size take affect when a new page is added; the current page is unaffected.
/// Add a page containing a single image. Set the page size to match the image size.
/// </summary>
/// <param name="doc"></param>
/// <param name="imagePath"></param>
private void AddPageWithImage(iTextSharp.text.Document doc, String imagePath)
{
// Read the image file
iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(new Uri(imagePath));
// Set the page size to the dimensions of the image BEFORE adding a new page to the document.
float imageWidth = image.Width;
float imageHeight = image.Height;
doc.SetMargins(0, 0, 0, 0);
doc.SetPageSize(new iTextSharp.text.Rectangle(imageWidth, imageHeight));
// Add a new page
doc.NewPage();
// Add the image to the page
doc.Add(image);
image = null;
}
The last code sample shows a method that was called in the Build() method shown in the first code sample. CreateXmpMetadata() adds XMP metadata to a PDF document. You may be familiar with EXIF metadata that many digital cameras embed within photos. XMP is an XML- based metadata standard that is similar to EXIF. It can be embedded in many types of files, including PDFs. Some reference managers and PDF cataloging tools can take advantage of this metadata if is is available.
The method begins by creating an XmpSchema object and adding metadata to it. It then creates a XmpWriter object and writes the XmpSchema to a byte stream. Then (and this is important), the byte stream is shrunk to the size of the metadata that was placed into it. Once that is done, the byte stream is written to the PDF document.
/// Use this method to write XMP data to a new PDF
/// </summary>
/// <param name="writer"></param>
private void CreateXmpMetadata(iTextSharp.text.pdf.PdfWriter writer)
{
// Set up the buffer to hold the XMP metadata
byte[] buffer = new byte[65536];
System.IO.MemoryStream ms = new System.IO.MemoryStream(buffer, true);
try
{
// XMP supports a number of different schemas, which are made available by iTextSharp.
// Here, the Dublin Core schema is chosen.
iTextSharp.text.xml.xmp.XmpSchema dc = new iTextSharp.text.xml.xmp.DublinCoreSchema();
// Add Dublin Core attributes
iTextSharp.text.xml.xmp.LangAlt title = new iTextSharp.text.xml.xmp.LangAlt();
title.Add("x-default", "My Science Project");
dc.SetProperty(iTextSharp.text.xml.xmp.DublinCoreSchema.TITLE, title);
// Dublin Core allows multiple authors, so we create an XmpArray to hold the values
iTextSharp.text.xml.xmp.XmpArray author = new iTextSharp.text.xml.xmp.XmpArray(iTextSharp.text.xml.xmp.XmpArray.ORDERED);
author.Add("M. Lichtenberg");
dc.SetProperty(iTextSharp.text.xml.xmp.DublinCoreSchema.CREATOR, author);
// Multiple subjects are also possible, so another XmpArray is used
iTextSharp.text.xml.xmp.XmpArray subject = new iTextSharp.text.xml.xmp.XmpArray(iTextSharp.text.xml.xmp.XmpArray.UNORDERED);
subject.Add("paper airplanes");
subject.Add("science project");
dc.SetProperty(iTextSharp.text.xml.xmp.DublinCoreSchema.SUBJECT, subject);
// Create an XmpWriter using the MemoryStream defined earlier
iTextSharp.text.xml.xmp.XmpWriter xmp = new iTextSharp.text.xml.xmp.XmpWriter(ms);
xmp.AddRdfDescription(dc); // Add the completed metadata definition to the XmpWriter
xmp.Close(); // This flushes the XMP metadata into the buffer
//———————————————————————————
// Shrink the buffer to the correct size (discard empty elements of the byte array)
int bufsize = buffer.Length;
int bufcount = 0;
foreach (byte b in buffer)
{
if (b == 0) break;
bufcount++;
}
System.IO.MemoryStream ms2 = new System.IO.MemoryStream(buffer, 0, bufcount);
buffer = ms2.ToArray();
//———————————————————————————
// Add all of the XMP metadata to the PDF doc that we’re building
writer.XmpMetadata = buffer;
}
catch (Exception ex)
{
throw ex;
}
finally
{
ms.Close();
ms.Dispose();
}
Working Code And Example Output
A ready-to-run Visual Studio 2010 solution can be downloaded from here. The download includes all of the code samples discussed in this post. Many of them include more detail than what is shown here. If you want to skip straight to the output, an example of the PDF created by the ready-to-run code is available here.
Wrapping Up
In my own experience I found iTextSharp to be a powerful tool. It was also a frustrating tool to learn. I hope that the examples that I’ve presented here help others realize the power of iTextSharp while avoiding the frustration.
While putting together this post, I discovered this series of posts from mikesdotnetting.com. I recommend those articles for further reading about iTextSharp.