Using C# and iTextSharp to create a PDF

The Biodiversity Heritage Library (BHL) is a consortium of many of the world’s leading natural history and botanical libraries.  The goal of the organization is to digitize and make available legacy biodiversity literature.  One popular feature of the BHL web site is the ability for visitors to select up to 100 pages from a book and generate a PDF containing those pages.  More than 100 custom PDFs are created each day.

As the primary developer of the site, I want to highlight the tool that we use to generate the PDFs. iTextSharp is a freely-available port of the popular Java component for generating PDFs, iText. 

While iTextSharp is powerful, it’s documentation is not ideal.  The official website for the component points you to the documentation for the original Java tool.  Unfortunately, while this provides good information, many things that you’d like to accomplish with iTextSharp are implemented slightly differently than with iText.  I found that these discrepancies between the Java documentation and the .NET implementation led to many instances of trial-and-error development.  I hope that this post will help illustrate how to use the iTextSharp component, and save others some frustration.

Getting Set Up

To get started using iTextSharp, go to http://sourceforge.net/projects/itextsharp/ and download the latest version of iTextSharp (5.0.6 at the time of this writing).  You can download the compiled assembly, or if you prefer, the source code.

To make iTextSharp available for use in your application, simply add a reference to the iTextSharp library.

How-To: The Code Samples

The following code samples illustrate a number of basic and advanced features of iTextSharp.  Included are examples of basic text layout and formatting, image insertion, page sizing, page labeling, metadata assignment, bullet lists, and linking.

Let’s start with a method named Build() which provides the framework for a simple application that builds a five-page PDF.  The rest of the code samples build on this one.  Here is the code listing:

using iTextSharp.text;

// Set up the fonts to be used on the pages
private Font _largeFont = new Font(Font.FontFamily.HELVETICA, 18, Font.BOLD, BaseColor.BLACK);
private Font _standardFont = new Font(Font.FontFamily.HELVETICA, 14, Font.NORMAL, BaseColor.BLACK);
private Font _smallFont = new Font(Font.FontFamily.HELVETICA, 10, Font.NORMAL, BaseColor.BLACK);

public void Build()
{
   iTextSharp.text.Document doc = null;

   try
   {
       // Initialize the PDF document
       doc = new Document();
       iTextSharp.text.pdf.PdfWriter writer = pdf.PdfWriter.GetInstance(doc,
           new System.IO.FileStream(System.IO.Directory.GetCurrentDirectory() + "\\ScienceReport.pdf",
               System.IO.FileMode.Create));

       // Set margins and page size for the document
       doc.SetMargins(50, 50, 50, 50);
       // There are a huge number of possible page sizes, including such sizes as
       // EXECUTIVE, LEGAL, LETTER_LANDSCAPE, and NOTE
       doc.SetPageSize(new iTextSharp.text.Rectangle(iTextSharp.text.PageSize.LETTER.Width,
           iTextSharp.text.PageSize.LETTER.Height));

       // Add metadata to the document.  This information is visible when viewing the
       // document properities within Adobe Reader.
       doc.AddTitle("My Science Report");
       doc.AddCreator("M. Lichtenberg");
       doc.AddKeywords("paper airplanes");

       // Add Xmp metadata to the document.
       this.CreateXmpMetadata(writer);

       // Open the document for writing content
       doc.Open();

       // Add pages to the document
       this.AddPageWithBasicFormatting(doc);
       this.AddPageWithInternalLinks(doc);
       this.AddPageWithBulletList(doc);
       this.AddPageWithExternalLinks(doc);
       this.AddPageWithImage(doc, System.IO.Directory.GetCurrentDirectory() + "\\FinalGraph.jpg");

       // Add page labels to the document
       iTextSharp.text.pdf.PdfPageLabels pdfPageLabels = new iTextSharp.text.pdf.PdfPageLabels();
       pdfPageLabels.AddPageLabel(1, iTextSharp.text.pdf.PdfPageLabels.EMPTY, "Basic Formatting");
       pdfPageLabels.AddPageLabel(2, iTextSharp.text.pdf.PdfPageLabels.EMPTY, "Internal Links");
       pdfPageLabels.AddPageLabel(3, iTextSharp.text.pdf.PdfPageLabels.EMPTY, "Bullet List");
       pdfPageLabels.AddPageLabel(4, iTextSharp.text.pdf.PdfPageLabels.EMPTY, "External Links");
       pdfPageLabels.AddPageLabel(5, iTextSharp.text.pdf.PdfPageLabels.EMPTY, "Image");
       writer.PageLabels = pdfPageLabels;
   }
   catch (iTextSharp.text.DocumentException dex)
   {
       // Handle iTextSharp errors
   }
   finally
   {
       // Clean up
       doc.Close();
       doc = null;
   }
}

The code starts by setting up the fonts that will be used within the PDF.  In fact, these are used in most of the following code samples.  You can see that various font faces, sizes, weights, and colors can be specified. 

The first significant lines of the Build() method initialize the file (ScienceReport.pdf) that will be built.  Next, margins and page size are set.  Following that you see AddTitle(), AddCreator(), and AddKeyword() being called to add metadata to the file.  An additional form of metadata is added by the CreateXmpMetadata() function, which will be explained later. 

After this basic setup is complete, the new document is opened for writing and five “AddPage…()” methods are called; these also are explained later.  After the pages are added to the document, page labels are added by populating a pdfPageLabels object and adding it to the document.  (In Acrobat Reader, these labels are displayed below the page thumbnails shown in the “Pages” navigation panel.)  At this point the content of the document has been completely written.  Notice that the Close() method is explicitly called on the Document object to finalize the writes to the open file (this happens in the “finally” block).

The only other thing to point out in this sample is the error handling.  Catch errors of type iTextSharp.text.DocumentException to handle errors originating from iTextSharp operations.

The next code sample shows two methods: AddPageWithBasicFormatting(), which is one of the methods used to add a page to the document, and AddParagraph(), which is a helper function used to add a paragraph to current page of the document. 

The AddPageWithBasicFormatting() method illustrates the basic methods for adding text and images to a PDF document.  It starts by calling the AddParagraph() helper method to add two short text strings to the current page.  Notice that when adding a paragraph, you can specify the alignment and font to be used to render the paragraph contents.  Next, a small JPG image is read from disk and inserted into the document.  The method finishes up by adding two more paragraphs to the page.

/// <summary>
/// Add the header page to the document.  This shows an example of a page containing
/// both text and images.  The contents of the page are centered and the text is of
/// various sizes.
/// </summary>
/// <param name="doc"></param>
private void AddPageWithBasicFormatting(iTextSharp.text.Document doc)
{
   // Write page content.  Note the use of fonts and alignment attributes.
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _largeFont, new Chunk("\n\nMY SCIENCE PROJECT\n\n"));
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _standardFont, new Chunk("by M. Lichtenberg\n\n\n\n"));

   // Add a logo
   String appPath = System.IO.Directory.GetCurrentDirectory();
   iTextSharp.text.Image logoImage = iTextSharp.text.Image.GetInstance(appPath + "\\PaperAirplane.jpg");
   logoImage.Alignment = iTextSharp.text.Element.ALIGN_CENTER;
   doc.Add(logoImage);
   logoImage = null;

   // Write additional page content
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _largeFont, new Chunk("\n\n\nWhat kind of paper is the best for making paper airplanes?\n\n\n\n\n"));
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _smallFont, new Chunk("Generated " +
       DateTime.Now.Day.ToString() + " " +
       System.Globalization.CultureInfo.CurrentCulture.DateTimeFormat.GetMonthName(DateTime.Now.Month) + " " +
       DateTime.Now.Year.ToString() + " " +
       DateTime.Now.ToShortTimeString()));
}

/// <summary>
/// Add a paragraph object containing the specified element to the PDF document.
/// </summary>
/// <param name="doc">Document to which to add the paragraph.</param>
/// <param name="alignment">Alignment of the paragraph.</param>
/// <param name="font">Font to assign to the paragraph.</param>
/// <param name="content">Object that is the content of the paragraph.</param>
private void AddParagraph(Document doc, int alignment, iTextSharp.text.Font font, iTextSharp.text.IElement content)
{
   Paragraph paragraph = new Paragraph();
   paragraph.SetLeading(0f, 1.2f);
   paragraph.Alignment = alignment;
   paragraph.Font = font;
   paragraph.Add(content);
   doc.Add(paragraph);
}

The AddParagraph() method simplifies the process of adding a paragraph to a document by wrapping the basic actions that need to be performed to properly format a new paragraph.  These actions include setting the alignment, font, and content.  Notice that the content is not restricted to text.  Anything that supports the iTextSharp.txt.IElement  interface can form the content of a paragraph.  This means that plain text, anchor tags, external links, and other objects can be used.

The AddPageWithInternalLinks() method, shown in the next code sample, demonstrates how to add links that reference other locations within the PDF document.  If you are familiar with how to link to anchor tags in an HTML document, then you should understand what is happening in this example.

As you can see, the method is a simple one.  Three Anchor objects are created that reference “#research”, “#graph”, and “#results”. These are references to named anchors are found in other locations in the finished PDF document.  Creation of the named anchors is explained in the next code sample.  Notice that as with paragraphs and other text fragments, you specify a font when creating the Anchor objects.

After the Anchor objects are created, a new page is added to the document, a paragraph of text is added to the page, and then the three Anchor objects are added to the page.  Notice that our AddParagraph() helper method is used to add the Anchor objects.

/// <summary>
/// Add a blank page to the document.
/// </summary>
/// <param name="doc"></param>
private void AddPageWithInternalLinks(iTextSharp.text.Document doc)
{
   // Generate links to be embedded in the page
   Anchor researchAnchor = new Anchor("Research & Hypothesis\n\n", _standardFont);
   researchAnchor.Reference = "#research"; // this link references a named anchor within the document
   Anchor graphAnchor = new Anchor("Graph\n\n", _standardFont);
   graphAnchor.Reference = "#graph";
   Anchor resultsAnchor = new Anchor("Results & Bibliography", _standardFont);
   resultsAnchor.Reference = "#results";

   // Add a new page to the document
   doc.NewPage();

   // Add heading text to the page
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _largeFont, new iTextSharp.text.Chunk("TABLE OF CONTENTS\n\n\n\n\n"));

   // Add the links to the page
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _standardFont, researchAnchor);
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _standardFont, graphAnchor);
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _standardFont, resultsAnchor);
}

The method in the next code sample, AddPageWithBulletList(), builds on the previous sample.  It shows how to create the named anchors that were referenced by the anchors created in the previous example.  In addition, it shows a new concept, a bulleted list.

In this method, after adding a new page to the document, a new Anchor object is created and added to the page.  The important thing to notice is that this anchor is not assigned a reference; instead it is simply given a name.  This is what makes this object a… well.. anchor… and not a link to another resource.

/// <summary>
/// Add a page that includes a bullet list.
/// </summary>
/// <param name="doc"></param>
private void AddPageWithBulletList(iTextSharp.text.Document doc)
{
   // Add a new page to the document
   doc.NewPage();

   // The header at the top of the page is an anchor linked to by the table of contents.
   iTextSharp.text.Anchor contentsAnchor = new iTextSharp.text.Anchor("RESEARCH\n\n", _largeFont);
   contentsAnchor.Name = "research";

   // Add the header anchor to the page
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _largeFont, contentsAnchor);

   // Create an unordered bullet list.  The 10f argument separates the bullet from the text by 10 points
   iTextSharp.text.List list = new iTextSharp.text.List(iTextSharp.text.List.UNORDERED, 10f);
   list.SetListSymbol("\u2022");   // Set the bullet symbol (without this a hypen starts each list item)
   list.IndentationLeft = 20f;     // Indent the list 20 points
   list.Add(new ListItem("Lift, thrust, drag, and gravity are forces that act on a plane.", _standardFont));
   list.Add(new ListItem("A plane should be light to help fight against gravity’s pull.", _standardFont));
   list.Add(new ListItem("Gravity will have less effect on a plane built from light materials.", _standardFont));
   list.Add(new ListItem("In order to fly well, airplanes must be stable.", _standardFont));
   list.Add(new ListItem("A plane that is unstable will either pitch up into a stall, or nose-dive.", _standardFont));
   doc.Add(list);  // Add the list to the page
}

After the named anchor is added to the page, a List object is created.  This object is used to define a bulleted list.

After the List object has been instantiated, some additional customizations are made.  These include a modification to the leading symbol of each list item (the default hyphen is changed to the bullet symbol) and the indentation of the entire list.  Once these actions are complete, five ListItem objects are added to the list, and the list is added to the page.

The next sample is very similar to the earlier example that shows how to add links to locations within the PDF.  This one shows a method that adds links to external resources.  The key difference to note between the method shown here (AddPageWithExternalLinks()) and the one shown earlier (AddPageWithInternalLinks()) is that the Reference properties of the anchors are set to external URLs instead of to internal named anchors.

/// <summary>
/// Add a page that contains embedded hyperlinks to external resources
/// </summary>
/// <param name="doc"></param>
private void AddPageWithExternalLinks(Document doc)
{
   // Generate external links to be embedded in the page
   iTextSharp.text.Anchor bibliographyAnchor1 = new Anchor("Scholastic.com", _standardFont);
   bibliographyAnchor1.Reference = "http://teacher.scholastic.com/paperairplane/airplane.htm";
   Anchor bibliographyAnchor2 = new Anchor("Berkeley.edu", _standardFont);
   bibliographyAnchor1.Reference = "http://www.eecs.berkeley.edu/Programs/doublex/spring02/paperairplane.html";
   Anchor bibliographyAnchor3 = new Anchor("Paper Airplane Science", _standardFont);
   bibliographyAnchor1.Reference = "http://www.exo.net/~pauld/activities/flying/PaperAirplaneScience.html";
   Anchor bibliographyAnchor4 = new Anchor("LittleToyAirplanes.com", _standardFont);
   bibliographyAnchor4.Reference = "http://www.littletoyairplanes.com/theoryofflight/02whyplanes.html"; 

   // Add a new page to the document
   doc.NewPage();

   // Add text to the page 
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_CENTER, _largeFont, new Chunk("BIBLIOGRAPHY\n\n"));

   // Add the links to the page
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_LEFT, _standardFont, bibliographyAnchor1);
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_LEFT, _standardFont, bibliographyAnchor2);
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_LEFT, _standardFont, bibliographyAnchor3);
   this.AddParagraph(doc, iTextSharp.text.Element.ALIGN_LEFT, _standardFont, bibliographyAnchor4);

 

One final example of adding content to a PDF file is the AddPageWithImage() method in the next code sample.  Looking at the body of the method, you can see that the image is read from disk, the page is resized to match the size of the image, and the image is added to the document.

The key thing to notice here is that the modifications to the margins and page size are made before the new page is added.  Modifications to margins and page size take affect when a new page is added; the current page is unaffected.

/// <summary>
/// Add a page containing a single image.  Set the page size to match the image size.
/// </summary>
/// <param name="doc"></param>
/// <param name="imagePath"></param>
private void AddPageWithImage(iTextSharp.text.Document doc, String imagePath)
{
   // Read the image file
   iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(new Uri(imagePath));

   // Set the page size to the dimensions of the image BEFORE adding a new page to the document. 
   float imageWidth = image.Width;
   float imageHeight = image.Height;
   doc.SetMargins(0, 0, 0, 0);
   doc.SetPageSize(new iTextSharp.text.Rectangle(imageWidth, imageHeight));

   // Add a new page
   doc.NewPage(); 

   // Add the image to the page 
   doc.Add(image);
   image = null;
}

The last code sample shows a method that was called in the Build() method shown in the first code sample.  CreateXmpMetadata() adds XMP metadata to a PDF document.  You may be familiar with EXIF metadata that many digital cameras embed within photos.  XMP is an XML- based metadata standard that is similar to EXIF.  It can be embedded in many types of files, including PDFs.  Some reference managers and PDF cataloging tools can take advantage of this metadata if is is available.

The method begins by creating an XmpSchema object and adding metadata to it.  It then creates a XmpWriter object and writes the XmpSchema to a byte stream.  Then (and this is important), the byte stream is shrunk to the size of the metadata that was placed into it.  Once that is done, the byte stream is written to the PDF document.

/// <summary>
/// Use this method to write XMP data to a new PDF
/// </summary>
/// <param name="writer"></param>
private void CreateXmpMetadata(iTextSharp.text.pdf.PdfWriter writer)
{
   // Set up the buffer to hold the XMP metadata
   byte[] buffer = new byte[65536];
   System.IO.MemoryStream ms = new System.IO.MemoryStream(buffer, true);

   try
   {
       // XMP supports a number of different schemas, which are made available by iTextSharp.
       // Here, the Dublin Core schema is chosen.
       iTextSharp.text.xml.xmp.XmpSchema dc = new iTextSharp.text.xml.xmp.DublinCoreSchema();

       // Add Dublin Core attributes
       iTextSharp.text.xml.xmp.LangAlt title = new iTextSharp.text.xml.xmp.LangAlt();
       title.Add("x-default", "My Science Project");
       dc.SetProperty(iTextSharp.text.xml.xmp.DublinCoreSchema.TITLE, title);

       // Dublin Core allows multiple authors, so we create an XmpArray to hold the values
       iTextSharp.text.xml.xmp.XmpArray author = new iTextSharp.text.xml.xmp.XmpArray(iTextSharp.text.xml.xmp.XmpArray.ORDERED);
       author.Add("M. Lichtenberg");
       dc.SetProperty(iTextSharp.text.xml.xmp.DublinCoreSchema.CREATOR, author);

       // Multiple subjects are also possible, so another XmpArray is used
       iTextSharp.text.xml.xmp.XmpArray subject = new iTextSharp.text.xml.xmp.XmpArray(iTextSharp.text.xml.xmp.XmpArray.UNORDERED);
       subject.Add("paper airplanes");
       subject.Add("science project");
       dc.SetProperty(iTextSharp.text.xml.xmp.DublinCoreSchema.SUBJECT, subject);

       // Create an XmpWriter using the MemoryStream defined earlier
       iTextSharp.text.xml.xmp.XmpWriter xmp = new iTextSharp.text.xml.xmp.XmpWriter(ms);
       xmp.AddRdfDescription(dc);  // Add the completed metadata definition to the XmpWriter
       xmp.Close();    // This flushes the XMP metadata into the buffer

       //———————————————————————————
       // Shrink the buffer to the correct size (discard empty elements of the byte array)
       int bufsize = buffer.Length;
       int bufcount = 0;
       foreach (byte b in buffer)
       {
           if (b == 0) break;
           bufcount++;
       }
       System.IO.MemoryStream ms2 = new System.IO.MemoryStream(buffer, 0, bufcount);
       buffer = ms2.ToArray();
       //———————————————————————————

       // Add all of the XMP metadata to the PDF doc that we’re building
       writer.XmpMetadata = buffer;
   }
   catch (Exception ex)
   {
       throw ex;
   }
   finally
   {
       ms.Close();
       ms.Dispose();
   }

 

Working Code And Example Output

A ready-to-run Visual Studio 2010 solution can be downloaded from here.  The download includes all of the code samples discussed in this post.  Many of them include more detail than what is shown here.  If you want to skip straight to the output, an example of the PDF created by the ready-to-run code is available here.

Wrapping Up

In my own experience I found iTextSharp to be a powerful tool.  It was also a frustrating tool to learn.  I hope that the examples that I’ve presented here help others realize the power of iTextSharp while avoiding the frustration.

While putting together this post, I discovered this series of posts from mikesdotnetting.com.  I recommend those articles for further reading about iTextSharp.

Using Internet Archive’s S3(ish) Interface

In my work for the Biodiversity Heritage Library, I transfer large amounts of data to and from the Internet Archive.  Last week (September 23, 2010) I gave a presentation to the BHL Global Tech Meeting in Woods Hole, Massachusetts about some of the methods I use to do these data transfers.  The meeting included BHL representatives from the United States, England, Germany, Egypt, Brazil, Costa Rica, and Australia. 

The slide deck is available for viewing and download at http://www.slideshare.net/mlichtenberg1/bhl-global-tech-meeting-internet-archive-data-transfer.

One topic that did not get much coverage in my presentation was programmatically uploading files to Internet Archive via their S3-like storage API.  (I say “S3-like” because it is meant to mirror the interface to Amazon.com’s S3 service.  And It is similar, but not identical.  For more information, see http://www.archive.org/help/abouts3.txt.)  The topic is mentioned on the 2nd-to-last slide of the presentation, but I did not discuss it in depth or include any source code to illustrate how it works.

To remedy that oversight, I present here a class that is used in a production application to upload files to Internet Archive, using their “S3-like” storage API.  Written in C#, it uses the WebClient class from the System.Net namespace in the .NET Framework to handle the data transfers.

Here is the full source of the class, followed by a discussion of its key elements.

using System;
using System.Collections.Generic;
using System.Text;
using System.Net;

namespace InternetArchive.Utilities
{
   public class S3
   {
       // PROPERTIES

       private string _accessKey = "YOUR_ACCESS_KEY";
       private string _secretKey = "YOUR_SECRET_KEY";
       private string _s3BaseDomain = "http://s3.us.archive.org";
       private string _bucketAddressFormat = "{0}/{1}";
       private string _objectAddressFormat = "{0}/{1}/{2}";

       // The WebClient class from the System.Net namespace is used for several Get and Put operations
       private WebClient _webClient = null;

       public WebClient WebClient
       {
           get
           {
               if (_webClient == null)
               {
                   // Set the Internet Archive authorization headers when the WebClient is instantiated
                   _webClient = new WebClient();
                   _webClient.Headers.Add("authorization", this.GetAuthHeaderValue());
               }
               return _webClient;
           }
       }

       // CONSTRUCTORS

       public S3()
       {
       }

       public S3(string accessKey, string secretKey)
       {
           _accessKey = accessKey;
           _secretKey = secretKey;
       }

       ~S3()
       {
           if (_webClient != null)
           {
               _webClient.Dispose();
               _webClient = null;
           }
       }

       // OBJECT OPERATIONS

       // Objects are files that are placed into buckets (folders).

       /// <summary>
       /// Upload a file into the specified bucket.
       /// </summary>
       /// <param name="fileName">The name of the file to be uploaded</param>
       /// <param name="bucketName">The Internet Archive identifier of the destination bucket </param>
       /// <param name="objectName">The name to give the file at Internet Archive</param>
       /// <param name="contentType">A valid MIME type for the file being uploaded</param>
       /// <param name="headers">A list of key-value pairs to be added as HTTP headers</param>
       /// <param name="preventDerive">True if Internet Archive should initiate its derivation process</param>
       /// <param name="makeBucket">True if Internet Archive should create a new bucket</param>
       /// <returns>"Success" if the upload was successful, otherwise an error message.</returns>
       public string PutObject(string fileName, string bucketName, string objectName,
           string contentType, List<KeyValuePair<string, string>> headers,
           bool preventDerive, bool makeBucket)
       {
           string result = string.Empty;
           try
           {
               if (preventDerive)
               {
                   // Set a header to prevent IA from initiating a derive process on this item
                   if (headers == null) headers = new List<KeyValuePair<string, string>>();
                   headers.Add(new KeyValuePair<string, string>("x-archive-queue-derive", "0"));
               }
               if (makeBucket)
               {
                   // Set a header to allow IA to create a "bucket" in which to place this item
                   if (headers == null) headers = new List<KeyValuePair<string, string>>();
                   headers.Add(new KeyValuePair<string, string>("x-archive-auto-make-bucket", "1"));
               }

               string destination = String.Format(_objectAddressFormat,
                           _s3BaseDomain, bucketName, objectName);
               this.HttpRequest(destination, fileName, "PUT", contentType, headers);
               result = "Success";
           }
           catch (Exception ex)
           {
               result = "Error: " + ex.Message;
           }

           return result;
       }

       /// <summary>
       /// Download a file to the specified location.
       /// </summary>
       /// <param name="bucketName">The Internet Archive identifier of the bucket holding the file</param>
       /// <param name="objectName">The name of the file to be downloaded</param>
       /// <param name="fileName">The name of a local file to which to download the object</param>
       /// <returns>True if the download was successful, otherwise false</returns>
       public bool GetObject(string bucketName, string objectName, string fileName)
       {
           bool result = true;
           try
           {
               this.WebClient.DownloadFile(
                     String.Format(_objectAddressFormat, _s3BaseDomain, bucketName, objectName), fileName);
           }
           catch
           {
               result = false;
           }

           return result;
       }

       // BUCKET OPERATIONS

       // Buckets are folders.  They are named with a unique identifier.

       /// <summary>
       /// List all of the buckets owned by the authorized user.
       /// </summary>
       /// <returns>XML listing of buckets</returns>
       public string ListBuckets()
       {
           return this.WebClient.DownloadString(_s3BaseDomain);
       }

       /// <summary>
       /// List the contents of the specified bucket.
       /// </summary>
       /// <param name="bucketName"></param>
       /// <returns>XML listing of the files in the bucket</returns>
       public string GetBucket(string bucketName)
       {
           return this.WebClient.DownloadString(
                 String.Format(_bucketAddressFormat, _s3BaseDomain, bucketName));
       }

       // HELPER METHODS

       /// <summary>
       /// Get the Internet Archive authorization string to be passed in an HTTP header
       /// </summary>
       /// <returns></returns>
       private string GetAuthHeaderValue()
       {
           return String.Format("LOW {0}:{1}", _accessKey, _secretKey);
       }

       /// <summary>
       /// Submit an HTTP request to upload a file
       /// </summary>
       /// <param name="url">The Url to which to submit the file</param>
       /// <param name="fileName">A file to be uploaded</param>
       /// <param name="method">"PUT"</param>
       /// <param name="contentType">A valid MIME type for the file being uploaded</param>
       /// <param name="headers">A list of key-value pairs to be added as HTTP headers</param>
       private void HttpRequest(string url, string fileName, string method,
           string contentType, List<KeyValuePair<string, string>> headers)
       {
           System.IO.Stream stream = null;

           try
           {
               // Read file to be uploaded
               byte[] fileContents = System.IO.File.ReadAllBytes(fileName);

               // Prepare the web request
               HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
               req.Method = method;
               req.Timeout = 600000;    // 10 minutes
               req.ContentType = contentType;
               req.ContentLength = fileContents.Length;
               req.Headers.Add("authorization", this.GetAuthHeaderValue());

               // If additional header values have been specified, add them now
               if (headers != null)
               {
                   foreach (KeyValuePair<string, string> header in headers)
                   {
                       req.Headers.Add(header.Key, header.Value);
                   }
               }

               // Send the data
               stream = req.GetRequestStream();
               stream.Write(fileContents, 0, fileContents.Length);
               stream.Close();

               // Make sure we were successful
               HttpWebResponse response = (HttpWebResponse)req.GetResponse();
               if (response.StatusCode != HttpStatusCode.Created)
               {
                   throw new UnauthorizedAccessException("File not written to " + url + ".  HTTP status: " +
                         response.StatusCode.ToString());
               }
           }
           catch (WebException wex)
           {
               throw wex;
           }
           finally
           {
               if (stream != null)
               {
                   stream.Close();
                   stream.Dispose();
                   stream = null;
               }
           }
       }
   }
}

 
The first thing to note is found in the section of the code labeled PROPERTIES.  Should you choose to use this code yourself, notice that you’ll need to set the values of the  _secretKey and _accessKey  properties to your own Internet Archive API keys.  (Alternately, you can set these by passing the values to the constructor).

The CONSTRUCTORS section of the code is straightforward, and needs no further explanation.

The section labeled OBJECT OPERATIONS includes methods used to upload and download individual files.  The PutObject method is used to upload a file to Internet Archive.  It allows you to set extra HTTP headers (for passing metadata about the uploaded file to Internet Archive), toggle Internet Archive’s derivation process, and toggle creation of a new bucket in which to store the file.  The actual upload is handled by the HttpRequest method, found in the HELPER METHODS section of the code.  The GetObject method is used for downloading a file from Internet Archive.

BUCKET OPERATIONS are methods for sending simple requests to InternetArchive to return the list of buckets associated with the specified API keys (the ListBuckets method), as well as the list of files contained in a particular bucket (the GetBucket method).

Finally, the HELPER METHODS section of the code includes private methods that support the class’ functionality.  You might pay close attention to the HttpRequest method, which handles the uploading of files to Internet Archive.  It uses the System.Net.HttpWebRequest class to perform the uploads; this is the only time that the WebClient instance is NOT used by to perform an HTTP operation.  System.Net.HttpWebRequest provides more fine-grained control over the upload process, which is needed here, particularly for setting the HTTP headers.

Here is a short example of how the preceding class might be used.  This example uploads a file to an existing item at Internet Archive, without setting any additional metadata values.

/// <summary>
/// A simple function that uses the S3 class to upload an XML file to an
/// existing bucket at Internet Archive
/// </summary>
private void UploadXmlFile(string localFileName, string remoteFileName, string bucketName)
{
   S3 s3 = new S3();

   try
   {
       // Upload the file
       string putResult = s3.PutObject(localFileName, bucketName,
           remoteFileName, "application/xml", null, true, false);

       // Evaluate results
       if (putResult == "Success")
       {
           // File uploaded
       }
       else if (putResult.ToLower().Contains("403"))
       {
           // Name file skipped (forbidden) – no permissions to write to bucket
       }
       else
       {
           // Error uploading file
       }
   }
   catch (Exception ex)
   {
       //  Error uploading file
   }
   finally
   {
       if (s3 != null) s3 = null;
   }
}

Follow

Get every new post delivered to your Inbox.