Parsing PDF Files using iTextSharp (C#, .NET)

How to extract text from PDF files using iTextSharp library. Sample Visual Studio 2010 project included (C#).

Downloads

PdfParsingiTextSharp.20140310.zip

License

Note that iTextSharp is licensed under AGPL which restricts the commercial use.

Sample code (C#)

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

// ...
 
public static string ExtractTextFromPdf(string path)
{
  using (PdfReader reader = new PdfReader(path))
  {
    StringBuilder text = new StringBuilder();

    for (int i = 1; i <= reader.NumberOfPages; i++)
    {
        text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
    }

    return text.ToString();
  }
}

Other Options

It is also possible to use other libraries with more flexible licensing for PDF parsing, such as PDFBox.NET. Download a sample C# project that uses PDFBox to parse PDF files.