Parsing PDF Files using IFilter (C#, .NET)

How to Convert PDF to Text in .NET (C#)
How to extract plain text from PDF file using PDFBox.NET library. Sample Visual Studio project download (C#).
Parsing PDF Files using iTextSharp (C#, .NET)
How to extract text from PDF files using iTextSharp library. Sample Visual Studio 2010 project included (C#).
How to Convert PDF to Text in .NET (VB)
How to extract plain text from PDF file using PDFBox.NET library. Sample Visual Studio project download (VB).
PDFBox in .NET
PDFBox.NET is a .NET port of PDFBBox created using IKVM.NET. The latest version (1.8.9) is available for download.
How to extract text from PDF files using Microsoft IFilter interface and Adobe PDF IFilter implementation. 
Downloads

Microsoft provides IFilter interface for extracting text from files. It is used by the Windows Indexing service to parse your documents and other files. The IFilter requires IFilter implementations to be installed. IFilter support for Microsoft Office documents is installed with the Microsoft Office, similarly the PDF IFilter is installed with Adobe Acrobat or Adobe Reader.

In order to parse PDF files using IFilter interface you need the following:

Sample Code (C#)

using IFilter;

// ...

public static string ExtractTextFromPdf(string path) {
  return DefaultParser.Extract(path); 
}

E_NOTIMPL Error Code

The Adobe PDF IFilter implementation that ships with Acrobat Reader seems to be limited and it will only support selected processes to access the parsing.

When accessed from other processes it returns E_NOTIMPL error code (0x80004001).

One of the files name that are allowed is "filtdump.exe". We are using this name for the output assembly in the sample project. Note that it will not work for debugging sessions started using Visual Studio (F5) because it uses a modified file name.

The standalone older version of Adobe PDF IFilter [adobe.com] seems not to have this limitation.

Other Options

There are also other options for parsing PDF files in .NET: