How do I convert Word files to PDF programmatically?

C#vb.netPdfMs Word

C# Problem Overview


I have found several open-source/freeware programs that allow you to convert .doc files to .pdf files, but they're all of the application/printer driver variety, with no SDK attached.

I have found several programs that do have an SDK allowing you to convert .doc files to .pdf files, but they're all of the proprietary type, $2,000 a license or thereabouts.

Does anyone know of any clean, inexpensive (preferably free) programmatic solution to my problem, using C# or VB.NET?

Thanks!

C# Solutions


Solution 1 - C#

Use a foreach loop instead of a for loop - it solved my problem.

int j = 0;
foreach (Microsoft.Office.Interop.Word.Page p in pane.Pages)
{
    var bits = p.EnhMetaFileBits;
    var target = path1 +j.ToString()+  "_image.doc";
    try
    {
        using (var ms = new MemoryStream((byte[])(bits)))
        {
            var image = System.Drawing.Image.FromStream(ms);
            var pngTarget = Path.ChangeExtension(target, "png");
            image.Save(pngTarget, System.Drawing.Imaging.ImageFormat.Png);
        }
    }
    catch (System.Exception ex)
    {
        MessageBox.Show(ex.Message);  
    }
    j++;
}

Here is a modification of a program that worked for me. It uses Word 2007 with the Save As PDF add-in installed. It searches a directory for .doc files, opens them in Word and then saves them as a PDF. Note that you'll need to add a reference to Microsoft.Office.Interop.Word to the solution.

using Microsoft.Office.Interop.Word;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;

...

// Create a new Microsoft Word application object
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();

// C# doesn't have optional arguments so we'll need a dummy value
object oMissing = System.Reflection.Missing.Value;

// Get list of Word files in specified directory
DirectoryInfo dirInfo = new DirectoryInfo(@"\\server\folder");
FileInfo[] wordFiles = dirInfo.GetFiles("*.doc");

word.Visible = false;
word.ScreenUpdating = false;

foreach (FileInfo wordFile in wordFiles)
{
    // Cast as Object for word Open method
    Object filename = (Object)wordFile.FullName;
          
    // Use the dummy value as a placeholder for optional arguments
    Document doc = word.Documents.Open(ref filename, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing);
    doc.Activate();
            
    object outputFileName = wordFile.FullName.Replace(".doc", ".pdf");
    object fileFormat = WdSaveFormat.wdFormatPDF;
            
    // Save document into PDF Format
    doc.SaveAs(ref outputFileName,
        ref fileFormat, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing);

    // Close the Word document, but leave the Word application open.
    // doc has to be cast to type _Document so that it will find the
    // correct Close method.                
    object saveChanges = WdSaveOptions.wdDoNotSaveChanges;
    ((_Document)doc).Close(ref saveChanges, ref oMissing, ref oMissing);
    doc = null;
}

// word has to be cast to type _Application so that it will find
// the correct Quit method.
((_Application)word).Quit(ref oMissing, ref oMissing, ref oMissing);
word = null;

Solution 2 - C#

To sum it up for vb.net users, the free option (must have office installed):

Microsoft office assembies download:

VB.NET example:

        Dim word As Application = New Application()
        Dim doc As Document = word.Documents.Open("c:\document.docx")
        doc.Activate()
        doc.SaveAs2("c:\document.pdf", WdSaveFormat.wdFormatPDF)
        doc.Close()

Solution 3 - C#

Just wanted to add that I used Microsoft.Interop libraries, specifically ExportAsFixedFormat function which I did not see used in this thread.

using Microsoft.Office.Interop.Word;
using System.Runtime.InteropServices;
using System.IO;
using Microsoft.Office.Core;

Application app;

public string CreatePDF(string path, string exportDir)
{
    Application app = new Application();
    app.DisplayAlerts = WdAlertLevel.wdAlertsNone;
    app.Visible = true;

    var objPresSet = app.Documents;
    var objPres = objPresSet.Open(path, MsoTriState.msoTrue, MsoTriState.msoTrue, MsoTriState.msoFalse);

    var pdfFileName = Path.ChangeExtension(path, ".pdf");
    var pdfPath = Path.Combine(exportDir, pdfFileName);

    try
    {
        objPres.ExportAsFixedFormat(
            pdfPath,
            WdExportFormat.wdExportFormatPDF,
            false,
            WdExportOptimizeFor.wdExportOptimizeForPrint,
            WdExportRange.wdExportAllDocument
        );
    }
    catch
    {
        pdfPath = null;
    }
    finally
    {
        objPres.Close();
    }
    return pdfPath;
}

Solution 4 - C#

PDFCreator has a COM component, callable from .NET or VBScript (samples included in the download).

But, it seems to me that a printer is just what you need - just mix that with Word's automation, and you should be good to go.

Solution 5 - C#

Solution 6 - C#

I went through the Word to PDF pain when someone dumped me with 10000 word files to convert to PDF. Now I did it in C# and used Word interop but it was slow and crashed if I tried to use PC at all.. very frustrating.

This lead me to discovering I could dump interops and their slowness..... for Excel I use (EPPLUS) and then I discovered that you can get a free tool called Spire that allows converting to PDF... with limitations!

http://www.e-iceblue.com/Introduce/free-doc-component.html#.VtAg4PmLRhE

Solution 7 - C#

Easy code and solution using Microsoft.Office.Interop.Word to converd WORD in PDF

using Word = Microsoft.Office.Interop.Word;

private void convertDOCtoPDF()
{

  object misValue = System.Reflection.Missing.Value;
  String  PATH_APP_PDF = @"c:\..\MY_WORD_DOCUMENT.pdf"

  var WORD = new Word.Application();

  Word.Document doc   = WORD.Documents.Open(@"c:\..\MY_WORD_DOCUMENT.docx");
  doc.Activate();
              
  doc.SaveAs2(@PATH_APP_PDF, Word.WdSaveFormat.wdFormatPDF, misValue, misValue, misValue, 
  misValue, misValue, misValue, misValue, misValue, misValue, misValue);

  doc.Close();
  WORD.Quit();


  releaseObject(doc);
  releaseObject(WORD);

}

Add this procedure to release memory:

private void releaseObject(object obj)
{
  try
  {
      System.Runtime.InteropServices.Marshal.ReleaseComObject(obj);
      obj = null;
  }
  catch (Exception ex)
  {
      //TODO
  }
  finally
  {
     GC.Collect();
  }
}

Solution 8 - C#

Seems to be some relevent info here:

https://stackoverflow.com/questions/159744/converting-ms-word-documents-to-pdf-in-asp-net

Also, with Office 2007 having publish to PDF functionality, I guess you could use office automation to open the *.DOC file in Word 2007 and Save as PDF. I'm not too keen on office automation as it's slow and prone to hanging, but just throwing that out there...

Solution 9 - C#

Microsoft PDF add-in for word seems to be the best solution for now but you should take into consideration that it does not convert all word documents correctly to pdf and in some cases you will see huge difference between the word and the output pdf. Unfortunately I couldn't find any api that would convert all word documents correctly. The only solution I found to ensure the conversion was 100% correct was by converting the documents through a printer driver. The downside is that documents are queued and converted one by one, but you can be sure the resulted pdf is exactly the same as word document layout. I personally preferred using UDC (Universal document converter) and installed Foxit Reader(free version) on server too then printed the documents by starting a "Process" and setting its Verb property to "print". You can also use FileSystemWatcher to set a signal when the conversion has completed.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionShaul BehrView Question on Stackoverflow
Solution 1 - C#Eric NessView Answer on Stackoverflow
Solution 2 - C#Elger MensonidesView Answer on Stackoverflow
Solution 3 - C#zetaView Answer on Stackoverflow
Solution 4 - C#Mark BrackettView Answer on Stackoverflow
Solution 5 - C#Todd GamblinView Answer on Stackoverflow
Solution 6 - C#Ggalla1779View Answer on Stackoverflow
Solution 7 - C#daniele3004View Answer on Stackoverflow
Solution 8 - C#MikeWView Answer on Stackoverflow
Solution 9 - C#SaberView Answer on Stackoverflow