convert pdf to svg

PdfSvgBatikPdfbox

Pdf Problem Overview


I want to convert PDF to SVG please suggest some libraries/executable that will be able to do this efficiently. I have written my own java program using the apache PDFBox and Batik libraries -

PDDocument document = PDDocument.load( pdfFile );
DOMImplementation domImpl =
    GenericDOMImplementation.getDOMImplementation();

// Create an instance of org.w3c.dom.Document.
String svgNS = "http://www.w3.org/2000/svg";
Document svgDocument = domImpl.createDocument(svgNS, "svg", null);
SVGGeneratorContext ctx = SVGGeneratorContext.createDefault(svgDocument);
ctx.setEmbeddedFontsOn(true);

// Ask the test to render into the SVG Graphics2D implementation.

    for(int i = 0 ; i < document.getNumberOfPages() ; i++){
        String svgFName = svgDir+"page"+i+".svg";
        (new File(svgFName)).createNewFile();
        // Create an instance of the SVG Generator.
        SVGGraphics2D svgGenerator = new SVGGraphics2D(ctx,false);
        Printable page  = document.getPrintable(i);
        page.print(svgGenerator, document.getPageFormat(i), i);
        svgGenerator.stream(svgFName);
    }

This solution works great but the size of the resulting svg files in huge.(many times greater than the pdf). I have figured out where the problem is by looking at the svg in a text editor. it encloses every character in the original document in its own block even if the font properties of the characters is the same. For example the word hello will appear as 6 different text blocks. Is there a way to fix the above code? or please suggest another solution that will work more efficiently.

Pdf Solutions


Solution 1 - Pdf

Inkscape can also be used to convert PDF to SVG. It's actually remarkably good at this, and although the code that it generates is a bit bloated, at the very least, it doesn't seem to have the particular issue that you are encountering in your program. I think it would be challenging to integrate it directly into Java, but inkscape provides a convenient command-line interface to this functionality, so probably the easiest way to access it would be via a system call.

To use Inkscape's command-line interface to convert a PDF to an SVG, use:

inkscape -l out.svg in.pdf

Which you can then probably call using:

Runtime.getRuntime().exec("inkscape -l out.svg in.pdf")

http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Runtime.html#exec%28java.lang.String%29

I think exec() is synchronous and only returns after the process completes (although I'm not 100% sure on that), so you shoudl be able to just read "out.svg" after that. In any case, Googling "java system call" will yield more info on how to do that part correctly.

Solution 2 - Pdf

Take a look at pdf2svg:

To use

pdf2svg <input.pdf> <output.svg> [<pdf page no. or "all" >]

When using all give a filename with %d in it (which will be replaced by the page number).

pdf2svg input.pdf output_page%d.svg all

And for some troubleshooting see: http://www.calcmaster.net/personal_projects/pdf2svg/

Solution 3 - Pdf

pdftocairo can be used to convert pdf to svg. pdfcairo is part of poppler-utils.

For example to convert 2nd page of a pdf, following command can be run.

pdftocairo -svg -f 1 -l 1 input.pdf

Solution 4 - Pdf

pdftk 82page.pdf burst
sh to-svg.sh 

contents of to-svg.sh

#!/bin/bash
FILES=burst/*
for f in $FILES
do
  inkscape -l "$f.svg" "$f"
done

Solution 5 - Pdf

I have encountered issues with the suggested inkscape, pdf2svg, pdftocairo, as well as the not suggested convert and mutool when trying to convert large and complex PDFs such as some of the topographical maps from the USGS. Sometimes they would crash, other times they would produce massively inflated files. The only PDF to SVG conversion tool that was able to handle all of them correctly for my use case was dvisvgm. Using it is very simple:

dvisvgm --pdf --output=file.svg file.pdf

It has various extra options for handling how elements are converted, as well as for optimization. Its resulting files can further be compacted by svgcleaner if necessary without perceptual quality loss.

Solution 6 - Pdf

inkscape (@jbeard4) for me produced svgs with no text in them at all, but I was able to make it work by going to postscript as an intermediary using ghostscript.

for page in $(seq 1 `pdfinfo $1.pdf | awk '/^Pages:/ {print $2}'`)
do
    pdf2ps -dFirstPage=$page -dLastPage=$page -dNoOutputFonts $1.pdf $1_$page.ps
    inkscape -z -l $1_$page.svg $1_$page.ps
    rm $1_$page.ps
done

However this is a bit cumbersome, and the winner for ease of use has to go to pdf2svg (@Koen.) since it has that all flag so you don't need to loop.

However, pdf2svg isn't available on CentOS 8, and to install it you need to do the following:

git clone https://github.com/dawbarton/pdf2svg.git && cd pdf2svg
#if you dont have development stuff specific to this project
sudo dnf config-manager --set-enabled powertools
sudo dnf install cairo-devel poppler-glib-devel
#git repo isn't quite ready to ./configure
touch README
autoreconf -f -i
./configure && make && sudo make install

It produces svgs that actually look nicer than the ghostscript-inkscape one above, the font seems to raster better.

pdf2svg $1.pdf $1_%d.svg all

But that installation is a bit much, too much even if you don't have sudo. On top of that, pdf2svg doesn't support stdin/stdout, so the readily available pdftocairo (@SuperNova) worked a treat in these regards, and here's an example of "advanced" use below:

for page in $(seq 1 `pdfinfo $1.pdf | awk '/^Pages:/ {print $2}'`)
do
    pdftocairo -svg -f $page -l $page $1.pdf - | gzip -9 >$1_$page.svg.gz
done

Which produces files of the same quality and size (before compression) as pdf2svg, although not binary-identical (and even visually, jumping between output of the two some pixels of letters shift, but neither looks wrong/bad like inkscape did).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser434541View Question on Stackoverflow
Solution 1 - Pdfjbeard4View Answer on Stackoverflow
Solution 2 - PdfKoen.View Answer on Stackoverflow
Solution 3 - PdfSuperNovaView Answer on Stackoverflow
Solution 4 - PdfLeblanc MenesesView Answer on Stackoverflow
Solution 5 - PdfMrDrMcCoyView Answer on Stackoverflow
Solution 6 - PdfHashbrownView Answer on Stackoverflow