How to extract text from the PDF document?

PhpPdfTextUnicode

Php Problem Overview


How to extract text from the PDF document using PHP?

(I can't use other tools, I don't have root access)

I've found some functions working for plain text, but they don't handle well Unicode characters:

http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html

Php Solutions


Solution 1 - Php

Download the class.pdf2text.php @ https://pastebin.com/dvwySU1a or http://www.phpclasses.org/browse/file/31030.html (Registration required)

Code:

include('class.pdf2text.php');
$a = new PDF2Text();
$a->setFilename('filename.pdf'); 
$a->decodePDF();
echo $a->output(); 

  • class.pdf2text.php Project Home

  • pdf2textclass doesn't work with all the PDF's I've tested, If it doesn't work for you, try PDF Parser


Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSfisiozaView Question on Stackoverflow
Solution 1 - PhpPedro LobitoView Answer on Stackoverflow