使用的指令是: pdftotext -enc UTF-8 test.pdf - 2>/dev/null
但输出为空,使用Adobe Reader打开可以正常保存为txt文件,就是使用pdftotext不行。
用windows版的pdftotext也不行。PDF的编码是Identity-H,请问有其它解决方案吗?
以下是转换失败的PDF:http://pan.baidu.com/s/1eQIFZO2
人生最曼妙的风景,竟是内心的淡定与从容!
Thanks to Evian for the tip, the problem has been solved.
In fact, poppler has a bug in the PDF of Identity-H. The following is the description of the bug: https://bugs.freedesktop.org/show_bug.cgi?id=35468
The following link is the patch: http://cgit.freedesktop.org/poppler/poppler/commit/?id=018892d4ceccd5e2994cdb74cd2d401293fc929d
Apply the patch and recompile to convert Identity-H encoded PDF normally.
In addition, I used the -cfg parameter in the program, but poppler 0.25 no longer supports the -cfg parameter and needs to be removed.
PS: This problem is caused by the jifile component of joomla2.5, so the code needs to be modified to make the component work properly.
Thanks to Evian for the tip, the problem has been solved.
In fact, poppler has a bug in the PDF of Identity-H. The following is the description of the bug:
https://bugs.freedesktop.org/show_bug.cgi?id=35468
The following link is the patch:
http://cgit.freedesktop.org/poppler/poppler/commit/?id=018892d4ceccd5e2994cdb74cd2d401293fc929d
Apply the patch and recompile to convert Identity-H encoded PDF normally.
In addition, I used the -cfg parameter in the program, but poppler 0.25 no longer supports the -cfg parameter and needs to be removed.
PS: This problem is caused by the jifile component of joomla2.5, so the code needs to be modified to make the component work properly.