linux - pdftotxt对某些繁体PDF转换失败
黄舟
黄舟 2017-04-17 11:34:28
0
1
581

使用的指令是:
pdftotext -enc UTF-8 test.pdf - 2>/dev/null

但输出为空,使用Adobe Reader打开可以正常保存为txt文件,就是使用pdftotext不行。

用windows版的pdftotext也不行。PDF的编码是Identity-H,请问有其它解决方案吗?

以下是转换失败的PDF:
http://pan.baidu.com/s/1eQIFZO2

黄舟
黄舟

人生最曼妙的风景,竟是内心的淡定与从容!

reply all(1)
刘奇

Thanks to Evian for the tip, the problem has been solved.

In fact, poppler has a bug in the PDF of Identity-H. The following is the description of the bug:
https://bugs.freedesktop.org/show_bug.cgi?id=35468

The following link is the patch:
http://cgit.freedesktop.org/poppler/poppler/commit/?id=018892d4ceccd5e2994cdb74cd2d401293fc929d

Apply the patch and recompile to convert Identity-H encoded PDF normally.

In addition, I used the -cfg parameter in the program, but poppler 0.25 no longer supports the -cfg parameter and needs to be removed.

PS: This problem is caused by the jifile component of joomla2.5, so the code needs to be modified to make the component work properly.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template