How to Read DOC Files in PHP
When attempting to read DOC or DOCX files in PHP, you may encounter issues with extraneous characters at the end of your text. This error occurs because the provided code snippet is unable to correctly parse the DOC format.
To resolve this issue, we need to modify our approach slightly since PHP does not support native DOC file parsing. Instead, we will use a different method to handle DOCX files.
Updated Code for Reading DOCX Files:
<code class="php">function read_file_docx($filename) { $striped_content = ''; $content = ''; if (!$filename || !file_exists($filename)) return false; $zip = zip_open($filename); if (!$zip || is_numeric($zip)) return false; while ($zip_entry = zip_read($zip)) { if (zip_entry_open($zip, $zip_entry) == FALSE) continue; if (zip_entry_name($zip_entry) != "word/document.xml") continue; $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry)); zip_entry_close($zip_entry); }// end while zip_close($zip); $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content); $content = str_replace('</w:r></w:p>', "\r\n", $content); $striped_content = strip_tags($content); return $striped_content; } $filename = "filepath";// or /var/www/html/file.docx $content = read_file_docx($filename); if($content !== false) { echo nl2br($content); } else { echo 'Couldn\'t the file. Please check that file.'; }</code>
This updated code uses the PHP ZipArchive class to open and read the contents of the DOCX file. Specifically, it extracts the "word/document.xml" file from the ZIP archive, which contains the actual text content.
By using this method, you can successfully read and parse DOCX files in PHP.
The above is the detailed content of How to Read DOCX Files in PHP without Extraneous Characters?. For more information, please follow other related articles on the PHP Chinese website!