Reading of docx files
docx files are actually composed of many XML files, the contents of which exist in word/document .xml inside.
We find a docx file and open it using a zip file (or change the docx suffix name to zip and then unzip it) (recommended learning: PHP video tutorial)
There is document.xml in the word directory, and the content of the docx file exists in document.xml. We can just read this file.
The code is as follows:
function parseWord($file) { $content = ""; $zip = new ZipArchive ( ); if ($zip->open ($file) === TRUE ) { for($i = 0; $i < $zip->numFiles; $i ++) { $entry = $zip->getNameIndex ( $i ); if (pathinfo ($entry,PATHINFO_BASENAME) == "document.xml") { $zip->extractTo (pathinfo ($file, PATHINFO_DIRNAME ) . "/" . pathinfo ($file, PATHINFO_FILENAME ), array ( $entry ) ); $filepath = pathinfo ($file, PATHINFO_DIRNAME ) . "/" . pathinfo ( $file, PATHINFO_FILENAME ) . "/" . $entry; $content = strip_tags ( file_get_contents ( $filepath ) ); break; } } $zip->close (); return $content; } else { echo 'no'; } }
It is worth noting:
The first $file file cannot be in the same directory file as the current code, $file is stored in a separate folder
The above is the detailed content of How to get the content in docx with PHP. For more information, please follow other related articles on the PHP Chinese website!