Backend Development
PHP Tutorial
How to safely transcribe .PO files to Cyrillic and avoid NUL character pollution
How to safely transcribe .PO files to Cyrillic and avoid NUL character pollution

This article explains in detail the root cause of NUL NUL NUL (null byte) garbled characters when processing .po localized files in PHP, and provides a repair solution based on safe file stream operations. It emphasizes avoiding direct reading and writing of the same file, and recommends using a professional PO parsing library instead of manual string replacement.
This article explains in detail the root cause of `NUL NUL NUL` (null byte) garbled characters when processing `.po` localized files in PHP, and provides a repair solution based on safe file stream operations. It emphasizes avoiding direct reading and writing of the same file, and recommends using a professional PO parsing library instead of manual string replacement.
When dealing with .po localization files (such as admin-sr_RS.po) for WordPress or other open source projects, developers often try to perform Latin transliteration of Cyrillic letters using a simple str_replace(). But as shown in the question, in the original code, the operation of directly opening the file in "r" mode, fread() and then ftruncate() fwrite() can easily cause NUL character residues - this is because after ftruncate(0) clears the file, if the length of the new content is smaller than the original file, the uncovered bytes at the end of the file (especially \x00 generated by multi-byte character truncation in UTF-8 encoding) will remain as NUL, causing the parser to crash or display exceptions (such as a large number of NULs in the screenshot) NUL NUL prefix).
✅ Correct approach: Separate reading and writing to avoid overwriting in place
The core principle is to never reuse the same file handle for "read-truncate-write" . Operations should be performed atomically: first read completely, then generate new content, and finally write to a completely new file (or safely replace). The following is the optimized implementation:
function transliterate($textcyr = "") {
$lat = [
'a','b','v','g','d','đ','e','ž','z','i','j','k','l','lj','m','n','nj' ,'o','p','r','s','t','ć','u','f','h','c','č','dž','š','š','š',
'A','B','V','G','D','Đ','E','Ž','Z','I','J','K','L','Lj','M','N','Nj ','O','P','R','S','T','Ć','U','F','H','C','Č','Dž','Š','Š','Š'
];
$cyr = [
'а','б','в','г','д','ђ','е','ж','з','и','ј','к','л','љ','м','н ','њ','о','п','р','с','т','ћ','у','ф','х','ц','ч','џ','ш','ш',
'А','Б','В','Г','Д','Ђ','Е','Ж','З','И','Ј','К','Л','Љ','М',' Н','Њ','О','П','Р','С','Т','Ћ','У','Ф','Х','Ц','Ч','Џ','Ш','Ш'
];
return str_replace($cyr, $lat, $textcyr);
}
function transliterate_po_safe() {
$source = 'wp-content/languages/admin-sr_RS.po';
$target = $source . '.transliterated'; // Temporary output file // ✅ Safe reading: file_get_contents automatically handles encoding and length $content = file_get_contents($source);
if ($content === false) {
throw new RuntimeException("Failed to read {$source}");
}
// ✅ Transliteration processing $transliterated = transliterate($content);
// ✅ Safe writing: independent file to avoid any truncation risk if (file_put_contents($target, $transliterated) === false) {
throw new RuntimeException("Failed to write {$target}");
}
// ✅ Atomic replacement (Linux/macOS) or safe rename (Windows compatible)
if (!rename($target, $source)) {
throw new RuntimeException("Failed to replace {$source} with transliterated version");
}
echo "✅ Transliteration completed successfully.\n";
}
⚠️Key Notes
- Encoding consistency : .po files are usually UTF-8 (including BOM). Ensure that the default encoding of the PHP environment is UTF-8 (mb_internal_encoding('UTF-8')), and str_replace() can correctly match multi-byte characters under UTF-8 (PHP 7.4 supports it by default; older versions recommend using mb_ereg_replace or preg_replace with the u modifier instead).
- Character mapping integrity : 'š' and 'š' appear repeatedly in the example array, which can easily lead to unexpected substitutions. Be sure to verify that the lengths of $cyr and $lat are strictly equal and there are no duplicate keys.
- PO file structure sensitivity : .po is structured text (including msgid/msgstr, comments, metadata). Pure string replacement may break formatting (such as mistakenly replacing Cyrillic characters in comments). This method is strongly not recommended for production environments.
? Recommended solution: Use professional PO parsing library
The truly robust solution is to use mature PO file parsers, which can accurately locate the content of the msgstr field, skip metadata and comments, and ensure format security:
| Library | Features | Example usage |
|---|---|---|
| php-gettext/Gettext | The most complete function, supports reading/writing/compiling (.mo), MIT license | php $translations = Gettext\Translations::fromPoFile($source); foreach ($translations as $t) { $t->setTranslation(transliterate($t->getTranslation())); } $translations->toPoFile($target); |
| raulferras/PHP-po-parser | Lightweight and focused on PO parsing, simple API | php $parser = new PoParser(); $entries = $parser->parse(file_get_contents($source)); // Process entries... $output = $parser->build($entries); |
? Summary : NUL characters are essentially a "residual bytes" problem at the file system level, and the root cause is unsafe in-place overwriting. Adhering to the "read → process → write new file → atomic replacement" process can be completely avoided; and long-term maintenance for localization must be upgraded to a semantic-level PO parsing library - this is not only a technical selection, but also a watershed in engineering reliability.
The above is the detailed content of How to safely transcribe .PO files to Cyrillic and avoid NUL character pollution. For more information, please follow other related articles on the PHP Chinese website!
Hot AI Tools
Undress AI Tool
Undress images for free
AI Clothes Remover
Online AI tool for removing clothes from photos.
Undresser.AI Undress
AI-powered app for creating realistic nude photos
ArtGPT
AI image generator for creative art from text prompts.
Stock Market GPT
AI powered investment research for smarter decisions
Hot Article
Popular tool
Notepad++7.3.1
Easy-to-use and free code editor
SublimeText3 Chinese version
Chinese version, very easy to use
Zend Studio 13.0.1
Powerful PHP integrated development environment
Dreamweaver CS6
Visual web development tools
SublimeText3 Mac version
God-level code editing software (SublimeText3)
Hot Topics
20602
7
13698
4
How to implement OAuth2.0 authorization code mode in PHP_PHP complete authorization process [Operation]
Apr 13, 2026 pm 11:42 PM
The authorization URL generated by PHP must contain response_type=code, client_id, redirect_uri and state; the redirect_uri must be exactly the same as the registration (including the trailing slash), and the state must be stored in $_SESSION for comparison to prevent CSRF.
How to get the current timestamp in PHP_How to get the current timestamp in PHP [Operation]
Apr 13, 2026 pm 11:51 PM
The most direct and reliable way is to use the time() function, which returns the integer seconds since the Unix epoch, with zero parsing, zero dependencies, and no time zone impact; avoid using strtotime(‘now’) and date(‘U’), and recommend microtime(true) to obtain millisecond precision.
How to implement Eloquent Attribute Accounting in PHP_Laravel data operation audit tracking [Tutorial]
Apr 14, 2026 am 06:45 AM
Adding logs directly to Eloquent's $casts or getFooAttribute is invalid because the accessor/mutator is only triggered when the model attributes are read and written, and cannot capture batch updates, native SQL and other changes that bypass the model. The audit needs to cover all data modification scenarios.
How to safely modify table names referenced by foreign keys in Laravel migrations
Apr 17, 2026 pm 01:22 PM
This article introduces how to safely update the target table name of an existing foreign key constraint (such as changing from seller to sellers) through migration in Laravel, covering the key steps and precautions for deleting old constraints and rebuilding new constraints.
NGINX URL redirection in action: detailed explanation and best practices
Apr 22, 2026 am 06:17 AM
This article aims to provide a professional tutorial on how to configure URL redirection using Nginx. We will focus on the use of the rewrite directive, especially how to redirect the root path to a URL with query parameters, and delve into the difference between the redirect (302 temporary redirect) and permanent (301 permanent redirect) flags and their considerations in SEO and browser caching to ensure that the Nginx configuration is both efficient and in line with best practices.
MySQL inventory entry and exit details and balance query (filtered by date and warehouse)
Apr 17, 2026 pm 01:34 PM
This article explains in detail how to use MySQL CTE and UNION ALL to build a dynamic inventory flow report, summarize the purchase (Purchase), outbound (Order) quantity and real-time balance of each commodity according to the specified date and warehouse ID, and output a structured result set that can be directly used for business dashboards.
How to implement lazy loading of images to improve long page performance
Apr 22, 2026 am 04:26 AM
This article introduces how to use the loading="lazy" attribute of native HTML to easily load images on demand in the viewport, significantly reducing initial page resource consumption. It is especially suitable for scrolling long pages such as portfolios and galleries containing a large number of images. No JavaScript framework required and compatible with modern mainstream browsers.
How to safely transcribe .PO files to Cyrillic and avoid NUL character pollution
Apr 17, 2026 pm 12:44 PM
This article explains in detail the root cause of NUL NUL NUL (null byte) garbled characters when processing .po localized files in PHP, and provides a repair solution based on safe file stream operations. It emphasizes avoiding direct reading and writing of the same file, and recommends using a professional PO parsing library instead of manual string replacement.





