search
  • Sign In
  • Sign Up
Password reset successful

Follow the proiects vou are interested in andi aet the latestnews about them taster

Table of Contents
✅ Correct approach: Separate reading and writing to avoid overwriting in place
⚠️Key Notes
? Recommended solution: Use professional PO parsing library
Home Backend Development PHP Tutorial How to safely transcribe .PO files to Cyrillic and avoid NUL character pollution

How to safely transcribe .PO files to Cyrillic and avoid NUL character pollution

Apr 17, 2026 pm 12:44 PM

How to safely transcribe .PO files to Cyrillic and avoid NUL character pollution

This article explains in detail the root cause of NUL NUL NUL (null byte) garbled characters when processing .po localized files in PHP, and provides a repair solution based on safe file stream operations. It emphasizes avoiding direct reading and writing of the same file, and recommends using a professional PO parsing library instead of manual string replacement.

This article explains in detail the root cause of `NUL NUL NUL` (null byte) garbled characters when processing `.po` localized files in PHP, and provides a repair solution based on safe file stream operations. It emphasizes avoiding direct reading and writing of the same file, and recommends using a professional PO parsing library instead of manual string replacement.

When dealing with .po localization files (such as admin-sr_RS.po) for WordPress or other open source projects, developers often try to perform Latin transliteration of Cyrillic letters using a simple str_replace(). But as shown in the question, in the original code, the operation of directly opening the file in "r" mode, fread() and then ftruncate() fwrite() can easily cause NUL character residues - this is because after ftruncate(0) clears the file, if the length of the new content is smaller than the original file, the uncovered bytes at the end of the file (especially \x00 generated by multi-byte character truncation in UTF-8 encoding) will remain as NUL, causing the parser to crash or display exceptions (such as a large number of NULs in the screenshot) NUL NUL prefix).

✅ Correct approach: Separate reading and writing to avoid overwriting in place

The core principle is to never reuse the same file handle for "read-truncate-write" . Operations should be performed atomically: first read completely, then generate new content, and finally write to a completely new file (or safely replace). The following is the optimized implementation:

 function transliterate($textcyr = "") {
    $lat = [
        'a','b','v','g','d','đ','e','ž','z','i','j','k','l','lj','m','n','nj' ,'o','p','r','s','t','ć','u','f','h','c','č','dž','š','š','š',
        'A','B','V','G','D','Đ','E','Ž','Z','I','J','K','L','Lj','M','N','Nj ','O','P','R','S','T','Ć','U','F','H','C','Č','Dž','Š','Š','Š'
    ];
    $cyr = [
        'а','б','в','г','д','ђ','е','ж','з','и','ј','к','л','љ','м','н ','њ','о','п','р','с','т','ћ','у','ф','х','ц','ч','џ','ш','ш',
        'А','Б','В','Г','Д','Ђ','Е','Ж','З','И','Ј','К','Л','Љ','М',' Н','Њ','О','П','Р','С','Т','Ћ','У','Ф','Х','Ц','Ч','Џ','Ш','Ш'
    ];
    return str_replace($cyr, $lat, $textcyr);
}

function transliterate_po_safe() {
    $source = 'wp-content/languages/admin-sr_RS.po';
    $target = $source . '.transliterated'; // Temporary output file // ✅ Safe reading: file_get_contents automatically handles encoding and length $content = file_get_contents($source);
    if ($content === false) {
        throw new RuntimeException("Failed to read {$source}");
    }

    // ✅ Transliteration processing $transliterated = transliterate($content);

    // ✅ Safe writing: independent file to avoid any truncation risk if (file_put_contents($target, $transliterated) === false) {
        throw new RuntimeException("Failed to write {$target}");
    }

    // ✅ Atomic replacement (Linux/macOS) or safe rename (Windows compatible)
    if (!rename($target, $source)) {
        throw new RuntimeException("Failed to replace {$source} with transliterated version");
    }

    echo "✅ Transliteration completed successfully.\n";
}

⚠️Key Notes

  • Encoding consistency : .po files are usually UTF-8 (including BOM). Ensure that the default encoding of the PHP environment is UTF-8 (mb_internal_encoding('UTF-8')), and str_replace() can correctly match multi-byte characters under UTF-8 (PHP 7.4 supports it by default; older versions recommend using mb_ereg_replace or preg_replace with the u modifier instead).
  • Character mapping integrity : 'š' and 'š' appear repeatedly in the example array, which can easily lead to unexpected substitutions. Be sure to verify that the lengths of $cyr and $lat are strictly equal and there are no duplicate keys.
  • PO file structure sensitivity : .po is structured text (including msgid/msgstr, comments, metadata). Pure string replacement may break formatting (such as mistakenly replacing Cyrillic characters in comments). This method is strongly not recommended for production environments.

The truly robust solution is to use mature PO file parsers, which can accurately locate the content of the msgstr field, skip metadata and comments, and ensure format security:

Library Features Example usage
php-gettext/Gettext The most complete function, supports reading/writing/compiling (.mo), MIT license php $translations = Gettext\Translations::fromPoFile($source); foreach ($translations as $t) { $t->setTranslation(transliterate($t->getTranslation())); } $translations->toPoFile($target);
raulferras/PHP-po-parser Lightweight and focused on PO parsing, simple API php $parser = new PoParser(); $entries = $parser->parse(file_get_contents($source)); // Process entries... $output = $parser->build($entries);

? Summary : NUL characters are essentially a "residual bytes" problem at the file system level, and the root cause is unsafe in-place overwriting. Adhering to the "read → process → write new file → atomic replacement" process can be completely avoided; and long-term maintenance for localization must be upgraded to a semantic-level PO parsing library - this is not only a technical selection, but also a watershed in engineering reliability.

The above is the detailed content of How to safely transcribe .PO files to Cyrillic and avoid NUL character pollution. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact [email protected]

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

ArtGPT

ArtGPT

AI image generator for creative art from text prompts.

Stock Market GPT

Stock Market GPT

AI powered investment research for smarter decisions

Popular tool

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to implement OAuth2.0 authorization code mode in PHP_PHP complete authorization process [Operation] How to implement OAuth2.0 authorization code mode in PHP_PHP complete authorization process [Operation] Apr 13, 2026 pm 11:42 PM

The authorization URL generated by PHP must contain response_type=code, client_id, redirect_uri and state; the redirect_uri must be exactly the same as the registration (including the trailing slash), and the state must be stored in $_SESSION for comparison to prevent CSRF.

How to get the current timestamp in PHP_How to get the current timestamp in PHP [Operation] How to get the current timestamp in PHP_How to get the current timestamp in PHP [Operation] Apr 13, 2026 pm 11:51 PM

The most direct and reliable way is to use the time() function, which returns the integer seconds since the Unix epoch, with zero parsing, zero dependencies, and no time zone impact; avoid using strtotime(‘now’) and date(‘U’), and recommend microtime(true) to obtain millisecond precision.

How to implement Eloquent Attribute Accounting in PHP_Laravel data operation audit tracking [Tutorial] How to implement Eloquent Attribute Accounting in PHP_Laravel data operation audit tracking [Tutorial] Apr 14, 2026 am 06:45 AM

Adding logs directly to Eloquent's $casts or getFooAttribute is invalid because the accessor/mutator is only triggered when the model attributes are read and written, and cannot capture batch updates, native SQL and other changes that bypass the model. The audit needs to cover all data modification scenarios.

How to safely modify table names referenced by foreign keys in Laravel migrations How to safely modify table names referenced by foreign keys in Laravel migrations Apr 17, 2026 pm 01:22 PM

This article introduces how to safely update the target table name of an existing foreign key constraint (such as changing from seller to sellers) through migration in Laravel, covering the key steps and precautions for deleting old constraints and rebuilding new constraints.

NGINX URL redirection in action: detailed explanation and best practices NGINX URL redirection in action: detailed explanation and best practices Apr 22, 2026 am 06:17 AM

This article aims to provide a professional tutorial on how to configure URL redirection using Nginx. We will focus on the use of the rewrite directive, especially how to redirect the root path to a URL with query parameters, and delve into the difference between the redirect (302 temporary redirect) and permanent (301 permanent redirect) flags and their considerations in SEO and browser caching to ensure that the Nginx configuration is both efficient and in line with best practices.

MySQL inventory entry and exit details and balance query (filtered by date and warehouse) MySQL inventory entry and exit details and balance query (filtered by date and warehouse) Apr 17, 2026 pm 01:34 PM

This article explains in detail how to use MySQL CTE and UNION ALL to build a dynamic inventory flow report, summarize the purchase (Purchase), outbound (Order) quantity and real-time balance of each commodity according to the specified date and warehouse ID, and output a structured result set that can be directly used for business dashboards.

How to implement lazy loading of images to improve long page performance How to implement lazy loading of images to improve long page performance Apr 22, 2026 am 04:26 AM

This article introduces how to use the loading="lazy" attribute of native HTML to easily load images on demand in the viewport, significantly reducing initial page resource consumption. It is especially suitable for scrolling long pages such as portfolios and galleries containing a large number of images. No JavaScript framework required and compatible with modern mainstream browsers.

How to safely transcribe .PO files to Cyrillic and avoid NUL character pollution How to safely transcribe .PO files to Cyrillic and avoid NUL character pollution Apr 17, 2026 pm 12:44 PM

This article explains in detail the root cause of NUL NUL NUL (null byte) garbled characters when processing .po localized files in PHP, and provides a repair solution based on safe file stream operations. It emphasizes avoiding direct reading and writing of the same file, and recommends using a professional PO parsing library instead of manual string replacement.

Related articles