Detailed explanation of file content deduplication and sorting related content

jacklove
Release: 2023-03-30 21:58:02
Original
1701 people have browsed it

This article will use the php and linux sort commands to achieve deduplication and sorting of file contents respectively, and provide complete demonstration code.

1. Create a test file

Write 1000000 numbers, one number per line

<?php$file = &#39;user_id.txt&#39;;$num = 1000000;$tmp = &#39;&#39;;for($i=0; $i<$num; $i++){    $tmp .= mt_rand(0,999999).PHP_EOL;    if($i>0 && $i%1000==0 || $i==$num-1){
        file_put_contents($file, $tmp, FILE_APPEND);        $tmp = &#39;&#39;;
    }
}?>
Copy after login

View the number of file lines

wc -l user_id.txt
 1000000 user_id.txt
Copy after login

2.php implements deduplication and sorting

Since 1000000 rows of data need to be processed, the memory available to php is set to 256m to prevent memory loss during execution. insufficient.

<?php/**
 * 文件内容去重及排序
 * @param String $source    源文件
 * @param String $dest      目标文件
 * @param String $order     排序顺序
 * @param Int    $sort_flag 排序类型
 */function fileUniSort($source, $dest, $order=&#39;asc&#39;, $sort_flag=SORT_NUMERIC){

    // 读取文件内容
    $file_data = file_get_contents($source);    // 文件内容按行分割为数组
    $file_data_arr = explode(PHP_EOL, $file_data);    // 去除空行数据
    $file_data_arr = array_filter($file_data_arr, &#39;filter&#39;);    // 去重
    $file_data_arr = array_flip($file_data_arr);    $file_data_arr = array_flip($file_data_arr);    // 排序
    if($order==&#39;asc&#39;){
        sort($file_data_arr, $sort_flag);
    }else{
        rsort($file_data_arr, $sort_flag);
    }    // 数组合拼为文件内容
    $file_data = implode(PHP_EOL, $file_data_arr).PHP_EOL;    // 写入文件
    file_put_contents($dest, $file_data, true);

}// 过滤空行function filter($data){
    if(!$data && $data!==&#39;0&#39;){        return false;
    }    return true;
}// 设置可使用内存为256mini_set(&#39;memory_limit&#39;, &#39;256m&#39;);$source = &#39;user_id.txt&#39;;$dest = &#39;php_sort_user_id.txt&#39;;

fileUniSort($source, $dest);?>
Copy after login

View the deduplicated and sorted files

wc -l php_sort_user_id.txt 
  632042 php_sort_user_id.txt

head php_sort_user_id.txt 
012357891112...
Copy after login

3. The linux sort command implements deduplication and sorting

The linux sort command is used to sort text files by lines

Format:

sort [OPTION]... [FILE]...
Copy after login


Parameter description:

-u Deduplication
-n Numeric sorting type
-r Descending order
-o Path to output file

Use sort to perform deduplication and Sorting

sort -uno linux_sort_user_id.txt user_id.txt
Copy after login


View the deduplicated and sorted files

wc -l linux_sort_user_id.txt 
  632042 linux_sort_user_id.txt

head linux_sort_user_id.txt 
012357891112...
Copy after login


Summary: File deduplication and sorting can be achieved using the php or linux sort command, and the execution time is different. Not big, but it is recommended that for file operations, it is easier to use system commands directly.

This article will use the php and linux sort commands to implement deduplication and sorting of file contents respectively, and provide complete demonstration code.

This article explains the relevant content of file content deduplication and sorting. For more information, please pay attention to the PHP Chinese website.

Related recommendations:

Interpretation of mysql case-sensitive configuration issues

How to use PHP to merge arrays and retain key values

How to use phpcurl to implement multi-process download file classes


The above is the detailed content of Detailed explanation of file content deduplication and sorting related content. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template