Home Backend Development PHP Tutorial PHP uses binary heap to implement TopK-algorithm

PHP uses binary heap to implement TopK-algorithm

May 23, 2018 pm 03:06 PM
php method

This article mainly introduces you to the method of using PHP to implement the TopK-algorithm using a binary heap. The introduction in the article is very detailed and has certain reference and learning value for everyone. Friends who need it can follow the editor to learn together. Bar.

Preface

In the past, I often encountered a problem when working or interviewing. How to achieve massive TopN is to achieve a very large result. To quickly find the largest 10 or 100 numbers in a set, while ensuring memory and speed efficiency, our first idea may be to use sorting, and then intercept the top 10 or 100, and sorting is not suitable when the amount is not particularly large. There is no problem, but as long as the amount is extremely large, it is impossible to complete this task. For example, if there are hundreds of millions of numbers in an array or text file, it is impossible to read them all into the memory, so using sorting to solve this problem is not The best, so here we use PHP to implement a small top heap to solve this problem. A heap is a special kind of heap. A binary heap is a complete binary tree or an approximately complete binary tree. There are two types of binary heaps, the maximum heap and the minimum heap. The maximum heap: the key value of the parent node is always greater than or equal to any child. The key value of the node; the minimum heap: the key value of the parent node is always less than or equal to the key value of any child node

小头 heap-(picture from the Internet)

Binary heaps are generally represented by arrays (see the figure above). For example, the position of the root node in the array is 0, and the child nodes at the nth position are respectively 2n 1 and 2n 2. Therefore, the The child nodes of position 0 are at 1 and 2, the child nodes of 1 are at 3 and 4, and so on. This storage method makes it easy to find parent nodes and child nodes.

I won’t go into details about the specific conceptual issues here. If you have any questions about the binary heap, you can have a good understanding of this data structure. Next, we will use PHP code to implement and solve the above topN problems. In order to see To find out the difference, let’s first use the sorting method to implement it and see how it works.



Use the quick sort algorithm to implement TopN


//为了测试运行内存调大一点
ini_set('memory_limit', '2024M');

//实现一个快速排序函数
function quick_sort(array $array){
 $length = count($array);
 $left_array = array();
 $right_array = array();
 if($length <= 1){
  return $array;
 }
 $key = $array[0];
 for($i=1;$i<$length;$i++){
  if($array[$i] > $key){
   $right_array[] = $array[$i];
  }else{
   $left_array[] = $array[$i];
  }
 }
 $left_array = quick_sort($left_array);
 $right_array = quick_sort($right_array);
 return array_merge($right_array,array($key),$left_array); 
}

//构造500w不重复数
for($i=0;$i<5000000;$i++){
 $numArr[] = $i; 
}
//打乱它们
shuffle($numArr);

//现在我们从里面找到top10最大的数
var_dump(time());
print_r(array_slice(quick_sort($all),0,10));
var_dump(time());
Copy after login

Result after running

You can see that the top10 results are printed above, and the running time is output, which is about 99s, but this is only a 5 million number and If everything can be loaded into the memory, if we have a file with 5kw or 500 million numbers, there will definitely be some problems.

Use the binary heap algorithm to implement TopN


The implementation process is: 1. First read 10 or 100 numbers into the array, this is Our topN number.

2. Call the generate small top heap function to generate a small top heap structure from this array. At this time, the top of the heap must be the smallest.
3. Traverse all the remaining numbers from the file or array in sequence.


4. Each time one is traversed, compare the size with the element on the top of the heap. If it is smaller than the element on the top of the heap, discard it. If it is larger than the element on the top of the heap, it will be discarded. Replace the top element.


5. After replacing the top element of the heap, call the generate small top heap function to continue generating the small top heap, because you need to find the smallest one.


6. Repeat the above steps 4~5, so that after all traversals are completed, the largest topN will be in our small top heap, because our small top heap will always exclude the smallest and leave the largest , and the speed of adjusting the small top heap is also very fast. It is just a relative adjustment, as long as the root node is smaller than the left and right nodes.


7. In terms of algorithm complexity, according to top10 in the worst case, That is, every time a number is traversed, if it is replaced with the top of the heap, it needs to be adjusted 10 times, which is faster than the sorting speed, and it does not read all the contents into the memory. It can be understood that the achievement is a linear traversal.


//生成小顶堆函数
function Heap(&$arr,$idx){
 $left = ($idx << 1) + 1;
 $right = ($idx << 1) + 2;

 if (!$arr[$left]){
  return;
 }

 if($arr[$right] && $arr[$right] < $arr[$left]){
  $l = $right;
 }else{
  $l = $left;
 }

 if ($arr[$idx] > $arr[$l]){
   $tmp = $arr[$idx]; 
   $arr[$idx] = $arr[$l];
   $arr[$l] = $tmp;
   Heap($arr,$l);
 }
}

//这里为了保证跟上面一致,也构造500w不重复数
/*
 当然这个数据集并不一定全放在内存,也可以在
 文件里面,因为我们并不是全部加载到内存去进
 行排序
*/
for($i=0;$i<5000000;$i++){
 $numArr[] = $i; 
}
//打乱它们
shuffle($numArr);

//先取出10个到数组
$topArr = array_slice($numArr,0,10);

//获取最后一个有子节点的索引位置
//因为在构造小顶堆的时候是从最后一个有左或右节点的位置
//开始从下往上不断的进行移动构造(具体可看上面的图去理解)
$idx = floor(count($topArr) / 2) - 1;

//生成小顶堆
for($i=$idx;$i>=0;$i--){
 Heap($topArr,$i);
}

var_dump(time());
//这里可以看到,就是开始遍历剩下的所有元素
for($i = count($topArr); $i < count($numArr); $i++){
 //每遍历一个则跟堆顶元素进行比较大小
 if ($numArr[$i] > $topArr[0]){
  //如果大于堆顶元素则替换
  $topArr[0] = $numArr[$i];
  /*
   重新调用生成小顶堆函数进行维护,只不过这次是从堆顶
   的索引位置开始自上往下进行维护,因为我们只是把堆顶
   的元素给替换掉了而其余的还是按照根节点小于左右节点
   的顺序摆放这也就是我们上面说的,只是相对调整下,并
   不是全部调整一遍
  */
  Heap($topArr,0);
 }
}
var_dump(time());
Copy after login



After running the result

You can see that the final result is also top10, only However, it only took about 1 second, and both memory and time efficiency met our requirements. And the best thing about sorting is that we don’t need to read all the data sets into the memory, because we don’t need to sort. The above is for demonstration, so 5 million elements are constructed directly in the memory. However, we can transfer all of them to a file, and then read them line by line for comparison, because the core point of our data structure is linear traversal and there are many elements in the memory. Compare the small top heap structures and finally get TopN.

The above is the entire content of this article, I hope it will be helpful to everyone's study.


Related recommendations:

php Implement hexadecimal conversion_php skills

How to compile redis and phpredis under Linux_php skills

##PHPHow to use the Mysqli class library to achieve perfect paging effect_php skills

The above is the detailed content of PHP uses binary heap to implement TopK-algorithm. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian Dec 24, 2024 pm 04:42 PM

PHP 8.4 brings several new features, security improvements, and performance improvements with healthy amounts of feature deprecations and removals. This guide explains how to install PHP 8.4 or upgrade to PHP 8.4 on Ubuntu, Debian, or their derivati

How To Set Up Visual Studio Code (VS Code) for PHP Development How To Set Up Visual Studio Code (VS Code) for PHP Development Dec 20, 2024 am 11:31 AM

Visual Studio Code, also known as VS Code, is a free source code editor — or integrated development environment (IDE) — available for all major operating systems. With a large collection of extensions for many programming languages, VS Code can be c

7 PHP Functions I Regret I Didn't Know Before 7 PHP Functions I Regret I Didn't Know Before Nov 13, 2024 am 09:42 AM

If you are an experienced PHP developer, you might have the feeling that you’ve been there and done that already.You have developed a significant number of applications, debugged millions of lines of code, and tweaked a bunch of scripts to achieve op

How do you parse and process HTML/XML in PHP? How do you parse and process HTML/XML in PHP? Feb 07, 2025 am 11:57 AM

This tutorial demonstrates how to efficiently process XML documents using PHP. XML (eXtensible Markup Language) is a versatile text-based markup language designed for both human readability and machine parsing. It's commonly used for data storage an

Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Apr 05, 2025 am 12:04 AM

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

PHP Program to Count Vowels in a String PHP Program to Count Vowels in a String Feb 07, 2025 pm 12:12 PM

A string is a sequence of characters, including letters, numbers, and symbols. This tutorial will learn how to calculate the number of vowels in a given string in PHP using different methods. The vowels in English are a, e, i, o, u, and they can be uppercase or lowercase. What is a vowel? Vowels are alphabetic characters that represent a specific pronunciation. There are five vowels in English, including uppercase and lowercase: a, e, i, o, u Example 1 Input: String = "Tutorialspoint" Output: 6 explain The vowels in the string "Tutorialspoint" are u, o, i, a, o, i. There are 6 yuan in total

Explain late static binding in PHP (static::). Explain late static binding in PHP (static::). Apr 03, 2025 am 12:04 AM

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

What are PHP magic methods (__construct, __destruct, __call, __get, __set, etc.) and provide use cases? What are PHP magic methods (__construct, __destruct, __call, __get, __set, etc.) and provide use cases? Apr 03, 2025 am 12:03 AM

What are the magic methods of PHP? PHP's magic methods include: 1.\_\_construct, used to initialize objects; 2.\_\_destruct, used to clean up resources; 3.\_\_call, handle non-existent method calls; 4.\_\_get, implement dynamic attribute access; 5.\_\_set, implement dynamic attribute settings. These methods are automatically called in certain situations, improving code flexibility and efficiency.

See all articles