Definition: Given n elements in a linear sequence and an integer k, 1≤k≤n, it is required to find the kth smallest among these n elements Elements.
(1) In some special cases, it is easy to design a linear time algorithm to solve the selection problem. For example: when you want to select the largest element or the smallest element, it can obviously be done in O(n) time. (Just one comparison)
(2) The general selection problem, especially the selection problem of the median, seems to be more difficult than the smallest (large) element. But in fact, in an asymptotic order sense, they are the same. It can also be done in O(n) time.
Linear time selection method one: randomizedSelect
Idea: Adapt random quick sort, instead of sorting the entire array, but the selected sort (Faster)
Time complexity:
(1) In the worst case, the algorithm randomizedSelect requires O(n^2) calculation time
eg. We need to find the smallest element, but the position obtained every time we divide by the Partition function is always very large (close to n) (that is, it is always divided at the largest element)
(2) But it can It is proved that the algorithm randomizedSelect can find the kth smallest element among n input elements in O(n) average time.
The code is as follows:
#include<iostream> #include<cstdio> #include<algorithm> using namespace std; int Partition(int a[],int p,int r){ int i=p,j=r+1,x=a[p]; while(1){ while(a[++i]<x&&i<r); while(a[--j]>x); if(i>=j)break; swap(a[i],a[j]); } a[p]=a[j]; a[j]=x; return j; } int RandomizedPartition(int a[],int p,int r){ int i=rand()%(r-p)+p; swap(a[p],a[i]); return Partition(a,p,r); } int RandomizedSelect(int a[],int p,int r,int k){ if(p==r)return a[p]; int i=RandomizedPartition(a,p,r);//返回基准元素的位置 int j=i-p+1;//表示基准元素及其左边一共多少个元素 if(k<=j)RandomizedSelect(a,p,i,k); else RandomizedSelect(a,i+1,r,k-j); } int main(){ int a[10]={3,1,7,6,5,9,8,2,0,4}; int x; while(scanf("%d",&x)!=EOF){ int ans=RandomizedSelect(a,0,9,x); printf("%d\n",ans); } }
Linear time selection method two:
If you can find a ## in linear time #Dividing benchmark, so that the length of the two sub-arrays divided according to this benchmark is at least ε times the length of the original array (0<ε<1 is a certain positive constant), then it can be used in the worst case It takes O(n) time to complete the selection task.
For example, if ε=9/10, the length of the subarray generated by the recursive call of the algorithm is shortened by at least 1/10. Therefore, in the worst case, the calculation time T(n) required by the algorithm satisfies the recursive formula T(n)≤T(9n/10) O(n). From this we can get T(n)=O(n). When the teacher talked about this, I remember it very clearly. He emphasized thatfind rather than determine. "Find" means to find the median we want. The value, and our previous quick sort and so on were to determine the position of the value, that is, to put the reference element in the correct position.
Steps:
(1) Divide n input elements into n/5 (round up) groups, each group contains 5 elements, and there may be at most one group that does not contain 5 elements. Use any sorting algorithm to sort the elements in each group, and take the median of each group, a total of n/5 (rounded up).
(2) Recursively call select to find the median of these n/5 (rounded up) elements. If n/5 (rounded up) is an even number, find the larger of its 2 medians. Use this element as the basis for division.
Schematic diagram of partitioning strategy:
White dot: median of each group; Point x: median of the median
Example:
In ascending order, find the following The 18th element of 29 elements: 8,31,60,33,17,4,51,57,49,35,11,43,37,3,13,52,6,19,25,32,
54,16,5,41,7,23,22,46,29.
(1) Divide the first 25 elements into 5 (=floor(29/5)) groups: (8,31,60, 33,17),(4,51,57,49,35),(11,43,37,3,13),(52,6,19,25,32),(54,16,5,41, 7);
(2) Extract the median element of each group to form the set {31,49,13,25,16};
(3) Recursively use the algorithm to find the median of the set, Get m=25;
(4) According to m=25, divide the 29 elements into 3 subarrays (in the original order)
P={8,17,4,11, 3,13,6 ,19,16,5,7,23,22}
Q={25}
(6) Divide R into 3 (floor(15/5)) groups: {31,60,33,51,57},{49,35,43 ,37,52},{32,54,41,46,29}
(7) Find the median element of these three groups of elements: {51,43,41}, the median element of this set is 43;
(8) Divide R into 3 groups based on 43:
Complexity:
Suppose the array length is nWhen n<75, the calculation time used by the algorithm select Not exceeding a certain constant C1
When n≥75, the for loop is executed n/5 times, and each time it takes a certain constant (a fixed number, that is, searching among 5!); select to find the median of the median Since the length is 1/5 of the original length, the time taken can be recorded as T(n/5); after division, the array obtained has at most 3n/4 elements, and the time taken is recorded as T(3n/4). So T(n) can be expressed recursively as:
上述算法将每一组的大小定为5,并选取75作为是否作递归调用的分界点(大于75使用该算法)。这2点保证了T(n)的递归式中2个自变量之和n/5+3n/4=19n/20=εn,0<ε<1。这是使T(n)=O(n)的关键之处。当然,除了5和75之外,还有其他选择。
注意:
(1)设中位数的中位数是x,比x小和比x大的元素至少3*(n-5)/10个,原因:
3*(n/5-1)*1/2
3---中位数比x小的每一组中有3个元素比x小
n/5-1---有5个数的组数
1/2---大概有1/2组的中位数比x小
(2)而当n≥75时,3(n-5)/10≥n/4所以按此基准划分所得的2个子数组的长度都至少缩短1/4,也就是说,长度最长为原长度的3/4。
如图,划分的部分左上是肯定比x小的(大概占1/4)右下是肯定比x大的(大概占1/4)左下和右上不确定,就算这两部分同时不比x小或比x大,划分成的子区间也能至少缩短1/4!
核心代码:
Type Select(Type a[], int p, int r, int k) { if (r-p<75) { //用某个简单排序算法对数组a[p:r]排序; return a[p+k-1]; }; for (int i=0;i<=(r-p-4)/5;i++)//i即为n个元素的分组个数 //将a[p+5*i]至a[p+5*i+4]的第3小元素与a[p+i]交换位置; //将中位数元素换至前面 //找中位数的中位数,r-p-4即上面所说的n-5 Type x=Select(a,p,p+(r-p-4)/5,(r-p-4)/10);//x是中位数的中位数 int i=Partition(a,p,r,x),j=i-p+1;//i为快排一趟找到区间[p,r]中x应该在的位置,j为[p,i]区间的元素个数 if (k<=j) return Select(a,p,i,k); else return Select(a,i+1,r,k-j); }
关键的代码是:
for ( int i = 0; i<=(r-p-4)/5; i++ )//i即为n个元素的分组个数 //将a[p+5*i]至a[p+5*i+4]的第3小元素与a[p+i]交换位置; //将中位数元素换至前面
一共(r-p+1)/5个组
注意这里i从0开始表示,为了方便交换时带入数组的下标,0-(r-p-4)/5,即一共(r-p-4)/5+1各组,即(r-p+1)/5个组
代码如下:
#include<iostream> #include<cstdio> #include<cstring> #include<stack> #include<algorithm> using namespace std; void bubbleSort(int a[],int p,int r){ for(int i=p;i<r;i++){ for(int j=i+1;j<=r;j++){ if(a[j]<a[i])swap(a[i],a[j]); } } } int Partition(int a[],int p,int r,int val){ int pos; for(int q=p;q<=r;q++){ if(a[q]==val){pos=q;break;} } swap(a[p],a[pos]); int i=p,j=r+1,x=a[p]; while(1){ while(a[++i]<x&&i<r); while(a[--j]>x); if(i>=j)break; swap(a[i],a[j]); } a[p]=a[j]; a[j]=x; return j; } int Select(int a[],int p,int r,int k){ if(r-p<75){ bubbleSort(a,p,r); return a[p+k-1]; } for(int i=0;i<=(r-p-4)/5;i++){//把每个组的中位数交换到区间[p,p+(r-p-4)/4] int s=p+5*i,t=s+4; for(int j=0;j<3;j++){//冒泡排序,从后开始排,结果使得后三个数是排好顺序的(递增) for(int n=s;n<t-j;n++){ if(a[n]>a[n+1])swap(a[n],a[n-1]); } } swap(a[p+i],a[s+2]);//交换每组中的中位数到前面 } //(r-p-4)/5表示组数-1,则[p,p+(r-p-4)/5]的区间长度等于组数 int x=Select(a,p,p+(r-p-4)/5,(r-p-4)/10);//求中位数的中位数 int i=Partition(a,p,r,x),j=i-p+1; if(k<=j)return Select(a,p,i,k); else return Select(a,i+1,r,k-j); } int main(){ int x; //数组a存了0-79 int a[80]={3,1,7,6,5,9,8,2,0,4, 13,11,17,16,15,19,18,12,10,14, 23,21,27,26,25,29,28,22,20,24, 33,31,37,36,35,39,38,32,30,34, 43,41,47,46,45,49,48,42,40,44, 53,51,57,56,55,59,58,52,50,54, 63,61,67,66,65,69,68,62,60,64, 73,71,77,76,75,79,78,72,70,74, }; while(scanf("%d",&x)!=EOF){ printf("第%d大的数是%d\n",x,Select(a,0,79,x)); } }
qwq,博主nc写错输出了,“第i小的数”
更多常见问题的相关技术文章,请访问常见问题教程栏目进行学习!
The above is the detailed content of linear time selection. For more information, please follow other related articles on the PHP Chinese website!