Community Learn Tools Library Leisure

English

Home > Backend Development > PHP Tutorial > Given two files a and b, each storing 5 billion urls, each url occupies 64 bytes, and the memory limit is 4G, how to find the common urls of files a and b? , 5 billion 4g_PHP tutorial

Given two files a and b, each storing 5 billion urls, each url occupies 64 bytes, and the memory limit is 4G, how to find the common urls of files a and b? , 5 billion 4g_PHP tutorial

WBOY

Release： 2016-07-13 10:12:02

Original

944 people have browsed it

Given two files a and b, each storing 5 billion urls, each url occupies 64 bytes, and the memory limit is 4G. How to find the common url of files a and b? , 5 billion 4g

can estimate the size of each file as 5G*64=300G, which is much larger than 4G. So it is impossible to fully load it into memory for processing. Consider a divide and conquer approach.
Traverse the file a, obtain hash(url)%1000 for each url, and then store the url into 1000 small file (set to a0,a1,...a999). In this way, the size of each small file is approximately 300M. Traverse the file b, and store the urls into 1000 small files (b0, b1....b999) in the same way as a. After this processing, all possible identical urls are in the corresponding small files (a0 vs b0, a1 vs b1....a999 vs b999), and non-corresponding small files (such as a0 vs b99) It is impossible to have the same url. Then we only need to find the same url in 1000 pairs of small files.
For example, for a0 vs b0, we can traverse a0 and store the url in hash_map. Then traverse b0. If the url is in the hash_map, it means that this url exists in both a and b. Just save it to a file.
If the divided small files are uneven and some small files are too large (for example, larger than 2G), you can consider dividing these too large small files into small files in a similar way

Yesterday the Baidu interviewer asked me to study it today

Related labels：

url common Memory occupy how store find out document yes of given limit

source：php.cn

Previous article：Binary security of php_PHP tutorial Next article：PHP uses session to prevent URL attacks, phpsession prevents URL_PHP tutorial

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Latest Articles by Author

What is a NullPointerException, and how do I fix it?

2024-10-22 09:46:29
From Novice to Coder: Your Journey Begins with C Fundamentals

2024-10-13 13:53:41
Unlocking Web Development with PHP: A Beginner's Guide

2024-10-12 12:15:51
Demystifying C: A Clear and Simple Path for New Programmers

2024-10-11 22:47:31
Unlock Your Coding Potential: C Programming for Absolute Beginners

2024-10-11 19:36:51
Unleash Your Inner Programmer: C for Absolute Beginners

2024-10-11 15:50:41
Automate Your Life with C: Scripts and Tools for Beginners

2024-10-11 15:07:41
PHP Made Easy: Your First Steps in Web Development

2024-10-11 14:21:21
Build Anything with Python: A Beginner's Guide to Unleashing Your Creativity

2024-10-11 12:59:11
The Key to Coding: Unlocking the Power of Python for Beginners

2024-10-11 12:17:31

Latest Issues

function_exists() cannot determine the custom function Function test () {return true;} if (function_exists ('test')) {echo "test is function...

From 2024-04-29 11:01:01

0

2

1429

Creating this specific design using a grid: a step-by-step guide I'm trying to create a layout similar to this image, I've tried using this grid but can't ...

From 2024-04-06 20:29:08

0

2

492

Select woocommerce related products using custom taxonomy with 3 level hierarchy I have a woocommerce store with a custom classification of "Sports". The classif...

From 2024-04-06 20:05:30

0

1

544

Unable to get input element from website So I'm trying to get an input element from Twitter, but when I run it it keeps giving me a...

From 2024-04-06 18:59:57

0

1

442

Trouble getting specific statistics (Stats) from PokeAPI using Axios and Node.js I have a problem, I'm trying to use the PokemonAPI, but when I try to access the attack, H...

From 2024-04-06 18:46:35

0

1

464

Related Topics

More>

Popular Recommendations

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template