Super detailed python common interview questions-Python Tutorial-php.cn

Super detailed python common interview questions

The content of this article is about super detailed common Python interview questions. It has certain reference value for friends in need. You can refer to it, I hope it will be helpful to you.

Recommended related articles: "2020 python interview questions summary (latest)"

1. Big data file reading

① Use generator

②Iterator for iterative traversal: for line in file

2. The difference between iterators and generators

1) Iterator is a more abstract concept. Any object, if its class has next method and iter method, returns itself. . For container objects such as string, list, dict, tuple, etc., it is very convenient to use a for loop to traverse. In the background for statement, the iter() function is called on the container object. iter() is a built-in function of Python. iter() will return an iterator object that defines the next() method, which accesses the elements in the container one by one. Next() is also a built-in function of Python. When there are no subsequent elements, next() will throw a StopIteration exception

2) Generator is a simple and powerful tool for creating iterators. They are written like regular functions, except that they use the yield statement when they need to return data. Each time next() is called, the generator returns the position where it left off (it remembers the position where the statement was last executed and all data values)

Difference: Generators can do what iterators can do Everything, and because the __iter__() and next() methods are automatically created, the generator is particularly concise, and the generator is also efficient. Using generator expressions instead of list comprehensions can save memory at the same time. In addition to the automatic methods of creating and saving program state, the StopIteration exception will also be automatically thrown when the generator terminates

3. The role and function of the decorator:

Introducing logs

Function execution time statistics

Preliminary processing before executing the function

Cleaning function after executing the function

Permission verification and other scenarios

Cache

4. Let’s briefly talk about GIL:

Global Interpreter Lock

Python The execution of the code is controlled by the Python virtual machine (also called the interpreter main loop, CPython version). Python was originally designed to have only one thread executing in the interpreter's main loop at the same time, that is, at any time, Only one thread runs in the interpreter. Access to the Python virtual machine is controlled by the Global Interpreter Lock (GIL), which ensures that only one thread is running at a time.

In a multi-threaded environment, the Python virtual machine executes as follows:

1. Set up GIL

2. Switch to a thread to run

3. Run:

a. Specify the number of bytecode instructions, or; b. The thread actively gives up control (time.sleep(0) can be called)

4. Set the thread to sleep state

5. Unlock the GIL

6. Repeat all the above steps again

before calling external code (such as C/C extension function) , the GIL will be locked until the end of this function (since no Python bytecode is run during this period, no thread switching will be performed).

5. Find and grep

The grep command is a powerful text search tool. The grep search content string can be a regular expression, allowing patterning of text files. Find. If a matching pattern is found, grep prints all lines containing the pattern.

find is usually used to search for files that meet the conditions in a specific directory. It can also be used to search for files owned by a specific user.

6. What should I do if the online service may hang up due to various reasons?

Supervisor, a powerful background process management tool under Linux

After each file modification, Linux executes service supervisord restart

7. How to improve the operation of python Efficiency

Use generators; key codes use external function packages (Cython, pylnlne, pypy, pyrex); optimization for loops – try to avoid accessing the properties of variables in loops

8. Commonly used Linux commands:

ls,help,cd,more,clear,mkdir,pwd,rm,grep,find,mv,su,date

9. Yield usage in Python

Yield is simply a generator, so that the function remembers the position in the function body when it last returned. The second (or n) call to the generator jumps to this function.

10. How Python performs memory management

1. Garbage collection: Python is not like C, Java and other languages. They can assign values to variables directly without declaring the variable type in advance. For the Python language, the type and memory of the object are determined at runtime. This is also the reason why we call the Python language dynamically typed (here we can simply boil down dynamic typing to the allocation of variable memory addresses, which is to automatically determine the variable type and assign a value to the variable at runtime).

2. Reference counting: Python uses a method similar to Windows kernel objects to manage memory. Each object maintains a count of references pointing to the object. When a variable is bound to an object, the reference count of the variable is 1 (there are other situations that will also cause the variable reference count to increase). The system will automatically maintain these labels and scan them regularly. When a label When the reference count becomes 0, the pair will be recycled.

3. Memory pool mechanism Python’s memory mechanism is organized in pyramid rows. The -1 and -2 layers are mainly operated by the operating system.

The 0th layer is memory such as malloc and free in C. Allocation and release functions operate;

The first and second layers are memory pools, which are implemented by the Python interface function PyMem_Malloc function. When the object is less than 256K, this layer directly allocates memory;

The third layer is the top layer, which is our direct operation of Python objects;

If malloc and free are frequently called in C, performance problems will occur. In addition, frequent allocation and Releasing small blocks of memory will produce memory fragmentation. The main tasks Python does here are:

If the requested memory allocation is between 1 and 256 bytes, use its own memory management system, otherwise use malloc directly. .

Malloc will still be called here to allocate memory, but a large block of memory with a size of 256k will be allocated each time.

The memory registered through the memory pool will still be recycled to the memory pool in the end. C's free will not be called to release it for next time use. For simple Python objects, such as values, strings, and tuples (tuples are not allowed to be changed), the copy method (deep copy?) is used, that is, It is said that when another variable B is assigned to variable A, although the memory spaces of A and B are still the same, when the value of A changes, the space will be re-allocated to A, and the addresses of A and B will no longer be the same.

11. Describe the differences between arrays, linked lists, queues, and stacks?

Arrays and linked lists are concepts of data storage. Arrays store data in continuous space, while linked lists can store data in non-continuous spaces;

Queue and stack are Describing the concept of data access methods, queues are first in first out, and stacks are last in first out; queues and stacks can be implemented using arrays or linked lists.

12. Do you know several kinds of sorting? Can you tell me about the one you are most familiar with?

web framework part

1. In Django, when a user logs into application server A (enters the login state), what will be the impact if the next request is proxied by nginx to application server B?

If the session data of the user logging in to the A application server is not shared to the B application server, the previous login status of Nano will be lost.

2. How to solve the cross-domain request problem in Django (principle)
Enable middleware
post request
Verification code
Add the {%csrf_token%} tag in the form

3. Please explain or describe the architecture of Django

The Django framework follows the MVC design and has a proper noun: MVT
M is spelled Model, which has the same function as M in MVC , responsible for data processing, with an embedded ORM framework.
V is spelled View, which has the same function as C in MVC. It receives HttpRequest, performs business processing, and returns HttpResponse.
T is spelled Template, which is the same as V function in MVC. Same, responsible for encapsulating and constructing the html to be returned, with a template engine embedded

4. How to sort data query results with django, how to do descending order, how to do query greater than a certain field
Sort using order_by( )
Descending order requires adding -
before the sorting field name: query the field greater than a certain value: use filter (field name_gt=value)

5. Tell me about Django, the role of MIDDLEWARES middleware?
Middleware is a processing process between request and response processing. It is relatively lightweight and globally changes the input and output of Django.

6. What do you know about Django?
Django is taking a big and comprehensive direction. It is most famous for its fully automated management backend: you only need to use ORM and make simple object definitions, and it can automatically generate a database structure and a full-featured management backend. .
Django’s built-in ORM is highly coupled with other modules in the framework.

The application must use Django's built-in ORM, otherwise it will not be able to enjoy the various ORM-based conveniences provided in the framework; in theory, you can switch out its ORM module, but this is equivalent to renovating the completed If the house is to be demolished and renovated, it is better to go to the rough house to do a new decoration from the beginning.

The selling point of Django is its ultra-high development efficiency, but its performance expansion is limited; projects using Django need to be reconstructed to meet performance requirements after the traffic reaches a certain scale.

Django is suitable for small and medium-sized websites, or as a tool for large websites to quickly implement product prototypes.

The design philosophy of Django templates is to completely separate code and styles; Django fundamentally eliminates the possibility of coding and processing data in templates.

7. How do you implement Django redirection? What status code is used?
Use HttpResponseRedirect
redirect and reverse
Status code: 302,301

8.ngnix forward proxy and reverse proxy?
The forward proxy is a server between the client and the origin server. In order to obtain content from the origin server, the client sends a request to the proxy and specifies the target (origin server), and then the proxy sends the content to the origin server. Hands off the request and returns the obtained content to the client. The client must make some special settings to use the forward proxy.
The reverse proxy is just the opposite. It is like the original server to the client, and the client does not need to make any special settings. The client sends a normal request to the content in the reverse proxy's namespace, and then the reverse proxy will determine where to forward the request (original server) and return the obtained content to the client, as if the content was itself The same.

9.What is the core of Tornado?
The core of Tornado is the two modules ioloop and iostream. The former provides an efficient I/O event loop, and the latter encapsulates a non-blocking socket. By adding network I/O events to ioloop, using non-blocking sockets, and matching the corresponding callback functions, you can achieve the coveted efficient asynchronous execution.

10.Django itself provides runserver, why can’t it be used for deployment? The
runserver method is a running method often used when debugging Django. It uses Django's own
WSGI Server to run. It is mainly used in testing and development, and the runserver method is also single-process.
uWSGI is a web server that implements the WSGI protocol, uwsgi, http and other protocols. Note that uwsgi is a communication protocol, and uWSGI is a web server that implements the uwsgi protocol and the WSGI protocol. uWSGI has the advantages of ultra-fast performance, low memory usage, and multi-app management. When paired with Nginx
, it becomes a production environment, which can isolate user access requests from application apps to achieve real deployment. In comparison, it supports a higher amount of concurrency, facilitates the management of multiple processes, takes advantage of multi-cores, and improves performance.

Network programming and front-end part

1.What is AJAX and how to use AJAX?
ajax (asynchronous javascript and xml) can refresh partial web page data instead of reloading the entire web page.
The first step is to create the xmlhttprequest object, var xmlhttp =new XMLHttpRequest(); the XMLHttpRequest object is used to exchange data with the server.
The second step is to use the open() and send() methods of the xmlhttprequest object to send resource requests to the server.
The third step is to use the responseText or responseXML attribute of the xmlhttprequest object to obtain the server's response.
The fourth step, onreadystatechange function, when sending a request to the server, we need to use the onreadystatechange function if we want the server to respond and perform some functions. The onreadystatechange function will be triggered every time the readyState of the xmlhttprequest object changes.

2. What are the common HTTP status codes?

200 OK
301 Moved Permanently
302 Found
304 Not Modified
307 Temporary Redirect
400 Bad Request
401 Unauthorized
403 Forbidden
404 Not Found
410 Gone
500 Internal Server Error
501 Not Implemented

3. What is the difference between Post and get?

GET request, the requested data will be appended to the URL to split the URL and transmit data. Multiple parameters are connected with &. The URL encoding format uses ASCII encoding instead of uniclde, which means that all non-ASCII characters must be encoded before being transmitted.
POST request: POST request will place the requested data in the body of the HTTP request package. The item=bandsaw above is the actual transmission data.
Therefore, the data of the GET request will be exposed in the address bar, but the POST request will not.

2. The size of the transmitted data

In the HTTP specification, there are no restrictions on the length of the URL and the size of the transmitted data. But in the actual development process, for GET, specific browsers and servers have restrictions on the length of the URL. Therefore, when using GET requests, the transferred data is limited by the length of the URL.

For POST, since it is not a URL value, it will not be restricted in theory. However, in fact, each server will stipulate a limit on the size of POST submitted data, and Apache and IIS have their own configurations.

3. Security

The security of POST is higher than that of GET. The security here refers to real security, which is different from the security in the security method mentioned above in GET. The security mentioned above is just not modifying the server's data. For example, when performing a login operation, through a GET request, the username and password will be exposed on the URL. Because the login page may be cached by the browser and other people view the browser's history, the username and password at this time are easy to Got it by someone else. In addition, the data submitted by the GET request may also cause Cross-site request frogery attacks.

4. What is the difference between cookie and session?

1. Cookie data is stored on the client's browser, and session data is stored on the server.
2. Cookies are not very safe. Others can analyze the COOKIE stored locally and conduct COOKIE deception. Considering security, session should be used.
3. The session will be saved on the server within a certain period of time. When access increases, it will take up more server performance. In order to reduce server performance, COOKIE should be used.
4. The data saved by a single cookie cannot exceed 4K. Many browsers limit a site to save up to 20 cookies.
5. Suggestion:
Store important information such as login information as SESSION
If other information needs to be retained, it can be placed in COOKIE

5. Required to create a simple tcp server Process

1.socket creates a socket
2.bind binds ip and port
3.listen so that the socket can be passively connected
4. accept waits for the client's link
5.recv/send to receive and send data

Crawler and database part

1. What is the difference between scrapy and scrapy-redis? Why choose redis database?

(1) Scrapy is a Python crawler framework with extremely high crawling efficiency and high degree of customization, but it does not support distribution. Scrapy-redis is a set of components based on the redis database and running on the scrapy framework, which allows scrapy to support distributed strategies. The Slaver side shares the item queue, request queue and request fingerprint set in the Master side redis database.

(2) Why choose the redis database? Because redis supports master-slave synchronization, and the data is cached in memory, the distributed crawler based on redis is very efficient in high-frequency reading of requests and data. high.

2. What crawler frameworks or modules have you used? Talk about their differences or advantages and disadvantages?
Python comes with: urllib, urllib2
Third party: requests
Framework: Scrapy
Both the urllib and urllib2 modules do operations related to requesting URLs, but they provide different functions.
urllib2.: urllib2.urlopen can accept a Request object or url (when accepting a Request object, you can set the headers of a URL), urllib.urlopen only accepts a url
urllib has urlencode, There is no urllib2, so it is always urllib. The reason why urllib2 is often used together
scrapy is an encapsulated framework. It includes a downloader, parser, log and exception handling. It is based on multi-threading and twisted processing. For fixed single There are advantages in website crawling development, but for crawling 100 websites from multiple websites, it is not flexible enough in terms of concurrent and distributed processing, and it is inconvenient to adjust and expand.

request is an HTTP library. It is only used to make requests. For HTTP requests, it is a powerful library. Downloading and parsing are all handled by yourself. It has higher flexibility, high concurrency and distributed deployment. It is very flexible and can better implement functions.

Scrapy advantages and disadvantages:

Advantages:scrapy is asynchronous

Use more readable xpath instead of regular expressions

Powerful statistics and log system

Crawling on different URLs at the same time

Supports shell mode to facilitate independent debugging

Write middleware to facilitate writing some unified filters

Save to the database through pipelines

Disadvantages: Based on python crawler framework, relatively scalable Difference

Based on the twisted framework, running exceptions will not kill the reactor, and the asynchronous framework will not stop other tasks after an error occurs. It is difficult to detect data errors.

3. What are your commonly used mysql engines? What are the differences between the engines?

There are two main engines, MyISAM and InnoDB. The main differences are as follows:

1. InnoDB supports transactions, but MyISAM does not. This is very important. Transaction is a high-level processing method. For example, in some series of additions, deletions, and modifications, as long as an error occurs, it can be rolled back and restored, but MyISAM

cannot;

2. MyISAM is suitable for queries and insertion-based applications, while InnoDB is suitable for frequent modifications and applications involving

high security;

3. InnoDB supports foreign keys, but MyISAM does not support it;

4. MyISAM is the default engine, InnoDB needs to be specified;

5. InnoDB does not support FULLTEXT type index;

6. The number of rows in the table is not saved in InnoDB. For example, when selecting count(*) from table, InnoDB needs to

scan the entire table to calculate how many rows there are, but MyISAM only needs to simply read out the number of saved rows. Note that when the count(*) statement contains the where condition, MyISAM also needs to scan the entire table;

7. For self-increasing fields, InnoDB must contain an index of only that field, but in MyISAM

The table can create a joint index with other fields;

8. When clearing the entire table, InnoDB deletes rows one by one, which is very slow. MyISAM will

rebuild the table;

9. InnoDB supports row locks (in some cases, the entire table is locked, such as update table set a=1 where

user like '%lee%'

4. Describe the mechanism of scrapy framework operation?

Obtain the first batch of URLs from start_urls and send the request. The request is handed over to the scheduler by the engine and put into the request queue. After the acquisition is completed, the scheduler hands the request in the request queue to the downloader to obtain the response resource corresponding to the request. And hand the response to the parsing method you wrote for extraction processing: 1. If the required data is extracted, hand it to the pipeline file for processing; 2. If the URL is extracted, continue to perform the previous steps (send the URL request, and The engine hands the request to the scheduler and queues it...) until there are no requests in the request queue and the program ends.

5. What are related queries and what are they?

Combine multiple tables for query, mainly including inner join, left join, right join, full join (outer join)

6. Write crawler using Is multiple processes better? Or multi-threading is better? Why?

For IO-intensive code (file processing, web crawlers, etc.), multi-threading can effectively improve efficiency (if there are IO operations under a single thread, IO waiting will occur, causing unnecessary waste of time, and opening multiple The thread can automatically switch to thread B while thread A is waiting, so that CPU resources are not wasted, thereby improving program execution efficiency). In the actual data collection process, you need to consider not only the network speed and response issues, but also the hardware conditions of your own machine to set up multi-process or multi-thread

7. Database optimization ?

1. Optimize indexes, SQL statements, and analyze slow queries;

2. When designing tables, design the database strictly according to the database design paradigm;

3 .Use cache to put frequently accessed data and data that does not need to change frequently in the cache, which can

save disk IO;

4. Optimize hardware; use SSD and use disk queues Technology (RAID0, RAID1, RDID5), etc.;

5. Using MySQL’s own table partitioning technology to layer data into different files can improve the reading efficiency of the disk

;

6. Vertical table partitioning; put some infrequently read data in one table to save disk I/O;

7. Master-slave separation of reading and writing; use master-slave Replication separates the read operations and write operations of the database;

8. Divide databases, tables, and machines (the amount of data is extremely large). The main principle is data routing;

9 .Select the appropriate table engine and optimize parameters;

10. Carry out architecture-level caching, staticization and distribution;

11. Do not use full-text index;

12. Use faster storage methods, such as NoSQL to store frequently accessed data

8. Common anti-crawlers and countermeasures?

1). Anti-crawling through Headers

Headers anti-crawling requested from users is the most common anti-crawling strategy. Many websites will detect the User-Agent of Headers, and some websites will detect Referer (the anti-leeching of some resource websites is to detect Referer). If you encounter this type of anti-crawler mechanism, you can directly add Headers to the crawler and copy the browser's User-Agent to the crawler's Headers; or change the Referer value to the target website domain name. For anti-crawlers that detect Headers, they can be easily bypassed by modifying or adding Headers in the crawler.

2). Anti-crawlers based on user behavior

There are also some websites that detect user behavior, such as the same IP visiting the same page multiple times in a short period of time, or the same account multiple times in a short period of time Do the same.

Most websites are in the former situation. For this situation, using an IP proxy can solve it. You can write a special crawler to crawl the proxy IPs that are public on the Internet, and save them all after detection. Such proxy IP crawlers are often used, so it is best to prepare one yourself. After you have a large number of proxy IPs, you can change one IP every few requests. This is easy to do in requests or urllib2, so that you can easily bypass the first anti-crawler.

For the second case, you can randomly wait a few seconds after each request before making the next request. Some websites with logical loopholes can bypass the restriction that the same account cannot make the same request multiple times in a short period of time by requesting several times, logging out, logging in again, and continuing to request.

3). Anti-crawler for dynamic pages

Most of the above situations occur on static pages, and there are also some websites where the data we need to crawl is obtained through ajax requests. Or generated via JavaScript. First use Fiddler to analyze network requests. If we can find the ajax request and analyze the specific parameters and the specific meaning of the response, we can use the above method to directly simulate the ajax request using requests or urllib2, and analyze the response json to obtain the required data.

It is great to be able to directly simulate an ajax request to obtain data, but some websites encrypt all parameters of the ajax request. We simply have no way to construct a request for the data we need. In this case, selenium phantomJS is used to call the browser kernel, and phantomJS is used to execute js to simulate human operations and trigger js scripts in the page. From filling in the form to clicking the button to scrolling the page, everything can be simulated, regardless of the specific request and response process. It just completely simulates the process of people browsing the page to obtain data.

Using this framework can almost bypass most anti-crawlers, because it is not pretending to be a browser to obtain data (the above-mentioned addition of Headers is to a certain extent to pretend to be a browser), it is a browser itself , phantomJS is a browser without an interface, but the browser is not controlled by humans. Selenium phantomJS can do many things, such as identifying touch (12306) or sliding verification codes, brute force cracking of page forms, etc.

9. What problems does distributed crawler mainly solve?

1)ip

2)bandwidth

3)cpu

4)io

10. How to deal with the verification code during the crawler process?

1.Scrapy comes with

2. Paid interface

Related learning recommendations: python video tutorial

The above is the detailed content of Super detailed python common interview questions. For more information, please follow other related articles on the PHP Chinese website!