When we don’t know how many parameters to pass to the function, for example, we pass a list or tuple, we use *args:
def func(*args): for i in args: print(i) func(3,2,1,4,7) 3 2 1 4 7
When we don’t know how many keyword arguments to pass, use **kwargs to collect keyword arguments:
def func(**kwargs): for i in kwargs: print(i,kwargs[i]) func(a=1,b=2,c=7) a.1 b.2 c.7
Use the command os.remove(filename) or os.unlink(filename)
You can access modules written in Python in C via:
Module = = PyImport_ImportModule(“<modulename>”)
It is a Floor Division operator, used to divide two operands, the result is the quotient, and only the numbers before the decimal point are displayed.
For example, 10 // 5 = 2 and 10.0 // 5.0 = 2.0.
Leading spaces in a string are spaces that appear before the first non-space character in the string.
We use the method Istrip() to remove it from the string.
’ Data123 '.lstrip()
Result:
'Data123 ’
The initial string contains both leading characters and suffix characters , calling Istrip() removes leading spaces. If we want to remove suffix spaces, we can use the rstrip() method.
'Data123 '.rstrip() 'Data123'
a,b = 0, 1 while b<100: print (b) a, b = b, a+b
If the string contains only numeric characters, you can use the function int() to convert it to an integer.
int(‘22’)
Let’s check the variable type:
type('22') <class'str'> type(int('22')) <class'int'>
To generate random numbers, we can import the function random() from the random module.
from random import random random() 0.013501571090371978
We can also use the function randint(), which uses two parameters to represent an interval and returns a random integer within the interval.
from random import randint randint(2,7) 4
The simplest way is to use the capitalize() method.
'daxie'.capitalize() 'Daxie'
For this problem, we can use the isalnum() method.
'DATA123'.isalnum() True 'DATA123!'.isalnum() False
We can also use some other methods:
'123'.isdigit()#检测字符串是否只由数字组成 True '123'.isnumeric()#只针对unicode对象 True 'data'.islower()#是否都为小写 True 'Data'.isupper()#是否都为大写 False
Connection in Python is to connect two sequences together. We use operators to complete:
'22'+'33' ‘2233' [1,2,3]+[4,5,6] [1, 2,3, 4, 5, 6] (2,3)+(4) TypeError Traceback (most recent call last) <ipython-input-7-69a1660f2fc5> in <module> ----> 1 (2,3)+(4) TypeError: can only concatenate tuple (not "int") to tuple
There is an error running here because (4) is regarded as an integer. Modify it and then Rerun:
(2,3)+(4,) (2, 3,4)
When a function calls itself directly or indirectly during its call, it is recursive. But in order to avoid an infinite loop, there must be an end condition. For example:
def facto(n): if n==1: return 1 return n*facto(n-1) facto(5) 120
The generator will generate a series of values for iteration, so it is an iterable object.
It continuously calculates the next element during the for loop and ends the for loop under appropriate conditions.
We define a function that can "yield" values one by one, and then use a for loop to iterate over it.
def squares(n): i=1 while(i<=n): yield i**2 i+=1 for i in squares(5): print(i) 1 4 9 16 25
Iterator is a way to access the elements of a collection.
The iterator object starts accessing from the first element of the collection until all elements have been accessed.
The iterator can only go forward and not backward. We create iterators using the inter() function.
odds=iter([1,2,3,4,5]) #每次想获取一个对象时,我们就调用next()函数 next (odds) 1 next (odds) 2 next (odds) 3 next (odds) 4 next (odds) 5
1) When using a generator, we create a function; when using an iterator, we use a built-in function iter() and next();
2) In the generator, we use the keyword ‘yield’ to generate/return an object each time;
3) There are How many "yield" statements you can customize;
The generator will save the local variable state each time the loop is paused. Iterators only need an iterable object to iterate, and there is no need to use local variables
5) You can implement your own iterator using classes, but you cannot implement generators;
6) The generator runs fast, has concise syntax, and is simpler;
7) Iterators can save memory.
Newbies to Python may not be very familiar with this function. zip() can return an iterator of tuples.
list(zip([‘a’,‘b’,‘c’],[1,2,3]))
[(‘a’,1 ), (‘b’, 2), (‘c’, 3)]
Here the zip() function pairs the data items in the two lists and creates a tuple from them .
We can use the function/method getcwd() to import it from the module os.
import os
os.getcwd()
‘C:\Users\37410\Desktop\code’
This is also relatively simple, just call the function len() on the string we want to calculate the length.
len(‘Data 123’)
8
Remove and return the last object or obj from the list.
list.pop(obj = list [-1])
有时,当我们想要遍历列表时,一些方法会派上用场。
1)filter()
过滤器允许我们根据条件逻辑过滤一些值。
list(filter(lambda x:x> 5,range(8))) [6,7] 2)map()
Map将函数应用于iterable中的每个元素。
list(map(lambda x:x ** 2,range(8))) [0,1,4,9,16,25,36,49] 3)reduce()
在我们达到单个值之前,Reduce会反复减少序列顺序。
from functools import reduce reduce(lambda x,y:xy,[1,2,3,4,5]) -13
def list_sum(num_List):如果len(num_List)== 1: return num_List [0] else: return num_List [0] + list_sum(num_List [1:]) print(list_sum([3,4,5,6,11])) 29
import random def random_line(fname): lines = open(fname).read().splitlines() return random.choice(lines) print(random_line('test.txt'))
def file_lengthy(fname): open(fname)as f: for i,l in enumerate(f): pass return i + 1 print(“file of lines:”,file_lengthy(“test.txt”))
import os os.chdir('C:\Users\lifei\Desktop') with open('Today.txt') as today: count=0 for i in today.read(): if i.isupper(): count+=1 print(count)
以下代码可用于在Python中对列表进行排序:
list = ["1", "4", "0", "6", "9"] list = [int(i) for i in list] list.sort() print (list) Django有关
对于Django框架遵循MVC设计,并且有一个专有名词:MVT,
M全拼为Model,与MVC中的M功能相同,负责数据处理,内嵌了ORM框架;
V全拼为View,与MVC中的C功能相同,接收HttpRequest,业务处理,返回HttpResponse;
T全拼为Template,与MVC中的V功能相同,负责封装构造要返回的html,内嵌了模板引擎
Flask是一个“微框架”,主要用于具有更简单要求的小型应用程序。
Pyramid适用于大型应用程序,具有灵活性,允许开发人员为他们的项目使用数据库,URL结构,模板样式等正确的工具。
Django也可以像Pyramid一样用于更大的应用程序。它包括一个ORM。
Django架构
开发人员提供模型,视图和模板,然后将其映射到URL,Django可以为用户提供服务。
Django使用SQLite作为默认数据库,它将数据作为单个文件存储在文件系统中。
如过你有数据库服务器-PostgreSQL,MySQL,Oracle,MSSQL-并且想要使用它而不是SQLite,那么使用数据库的管理工具为你的Django项目创建一个新的数据库。
无论哪种方式,在您的(空)数据库到位的情况下,剩下的就是告诉Django如何使用它。
这是项目的settings.py文件的来源。
我们将以下代码行添加到setting.py文件中:
DATABASES ={‘default’: {‘ENGINE’: ‘django.db.backends.sqlite3’, ‘NAME’: os.path.join(BASE_DIR, ‘db.sqlite3’),
这是我们在Django中使用write一个视图的方法:
from django.http import HttpResponse import datetime def Current_datetime(request): now =datetime.datetime.now() html ="<html><body>It is now %s</body></html>"%now return HttpResponse(html)
返回当前日期和时间,作为HTML文档。
模板是一个简单的文本文件。
它可以创建任何基于文本的格式,如XML,CSV,HTML等。
模板包含在评估模板时替换为值的变量和控制模板逻辑的标记(%tag%)。
Django提供的会话允许您基于每个站点访问者存储和检索数据。
Django通过在客户端放置会话ID cookie并在服务器端存储所有相关数据来抽象发送和接收cookie的过程。
所以数据本身并不存储在客户端。
从安全角度来看,这很好。
在Django中,有三种可能的继承样式:
抽象基类:当你只希望父类包含而你不想为每个子模型键入的信息时使用;
多表继承:对现有模型进行子类化,并且需要每个模型都有自己的数据库表。
代理模型:只想修改模型的Python级别行为,而无需更改模型的字段。
数据分析
map函数执行作为第一个参数给出的函数,该函数作为第二个参数给出的iterable的所有元素。
如果给定的函数接受多于1个参数,则给出了许多迭代。
我们可以使用下面的代码在NumPy数组中获得N个最大值的索引:
importnumpy as np arr =np.array([1, 3, 2, 4, 5]) print(arr.argsort()[-3:][::-1]) 4 3 1
Q86.如何用Python/ NumPy计算百分位数?
importnumpy as np a =np.array([1,2,3,4,5] p =np.percentile(a, 50) #Returns 50th percentile, e.g. median print(p) 3
1)Python的列表是高效的通用容器。
它们支持(相当)有效的插入,删除,追加和连接,Python的列表推导使它们易于构造和操作。
2)有一定的局限性
它们不支持元素化加法和乘法等“向量化”操作,可以包含不同类型的对象这一事实意味着Python必须存储每个元素的类型信息,并且必须在操作时执行类型调度代码在每个元素上。
3)NumPy不仅效率更高,也更方便
You get a lot of vector and matrix operations, which can sometimes avoid unnecessary work.
4) NumPy arrays are faster
You can use NumPy, FFT, convolution, fast search, basic statistics, linear algebra, histograms, etc. built-in.
Decorators in Python are used to modify or inject code in functions or classes.
Using decorators, you can wrap a class or function method call so that a section of code is executed before or after the original code is executed.
Decorators can be used to check permissions, modify or track parameters passed to methods, log calls to specific methods, etc.
In an ideal world, NumPy would only contain the most basic array data types and operations, such as indexing, sorting, reshaping, and basic element functions.
2) All numerical code will reside in SciPy. Despite this, NumPy still maintains the goal of backward compatibility and strives to retain all features supported by its predecessor.
So, although more appropriately belonging to SciPy, NumPy still includes some linear algebra functions. Regardless, SciPy contains a more comprehensive version of the linear algebra module and many other numerical algorithms than any other.
If you use python for scientific calculations, it is recommended to install NumPy and SciPy. Most new features belong to SciPy rather than NumPy.
As with 2D plotting, 3D graphics are beyond the scope of NumPy and SciPy, but just like the 2D case, there are packages that integrate with NumPy.
Matplotlib provides basic 3D plotting in the mplot3d subpackage, while Mayavi uses the powerful VTK engine to provide a variety of high-quality 3D visualization functions.
Crawler and scary framework
Scrapy is a Python crawler framework that has extremely high crawling efficiency and is highly customizable, but it does not support distribution.
And scrapy-redis is a set of components based on the redis database and running on the scrapy framework, which allows scrapy to support distributed strategies. The Slaver side shares the item queue, request queue and request fingerprint in the Master side redis database. gather.
Because redis supports master-slave synchronization and data is cached in memory, distributed crawlers based on redis are very efficient in high-frequency reading of requests and data.
Python comes with: urllib, urllib2
Third party: requests
Framework: Scrapy
Both the urllib and urllib2 modules do operations related to request URLs , but they provide different functionality.
urllib2.: urllib2.urlopen can accept a Request object or url (when accepting a Request object, you can set the headers of a URL), urllib.urlopen only accepts a url
urllib has urlencode, urllib2 does not, so it is always urllib and urllib2 are often used together.
scrapy is an encapsulated framework. It includes a downloader, parser, log and exception handling, based on multi-threading ,.
The twisted method has advantages for crawling development of a fixed single website; however, for crawling 100 websites on multiple websites, it is not flexible enough in terms of concurrent and distributed processing, and it is inconvenient to adjust and expand.
request is an HTTP library. It is only used to make requests. For HTTP requests, it is a powerful library. Downloading and parsing are all handled by yourself. It has higher flexibility, high concurrency and distributed deployment. It is very flexible and can better implement functions.
There are two main engines, MyISAM and InnoDB. The main differences are as follows:
1) InnoDB supports transactions, but MyISAM does not. This is very important. Transaction is a high-level processing method. For example, in some column additions, deletions and modifications, as long as there is an error, you can roll back and restore it, but MyISAM cannot;
MyISAM is more suitable for query and insertion-based applications, while InnoDB is more suitable for applications that require frequent modifications and involve higher security
3)InnoDB supports foreign keys, but MyISAM does not;
4) MyISAM is the default engine, InnoDB needs to be specified;
5) InnoDB does not support FULLTEXT type index;
6) InnoDB does not save the number of rows in the table, such as select count(*) from table At this time, InnoDB;
needs to scan the entire table to calculate how many rows there are, but MyISAM only needs to simply read the number of saved rows.
Note that when the count(*) statement contains the where condition, MyISAM also needs to scan the entire table;
7) For self-increasing fields, InnoDB must contain an index of only that field , but in the MyISAM table, a joint index can be established together with other fields;
8) When clearing the entire table, InnoDB deletes rows one by one, and the efficiency is very slow. MyISAM will rebuild the table;
9)InnoDB supports row locks (in some cases, the entire table is locked, such as update table set a=1 whereuser like ‘%lee%’
Q94. Describe the mechanism of scrapy framework operation?
The scheduler hands the request in the request queue to the downloader to obtain the response resource corresponding to the request, and hands the response to the parsing method written by itself for extraction processing:
If the required data is extracted, Then hand it over to the pipeline file for processing;
2) If the url is extracted, continue to perform the previous steps (send the url request, and the engine will hand the request to the scheduler and put it into the queue...) until the request queue No request, the program ends.
Combine multiple tables for query, mainly including inner join, left join, right join, full join (outer join)
For IO-intensive code (file processing, web crawlers, etc.), multi-threading can effectively improve efficiency (if there are IO operations under a single thread, IO waiting will occur, causing unnecessary waste of time,
Enabling multi-threading can automatically switch to thread B while thread A is waiting, without wasting CPU resources, thereby improving program execution efficiency).
In the actual data collection process, you need to consider not only the network speed and response issues, but also the hardware conditions of your own machine to set up multi-process or multi-thread.
1) Optimize indexes, SQL statements, and analyze slow queries;
2) Optimize hardware; use SSD, use disk queue technology (RAID0, RAID1, RDID5), etc.;
3) Use MySQL’s own table partitioning technology to layer data into different files, which can improve disk reading efficiency;
4) Choose the appropriate table engine and optimize parameters;
5) Carry out architecture-level caching, staticization and distribution;
6) Use faster storage methods, such as NoSQL to store frequently accessed data
1)ip
#2)bandwidth
3)cpu
4)io
1) Scrapy comes with
2) Paid interface
1) Headers anti-crawler requested from users through Headers anti-crawler is the most common anti-crawler strategy.
You can add Headers directly to the crawler and copy the browser's User-Agent to the crawler's Headers; or modify the Referer value to the target website domain name.
2) Anti-crawler based on user behavior
By detecting user behavior, such as the same IP visiting the same page multiple times in a short period of time, or the same account performing the same operation multiple times in a short period of time.
Most websites are in the former situation. For this situation, using an IP proxy can solve it.
You can write a special crawler to crawl the public proxy IPs on the Internet, and save them all after detection.
After you have a large number of proxy IPs, you can change one IP every few requests. This is easy to do in requests or urllib2, so that you can easily bypass the first anti-crawler.
For the second case, you can randomly wait a few seconds after each request before making the next request.
Some websites with logical loopholes can bypass the restriction that the same account cannot make the same request multiple times in a short period of time by requesting several times, logging out, logging in again, and continuing to request.
3) Anti-crawler for dynamic pages
First use Fiddler to analyze the network request. If we can find the ajax request and analyze the specific parameters and the specific meaning of the response, we can Use the method above.
Use requests or urllib2 to simulate an ajax request and parse the response's JSON format to get the required data.
But some websites encrypt all the parameters of the ajax request and cannot construct a request for the data they need.
In this case, use selenium phantomJS to call the browser kernel, and use phantomJS to execute js to simulate human operations and trigger js scripts in the page.
The above is the detailed content of What are the frequently asked interview questions in Python?. For more information, please follow other related articles on the PHP Chinese website!