Home >headlines >15 real Shopee server interview questions, can you answer them all correctly? (with analysis)

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)

青灯夜游
青灯夜游forward
2022-02-21 11:40:264689browse

Read more about the company's interview materials before the interview, which will be very helpful for subsequent interviews. Today I will share with you 15 real Shopee server interview questions (with answer analysis). Come and see what your level is. I hope it can help you!

1. Sort the linked list

Given you the head node of the linked list, please sort it in ascending order and return the sorted linked list.

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)

Example 1:

输入:head = [4,2,1,3]
输出:[1,2,3,4]

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)

Example 2:

输入:head = [-1,5,3,4,0]
输出:[-1,0,3,4,5]

This question can use double The pointer merge sorting algorithm is solved by the following four steps

 1. Fast and slow pointer method, traverse the linked list to find the intermediate node

 2. The intermediate node cuts off the linked list

 3. Use respectively Merge sort left and right sub-linked lists

 4. Merge sub-linked lists

The complete code is as follows:

class Solution {
    public ListNode sortList(ListNode head) {
        //如果链表为空,或者只有一个节点,直接返回即可,不用排序
        if (head == null || head.next == null)
            return head;
        
        //快慢指针移动,以寻找到中间节点
        ListNode slow = head;
        ListNode fast = head;
        while(fast.next!=null && fast.next.next !=null){
          fast = fast.next.next;
          slow = slow.next;
        }
        //找到中间节点,slow节点的next指针,指向mid
        ListNode mid = slow.next;
        //切断链表
        slow.next = null;
        
        //排序左子链表
        ListNode left = sortList(head);
        //排序左子链表
        ListNode right = sortList(mid);
        
        //合并链表
        return merge(left,right);
    }
    
    public ListNode merge(ListNode left, ListNode right) {
       ListNode head = new ListNode(0);
       ListNode temp = head;
       while (left != null && right != null) {
           if (left.val <= right.val) {
                temp.next = left;
                left = left.next;
            } else {
                temp.next = right;
                right = right.next;
            }
            temp = temp.next;
        }
        if (left != null) {
            temp.next = left;
        } else if (right != null) {
            temp.next = right;
        }
        return head.next;
    }
}

2. The difference between symmetric and asymmetric encryption algorithms

Let’s review the relevant concepts first:

  • Clear text: refers to information/data that has not been encrypted.

  • Ciphertext: After the plaintext is encrypted by the encryption algorithm, it will become ciphertext to ensure data security.

  • Key: is a parameter that is entered in the algorithm that converts plaintext to ciphertext or ciphertext to plaintext. Keys are divided into symmetric keys and asymmetric keys.

  • Encryption: The process of turning plaintext into ciphertext.

  • Decryption: The process of restoring ciphertext to plaintext.

Symmetric encryption algorithm: An encryption algorithm that uses the same key for encryption and decryption. Common symmetric encryption algorithms include AES, 3DES, DES, RC5, RC6, etc.

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)

Asymmetric encryption algorithm: Asymmetric encryption algorithm requires two keys (public key and private key). Public keys and private keys exist in pairs. If the public key is used to encrypt data, only the corresponding private key can decrypt it. The main asymmetric encryption algorithms are: RSA, Elgamal, DSA, D-H, ECC.

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)

3. How does TCP ensure reliability

  • First of all, TCP connection is based on three-way handshake. And disconnection is four waves. Ensure reliable connection and disconnection.

  • Secondly, the reliability of TCP is also reflected in the state; TCP will record which data is sent, which data is accepted, and which is not accepted, and ensures that the data packets are in order Arrival to ensure that there are no errors in data transmission.

  • Thirdly, the reliability of TCP is also reflected in its controllability. It has mechanisms such as message verification, ACK response, timeout retransmission (sender), out-of-sequence data retransmission (receiver), discarding duplicate data, flow control (sliding window) and congestion control.

4. Let’s talk about the five IO models

##4.1 Blocking IO model

Assume that the application process initiates an IO call, but if the kernel data is not ready yet, the application process will be blocked and waiting until the kernel data is ready and copied from the kernel to the user space. Returning a successful prompt, this IO operation is called blocking IO.

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)

4.2 Non-blocking IO model

If the kernel data is not ready yet, you can return an error first The information is given to the user process so that it does not need to wait, but requests again through polling. This is non-blocking IO. The flow chart is as follows:

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)

##4.3 IO multiplexing model

IO multiplexing select

The application process can monitor multiple fds at the same time by calling the select function. In the fd monitored by the select function, as long as any data status is ready , the select function will return to the readable state, and then the application process will initiate a recvfrom request to read the data.

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)select has several disadvantages:

    The maximum number of connections is limited, generally 1024 on Linux systems.
  • After the select function returns, the ready descriptor fd is found by traversing fdset.
IO multiplexing epoll

In order to solve the problems of select, the multiplexing model epoll was born, which is event-driven To implement, the flow chart is as follows:

epoll first registers an fd (file descriptor) through epoll_ctl(). Once a certain fd is ready, the kernel will use a callback mechanism to quickly activate the fd. When the process calls epoll_wait(), it will be notified. The tricky operation of traversing file descriptors is removed here, and a mechanism of listening for event callbacks is used. This is the highlight of epoll.

4.4 Signal-driven model of IO model

Signal-driven IO no longer uses active inquiry to confirm whether the data is ready, but to The kernel sends a signal (a SIGIO signal is established when calling sigaction), and then the application user process can do other things without blocking. When the kernel data is ready, the application process is notified through the SIGIO signal that the data is ready for readability. After the application user process receives the signal, it immediately calls recvfrom to read the data.

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)

4.5 IO model of asynchronous IO (AIO)

AIO realizes the non-linearization of the entire IO process Blocking means that after the application process issues a system call, it returns immediately, but what is returned immediately is not the processing result, but something like a successful submission. When the kernel data is ready, copy the data to the user process buffer, and send a signal to notify the user process that the IO operation is completed.

The process is as follows:

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)

5. Hystrix working principle

Hystrix workflow chart is as follows:

115 real Shopee server interview questions, can you answer them all correctly? (with analysis)

1. Build command

Hystrix provides two command objects: HystrixCommand and HystrixObservableCommand, which will represent one of your dependency request tasks and pass it into the constructor Parameters required to request dependencies.

2. Execute commands

There are four ways to execute Hystrix commands. They are:

  • R execute(): synchronous blocking execution, receiving a single response from the dependent request.

  • Future queue(): Asynchronous execution, returns a Future object containing a single response.

  • Observable observe(): After creating an Observable, it will subscribe to the Observable and return the Observable object representing the response from the dependent request

  • Observable toObservable() : cold observable, returns an Observable, Hystrix commands will only be executed when subscribing, and multiple results can be returned

3. Check whether the response is cached

If enabled Hystrix cache, before task execution, it will first determine whether there is a cache for executing the same command. If there is, the Observable containing the cached response will be returned directly; if there is no cached result but caching is enabled, the execution result will be cached for subsequent use.

4. Check whether the circuit breaker is open. The circuit breaker (circuit-breaker) is similar to a fuse. The fuse will burn out to protect the circuit when danger occurs, while the circuit breaker can open the circuit when it reaches the threshold we set. Trigger a short circuit (for example, the request failure rate reaches 50%) and refuse to execute any request.

If the looper is opened, Hystrix will not execute the command and directly enter the Fallback processing logic.

5. Check the thread pool/semaphore/queue situation. Hystrix isolation methods include thread pool isolation and semaphore isolation. When using the Hystrix thread pool, Hystrix allocates 10 threads to each dependent service by default. When all 10 threads are busy, the execution of the command will be refused, and instead it will immediately jump to the execution of fallback logic.

6. Perform specific tasks. Use HystrixObservableCommand.construct() or HystrixCommand.run() to run the user's real tasks.

7. Calculate the loop health. Every time a command starts to be executed, ends to execute a command, or an exception occurs, the execution status will be recorded, such as: success, failure, rejection, timeout and other indicators, and these will be processed regularly. data, and then determine whether to open the looper according to the set conditions.

8. Execute Fallback logic when the command fails. Execute the user-specified Fallback logic when the command fails. In the above figure, circuit breaking, thread pool rejection, semaphore rejection, execution execution, and execution timeout will all enter Fallback processing.

9. Return the execution result. The original object result will be returned in the form of Observable. Before being returned to the user, some processing will be done depending on the calling method.

6. Delay scenario processing

In daily development, we often encounter this kind of business scenario, such as: if a takeout order is not paid for more than 30 minutes, it will be automatically canceled. Order; 15 minutes after the user successfully registers, a text message will be sent to notify the user, etc. This is the delayed task processing scenario. We mainly have the following solutions for such scenarios:

  • JDK's DelayQueue delay queue

  • Time wheel algorithm

  • Database scheduled tasks (such as Quartz)

  • Redis ZSet implementation

  • MQ delay queue implementation

7.https request process

  • HTTPS = HTTP SSL/TLS, that is, using SSL/TLS to encrypt and decrypt data , HTTP for transmission.

  • SSL, or Secure Sockets Layer, is a security protocol that provides security and data integrity for network communications.

  • TLS, Transport Layer Security (Secure Transport Layer Protocol), is a subsequent version of SSL 3.0.

115 real Shopee server interview questions, can you answer them all correctly? (with analysis)
http request process

1. The user enters an https URL in the browser, and then connects to the 443 port of the server.

2. The server must have a set of digital certificates. It can be made by itself or applied to the organization. The difference is that the certificate issued by itself needs to be verified by the client. This set of certificates is actually a pair of public and private keys.

3. The server sends its own digital certificate (containing public key) to the client.

4. After the client receives the digital certificate from the server, it will check it. If it fails, a warning box will pop up. If the certificate is OK, a key is generated (symmetric encryption) and encrypted with the certificate's public key.

5. The client will initiate a second HTTP request in HTTPS and send the encrypted client key to the server.

6. After the server receives the ciphertext from the client, it will use its own private key to asymmetrically decrypt it. After decryption, it will obtain the client key, and then use the client key pair to return the data. Perform symmetric encryption so that the data becomes ciphertext.

7. The server returns the encrypted ciphertext to the client.

8. The client receives the ciphertext returned by the server, uses its own key (client key) to symmetrically decrypt it, and obtains the data returned by the server.

8. Let’s talk about the transaction isolation level and the implementation principle of repeatable read

##8.1 The four major isolation levels of the database

In order to solve the problems of

dirty reading, non-repeatable reading, and phantom reading in concurrent transactions, the database uncle designed four isolation levels. They are Read uncommitted, Read committed, Repeatable read, Serializable.

  • Read uncommitted isolation level: It only limits that two data cannot be modified at the same time. However, when the data is modified, even if the transaction is not committed, it can be read by other transactions. , this level of transaction isolation has problems with dirty reads, repeated reads, and phantom reads;

  • Read committed isolation level: The current transaction can only read data submitted by other transactions, so This transaction isolation level solves the problem of dirty reads, but there will still be problems of repeated reads and phantom reads;

  • Repeatable reads: When reading data is restricted, it cannot be performed Modification, so the problem of repeated reading is solved, but when reading range data, data can be inserted, so there will still be a phantom reading problem;

  • Serialization: the highest transaction Isolation level, at which all transactions are executed serially. It can avoid all concurrency problems of dirty reads, non-repeatable reads and phantom reads. However, under this transaction isolation level, transaction execution is very performance-intensive.

What are the concurrency problems among the four isolation levels?

##Serialization×××

8.2 Read View Visibility Rules

Isolation level Dirty read Non-repeatable read Phantom read
Read Uncommitted
Read Submitted ×
Repeatable read × ×
##VariablesDescriptionm_idsThe active (uncommitted) read and write transaction IDs in the current system, its data structure is a List. max_limit_idIndicates the id value that should be assigned to the next transaction in the system when a Read View is generated. min_limit_id Indicates the smallest transaction id among the active read and write transactions in the current system when the Read View is generated, that is, the minimum value in m_ids. creator_trx_idCreate the transaction ID of the current Read View

The visibility rules of Read View are as follows:

  • If the data transaction ID trx_id , it indicates that the transaction that generated this version is generating <code>Read Before View, it has been submitted (because the transaction ID is incremented), so this version can be accessed by the current transaction.

  • If trx_id>= max_limit_id, it indicates that the transaction that generated this version was generated after Read View was generated, so this version cannot be Current transaction access.

  • Ifmin_limit_id =<trx_id max_limit_id>, it needs to be discussed in 3 situations</trx_id>

1) If m_ids contains trx_id, it represents the time when Read View is generated. This transaction has not yet been submitted, but if the trx_id of the data is equal to creator_trx_id indicates that the data is generated by itself, so it is visible.

2) If m_ids contains trx_id, and trx_id is not equal to creator_trx_id, then Read ViewWhen generated, the transaction is not submitted and is not produced by itself, so the current transaction is also invisible;

3) If m_ids does not contain trx_id, then It means that your transaction has been submitted before Read View is generated, and the result of the modification can be seen by the current transaction.

8.3 Principle of Repeatable Read Implementation

The database achieves isolation level through locking. For example, if you want to be alone, To be quiet and not be disturbed by others, you can lock yourself in the house and add a lock on the door! The serialization isolation level is implemented by locking. But if you lock frequently, performance will decrease. So the uncle who designed the database thought of MVCC.

The implementation principle of repeatable reading is MVCC multi-version concurrency control. Within the scope of a transaction, two identical queries read the same record, but return different data. This is non-repeatable reading. The repeatable read isolation level is to solve the non-repeatable read problem.

Querying a record based on MVCC, what is the process?

  • Get the version number of the transaction, that is, the transaction ID

  • Get Read View

  • Query The obtained data is then compared with the transaction version number in Read View.

  • If the visibility rules of Read View are not met, the historical snapshot in Undo log is needed;

  • Finally return the data that meets the rules

InnoDB implements MVCC, which is implemented through Read View Undo Log, Undo Log saves historical snapshots,Read ViewVisibility rules help determine whether the current version of data is visible.

Repeatable Read (RR) isolation level, how to solve the problem of non-repeatable read?

Assuming that there are transactions A and B, the SQL execution process is as follows

115 real Shopee server interview questions, can you answer them all correctly? (with analysis)

Under the repeatable read (RR) isolation level, a transaction will only be obtained once read view are shared by the replicas, ensuring that the data in each query is the same.

Assume that there is currently a core_user table and insert an initialization data as follows:

115 real Shopee server interview questions, can you answer them all correctly? (with analysis)

##Based on MVCC, let’s take a look at the execution process

1. A opens a transaction and first gets a transaction ID of 100

2. B opens a transaction and gets a transaction ID of 101

3. Transaction A generates a Read View and the value corresponding to the read view as follows

##m_ids100,101max_limit_id102min_limit_id100 creator_trx_id100

然后回到版本链:开始从版本链中挑选可见的记录:

115 real Shopee server interview questions, can you answer them all correctly? (with analysis)

由图可以看出,最新版本的列name的内容是孙权,该版本的trx_id值为100。开始执行read view可见性规则校验:

min_limit_id(100)=<trx_id(100)<102;
creator_trx_id = trx_id =100;

由此可得,trx_id=100的这个记录,当前事务是可见的。所以查到是name为孙权的记录。

4、事务B进行修改操作,把名字改为曹操。把原数据拷贝到undo log,然后对数据进行修改,标记事务ID和上一个数据版本在undo log的地址。

115 real Shopee server interview questions, can you answer them all correctly? (with analysis)

5、事务B提交事务

6、事务A再次执行查询操作,因为是RR(可重复读)隔离级别,因此会复用老的Read View副本,Read View对应的值如下

Variable Value
##m_ids100,101max_limit_id102min_limit_id100 creator_trx_id100

然后再次回到版本链:从版本链中挑选可见的记录:

115 real Shopee server interview questions, can you answer them all correctly? (with analysis)

从图可得,最新版本的列name的内容是曹操,该版本的trx_id值为101。开始执行read view可见性规则校验:

min_limit_id(100)=<trx_id(101)<max_limit_id(102);

因为m_ids{100,101}包含trx_id(101),
并且creator_trx_id (100) 不等于trx_id(101)

所以,trx_id=101这个记录,对于当前事务是不可见的。这时候呢,版本链roll_pointer跳到下一个版本,trx_id=100这个记录,再次校验是否可见:

min_limit_id(100)=<trx_id(100)< max_limit_id(102);

因为m_ids{100,101}包含trx_id(100),
并且creator_trx_id (100) 等于trx_id(100)

所以,trx_id=100这个记录,对于当前事务是可见的,所以两次查询结果,都是name=孙权的那个记录。即在可重复读(RR)隔离级别下,复用老的Read View副本,解决了不可重复读的问题。

9. 聊聊索引在哪些场景下会失效?

1. 查询条件包含or,可能导致索引失效

2. 如何字段类型是字符串,where时一定用引号括起来,否则索引失效

3. like通配符可能导致索引失效。

4. 联合索引,查询时的条件列不是联合索引中的第一个列,索引失效。

5. 在索引列上使用mysql的内置函数,索引失效。

6. 对索引列运算(如,+、-、*、/),索引失效。

7. 索引字段上使用(!= 或者 ,not in)时,可能会导致索引失效。

8. 索引字段上使用is null, is not null,可能导致索引失效。

9. 左连接查询或者右连接查询查询关联的字段编码格式不一样,可能导致索引失效。

10. mysql估计使用全表扫描要比使用索引快,则不使用索引。

10. 什么是虚拟内存

虚拟内存,是虚拟出来的内存,它的核心思想就是确保每个程序拥有自己的地址空间,地址空间被分成多个块,每一块都有连续的地址空间。同时物理空间也分成多个块,块大小和虚拟地址空间的块大小一致,操作系统会自动将虚拟地址空间映射到物理地址空间,程序只需关注虚拟内存,请求的也是虚拟内存,真正使用却是物理内存。

现代操作系统使用虚拟内存,即虚拟地址取代物理地址,使用虚拟内存可以有2个好处:

  • 虚拟内存空间可以远远大于物理内存空间

  • 多个虚拟内存可以指向同一个物理地址

零拷贝实现思想,就利用了虚拟内存这个点:多个虚拟内存可以指向同一个物理地址,可以把内核空间和用户空间的虚拟地址映射到同一个物理地址,这样的话,就可以减少IO的数据拷贝次数啦,示意图如下:

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)

11. 排行榜的实现,比如高考成绩排序

排行版的实现,一般使用redis的zset数据类型。

使用格式如下:

zadd key score member [score member ...],zrank key member
  • 层内部编码:ziplist(压缩列表)、skiplist(跳跃表)

  • 使用场景如排行榜,社交需求(如用户点赞)

实现demo如下:

115 real Shopee server interview questions, can you answer them all correctly? (with analysis)

12.分布式锁实现

分布式锁,是控制分布式系统不同进程共同访问共享资源的一种锁的实现。秒杀下单、抢红包等等业务场景,都需要用到分布式锁,我们项目中经常使用Redis作为分布式锁。

选了Redis分布式锁的几种实现方法,大家来讨论下,看有没有啥问题哈。

  • 命令setnx + expire分开写

  • setnx + value值是过期时间

  • set的扩展命令(set ex px nx)

  • set ex px nx + 校验唯一随机值,再删除

  • Redisson

12.1 命令setnx + expire分开写

if(jedis.setnx(key,lock_value) == 1){ //加锁
    expire(key,100); //设置过期时间
    try {
        do something  //业务请求
    }catch(){
  }
  finally {
       jedis.del(key); //释放锁
    }
}

如果执行完setnx加锁,正要执行expire设置过期时间时,进程crash掉或者要重启维护了,那这个锁就“长生不老”了,别的线程永远获取不到锁啦,所以分布式锁不能这么实现。

12.2 setnx + value值是过期时间

long expires = System.currentTimeMillis() + expireTime; //系统时间+设置的过期时间
String expiresStr = String.valueOf(expires);

// 如果当前锁不存在,返回加锁成功
if (jedis.setnx(key, expiresStr) == 1) {
        return true;
} 
// 如果锁已经存在,获取锁的过期时间
String currentValueStr = jedis.get(key);

// 如果获取到的过期时间,小于系统当前时间,表示已经过期
if (currentValueStr != null && Long.parseLong(currentValueStr) < System.currentTimeMillis()) {

     // 锁已过期,获取上一个锁的过期时间,并设置现在锁的过期时间(不了解redis的getSet命令的小伙伴,可以去官网看下哈)
    String oldValueStr = jedis.getSet(key_resource_id, expiresStr);
    
    if (oldValueStr != null && oldValueStr.equals(currentValueStr)) {
         // 考虑多线程并发的情况,只有一个线程的设置值和当前值相同,它才可以加锁
         return true;
    }
}
        
//其他情况,均返回加锁失败
return false;
}

笔者看过有开发小伙伴就是这么实现分布式锁的,但是这种方案也有这些缺点:

  • 过期时间是客户端自己生成的,分布式环境下,每个客户端的时间必须同步。

  • 没有保存持有者的唯一标识,可能被别的客户端释放/解锁。

  • 锁过期的时候,并发多个客户端同时请求过来,都执行了jedis.getSet(),最终只能有一个客户端加锁成功,但是该客户端锁的过期时间,可能被别的客户端覆盖。

12.3 set的扩展命令(set ex px nx)(注意可能存在的问题)

if(jedis.set(key, lock_value, "NX", "EX", 100s) == 1){ //加锁
    try {
        do something  //业务处理
    }catch(){
  }
  finally {
       jedis.del(key); //释放锁
    }
}

这个方案可能存在这样的问题:

  • 锁过期释放了,业务还没执行完。

  • 锁被别的线程误删。

12.4 set ex px nx + 校验唯一随机值,再删除

if(jedis.set(key, uni_request_id, "NX", "EX", 100s) == 1){ //加锁
    try {
        do something  //业务处理
    }catch(){
  }
  finally {
       //判断是不是当前线程加的锁,是才释放
       if (uni_request_id.equals(jedis.get(key))) {
        jedis.del(key); //释放锁
        }
    }
}

在这里,判断当前线程加的锁和释放锁是不是一个原子操作。如果调用jedis.del()释放锁的时候,可能这把锁已经不属于当前客户端,会解除他人加的锁。

一般也是用lua脚本代替。lua脚本如下:

if redis.call(&#39;get&#39;,KEYS[1]) == ARGV[1] then 
   return redis.call(&#39;del&#39;,KEYS[1]) 
else
   return 0
end;

这种方式比较不错了,一般情况下,已经可以使用这种实现方式。但是存在锁过期释放了,业务还没执行完的问题(实际上,估算个业务处理的时间,一般没啥问题了)。

12.5 Redisson

分布式锁可能存在锁过期释放,业务没执行完的问题。有些小伙伴认为,稍微把锁过期时间设置长一些就可以啦。其实我们设想一下,是否可以给获得锁的线程,开启一个定时守护线程,每隔一段时间检查锁是否还存在,存在则对锁的过期时间延长,防止锁过期提前释放。

当前开源框架Redisson就解决了这个分布式锁问题。我们一起来看下Redisson底层原理是怎样的吧:

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)

只要线程一加锁成功,就会启动一个watch dog看门狗,它是一个后台线程,会每隔10秒检查一下,如果线程1还持有锁,那么就会不断的延长锁key的生存时间。因此,Redisson就是使用Redisson解决了锁过期释放,业务没执行完问题。

13. 零拷贝

零拷贝就是不需要将数据从一个存储区域复制到另一个存储区域。它是指在传统IO模型中,指CPU拷贝的次数为0。它是IO的优化方案

传统IO流程

  • 零拷贝实现之mmap+write

  • 零拷贝实现之sendfile

  • 零拷贝实现之带有DMA收集拷贝功能的sendfile

13.1 传统IO流程

流程图如下:

215 real Shopee server interview questions, can you answer them all correctly? (with analysis)

  • 用户应用进程调用read函数,向操作系统发起IO调用,上下文从用户态转为内核态(切换1)

  • DMA控制器把数据从磁盘中,读取到内核缓冲区。

  • CPU把内核缓冲区数据,拷贝到用户应用缓冲区,上下文从内核态转为用户态(切换2),read函数返回

  • 用户应用进程通过write函数,发起IO调用,上下文从用户态转为内核态(切换3)

  • CPU将应用缓冲区中的数据,拷贝到socket缓冲区

  • DMA控制器把数据从socket缓冲区,拷贝到网卡设备,上下文从内核态切换回用户态(切换4),write函数返回

从流程图可以看出,传统IO的读写流程,包括了4次上下文切换(4次用户态和内核态的切换),4次数据拷贝(两次CPU拷贝以及两次的DMA拷贝)。

13.2 mmap+write实现的零拷贝

mmap 的函数原型如下:

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
  • addr:指定映射的虚拟内存地址

  • length:映射的长度

  • prot:映射内存的保护模式

  • flags:指定映射的类型

  • fd:进行映射的文件句柄

  • offset:文件偏移量

  • mmap使用了虚拟内存,可以把内核空间和用户空间的虚拟地址映射到同一个物理地址,从而减少数据拷贝次数!

mmap+write实现的零拷贝流程如下:

215 real Shopee server interview questions, can you answer them all correctly? (with analysis)

  • 用户进程通过mmap方法向操作系统内核发起IO调用,上下文从用户态切换为内核态。

  • CPU利用DMA控制器,把数据从硬盘中拷贝到内核缓冲区。

  • 上下文从内核态切换回用户态,mmap方法返回。

  • 用户进程通过write方法向操作系统内核发起IO调用,上下文从用户态切换为内核态。

  • CPU将内核缓冲区的数据拷贝到的socket缓冲区。

  • CPU利用DMA控制器,把数据从socket缓冲区拷贝到网卡,上下文从内核态切换回用户态,write调用返回。

可以发现,mmap+write实现的零拷贝,I/O发生了4次用户空间与内核空间的上下文切换,以及3次数据拷贝。其中3次数据拷贝中,包括了2次DMA拷贝和1次CPU拷贝。

mmap是将读缓冲区的地址和用户缓冲区的地址进行映射,内核缓冲区和应用缓冲区共享,所以节省了一次CPU拷贝‘’并且用户进程内存是虚拟的,只是映射到内核的读缓冲区,可以节省一半的内存空间。

sendfile实现的零拷贝

sendfile是Linux2.1内核版本后引入的一个系统调用函数,API如下:

ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);
  • out_fd:为待写入内容的文件描述符,一个socket描述符。,

  • in_fd:为待读出内容的文件描述符,必须是真实的文件,不能是socket和管道。

  • offset:指定从读入文件的哪个位置开始读,如果为NULL,表示文件的默认起始位置。

  • count:指定在fdout和fdin之间传输的字节数。

  • sendfile表示在两个文件描述符之间传输数据,它是在操作系统内核中操作的,避免了数据从内核缓冲区和用户缓冲区之间的拷贝操作,因此可以使用它来实现零拷贝。

sendfile实现的零拷贝流程如下:

215 real Shopee server interview questions, can you answer them all correctly? (with analysis)
sendfile实现的零拷贝

  • 用户进程发起sendfile系统调用,上下文(切换1)从用户态转向内核态

  • DMA控制器,把数据从硬盘中拷贝到内核缓冲区。

  • CPU将读缓冲区中数据拷贝到socket缓冲区

  • DMA控制器,异步把数据从socket缓冲区拷贝到网卡,

  • 上下文(切换2)从内核态切换回用户态,sendfile调用返回。

可以发现,sendfile实现的零拷贝,I/O发生了2次用户空间与内核空间的上下文切换,以及3次数据拷贝。其中3次数据拷贝中,包括了2次DMA拷贝和1次CPU拷贝。那能不能把CPU拷贝的次数减少到0次呢?有的,即带有DMA收集拷贝功能的sendfile

sendfile+DMA scatter/gather实现的零拷贝

linux 2.4版本之后,对sendfile做了优化升级,引入SG-DMA技术,其实就是对DMA拷贝加入了scatter/gather操作,它可以直接从内核空间缓冲区中将数据读取到网卡。使用这个特点搞零拷贝,即还可以多省去一次CPU拷贝。

sendfile+DMA scatter/gather实现的零拷贝流程如下:

215 real Shopee server interview questions, can you answer them all correctly? (with analysis)

  • 用户进程发起sendfile系统调用,上下文(切换1)从用户态转向内核态

  • DMA控制器,把数据从硬盘中拷贝到内核缓冲区。

  • CPU把内核缓冲区中的文件描述符信息(包括内核缓冲区的内存地址和偏移量)发送到socket缓冲区

  • DMA控制器根据文件描述符信息,直接把数据从内核缓冲区拷贝到网卡

  • 上下文(切换2)从内核态切换回用户态,sendfile调用返回。

可以发现,sendfile+DMA scatter/gather实现的零拷贝,I/O发生了2次用户空间与内核空间的上下文切换,以及2次数据拷贝。其中2次数据拷贝都是包DMA拷贝。这就是真正的 零拷贝(Zero-copy) 技术,全程都没有通过CPU来搬运数据,所有的数据都是通过DMA来进行传输的。

14. synchronized

synchronized是Java中的关键字,是一种同步锁。synchronized关键字可以作用于方法或者代码块。

一般面试时。可以这么回答:

  • 反编译后,monitorenter、monitorexit、ACC_SYNCHRONIZED

  • monitor监视器

  • Java Monitor 的工作机理

  • 对象与monitor关联

14.1 monitorenter、monitorexit、ACC_SYNCHRONIZED

如果synchronized作用于代码块,反编译可以看到两个指令:monitorenter、monitorexit,JVM使用monitorenter和monitorexit两个指令实现同步;如果作用synchronized作用于方法,反编译可以看到ACCSYNCHRONIZED标记,JVM通过在方法访问标识符(flags)中加入ACCSYNCHRONIZED来实现同步功能。

  • 同步代码块是通过monitorenter和monitorexit来实现,当线程执行到monitorenter的时候要先获得monitor锁,才能执行后面的方法。当线程执行到monitorexit的时候则要释放锁。

  • 同步方法是通过中设置ACCSYNCHRONIZED标志来实现,当线程执行有ACCSYNCHRONIZED标志的方法,需要获得monitor锁。每个对象都与一个monitor相关联,线程可以占有或者释放monitor。

14.2 monitor监视器

monitor是什么呢?操作系统的管程(monitors)是概念原理,ObjectMonitor是它的原理实现。

215 real Shopee server interview questions, can you answer them all correctly? (with analysis)

在Java虚拟机(HotSpot)中,Monitor(管程)是由ObjectMonitor实现的,其主要数据结构如下:

 ObjectMonitor() {
    _header       = NULL;
    _count        = 0; // 记录个数
    _waiters      = 0,
    _recursions   = 0;
    _object       = NULL;
    _owner        = NULL;
    _WaitSet      = NULL;  // 处于wait状态的线程,会被加入到_WaitSet
    _WaitSetLock  = 0 ;
    _Responsible  = NULL ;
    _succ         = NULL ;
    _cxq          = NULL ;
    FreeNext      = NULL ;
    _EntryList    = NULL ;  // 处于等待锁block状态的线程,会被加入到该列表
    _SpinFreq     = 0 ;
    _SpinClock    = 0 ;
    OwnerIsThread = 0 ;
  }

ObjectMonitor中几个关键字段的含义如图所示:

215 real Shopee server interview questions, can you answer them all correctly? (with analysis)

14.3 Java Monitor 的工作机理

215 real Shopee server interview questions, can you answer them all correctly? (with analysis)

  • 想要获取monitor的线程,首先会进入_EntryList队列。

  • 当某个线程获取到对象的monitor后,进入Owner区域,设置为当前线程,同时计数器count加1。

  • 如果线程调用了wait()方法,则会进入WaitSet队列。它会释放monitor锁,即将owner赋值为null,count自减1,进入WaitSet队列阻塞等待。

  • 如果其他线程调用 notify() / notifyAll() ,会唤醒WaitSet中的某个线程,该线程再次尝试获取monitor锁,成功即进入Owner区域。

  • 同步方法执行完毕了,线程退出临界区,会将monitor的owner设为null,并释放监视锁。

14.4 对象与monitor关联

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)

  • 在HotSpot虚拟机中,对象在内存中存储的布局可以分为3块区域:对象头(Header),实例数据(Instance Data)和对象填充(Padding)。

  • 对象头主要包括两部分数据:Mark Word(标记字段)、Class Pointer(类型指针)。

Mark Word 是用于存储对象自身的运行时数据,如哈希码(HashCode)、GC分代年龄、锁状态标志、线程持有的锁、偏向线程 ID、偏向时间戳等。

215 real Shopee server interview questions, can you answer them all correctly? (with analysis)

重量级锁,指向互斥量的指针。其实synchronized是重量级锁,也就是说Synchronized的对象锁,Mark Word锁标识位为10,其中指针指向的是Monitor对象的起始地址。

15. 分布式id生成方案有哪些?什么是雪花算法?

分布式id生成方案主要有:

  • UUID

  • 数据库自增ID

  • 基于雪花算法(Snowflake)实现

  • 百度 (Uidgenerator)

  • 美团(Leaf)

什么是雪花算法?

雪花算法是一种生成分布式全局唯一ID的算法,生成的ID称为Snowflake IDs。这种算法由Twitter创建,并用于推文的ID。

一个Snowflake ID有64位。

  • 第1位:Java中long的最高位是符号位代表正负,正数是0,负数是1,一般生成ID都为正数,所以默认为0。

  • 接下来前41位是时间戳,表示了自选定的时期以来的毫秒数。

  • 接下来的10位代表计算机ID,防止冲突。

  • 其余12位代表每台机器上生成ID的序列号,这允许在同一毫秒内创建多个Snowflake ID。

15 real Shopee server interview questions, can you answer them all correctly? (with analysis)
雪花算法

最后PHP中文网祝大家找到一份满意的工作!!!

【面试题专题】

Front-end: [Front-end interview questions][js interview questions][ vue interview questions][ajax interview questions][ react interview questions

Database:【mysql interview questions】【redis interview questions】[oracle interview questions

Backend:【PHP interview questions】【thinkphp interview questions】【python interview questions】【java interview questions 】【android interview questions

Variable Value
Statement:
This article is reproduced at:微信公众号. If there is any infringement, please contact admin@php.cn delete