Introduction and in-depth understanding of HTTP protocol-Linux Operation and Maintenance-php.cn

Home

Operation and Maintenance

Linux Operation and Maintenance

Introduction and in-depth understanding of HTTP protocol

巴扎黑

Aug 23, 2017 pm 03:56 PM

httpapplicationSummarize

Summarizes my understanding of some content related to http protocol that I encountered in actual work scenarios.

Request & Response

Request format

For example: GET /api/index.json HTTP/1.1

For example: Accept: */*; User-Agent: Mozilla/4.0;……

[] For example: id=1×tamp=xxxxxx

Response format

For example: HTTP/1.1 200 OK

For example: Content-Type: application/json;……

[] For example: {"id": 1,"username":"testuser"}

Status Code

There are nearly 60 http status codes. I mainly record some common status codes generated under abnormal circumstances. We will encounter it more or less in daily applications, which helps us understand and discover problems.

206 - Used when downloading with breakpoints. The client requested a part of the content and the server successfully returned this part of the content to it. This status is used at this time.

301 - Permanent jump, the original address no longer exists, and the url is pointed to another address. This is mainly related to search engines and affects the crawler's retrieval behavior.

302 - Temporary jump, the server will return a new URL to the client, and the client can continue to access this URL to obtain content.

304 - The resource has not changed and the client can use locally cached content, which is common for static content access.

413 - The request entity is too large. A common situation is to upload a large file, but exceed the server (such as nginx) limit. Or the request header or request body exceeds the settings of the back-end server (such as tomcat) (for example, there are too many cookies under the current domain name, exceeding the request header limit)

416 - Related to breakpoint resumption, client request The range exceeds the file size on the server.

500 - Internal server error and cannot return normal results. For example, the most common application throws a null pointer exception that is not handled.

502 - Gateway error. A common situation is that the reverse proxy backend server (such as resin or tomcat) is not started.

503 - Service unavailable. For example, the server load is too high or the server has stopped serving.

504 - Gateway timeout. For example, the request duration exceeds the server's response time limit.

　Headers

HTTP headers are divided into two categories: request header (Request Header) and response header (Response Header). The following are some headers we often use.

　1. Cache control

In Internet website applications, caches are almost everywhere. In http-based services, we can also control Some content that does not change frequently is cached on the client side, so that the cached content can be reused in multiple visits, speeding up access, and improving user experience. The http protocol stipulates some http message headers for cache control:

Cache-Control(HTTP/1.1)/Pragma(HTTP/1.0): Indicates whether the client caches and how long the cache time is long. The default value is private, which means the content is cached in the user's private space. For example: Cache-Control: max-age=86400, must-revalidate, this tells the client that the requested resource is cached for one day (max-age unit is seconds, relative time), and must be re-checked after expiration.

Expires: Specify how long the client (if no forced refresh is required) can directly read the local cache without sending a request to the server.

Note:

Priority: Cache-Control > Expires;

Detailed parameter description: http://condor.depaul.edu/dmumaugh/readings/handouts/ SE435/HTTP/node24.html

The different behaviors of different browsers (refresh, back, enter in the address bar, etc.) may have differences in implementation;

Last-Modified/If-Modified -Since: Last-Modified is the last modified timestamp of the resource returned by the server to the client. In this way, the client will bring the If-Modified-Since parameter to verify whether the resource has been updated during the next request (such as forced refresh). No If updated, the server will return a 304 status code, and the client will directly access the locally cached resources. At this time, there is only request overhead and no network transmission overhead. Note: The timestamp must be Greenwich Mean Time (GMT), for example: Last-Modified:Sat, 19 Oct 2013 09:20:15 GMT

ETag/If-None-Match: ETag is based on file attributes The resource identifier generated through a certain algorithm is also used to determine whether the resource requested by the client has been updated. If the server returns an ETag value to the client, the next time the client requests it, it will bring the If-None-Match parameter to verify whether the resource is updated. If it is not updated, a 304 status code will be returned. (The effect is basically the same as Last-Modified)

Note:

ETag needs to be calculated, which is a consumption for servers with tight computing resources, so some websites do not use ETag directly;

If the server is behind a load balancer, requests for the same resource may be distributed to different backend machines. Since the calculation of ETag depends on file attributes, files with the same content on different machines may generate different ETags, which may Failed to pass ETag verification for files whose original content has not changed. There are two solutions here: one is that etag calculation does not depend on the local machine, such as directly calculating the md5 value of the file content; the other is to distribute the same URL request to the same back-end machine on the load balancer.

In our actual business scenarios, http caching has great uses. Here are some:

Make full use of the client’s resources, such as some static files that the client needs to access frequently. Such as LOGO, advertising images, etc., can be cached locally on the client. This can reduce network requests, speed up client display, and reduce the pressure on server requests.

When some of our static content, such as news, blogs, etc., are crawled by search engine crawlers, by controlling the cache parameters, we can reduce the crawler's crawling frequency and reduce unnecessary waste of resources.

If our static resources use CDN, then setting up http cache can save a file on the CDN node, reducing the number of CDN returns to the origin, reducing network delay and origin server pressure.

　2. Breakpoint request

Accept-Ranges: When the server supports breakpoint download, it will return this response header to the client. When the client knows this, it can send a breakpoint request. .

Content-Length: The length of the response information, telling the client how much data is returned by the current request. It should be noted here that when submitting a request using the head method, no specific data will be returned, but the Content-Length will return the size of the complete data.

Range/Content-Range: The client submits a header named Range when requesting, telling the server which part of the data it wants to request. For example: Range: bytes=0-1023 means requesting bytes 0 to 1023. Then the server returns the content of these 1024 bytes to the client, and Content-Range will be included in the response header. That is: Content-Range: bytes 0-1023/4096, this 4096 is the total file size. The client's next request can start from the 1024th byte, Range: bytes=1024-xxxx

　3. Encoding

Accept-Encoding/Content-Encoding: The former is supported by the client Received message encoding type. The default is identity, optional values include gzip, compress, etc. The latter is the content encoding type of the server-side response information, and compression is commonly used. The benefits of compression are obvious. It can greatly reduce the cost of network transmission. Compared with the CPU consumption caused by server-side compression, the reduction of network transmission is obviously more practical. Common forms: Content-Encoding: gzip, deflate, compress. Usually we can compress and transmit response results such as html, js, css, xml, and json.

Transfer-Encoding: response header. The transfer encoding type of the response message specifies the form of network transmission. Generally, it is in the following form: Transfer-Encoding: chunked. When the server generates dynamic content and does not know the specific length of the response information, it can transmit it in designated chunks and return as much data as it processes, so there is no need to wait until the data is ready and return it all at once. Combined with the above content encoding, such as gzip, it can be compressed in blocks and transmitted. In addition, please note that when using this encoding to transmit, we cannot see the Content-Length because the content has not been fully generated.

　4. Others

X-Forward-For: request header. Used to identify the user’s real IP, especially when accessing the server through a proxy (forward or reverse) or when the server is under load Equalize the situation behind the device. Format: X-forward-For: client, proxy1, proxy2,... The leftmost one is the IP closest to the client.

User-Agent: request header. The request header used by the server to identify the client's basic information. Generally, this is useful when identifying search crawlers. In some scenarios, this can also be used to do some client statistics.

Referer: request header. When the client accesses the server, this Referer specifies the source of the request, such as which website it is linked from. We often use this in some statistics. In addition, another important use is to filter illegal request sources in scenarios that require resource anti-hotlinking (however, this referer can be forged by the client).

Location: response header. This Location header will be included in the response header of the 301/302 status code to instruct the client to use the new address to access the required resources.

Connection: request/response header. In http/1.1, the client and server keep the connection by default, that is, Connection: keep-alive. If either party does not want to keep the connection, you can put this The value is set to close. By default, the client and server will maintain a long connection, so that the client can use this connection to send multiple http requests, reducing the consumption caused by frequent connection creation. For this parameter, more settings may be required on the server side, such as the connection keep-alive time and some network parameter settings of the server kernel (for tcp).

Session and Cookie

HTTP requests are stateless requests, but in our Internet applications, it is often necessary to identify user status information to complete some interactive operations. For example, user authentication needs to record user login status, and shopping cart applications need to remember user selections. Products, advertising applications need to record users’ historical browsing behavior, etc. Session and cookies will be used here.

session: refers to the interaction state between the client and the server during the http request-response process. This information is stored on the server side, such as memory, database, etc. Each session has a unique identifier, which is generated by the server. This identifier must also be saved on the client, so that the client can bring this identifier with the next request to facilitate the server to determine the client's status.

Client support for session:

Save the session id through cookie and send it to the server when requesting.

Communicate with the server by carrying the session id in the url parameters.

Communicate with the server by carrying the session id in the hidden field of the form.

Session sharing problem:

In distributed applications, our http server is usually installed behind a reverse proxy or load balancing device, which will face a session sharing problem. . That is to say, multiple requests from the same user may be distributed to multiple different machines. If we save the session in the local memory of the machine, we cannot share the user's session among multiple machines. Generally speaking, we can solve this problem in two ways:

Store the session in distributed memory (eg: memcached) or centralized storage (eg: database).

Distribute the requests of the same user to the same machine on the reverse proxy or load balancing device (here we need to deal with the problem of request redistribution after the machine goes down).

Cookie: Maintain stateful information on the client. Each cookie content belongs to a specific domain (domain) and path (path). For security reasons, cookies in different domains or paths cannot be shared.

Session cookie: No expiration time is specified, it is stored in memory and will expire after the browser is closed.

Persistent cookie: Specifies the expiration time and is saved locally in the browser.

For details, please refer to: http://en.wikipedia.org/wiki/HTTP_cookie

It should be noted that cookies will have some security issues.

Here I just summarized my understanding of some content related to the http protocol that I encountered at work. There are still many things that need to be explored in the http protocol, and we also need to continue to explore and understand the http protocol. It will bring great convenience to our development applications.

Finally, I recommend two very NB http debugging tools: fiddler (windows) and charles (mac) have http proxy function. For http applications that are not browser-based (such as mobile app), you can use these two A tool to monitor http requests.

The above is the detailed content of Introduction and in-depth understanding of HTTP protocol. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Linux: How to Enter Recovery Mode (and Maintenance)Apr 18, 2025 am 12:05 AM

The steps to enter Linux recovery mode are: 1. Restart the system and press the specific key to enter the GRUB menu; 2. Select the option with (recoverymode); 3. Select the operation in the recovery mode menu, such as fsck or root. Recovery mode allows you to start the system in single-user mode, perform file system checks and repairs, edit configuration files, and other operations to help solve system problems.

Linux's Essential Components: Explained for BeginnersApr 17, 2025 am 12:08 AM

The core components of Linux include the kernel, file system, shell and common tools. 1. The kernel manages hardware resources and provides basic services. 2. The file system organizes and stores data. 3. Shell is the interface for users to interact with the system. 4. Common tools help complete daily tasks.

Linux: A Look at Its Fundamental StructureApr 16, 2025 am 12:01 AM

The basic structure of Linux includes the kernel, file system, and shell. 1) Kernel management hardware resources and use uname-r to view the version. 2) The EXT4 file system supports large files and logs and is created using mkfs.ext4. 3) Shell provides command line interaction such as Bash, and lists files using ls-l.

Linux Operations: System Administration and MaintenanceApr 15, 2025 am 12:10 AM

The key steps in Linux system management and maintenance include: 1) Master the basic knowledge, such as file system structure and user management; 2) Carry out system monitoring and resource management, use top, htop and other tools; 3) Use system logs to troubleshoot, use journalctl and other tools; 4) Write automated scripts and task scheduling, use cron tools; 5) implement security management and protection, configure firewalls through iptables; 6) Carry out performance optimization and best practices, adjust kernel parameters and develop good habits.

Understanding Linux's Maintenance Mode: The EssentialsApr 14, 2025 am 12:04 AM

Linux maintenance mode is entered by adding init=/bin/bash or single parameters at startup. 1. Enter maintenance mode: Edit the GRUB menu and add startup parameters. 2. Remount the file system to read and write mode: mount-oremount,rw/. 3. Repair the file system: Use the fsck command, such as fsck/dev/sda1. 4. Back up the data and operate with caution to avoid data loss.

How Debian improves Hadoop data processing speedApr 13, 2025 am 11:54 AM

This article discusses how to improve Hadoop data processing efficiency on Debian systems. Optimization strategies cover hardware upgrades, operating system parameter adjustments, Hadoop configuration modifications, and the use of efficient algorithms and tools. 1. Hardware resource strengthening ensures that all nodes have consistent hardware configurations, especially paying attention to CPU, memory and network equipment performance. Choosing high-performance hardware components is essential to improve overall processing speed. 2. Operating system tunes file descriptors and network connections: Modify the /etc/security/limits.conf file to increase the upper limit of file descriptors and network connections allowed to be opened at the same time by the system. JVM parameter adjustment: Adjust in hadoop-env.sh file

How to learn Debian syslogApr 13, 2025 am 11:51 AM

This guide will guide you to learn how to use Syslog in Debian systems. Syslog is a key service in Linux systems for logging system and application log messages. It helps administrators monitor and analyze system activity to quickly identify and resolve problems. 1. Basic knowledge of Syslog The core functions of Syslog include: centrally collecting and managing log messages; supporting multiple log output formats and target locations (such as files or networks); providing real-time log viewing and filtering functions. 2. Install and configure Syslog (using Rsyslog) The Debian system uses Rsyslog by default. You can install it with the following command: sudoaptupdatesud

How to choose Hadoop version in DebianApr 13, 2025 am 11:48 AM

When choosing a Hadoop version suitable for Debian system, the following key factors need to be considered: 1. Stability and long-term support: For users who pursue stability and security, it is recommended to choose a Debian stable version, such as Debian11 (Bullseye). This version has been fully tested and has a support cycle of up to five years, which can ensure the stable operation of the system. 2. Package update speed: If you need to use the latest Hadoop features and features, you can consider Debian's unstable version (Sid). However, it should be noted that unstable versions may have compatibility issues and stability risks. 3. Community support and resources: Debian has huge community support, which can provide rich documentation and

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Will R.E.P.O. Have Crossplay?

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Atom editor mac version download

The most popular open source editor

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.