Home >Backend Development >PHP Tutorial >What is the difference between Http protocol and TCP protocol?
The TCP protocol corresponds to the transport layer, while the HTTP protocol corresponds to the application layer. In essence, the two are not comparable. The Http protocol is based on the TCP protocol. When the browser needs to obtain web page data from the server, it will issue an Http request. Http will establish a connection channel to the server through TCP. When the data required for this request is completed, Http will immediately disconnect the TCP connection. This process is very short. Therefore, the HTTP connection is a short connection and a stateless connection. The so-called stateless means that every time the browser initiates a request to the server, it does not go through a connection, but establishes a new connection every time. If it is a connection, the server process can maintain the connection and remember some information status in the memory. After each request ends, the connection is closed and the relevant content is released, so no state is remembered and it becomes a stateless connection.
As time goes by, the HTML page becomes more complex, and there may be many pictures embedded in it. At this time, it is inefficient to establish a TCP connection every time to access the picture. Therefore, Keep-Alive was proposed to solve the problem of low efficiency. Starting from HTTP/1.1, Keep-Alive is enabled by default to keep the connection feature. Simply put, when a web page is opened, the TCP connection used to transmit HTTP data between the client and the server will not be closed. If the client When you visit the webpage on this server again, you will continue to use this established connection. Keep-Alive does not maintain the connection permanently. It has a retention time, which can be set in different server software (such as Apache). Although the TCP connection is maintained for a period of time here, this time is limited and will still be closed at the time point, so we also regard it as closing after each connection is completed. Later, through Session,
Cookies and other related technologies can also maintain the status of some users. But it still uses one connection every time, and it is still a stateless connection.
There used to be a concept that I couldn’t tolerate being confused about. Is that why Http is a stateless short connection and TCP is a stateful long connection? Isn't HTTP based on TCP? Why can it still be a short connection? Now I understand that Http closes the TCP connection after each request is completed, so it is a short connection. When we use the TCP protocol directly through Socket programming, because we can control when to open and close the connection through the code area, as long as we do not close the connection through code, the connection will be in the process of the client and server. It always exists, and the relevant status data will always be saved.
There is a Socket in C#. In fact, the socket is an encapsulation of the TCP/IP protocol. The Socket itself is not a protocol, but a calling interface (API). The emergence of Socket only makes it easier for programmers to use the TCP/IP protocol stack. It is an abstraction of the TCP/IP protocol, thus forming some of the most basic function interfaces we know, such as create, listen, connect, accept, and send. , read and write, etc.
A more vivid description: HTTP is a car, which provides a specific form of encapsulating or displaying data; Socket is an engine, which provides the ability for network communication. From a C# programming perspective, for convenience, you can directly choose the already manufactured car Http to interact with the server. However, sometimes the TCP protocol must be used due to environmental factors or other customized requests. In this case, you need to use Socket programming and then process the obtained data yourself. It's like you built a truck using an existing engine and interacted with the server.
HTTP/1.0 and HTTP/1.1 both use TCP as the underlying transport protocol. The HTTP client first initiates the establishment of a TCP connection with the server. Once the connection is established, the browser process and server process can access TCP through their respective sockets. As mentioned before, the client socket is the "door" between the client process and the TCP connection, and the server side socket is the "door" between the server process and the same TCP connection. The client sends HTTP request messages to its own socket and receives HTTP response messages from its own socket. Similarly, the server receives HTTP request messages from its own socket and sends HTTP response messages to its own socket. Once the client or server sends a message to their respective sockets, the message falls completely under the control of TCP. TCP provides a reliable data transmission service to HTTP; this means that every HTTP request message sent by the client will eventually reach the server without loss, and every HTTP response message sent by the server will eventually reach the client without loss.
The C# code uses the TCP protocol to connect to the remote database. Every time a new connection is created, connection.open opens the TCP connection. The connection is closed when connection.Close is called. The bottom layer of FTP is also TCP, but it is a long-term connection. Transferring large files is faster. It depends on the specific scenario. On the server side, if the program adopts a long connection method, it can control the number of connections to the server at the same time to prevent multiple connections at the same time. However, if you use short connections, you cannot control the number of connections to the server at the same time. This is also an advantage, as it can handle a large number of connection requests at the same time. However, if the number of connection requests is too large, the server may stop working.
WebService does not require a connection. It can support at least tens of thousands/hundreds of thousands of requests in one second. Each request is then released, and there is no free memory consumption. Generally, there is no limit on the number of simultaneous connections, which is an advantage. Message Queue needs to establish a connection, and it is very difficult to support thousands of connections. Because each connection will occupy a certain amount of space in memory even if it is not requesting data. There will be restrictions, such as the SQL Server database server, which generally has a maximum of 16 simultaneous connections.
The Http protocol must pass the specified port, 80, so this port is not restricted on general computers, so the Http protocol can successfully pass through the firewalls on all machines. If you use Socket programming, you need to specify a specific port yourself. Then it is very likely that this port is disabled in a certain environment, so it will not be able to penetrate the firewall. IIS uses port 80, which means this program has been listening to this port. Once it finds that someone wants to establish a connection to this port, he will respond and then establish the connection. The connections mentioned here are all short connections. So your requests for URLs on the server are sent to the website program through port 80. The client browser then sends it through this port.
HTTP is an object-oriented protocol belonging to the application layer. Due to its simple and fast method, it is suitable for distributed hypermedia information systems. It was proposed in 1990 and has been continuously improved and expanded after several years of use and development. The sixth version of HTTP/1.0 is currently used in the WWW. The standardization work of HTTP/1.1 is in progress, and HTTP-NG (Next
Generation of HTTP) proposals have been made.
The main features of the HTTP protocol can be summarized as follows:
1. Support client/server mode.
2. Simple and fast: When a client requests a service from the server, it only needs to transmit the request method and path. Commonly used request methods are GET, HEAD, and POST. Each method specifies a different type of contact between the client and the server. Due to the simplicity of the HTTP protocol, the program size of the HTTP server is small and the communication speed is very fast.
3. Flexible: HTTP allows the transmission of any type of data object. The type being transferred is marked by Content-Type.
4. No connection: The meaning of no connection is to limit each connection to only process one request. After the server processes the client's request and receives the client's response, it disconnects. This method saves transmission time.
5. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory ability for transaction processing. The lack of status means that if subsequent processing requires the previous information, it must be retransmitted, which may result in an increase in the amount of data transferred per connection. On the other hand, the server responds faster when it does not need previous information.
1. Detailed explanation of HTTP protocol - URL ## http (Super Text Transfer Protocol) is a stateless, application-layer protocol based on the request and response model. It is often based on the TCP connection method. The HTTP 1.1 version provides a continuous connection mechanism. Most web development They are all web applications built on the HTTP protocol.
HTTP URL (URL is a special type of URI that contains enough information to find a resource) has the following format:
http://host[":" port][abs_path]
http means to locate network resources through the HTTP protocol; host means a legal Internet host domain name or IP address; port specifies a port number, and if it is empty, the default port 80 is used. ; abs_path specifies the URI of the requested resource; if abs_path is not given in the URL, then when it is used as the request URI, it must be given in the form of "/". Usually the browser automatically completes this task for us.
eg:
1. Enter: www.guet.edu.cn
The browser will automatically convert it to: http://www.guet.edu. cn/
2、http:192.168.0.116:8080/index.jsp
二、 HTTP protocol detailed explanation of requests
An http request consists of three parts, namely: request line, message header, and request body
1. The request line starts with a method symbol, separated by spaces, followed by the requested URI and protocol version. The format is as follows: Method Request-URI HTTP-Version CRLF
Method represents the request method; Request-URI is a Uniform Resource Identifier; HTTP-Version indicates the requested HTTP protocol version; CRLF indicates carriage return and line feed (except for the trailing CRLF, no separate CR or LF characters are allowed).
There are many request methods (all methods are in uppercase letters). The explanations of each method are as follows:
GET Request to obtain the resource identified by Request-URI
POST In Request -Append new data to the resource identified by URI
HEAD Request to obtain the response message header of the resource identified by Request-URI
PUT Request the server to store a resource and use Request-URI as its identifier
DELETE Request the server to delete the resource identified by Request-URI
TRACE Request the server to send back the received request information, mainly used for testing or diagnosis
CONNECT Reserved for future use
OPTIONS to request query server performance, or query and resources Related options and requirements
Application examples:
GET method: When accessing a webpage by entering the URL in the browser's address bar, the browser uses the GET method to obtain resources from the server, eg: GET /form.html HTTP /1.1 (CRLF)
The POST method requires the requested server to accept the data attached to the request, and is often used to submit forms.
eg:POST /reg.jsp HTTP/ (CRLF)
Accept:image/gif,image/x-xbit,... (CRLF)
...
HOST:www.guet .edu.cn (CRLF)
Content-Length:22 (CRLF)
Connection:Keep-Alive (CRLF)
Cache-Control:no-cache (CRLF)
(CRLF) // This CRLF indicates that the message header has ended, and before that it was the message header
user=jeffrey&pwd=1234 //The following line is the submitted data
HEAD method and GET method are almost the same Similarly, for the response part of the HEAD request, the information contained in its HTTP header is the same as the information obtained through the GET request. Using this method, information about the resource identified by the Request-URI can be obtained without transmitting the entire resource content. This method is often used to test the validity of a hyperlink, whether it is accessible, and whether it has been updated recently.
2. Request header description later
3. Request body (omitted)
3. Response chapter with detailed explanation of HTTP protocol
After receiving and interpreting the request message, the server returns an HTTP response message.
HTTP response also consists of three parts, namely: status line, message header, response body
1. The status line format is as follows:
HTTP-Version Status-Code Reason-Phrase CRLF
Among them, HTTP-Version represents the version of the server HTTP protocol; Status-Code represents the response status code sent back by the server; Reason-Phrase represents the text description of the status code.
The status code consists of three digits. The first digit defines the category of the response and has five possible values:
1xx: Indication information--indicates that the request has been received and continues to be processed
2xx: Success --Indicates that the request has been successfully received, understood, and accepted
3xx: Redirect--Further operations must be performed to complete the request
4xx: Client error--The request has a syntax error or the request cannot be implemented
5xx: Server-side error--the server failed to implement a legal request
Common status codes, status descriptions, instructions:
200 OK //Client request successful
400 Bad Request //Client request OK Syntax error, cannot be understood by the server
401 Unauthorized //The request is not authorized, this status code must be used together with the WWW-Authenticate header field
403 Forbidden //The server received the request, but refused to provide the service
404 Not Found //The requested resource does not exist, eg: the wrong URL was entered
500 Internal Server Error //An unexpected error occurred in the server
503 Server Unavailable //The server is currently unable to process the client's request, a paragraph It may return to normal after some time
eg: HTTP/1.1 200 OK (CRLF)
2. The response header is described later
3. The response text is the content of the resource returned by the server. It consists of client-to-server requests and server-to-client responses. Both request messages and response messages consist of a start line (for a request message, the start line is the request line, for a response message, the start line is the status line), a message header (optional), a blank line (a line with only CRLF), and the message body (optional) composition.
HTTP message headers include ordinary headers, request headers, response headers, and entity headers. Each header field is composed of name + ":" + space + value. The name of the message header field is case-independent.
1. Ordinary header In the ordinary header, there are a few header fields used for all request and response messages, but not for the entity being transmitted, only for transmission. news. eg:
Cache-Control is used to specify cache instructions. The cache instructions are one-way (the cache instructions that appear in the response may not appear in the request) and are independent (the cache instructions of a message will not Caching mechanism that affects another message processing), a similar header field used by HTTP 1.0 is Pragma.Cache directives when requesting include: no-cache (used to indicate that request or response messages cannot be cached), no-store, max-age, max-stale, min-fresh, only-if-cached; The caching directives in response include: public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, max-age, s-maxage.
eg: In order to instruct IE browser ( Client) Do not cache the page. The server-side JSP program can be written as follows: response.sehHeader("Cache-Control","no-cache");//response.setHeader("Pragma","no-cache" );The function is equivalent to the above code, usually both // are used together
2. Request header #3. Response header 5. Use telnet to observe the communication process of the http protocol Experiment purpose and principle: Use MS's telnet tool to manually enter the http request In the form of information, a request is sent to the server. After the server receives, interprets and accepts the request, it will return a response, which will be displayed on the telnet window, thereby perceptually deepening the understanding of the communication process of the http protocol. Experimental steps: 1. Open telnet1.1 Open telnet 1.2 Open telnet echo functionset localecho 2. Connect to the server and send a request2.1 open HEAD /index.asp HTTP/1.0 Host: www.guet.edu.cn 2.2 open www.sina.com.cn 80 //Enter telnet directly at the command prompt www.sina.com.cn 80 HEAD /index.asp HTTP/1.0 3.1 Obtained from request information 2.1 The response is: HTTP/1.1 200 OK Thu,08 Mar 200707:17:51 GMT HTTP/1.0 404 Not Found //Request failed Server: Apache/2.0.54 X-Powered-By: mod_xlayout_jh/0.0.1vhs .markII.remixVary: Accept-Encoding 4. Notes: 1. An input error occurred. The request will not succeed. 3. To learn more about the HTTP protocol, you can view RFC2616 and find the file at http://www.letf.org/rfc 4. To develop background programs, you must master the http protocol 1. Basics:
Date Common header field indicates the date and time when the message was generated
Connection Common header field allows options to be sent to specify a connection. For example, specify that the connection is continuous, or specify the "close" option to notify the server to close the connection after the response is completed
The request header allows the client to transmit additional information of the request and the client's own information to the server.
Commonly used request headers
Accept
The Accept request header field is used to specify what types of information the client accepts. eg: Accept: image/gif, indicating that the client wishes to accept resources in GIF image format; Accept: text/html, indicating that the client wishes to accept html text.
Accept-Charset
The Accept-Charset request header field is used to specify the character set accepted by the client. eg: Accept-Charset:iso-8859-1, gb2312. If this field is not set in the request message, the default is that any character set is acceptable.
Accept-Encoding
The Accept-Encoding request header field is similar to Accept, but it is used to specify acceptable content encoding. eg: Accept-Encoding:gzip.deflate. If this domain is not set in the request message, the server assumes that the client can accept various content encodings.
Accept-Language
The Accept-Language request header field is similar to Accept, but it is used to specify a natural language. eg: Accept-Language:zh-cn. If this header field is not set in the request message, the server assumes that the client can accept various languages.
Authorization
The Authorization request header field is mainly used to prove that the client has the right to view a certain resource. When the browser accesses a page and receives a response code of 401 (Unauthorized) from the server, it can send a request containing the Authorization request header field to ask the server to verify it.
Host (this header field is required when sending a request)
Host request header field is mainly used to specify the Internet host and port number of the requested resource. It is usually extracted from the HTTP URL, eg:
We enter in the browser: http://www.guet.edu.cn/index.html
The request message sent by the browser will include the Host request header Domain, as follows:
Host: www.guet.edu.cn
The default port number 80 is used here. If the port number is specified, it becomes: Host:www.guet.edu.cn:Specify the port number
User-Agent
When we log in to the forum online, we often see some welcome messages, which list your operations The name and version of the system and the name and version of the browser you are using often make many people feel amazing. In fact, the server application obtains this information from the User-Agent request header field. The User-Agent request header field allows the client to tell the server its operating system, browser, and other attributes. However, this header field is not necessary. If we write a browser ourselves and do not use the User-Agent request header field, then the server will not be able to know our information.
Request header example:
GET /form.html HTTP/1.1 (CRLF)
Accept:image/gif,image/x-xbitmap,image/jpeg,application/x-shockwave-flash,application/ vnd.ms-excel,application/vnd.ms-powerpoint,application/msword,*/* (CRLF)
Accept-Language:zh-cn (CRLF)
Accept-Encoding:gzip,deflate (CRLF)
If-Modified-Since:Wed,05 Jan 2007 11:21:25 GMT (CRLF)
If-None-Match:W/"80b1a4c018f3c41:8317" (CRLF)
User-Agent:Mozilla /4.0(compatible;MSIE6.0;Windows NT 5.0) (CRLF)
Host:www.guet.edu.cn (CRLF)
Connection:Keep-Alive (CRLF)
(CRLF)
The response header allows the server to pass additional response information that cannot be placed in the status line, as well as information about the server and the response to the Request - Information for further access to the resource identified by the URI.
Commonly used response headers
Location
The Location response header field is used to redirect the recipient to a new location. The Location response header field is often used when changing domain names.
Server
The Server response header field contains information about the software used by the server to process the request. Corresponds to the User-Agent request header field. The following is an example of the
Server response header field:
Server:Apache-Coyote/1.1
WWW-Authenticate
The WWW-Authenticate response header field must be included in the 401 (Unauthorized) response message When the client receives the 401 response message and sends the Authorization header field to request the server to verify it, the server response header contains this header field.
eg: WWW-Authenticate:Basic realm="Basic Auth Test!" //It can be seen that the server uses a basic verification mechanism for requested resources.
4. Entity header
Both request and response messages can transmit an entity. An entity consists of an entity header field and an entity body. However, this does not mean that the entity header field and the entity body must be sent together. Only the entity header field can be sent. The entity header defines meta-information about the entity body (eg: presence or absence of an entity body) and the resource identified by the request.
Commonly used entity headers
Content-Encoding
The Content-Encoding entity header field is used as a modifier of the media type. Its value indicates the encoding of additional content that has been applied to the entity body, so it must To obtain the media type referenced in the Content-Type header field, the corresponding decoding mechanism must be used. Content-Encoding is used to record the compression method of the document, eg: Content-Encoding: gzip
Content-Language
Content-Language entity header field describes the natural language used by the resource. If this field is not set, it is assumed that the entity content will be available to readers in all languages. eg: Content-Language:da
Content-Length
The Content-Length entity header field is used to indicate the length of the entity body, expressed as a decimal number stored in bytes.
Content-Type
The Content-Type entity header field specifies the media type of the entity body sent to the recipient. eg:
Content-Type:text/html;charset=ISO-8859-1
Content-Type:text/html;charset=GB2312
Last-Modified
Last-Modified entity header field is used Indicates the date and time the resource was last modified.
Expires
The Expires entity header field gives the date and time when the response expires. In order to allow the proxy server or browser to update the page in the cache after a period of time (when accessing the previously visited page again, load it directly from the cache, shorten the response time and reduce the server load), we can use the Expires entity header field to specify the page Expiration time. eg: Expires: Thu, 15 Sep 2006 16:23:12 GMT
Clients and caches of HTTP1.1 MUST treat other illegal date formats (including 0) as having expired. eg: In order to prevent the browser from caching the page, we can also use the Expires entity header field and set it to 0. The program in jsp is as follows: response.setDateHeader("Expires","0");
and run-->cmd--> telnet
www.guet.edu.cn 80 //Note that the port number cannot be omitted
/*We can change the request method and request the content of Guilin Electronics homepage, enter the message as follows*/
open
www.guet.edu.cn 80
GET /index.asp HTTP/1.0 //The content of the requested resource
Host:www.guet.edu.cn
Host:www.sina.com.cn
3 Experimental results:
Connection: Keep –Alive KAJEOIMMH; path=/
Cache-control: private
//Resource content omitted
##3.2 The response obtained by requesting information 2.2 is:
Content-Type: text/html
X-Cache: MISS from zjm152-78.sina.com.cn
Via: 1.0 zjm152-78. sina.com.cn:80
X-Cache: MISS from th-143.sina.com.cn
Connection: close
Lost the connection with the host
##6.
HTTP protocol related technical supplement
High-level protocols include: File Transfer Protocol FTP, Email Transfer Protocol SMTP, Domain Name System Service DNS, Network News Transfer Protocol NNTP and HTTP protocols, etc.
There are three types of intermediaries: Proxy, Gateway and Tunnel. A proxy accepts requests according to the absolute format of the URI, rewrites all or part of the message, and sends the formatted request to the server through the URI identifier. A gateway is a receiving proxy that acts as a layer above some other server and, if necessary, can translate requests to the underlying server protocol. A channel acts as a relay point between two connections that do not change messages. Channels are often used when communication needs to go through an intermediary (such as a firewall, etc.) or when the intermediary cannot identify the content of the message.
Proxy: An intermediate program that can act as a server or a client to establish requests for other clients. Requests are passed internally or via other servers via possible translations. A proxy must interpret and if possible rewrite a request message before sending it. A proxy often acts as a portal for clients through a firewall. A proxy can also serve as a helper application to handle requests over a protocol that are not completed by the user agent.
Gateway: A server that acts as an intermediary for other servers. Unlike a proxy, a gateway accepts requests as if it were the origin server for the requested resource; the requesting client is unaware that it is dealing with the gateway.
A gateway often serves as a server-side portal through a firewall. The gateway can also serve as a protocol translator to access resources stored in non-HTTP systems.
Channel (Tunnel): It is an intermediary program that acts as a relay between two connections. Once activated, the channel is not considered to belong to HTTP communication, although the channel may be initiated by an HTTP request. When both ends of the relayed connection are closed, the channel disappears. Channels are often used when a portal must exist or when an intermediary cannot interpret the relayed traffic.
2. Advantages of protocol analysis - HTTP analyzer detects network attacks
Analyzing and processing high-level protocols in a modular manner will be the direction of future intrusion detection.
Commonly used ports 80, 3128 and 8080 of HTTP and its proxy are specified in the network section using the port tag
3. HTTP protocol Content Lenth restriction vulnerability leads to denial of service attack
When using the POST method, ContentLenth can be set to define the length of data that needs to be transmitted, for example, ContentLenth:999999999. The memory will not be released until the transmission is completed. An attacker can take advantage of this flaw to continuously send junk data to the WEB server until the WEB server runs out of memory. This attack method leaves basically no trace.
http://www.cnpaf.net/Class/HTTP/0532918532667330.html
4. Some ideas of using the characteristics of the HTTP protocol to carry out denial of service attacks
The server is busy processing the attacker's forgery TCP connection request and has no time to pay attention to the client's normal request (after all, the client's normal request ratio is very small). At this time, from the perspective of a normal client, the server loses response. This situation is called: the server is subject to a SYNFlood attack. (SYN flood attack).
Smurf, TearDrop, etc. use ICMP messages to carry out Flood and IP fragmentation attacks. This article uses the "normal connection" method to generate a denial of service attack.
Port 19 has been used for Chargen attacks in the early days, namely Chargen_Denial_of_Service, but! The method they used was to generate a UDP connection between two Chargen servers, allowing the server to process too much information and become DOWN. Then, there must be two conditions for killing a WEB server: 1. There is a Chargen service 2. There is HTTP Service
Method: The attacker forges the source IP and sends a connection request (Connect) to N Chargens. After Chargen receives the connection, it will return a 72-byte character stream per second (actually, according to the actual network conditions, this speed is faster ) to the server.
5. Http Fingerprinting Technology
The principle of Http fingerprinting is basically the same: recording different servers to identify minor differences in the execution of the Http protocol. Http fingerprinting is better than TCP/IP stack fingerprinting It is much more complicated. The reason is that customizing the configuration file of the HTTP server and adding plug-ins or components make it easy to change the HTTP response information, which makes identification difficult; however, customizing the behavior of the TCP/IP stack requires modifying the core layer, so It is easy to identify.
It is very simple to set up the server to return different Banner information. For open source Http servers like Apache, users can modify the Banner information in the source code, and then restart the Http service to take effect. For Http servers that do not have open source code, such as Microsoft's IIS or Netscape, you can modify it in the Dll file that stores Banner information. Related articles have discussed it, so I won't go into details here. Of course, the effect of such modification is still good. .Another way to obscure banner information is to use a plug-in.
Commonly used test requests:
1: HEAD/Http/1.0 sends basic Http requests
2: DELETE/Http/1.0 sends those requests that are not allowed, such as Delete requests
3: GET/Http/3.0 sends an illegal version of the Http protocol request
4: GET/JUNK/1.0 sends an incorrect specification of the Http protocol request
Http fingerprint identification tool Httprint, which uses statistical principles, Combining fuzzy logic technology can effectively determine the type of HTTP server. It can be used to collect and analyze signatures generated by different HTTP servers.
6. Others: In order to improve the performance of users when using the browser, modern browsers also support concurrent access methods. When browsing a web page, multiple connections are established at the same time to quickly obtain multiple icons on a web page. , which can complete the transmission of the entire web page more quickly.
HTTP1.1 provides this continuous connection method, and the next generation HTTP protocol: HTTP-NG has added support for session control, rich content negotiation and other methods to provide
more efficient connect.
The above is the detailed content of What is the difference between Http protocol and TCP protocol?. For more information, please follow other related articles on the PHP Chinese website!