nginx modules are generally divided into three categories: handler, filter and upstream. In the previous chapters, readers have already learned about handlers and filters. Using these two types of modules, nginx can easily complete any stand-alone work.
The upstream module will enable nginx to transcend the limitations of a single machine and complete the reception, processing and forwarding of network data.
The data forwarding function provides nginx with horizontal processing capabilities across a single machine, freeing nginx from the limitation of only providing a single function for terminal nodes, and enabling it to have network application level disassembly. Divide, encapsulate and integrate functions.
Data forwardingis a key component of nginx’s ability to build a network application. Of course, due to development cost issues, key components of a network application are often initially developed using high-level programming languages. But when the system reaches a certain scale and more emphasis is placed on performance, in order to achieve the required performance goals, the components developed in high-level languages must be structurally modified.
At this time, in terms of modification costs, nginx's upstream module shows its advantages because it is inherently fast. As an aside, the hierarchical and loose coupling provided by nginx's configuration system makes the system scalable to a relatively high level.
Essentially speaking, upstream belongs to the handler, but it does not generate its own content, but obtains the content by requesting the back-end server, so it is called upstream (upstream) . The entire process of requesting and obtaining response content has been encapsulated within nginx, so the upstream module only needs to develop several callback functions to complete specific work such as constructing requests and parsing responses.
The upstream module callback functions are listed as follows:
Function name | Description |
---|---|
create_request | Generate the request buffer (buffer chain) sent to the back-end server, use it when initializing upstream |
reinit_request | In the event of an error on a backend server, nginx will try another backend server. After nginx selects a new server, it will first call this function to reinitialize the working status of the upstream module, and then connect the upstream again |
process_header | Processing backend The information header returned by the server. The so-called header is specified by the protocol for communicating with the upstream server, such as the header part of the HTTP protocol, or the response status part of the memcached protocol |
abort_request | Abort on the client Called when requested. There is no need to implement the function of closing the back-end server connection in the function. The system will automatically complete the steps of closing the connection, so generally this function will not perform any specific work |
finalize_request | This function is called after the request with the back-end server is completed normally. It is the same as abort_request and generally does not perform any specific work |
input_filter | Processes the back-end server The response body returned. The default input_filter of nginx will encapsulate the received content into a buffer chain ngx_chain. This chain is located by the out_bufs pointer field of upstream, so developers can use this pointer to obtain the text data returned by the back-end server outside the module. The memcached module implements its own input_filter. This module will be analyzed in detail later. |
input_filter_init | Initialize the input filter context. nginx's default input_filter_init directly returns |
memcache is a high-performance distributed cache system that has been widely used. memcache defines a set of private communication protocols so that memcache cannot be accessed through HTTP requests. However, the protocol itself is simple and efficient, and memcache is widely used, so most modern development languages and platforms provide memcache support to facilitate developers to use memcache.
nginx provides the ngx_http_memcached module, which provides the function of reading data from memcache, but does not provide the function of writing data to memcache.
The upstream module uses the access method of the handler module.
At the same time, the design of the instruction system of the upstream module also follows the basic rules of the handler module: the module will be executed only after configuring the module.
So, what is so special about the upstream module? That is the processing function of the upstream module. The operations performed by the processing function of the upstream module include a fixed process: (take the memcached module as an example, in the processing function ngx_http_memcached_handler of memcached)
Create the upstream data structure:
ngx_http_upstream_t *u; if (ngx_http_upstream_create(r) != NGX_OK) { return NGX_HTTP_INTERNAL_SERVER_ERROR; } u = r->upstream;
Set the tag and schema of the module. Schema will only be used for logs now, and tags will be used for buf_chain management:
ngx_str_set(&u->schema, "memcached://"); u->output.tag = (ngx_buf_tag_t) &ngx_http_memcached_module;
Set the upstream backend server list data structure:
mlcf = ngx_http_get_module_loc_conf(r, ngx_http_memcached_module); u->conf = &mlcf->upstream;
Set the upstream callback function:
u->create_request = ngx_http_memcached_create_request; u->reinit_request = ngx_http_memcached_reinit_request; u->process_header = ngx_http_memcached_process_header; u->abort_request = ngx_http_memcached_abort_request; u->finalize_request = ngx_http_memcached_finalize_request; u->input_filter_init = ngx_http_memcached_filter_init; u->input_filter = ngx_http_memcached_filter;
Create and set upstream environment data structure:
ctx = ngx_palloc(r->pool, sizeof(ngx_http_memcached_ctx_t)); if (ctx == NULL) { return NGX_HTTP_INTERNAL_SERVER_ERROR; } ctx->request = r; ngx_http_set_ctx(r, ctx, ngx_http_memcached_module); u->input_filter_ctx = ctx;
Complete upstream initialization and finishing work:
r->main->count++; ngx_http_upstream_init(r); return NGX_DONE;
This is true for any upstream module, as simple as memcached, as complex as proxy and fastcgi.
The biggest difference between different upstream modules in these 6 steps will appear in steps 2, 3, 4, and 5.
Steps 2 and 4 are easy to understand. The flags set by different modules and the callback functions used are definitely different. Step 5 is not difficult to understand either.
Only step 3 is a bit confusing. Different modules have very different strategies when obtaining the backend server list. Some are as simple and clear as memcached, and some are as logically complex as proxy.
Step 6 is usually consistent between different modules. Increase count by 1 and return NGX_DONE.
When nginx encounters this situation, although it will consider that the processing of the current request has ended, it will not release the memory resources used by the request, nor will it close the connection with the client.
The reason why this is needed is because nginx has established a one-to-one relationship between upstream requests and client requests. When subsequently using ngx_event_pipe to send the upstream response back to the client, these stored client information will also be used. data structure.
Bind upstream requests and client requests one-to-one. This design has advantages and disadvantages. The advantage is that it simplifies module development and allows you to focus on module logic. However, the disadvantages are equally obvious. One-to-one design often cannot meet the needs of complex logic.
Callback function:(still taking the processing function of the memcached module as an example)
ngx_http_memcached_create_request: It is very simple to generate one according to the set content key, then generate a "get $key" request and place it in r->upstream->request_bufs.
ngx_http_memcached_reinit_request: No initialization required.
ngx_http_memcached_abort_request: No additional action required.
ngx_http_memcached_finalize_request: No additional action required.
ngx_http_memcached_process_header: The business focus function of the module. The header information of the memcache protocol is defined as the first line of text, and the code is as follows:
#define LF (u_char) '\n' for (p = u->buffer.pos; p < u->buffer.last; p++) { if (*p == LF) { goto found; } }
If the LF (‘\n’) character is not found in the data that has been read into the buffer , the function returns NGX_AGAIN, indicating that the header has not been completely read and the data needs to continue to be read. nginx will call this function again after receiving new data.
nginx will only use one cache when processing the response header of the back-end server. All data is in this cache, so when parsing the header information, there is no need to consider the fact that the header information spans multiple caches. If the header is too large and cannot be saved in this cache, nginx will return an error message to the client and record an error log, indicating that the cache is not large enough.
The important responsibility of ngx_http_memcached_process_header is to translate the status returned by the back-end server into the status returned to the client. For example:
u->headers_in.content_length_n = ngx_atoof(start, p - start); ··· u->headers_in.status_n = 200; u->state->status = 200; ··· u->headers_in.status_n = 404; u->state->status = 404;
u->state is used to calculate upstream related variables. For example, u->state->status will be used to calculate the value of the variable "upstream_status". u->headers_in will be returned as a status code in the response to the client. And u->headers_in.content_length_n sets the length of the response returned to the client.
In this function, you must move the read pointer pos backward after processing the header information, otherwise this data will also be copied to the body of the response returned to the client, resulting in the content of the body being inconsistent. correct.
ngx_http_memcached_process_header function completes the correct processing of the response header and should return NGX_OK. If NGX_AGAIN is returned, it means that the complete data has not been read and the data needs to be continued to be read from the backend server. Returning NGX_DECLINED is meaningless. Any other return value is considered an error status, and nginx will end the upstream request and return an error message.
ngx_http_memcached_filter_init: Fix content length received from backend server. Because this part of the length is not added when processing the header.
ngx_http_memcached_filter:
The memcached module is a rare module with a callback function for processing text.
Because the memcached module needs to filter the CRLF "END" CRLF at the end of the text, it implements its own filter callback function.
The actual meaning of processing the text is to encapsulate the valid content of the text received from the back-end server into ngx_chain_t and add it to the end of u->out_bufs.
nginx does not copy data, but establishes the ngx_buf_t data structure to point to these data memory areas, and then organizes these bufs by ngx_chain_t. This implementation avoids large-scale memory relocation and is one of the reasons why nginx is efficient.
The above is the detailed content of How to use the upstream module in Nginx. For more information, please follow other related articles on the PHP Chinese website!