In the previous post, I shot, saved and viewed Opentelemetry data in Grafana Cloud.
If you use the free version of Grafana Cloud, you get about 50GB of logs and traces per month. If it is a service that does not accumulate much trace (or does not record logs) because there are not many users, you can just use it, but if you introduce it on a small scale, I am afraid that too many logs will accumulate and explode.
Sampling means extracting a part from the whole. As a result, the task is to reduce the number of telemetry data stored.
Why is sampling necessary?
There is no need to save all the circles (trace) in the picture above. It is enough to store only important traces (errors, or too long execution time) and some samples representative of the whole (some of the OK traces).
Sampling can be broadly divided into Head Sampling and Tail Sampling.
TraceIdRatioBasedSampler is provided by default.
import { TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-node'; const samplePercentage = 0.1; const sdk = new NodeSDK({ // Other SDK configuration parameters go here sampler: new TraceIdRatioBasedSampler(samplePercentage), });
Sampling is done from the back. At this time, since there is a lot of information available, you can filter it according to the desired logic.
For example, error traces are always sampled.
Usually, sampling is performed once all traces have been received from the collector.
Implementation may be difficult. It is something that always has to change when the system changes and conditions change.
It is difficult to perform because it must maintain a stateful state for sampling.
There are cases where the Tail Sampler is vendor-specific.
Let’s implement Tail Sampling by implementing a Custom Span Processor.
Create sampling-span-processor.ts file
import { Context } from "@opentelemetry/api"; import { SpanProcessor, ReadableSpan, Span, } from "@opentelemetry/sdk-trace-node"; /** * Sampling span processor (including all error span and ratio of other spans) */ export class SamplingSpanProcessor implements SpanProcessor { constructor( private _spanProcessor: SpanProcessor, private _ratio: number ) {} /** * Forces to export all finished spans */ forceFlush(): Promise<void> { return this._spanProcessor.forceFlush(); } onStart(span: Span, parentContext: Context): void { this._spanProcessor.onStart(span, parentContext); } shouldSample(traceId: string): boolean { let accumulation = 0; for (let idx = 0; idx < traceId.length; idx++) { accumulation += traceId.charCodeAt(idx); } const cmp = (accumulation % 100) / 100; return cmp < this._ratio; } /** * Called when a {@link ReadableSpan} is ended, if the `span.isRecording()` * returns true. * @param span the Span that just ended. */ onEnd(span: ReadableSpan): void { // Only process spans that have an error status if (span.status.code === 2) { // Status code 0 means "UNSET", 1 means "OK", and 2 means "ERROR" this._spanProcessor.onEnd(span); } else { if (this.shouldSample(span.spanContext().traceId)) { this._spanProcessor.onEnd(span); } } } /** * Shuts down the processor. Called when SDK is shut down. This is an * opportunity for processor to do any cleanup required. */ async shutdown(): Promise<void> { return this._spanProcessor.shutdown(); } }
This._spanProcessor.onEnd(span); only when status.code is 2 (Error) or the ratio probability is winning. Export by calling
Update spanProcessors in main.ts.
spanProcessors: [ new SamplingSpanProcessor( new BatchSpanProcessor(traceExporter), samplePercentage ), ],
The above is the detailed content of NestJS + Opentelemetry (Sampling). For more information, please follow other related articles on the PHP Chinese website!