The purpose of this Tutorial is to provide a practical and easy-to-understand guide on span links in OpenTelemetry.
It aims to help developers, especially those working with complex and asynchronous systems, understand what span links are, how they differ from traditional parent-child relationships in tracing, and why they are valuable for better trace correlation.
By the end of this guide, you will gain the skills needed to effectively use span links to track interactions within distributed systems, leading to improved observability and debugging.
In the past, Applications were typically monolithic, meaning every process or feature is executed as a single unit on one server.Monitoring such applications was straightforward.
For Example: if something went wrong, you could look at logs from that server to identify the problem. However, the rise of microservices changed this simplicity.
Now, modern applications are often made up of dozens or even hundreds of smaller, independent services that work together. For example: when you use a mobile app to place an order, there might be separate services to handle user authentication, process payments, manage inventory, and send confirmation emails.
These services don’t always live on the same server and can even communicate over the internet, which adds complexity to tracking what happens when you interact with an application.
This is where distributed tracing comes in. Think of distributed tracing as a way to follow a single request as it travels through various services in a complex application.It tracks the journey of a request through a complex system.
In modern applications, requests often travel through multiple services, each running on different machines. Distributed tracing helps us visualize this journey, making it easier to identify bottlenecks and errors.
It’s like a detective’s map that connects the dots between each step of the process, showing you how long each part took and where any issues occurred. When you look at a trace, you see a timeline of how your request moved through different services, making it easier to identify slowdowns, errors, or failures.
Code Repo
Here’s the code repo for this tutorial:
[https://github.com/Noibisjunior/Span-Links-in-OpenTelemetry]
OpenTelemetry is a key player in enabling this kind of visibility. It’s an open-source observability framework that allows developers to collect data like logs, metrics, and traces from their applications.It serve as a toolset for capturing detailed information about what’s happening inside your services.
In the world of modern observability, OpenTelemetry helps you understand the performance and health of your distributed applications. It acts like a bridge that gathers data from various services and sends it to tools like SigNoz, where you can visualize what’s going on. This makes OpenTelemetry invaluable for identifying bottlenecks, tracking down errors, and ensuring that your applications run smoothly.
By using OpenTelemetry with distributed tracing, you can get a full picture of how your applications behave, making it easier to diagnose issues and improve the user experience.
As software, especially distributed systems grow in complexity, understanding their inner workings becomes a challenging task. That's where OpenTelemetry's spans come in to solve the challenge easily.
A span is a fundamental unit of work in OpenTelemetry’s tracing system.It is a single operation or event that occurs within your application.
It captures what happened during that operation, how long it took, and any relevant details, like whether it succeeded or failed.
For example, imagine your application processes a user request:
Key attributes of a span:
Individually, spans are useful, but they are effective when they work together to form a trace.
A trace is a collection of spans that represents the entire journey of a request or operation as it flows through your system.
Let’s go back to our user request example:
The trace begins when the request enters the system, and a root span is created.As the request triggers the database query, the database interaction span is linked to the root span, showing that it’s part of the same process.
Additional spans for calling other services get added to the trace.By looking at this trace, you can see the big picture of how the request traveled through different parts of your system. It helps you understand not just what happened, but how different parts of your application are connected.
Pinpointing Problems: Spans help you zoom in on where things go wrong. If a request is slow, spans can tell you whether it’s the database query, the network call, or some other part of the process that’s causing the delay. You can see which span took longer than expected, making it easier to find bottlenecks.
Building Context: Each span contains contextual information like start time, end time, and custom labels (attributes). This data provides insights into what was happening at a particular moment in your system, like the specific user ID involved in a request or the query that was executed.
Creating Relationships: Spans have relationships with one another, often in a parent-child structure. The root span is the parent, and subsequent spans are its children. This structure helps you see the order in which events occurred and how they depend on one another. It’s like looking at a family tree of operations in your app.
Debugging Distributed Systems: For applications with microservices (where different services handle different parts of a request), spans are especially crucial. They help you track a request as it moves between services, even if those services are running on different servers or in different data centers. This is key for understanding complex interactions between services.
What Are Span Links?
In the world of distributed systems, where multiple services work together to handle a user request, tracing is like a detective's map, it shows the path a request takes as it moves through these services. Each activity in this journey is called a span, and a complete journey is called a trace.
Traditionally, spans are connected using parent-child relationships. Imagine these like a family tree: a parent span initiates a process (like making a request to another service), and child spans represent the activities that happen as a result (like the service processing the request). This is a straightforward way to represent a request’s flow.
But what happens when two spans are related, yet they don’t fit perfectly into that parent-child hierarchy? This is where span links comes in.
A span link allows you to connect two spans that are related but don’t have a direct parent-child relationship. It is like a “reference” or “shortcut” between two activities in a distributed system.
For example, let’s say you have a user making a request that triggers multiple independent processes, like sending an email and writing to a database. These processes aren’t child activities of each other; they happen side by side. Using a span link, you can indicate that the email sending span and the database writing span are related to the same initial user request, even though they aren’t directly connected in the parent-child concept.
Parent-Child Relationship: it is a straightforward chain of events. A user sends a request (parent), which triggers the creation of a record in a database (child). The child span wouldn’t exist without the parent span, making it a direct consequence.
Span Links: These are more like drawing dotted lines between activities that are related in some context but don’t follow a direct chain of actions. They provide a way to say, “These things are related, even though one didn’t directly cause the other.” Span links are ideal for representing parallel activities or events that interact but aren’t strictly hierarchical.
Importance of Span Links in Complex and Asynchronous Systems
Span links are particularly valuable in complex and asynchronous systems, where the flow of events doesn’t always follow a clear parent-child path. Here are some scenarios of how it can be used practically;
Asynchronous Workflows:
Imagine a user request that starts with a background job (like generating a report). The initial request finishes, but the report generation continues in the background.
With the implementation of span links, you can relate the initial request span to the background job span, even though they don’t follow a direct parent-child pattern.
Microservice Communication:
In a microservices architecture, services often communicate with each other in ways that aren’t strictly hierarchical.
For instance, a user action could trigger multiple services to process different parts of the data simultaneously. Span links allow you to track these independent and related spans as part of a broader workflow.
Batch Processing: If you’re processing batches of data where each item in the batch generates its own spans, you can use span links to connect these spans back to the original batch process.
This makes it easier to trace the entire lifecycle of a batch and understand how individual items relate back to the main process.
It acts as a bridge between your code and observability systems, making it possible to collect detailed information about how your application is running.
Imagine OpenTelemetry as a “camera” that captures snapshots of your application's operations. By integrating the SDK into your app, you’re positioning this camera to record what’s happening behind the scenes.
You’ll need to install the SDK in your application’s programming language (e.g., Python, Java, JavaScript).
(2) SigNoz Setup: SigNoz is an open-source observability tool that allows you to visualize and analyze the data you collect with OpenTelemetry.
Think of SigNoz as the “control room” where you view the footage captured by your OpenTelemetry setup. It’s where you have a clear picture of traces and metrics in your application.
You’ll need to set up a SigNoz instance, which involves deploying it on your local machine or on a server, usually using Docker or Kubernetes.
SigNoz helps transform the raw data into visualizations, like graphs and charts, making it easier to understand what's happening inside your application.
Traces:
In simple terms, a trace is like a “story” of what happens when a user or a request interacts with your application. It captures all the actions that occur as a result of that interaction, from the initial request to all the services and databases that might be involved.
Imagine a user clicking a button on your website. A trace would record every step of what happens next.
Spans:
Spans are the “chapters” within a trace's story. Each span represents a specific operation or task that takes place as part of a trace.
For instance, if the trace captures the entire process of a user request, a span could represent a single step, like querying the database or calling an external API.
Each span has a start and end time, giving you precise details about how long each step took. This makes it easier to pinpoint any slowdowns or errors.
Instrumenting Code with OpenTelemetry:
Instrumentation is the process of adding code to your application to collect observability data. By instrumenting your code with OpenTelemetry, this typically involves adding a few lines of code where you want to create traces and spans.
For example, you might instrument a database query to see how long it takes or instrument a user login process to track its performance.
The OpenTelemetry SDK makes this easier by providing libraries and functions that you can integrate directly into your code. Think of it like attaching trackers to parts of a machine to monitor how they work together.
Let’s look at a basic example in Python. We’ll use the OpenTelemetry SDK to create two spans and link them together.
Explanation of the Above Python Code Snippet
Set Up the Tracer Provider:
The above code snippet begins with a tracer provider, which manages the creation of spans.
This is essential for OpenTelemetry to know how to handle spans.We also configure a SimpleSpanProcessor and ConsoleSpanExporter to print span data to the console. This helps us see what type of spans that are being created and how they’re linked
.
(2) Create the First Span (span_one):
Using the tracer.start_as_current_span method, we create a span called span_one. This could represent any action, like processing an order.
Inside this span, we add an event Processing order to indicate what’s happening at that particular point in time.
We also simulate an order ID (order_id = "12345") that would be used in the next span.
(3) Create the Second Span with a Link (span_two):
Here, we initiated another span called span_two to represent a different, but related action—like updating the status of the order.
Notice the links parameter.We use Link(span_one.get_span_context()) to create a link between span_two and span_one.
This tells OpenTelemetry, "While these actions aren't parent-child, they are related."
Inside span_two, we added another event, Updating order status, and simulate some work like updating an order status in a database.
(4) Output:
When you run this code, you’ll see output in the console from the ConsoleSpanExporter that shows both spans, along with the link between them. This helps visualize how these two spans relate to each other in a trace.
(1) Missing Span Contexts:
Error: If you try to create a link without calling span_one.get_span_context(), you’ll get an error because OpenTelemetry requires a valid span context to create a link.
Solution: Always ensure that you are passing a span context when creating a link. Use the .get_span_context() method of an active span.
(2) Linking Unstarted or Ended Spans:
Error: If you attempt to create a link to a span that hasn’t been started or has already ended, you might run into issues where the link is not recognized.
Solution: Make sure that the span you’re linking to is active when you create the link. Creating links with spans that have already ended can cause unexpected behavior in how traces are displayed.
(3) Performance Considerations:
Performance Issue: Linking too many spans can increase the overhead of trace data, leading to performance degradation in high-traffic systems.
Solution: Use links selectively. Only link spans when there is a meaningful relationship that you need to visualize or analyze. For high-traffic environments, you can use OpenTelemetry’s sampling options to reduce the amount of trace data being captured.
When working with distributed systems, understanding how different parts of your system communicate is crucial. Span links in OpenTelemetry play a vital role in connecting traces that may not have a direct parent-child relationship, providing a clearer picture of how requests flow through your services.
Why Visualizing Span Links Matters
Imagine your application has a payment service that triggers a notification service when a payment is successful. While these services interact, they might not share a direct parent-child relationship.
Span links allow you to relate these interactions, showing that the payment triggered the notification. Visualizing these links helps you see the bigger picture of how independent services correlate in real time.
By setting up SigNoz to visualize these span links, you can gain deeper insights into your application's behavior.
Here’s how you can configure SigNoz to Capture and View Span Links
Step 1: Ensure SigNoz is Properly Installed
Step 2: Configure OpenTelemetry SDK to include your Span Links
The next step is to ensure that your OpenTelemetry SDK is configured to send span link data to SigNoz.In your application's code, you’ll need to add span links as part of your tracing logic.
Here’s a code snippet in Python programming language:
Explanation of the above python code snippet
We started by configuring the TracerProvider to handle tracing in our application and set up a span processor.
The OTLPSpanExporter is used to send spans to SigNoz’s backend using the OTLP protocol.
Replace http://localhost:4317 with the appropriate SigNoz endpoint if you are not running SigNoz locally.
Secondly, Creating Spans and Links:
The parent_span is created first, representing an initial operation (e.g., a user request).We extract the context of the parent_span using get_span_context(), which allows us to link another span to it.
The linked_span is then created, and a Link is added to reference the parent_span’s context. This signifies that while the linked_span is not a direct child of parent_span, it is related to it.
Lastly, Sending Data to SigNoz:
The span data is processed using the BatchSpanProcessor, which ensures that span data is sent to SigNoz efficiently.
trace.get_tracer_provider().shutdown() is called at the end to ensure that any remaining spans are flushed and exported before the program exits.
Step 3: Update SigNoz Configuration for Tracing Data
Step 4: Visualize Span Links in SigNoz
Once your application is sending trace data with span links to SigNoz, you can visualize these links by:
Step 5: Adjust Your View for Clarity
In order to have perfect visualization, you need to take note of the following:
Next Steps
In this tutorial, we learnt how to use span links to track interactions within distributed systems
In the next tutorial, we will learn the Best Practices for Using Span Links and Advanced Use Cases
The above is the detailed content of Mastering Trace Analysis with Span Links using OpenTelemetry and Signoz (A Practical Guide). For more information, please follow other related articles on the PHP Chinese website!