What Happened When We Switched to OpenTelemetry for Monitoring

We finally saw what our logs and metrics missed.

When you’re running microservices in Kubernetes, one truth hits hard sooner or later:

“Everything is working... until it’s not.”

And when it’s not, finding out why can be a painful game of guesswork. Logs are scattered across pods. Dashboards showing CPU but not context. Traces? What traces?

We’ve been there, especially with one of our clients who runs a high-traffic fintech app on Kubernetes. That’s when we decided to go all-in on OpenTelemetry & it changed the game.

The Problem: Performance Bottlenecks and Black Boxes

Let me paint you a picture.

A client calls up: “The app is slow again, but only sometimes. During peak hours.”

We check:

  • Logs? Nothing obvious.

  • Metrics? CPU’s fine, memory’s chill.

  • Kubernetes? All pods are healthy.

Still... slowness.

We finally realized we were missing the full picture. We could see symptoms, but not the path a request took through the system. There was no context, just clues.

So we introduced distributed tracing using OpenTelemetry, and the results were immediate.

Why did we make a move towards OpenTelemetry ?!

We want standardized, vendor-neutral, full-stack observability without reinventing the wheel.

  • We started with Prometheus and Grafana, but what if we want to switch to NewRelic or DataDog? Then, the OpenTelemetry Collector is the best option as the backend grows, since OpenTelemetry is vendor-neutral.

  • Huge Community support, in fact, the second biggest project after Kubernetes in the CNCF space.

What OpenTelemetry Helped Us Discover

After implementing OpenTelemetry:

  • Discovered a slow database call buried three layers deep, responsible for 60% of the delay.

  • Detected an unlogged retry storm between two services.

  • Cut debugging time from hours to minutes.

    Traces provided a time-travel perspective of requests, changing our view from foggy CCTV footage to a clear security camera replay.


    Traces provided a time-travel perspective of requests, changing our view from foggy CCTV footage to a clear security camera replay.

Wait, So What is OpenTelemetry?

OpenTelemetry (or OTel) is like a black box recorder for your software systems.

It collects three things:

  1. Metrics - Numbers over time (e.g., request duration, CPU usage)

  2. Logs - Timestamped text entries (e.g., errors, warnings)

  3. Traces - Request journey across services (e.g., auth → API → DB)

Think of it this way:

Type

Analogy

Best For

Metrics

Heart rate monitor

Alerting & Trends

Logs

Journal entries

Debugging

Traces

Flight path map

Performance bottlenecks

And the best part? It’s vendor-neutral. You can export data to Grafana, Jaeger, Tempo, Datadog, AWS X-Ray, whatever fits your stack.

Set up OpenTelemetry Step by Step

This example sets up a full-blown OpenTelemetry observability stack using Helm, including the Astronomy Shop microservices, OpenTelemetry Collector, load generator, and integrations with Jaeger, Prometheus, and Grafana.

Prerequisites

Ensure you have:

Clone the Demo Repo

git clone https://github.com/ezyinfra/ezyinfra-blogs.git
cd otel-demo

Deal with the Microservice Application

Create a Namespace

kubectl create namespace otel-demo

Switch to the recent namespace created

kubectl config set-context --current --namespace=otel-demo

# Check for if we are in the correct otel-demo namespace
kubectl get pods
kubectl get svc


Apply all the current files within the folder

kubectl apply -f .


Try to access the Dashboard using the kubectl port forward

kubectl port-forward svc/opentelemetry-demo-frontendproxy 8080:8080

But if we are using the EC2 Instance for doing this, it will not be sufficient; we would need to access the dashboard from the local terminal like

ssh -i /your/location/Downloads/otel-key.pem -L 8080:localhost:8080 ubuntu@<YOUR PUBLIC IP>


Frontend (Astronomy Shop UI)

Further, we can do some transactions on the items to add to the cart and purchase


Furthermore, we need to send the data from the microservices to the OpenTelemetry collector

Install the OpenTelemetry Collector

With the help of the collector, we are doing the deployment & pulling the images, and it is helpful to capture the data fetched by the Microservices and further send it to Jaeger

cd ../opentelemetry-collector

# Apply all the manifest files within that
kubectl apply -f .


After applying a check for the resources creation

kubectl get pods
kubectl get svc


Check for the logs created by the microservice application and that are collected with the collector

kubectl logs opentelemetry-demo-otelcol-dd9fcfbd6-8x7w6

Move on to the Jaeger Installation

Installation Jaeger

Jaeger captures the Data from the Collector and forwards it to the Prometheus

# Move to the Jaeger folder
cd ../jaeger

# Install all the manifiest within the jaeger folder
kubectl apply -f .


After applying a check for the resources creation

kubectl get pods
kubectl get svc


Access the Jaeger Dashboard at

http://localhost:8080/jaeger/


Further Move on to the Prometheus Installation

Installation Prometheus

To capture the metrics of the Microservice Application

# Goto the prometheus folder
cd ../prometheus

# Apply all the manifest within the folder
kubectl apply -f .

After applying a check for the resources creation

kubectl get pods
kubectl get svc

Check the Prometheus Dashboard at

http://localhost:9090/


Access it via the Port forwarding

kubectl port-forward svc//opentelemetry-demo-prometheus-server 9090


For the AWS Instances, do the same thing to access the Dashboard from the local terminal

ssh -i /your/location/Downloads/otel-key.pem -L 9090:localhost:9090 ubuntu@<YOUR PUBLIC IP>


Check for the targets and service discovery within the dashboard, if the data is being pulled properly out.

Further, we can make some transactions on the frontend application to get the latest traces in the Jaeger application

# Access the Frontend dashboard
http://localhost:8080/

# Access the Jaeger Dashboard with the help
http://localhost:8080/jaeger


We can later move on to the installation of the Grafana Dashboards

Installation Grafana

It depicts some of the user-created dashboards that we have created within the manifest for the open telemetry demo application.

Dashboards are pre-created within the manifests present inside the folder.

# Goto the grafana folder
cd ../grafana

# Apply all the manifest within the folder
kubectl apply -f .


Check for the resources created

kubectl get pods
kubectl get svc

NOTE: We need to run all the Port-Forwards in parallel, so we need to maintain different sessions for each Frontend, Prometheus, and Grafana

Port forward to access the Grafana dashboard

kubectl port-forward svc/demo-grafana 3000:80

Access it locally via the terminal to make the dashboard accessible using

ssh -i /your/path/Downloads/otel-key.pem -L 3000:localhost:3000 ubuntu@<Your-PUBLIC-IP>

Access the Dashboard using the credentials

username: admin
password: admin (meant to change on the first login)

We can further check the Explore section in the side navigation for Jaeger to check if the data is coming properly

We will be able to see the traces

Explore the Dashboards that we have applied via the manifests and are they able to pull the data collected from the otel-collector.

Check Demo Dashboard

OpenTelemetry Collector

OpenTelemetry Collector DataFlow

Viola, we have established the flow for the Data (logs, metrics, and traces flowing from the microservice to the Grafana dashboard)

Miccroservices ==> Otel Collector ==> Jaeger ==> Prometheus ==> Grafana

The “DevOps Cheat Code” You’re Not Using Yet

Kubernetes Operators: Automate everything, the cloud-native way.


EzyInfra.dev – Expert DevOps & Infrastructure consulting! We help you set up, optimize, and manage cloud (AWS, GCP) and Kubernetes infrastructure—efficiently and cost-effectively. Need a strategy? Get a free consultation now!

Share this post

Want to discuss about DevOps practices, Infrastructure Audits or Free consulting for your AWS Cloud?

Prasanna would be glad to jump into a call
Loading...