Skip to main content

Observability

Transform Platform ships a full observability stack out of the box. A single docker compose up gives you structured logs, distributed traces, and real-time metrics β€” all correlated by the same traceId.


The Three Pillars​

Every log line, every trace span, and every metric tag carries the same identifiers so you can jump between tools without losing context.


Signal Flow β€” How Everything Connects​


Component Responsibilities​

App β€” what it emits​

SignalLibraryTransportDestination
Tracesmicrometer-tracing-bridge-otel + opentelemetry-exporter-otlpOTLP/HTTP to :4318OTel Collector
Metricsmicrometer-registry-prometheusActuator pull at /actuator/prometheusPrometheus (scrapes every 15s)
Logslogstash-logback-encoder via Logback FILE appenderStructured JSON file + OTel log exporterOTel Collector β†’ Elasticsearch

Every HTTP request gets a correlationId injected into MDC by CorrelationIdFilter. Spring's OTel bridge then automatically adds the active traceId and spanId to MDC as well, so every log line carries all three identifiers.

OTel Collector β€” the routing hub​

The Collector (:4318) receives both traces and logs from the app over OTLP/HTTP. It:

  • Routes traces β†’ Jaeger via OTLP gRPC (:14250)
  • Routes logs β†’ Elasticsearch via HTTP bulk API (:9200)
  • Exposes its own metrics at :8889 (scraped by Prometheus)

Config: .docker/otel-collector-config.yaml

Prometheus β€” metrics store​

Prometheus pulls metrics from two sources every 15 seconds:

  1. /actuator/prometheus on the app β€” all JVM + custom business metrics
  2. :8889 on the OTel Collector β€” collector pipeline metrics

Config: .docker/prometheus.yml

Grafana β€” unified dashboard​

Grafana connects to both Prometheus and Jaeger as datasources (auto-provisioned at startup). The pre-built Transform Platform dashboard correlates metrics and traces in one view. Config: .docker/grafana/provisioning/

Elasticsearch + Kibana β€” log store​

Structured JSON logs are forwarded from the OTel Collector to Elasticsearch. Kibana provides the search and visualisation layer. First-time setup requires creating a data view with pattern transform-platform-*.


Trace Lifecycle β€” One Request End to End​

Every span in Jaeger links back to the same traceId present in every log line. This means you can:

  1. Find an error in Kibana β†’ copy traceId
  2. Paste into Jaeger search β†’ see exactly which span failed and how long each step took
  3. Switch to Grafana β†’ view the metrics spike that coincided with the error

Metrics Reference​

All custom metrics are registered in TransformMetrics.kt using Micrometer. Tags enable per-spec filtering in Prometheus and Grafana.

Key PromQL queries​

# Throughput β€” records processed per minute (last 5m)
rate(transform_records_processed_total[5m]) * 60

# Error rate β€” failed as a percentage of total
rate(transform_records_failed_total[5m])
/ rate(transform_records_processed_total[5m])

# p99 processing time per spec
histogram_quantile(0.99,
sum by(specId, le)(
rate(transform_file_processing_duration_seconds_bucket[5m])
)
)

# Specs with the most failures in the last hour
topk(5,
sum by(specId)(increase(transform_records_failed_total[1h]))
)

Log Structure​

Every log line is emitted as JSON (when dev-text profile is not active). Fields injected automatically:

{
"@timestamp": "2025-01-15T14:32:01.123Z",
"level": "ERROR",
"logger_name": "com.transformplatform.core.pipeline.TransformationPipeline",
"message": "Pipeline execution failed for spec=abc-123",
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"spanId": "00f067aa0ba902b7",
"correlationId": "req-7f3a9c12",
"thread_name": "virtual-23",
"stack_trace": "com.transformplatform..."
}

correlationId is set per-request by CorrelationIdFilter (from X-Correlation-ID header or auto-generated UUID). traceId and spanId are set automatically by Spring's OTel bridge from the active trace context.


Developer Workflow β€” Debugging a Failed Transform​

This is the recommended workflow when a transform job produces unexpected results or errors.


Accessing the Tools​

ToolURLFirst steps
Swagger UIhttp://localhost:8080/swagger-uiOpen in browser β€” no setup
Actuator healthhttp://localhost:8080/actuator/healthOpen in browser or Postman
Kafka UIhttp://localhost:8090Open in browser β€” browse topics and messages
Prometheushttp://localhost:9090Paste PromQL into the Expression bar β†’ Execute
Grafanahttp://localhost:3001Login admin/admin β†’ Dashboards β†’ Transform Platform
Jaegerhttp://localhost:16686Select service transform-platform β†’ Find Traces
Kibanahttp://localhost:5601First visit: create data view transform-platform-*

See .docker/README.md for detailed connection instructions, credentials, CLI commands, and troubleshooting.


Configuration Files​

FilePurpose
.docker/docker-compose.ymlDefines all services, ports, and volumes
.docker/otel-collector-config.yamlOTel Collector pipeline β€” receivers, processors, exporters
.docker/prometheus.ymlPrometheus scrape config β€” targets and intervals
.docker/grafana/provisioning/datasources.ymlAuto-provisions Prometheus + Jaeger datasources
.docker/grafana/dashboards/transform-platform.jsonPre-built Grafana dashboard
platform-api/src/main/resources/logback-spring.xmlLogback config β€” JSON appender + rolling file
platform-api/src/main/resources/application.ymlOTLP exporter endpoint, tracing sample rate
platform-api/src/main/kotlin/.../metrics/TransformMetrics.ktAll custom Micrometer meters
platform-api/src/main/kotlin/.../filter/CorrelationIdFilter.ktInjects correlationId into MDC per request
platform-api/src/main/kotlin/.../config/ObservabilityConfig.ktGlobal meter tags β€” service name, environment