Your Go endpoint takes 1.2 seconds. The handler looks straightforward. You add log.Printf with timestamps before and after each block, redeploy, and try to figure out where the time goes. In a single-service app with sequential logic, this sometimes works. Once you have goroutines, multiple services, and database calls fanning out in parallel, it doesn't. Timestamps in logs can't show you that two goroutines were contending on a mutex, or that a downstream service was slow only when called with a particular user ID.
Distributed tracing captures every operation inside a request as a span with precise timing and parent-child relationships. Instead of correlating log lines, you open a trace and see a waterfall showing which database query took 600ms, which service call blocked the critical path, and where your goroutines actually overlapped versus where they waited.
We'll start with common bottleneck patterns and the manual approach using OpenTelemetry, then look at how modern frameworks can eliminate most of the instrumentation work entirely.
A trace records everything that happens during a single request. Each unit of work becomes a span: an API handler, a SQL query, a service-to-service RPC, an outbound HTTP call. Spans nest to form a tree with start times, durations, and metadata like SQL query text or HTTP URLs.
Here's what a slow request looks like as a waterfall:
GET /orders/abc123 [1180ms]
├── orders.GetOrder [1175ms]
│ ├── DB: SELECT * FROM orders WHERE id = $1 [ 8ms]
│ ├── users.GetProfile [ 450ms]
│ │ └── DB: SELECT * FROM users WHERE id = $1 [ 445ms] ← slow query
│ ├── payments.GetStatus [ 380ms]
│ │ └── HTTP: GET stripe.com/v1/charges/ch_... [ 375ms]
│ └── shipping.GetTracking [ 330ms]
│ └── HTTP: GET fedex.com/track?id=... [ 325ms]
Without a trace, you know the endpoint took 1180ms. With a trace, you can see that the user lookup contains a database query taking 445ms (likely a missing index), and the payment and shipping lookups hit external APIs sequentially (375ms + 325ms). Fix the slow query and parallelize those external calls, and you're under 500ms.
The most frequent cause of slow Go endpoints. Usually a missing index, a full table scan, or an N+1 query pattern hiding inside a loop.
In a trace, this shows up as a single database span taking hundreds of milliseconds on a query that should be fast:
//encore:api public method=GET path=/orders
func ListUserOrders(ctx context.Context, p *ListParams) (*OrderList, error) {
// Fine with 1,000 rows. With 500,000 rows and no index on user_id: 800ms.
rows, err := db.Query(ctx, `
SELECT id, product, amount, created_at FROM orders
WHERE user_id = $1 ORDER BY created_at DESC LIMIT 20
`, p.UserID)
if err != nil {
return nil, err
}
defer rows.Close()
// ... scan rows
}
var db = sqldb.NewDatabase("orders", sqldb.DatabaseConfig{
Migrations: "./migrations",
})
Once you see the 800ms span in a trace, the fix is a migration:
CREATE INDEX idx_orders_user_created ON orders (user_id, created_at DESC);
Same query, 800ms to under 5ms. You can't know which query to index unless you can see individual span durations.
N+1 is the pattern where you fetch a list, then run a separate query per item. In a trace, it appears as dozens of identical database spans stacked sequentially:
GET /orders [ 920ms]
├── DB: SELECT * FROM orders LIMIT 50 [ 6ms]
├── DB: SELECT name FROM users WHERE id = $1 [ 15ms]
├── DB: SELECT name FROM users WHERE id = $1 [ 14ms]
├── DB: SELECT name FROM users WHERE id = $1 [ 16ms]
│ ... (47 more)
The Go code producing this often looks harmless:
//encore:api public method=GET path=/orders
func ListOrders(ctx context.Context) (*OrderList, error) {
rows, err := db.Query(ctx, `SELECT id, user_id, amount FROM orders LIMIT 50`)
if err != nil {
return nil, err
}
defer rows.Close()
var orders []OrderWithUser
for rows.Next() {
var o Order
if err := rows.Scan(&o.ID, &o.UserID, &o.Amount); err != nil {
return nil, err
}
var name string
err := db.QueryRow(ctx, `SELECT name FROM users WHERE id = $1`, o.UserID).Scan(&name)
if err != nil {
return nil, err
}
orders = append(orders, OrderWithUser{Order: o, UserName: name})
}
return &OrderList{Orders: orders}, nil
}
The fix is a JOIN:
rows, err := db.Query(ctx, `
SELECT o.id, o.user_id, o.amount, u.name
FROM orders o
JOIN users u ON u.id = o.user_id
ORDER BY o.created_at DESC LIMIT 50
`)
One query instead of 51. The trace goes from a wall of sequential database spans to a single span under 20ms.
When your handler calls multiple external services one after another, the latencies stack. In a trace, you see spans lined up vertically with no overlap:
GET /checkout/summary [1050ms] ├── HTTP: POST stripe.com/v1/payment_intents [ 410ms] ├── HTTP: POST api.taxjar.com/v2/taxes [ 370ms] ├── HTTP: POST api.shipengine.com/v1/rates [ 260ms]
None of these calls depend on each other's result. The sequential code:
//encore:api auth method=GET path=/checkout/summary
func GetSummary(ctx context.Context) (*CheckoutSummary, error) {
payment, err := stripe.CreatePaymentIntent(ctx, params)
if err != nil {
return nil, err
}
tax, err := taxjar.CalculateTax(ctx, taxParams)
if err != nil {
return nil, err
}
shipping, err := shipengine.GetRates(ctx, rateParams)
if err != nil {
return nil, err
}
return &CheckoutSummary{Payment: payment, Tax: tax, Shipping: shipping}, nil
}
Run them in parallel with errgroup:
//encore:api auth method=GET path=/checkout/summary
func GetSummary(ctx context.Context) (*CheckoutSummary, error) {
var (
payment *PaymentIntent
tax *TaxResult
shipping *ShippingRates
)
g, ctx := errgroup.WithContext(ctx)
g.Go(func() error {
var err error
payment, err = stripe.CreatePaymentIntent(ctx, params)
return err
})
g.Go(func() error {
var err error
tax, err = taxjar.CalculateTax(ctx, taxParams)
return err
})
g.Go(func() error {
var err error
shipping, err = shipengine.GetRates(ctx, rateParams)
return err
})
if err := g.Wait(); err != nil {
return nil, err
}
return &CheckoutSummary{Payment: payment, Tax: tax, Shipping: shipping}, nil
}
Total time drops from the sum of all calls (~1050ms) to the slowest one (~410ms). errgroup also propagates cancellation through the context: if one call fails, the others get cancelled.
The same pattern happens with internal services. With Encore, service-to-service calls look like normal function calls, which makes it easy to overlook that each involves a network round trip:
import (
"encore.app/users"
"encore.app/inventory"
"encore.app/pricing"
)
//encore:api auth method=POST path=/orders/preview
func Preview(ctx context.Context, req *PreviewRequest) (*OrderPreview, error) {
user, err := users.GetProfile(ctx, req.UserID)
if err != nil {
return nil, err
}
stock, err := inventory.Check(ctx, &inventory.CheckParams{SKU: req.SKU})
if err != nil {
return nil, err
}
price, err := pricing.Calculate(ctx, &pricing.CalcParams{SKU: req.SKU, Region: user.Region})
if err != nil {
return nil, err
}
return &OrderPreview{User: user, Stock: stock, Price: price}, nil
}
The pricing call depends on the user's region, so you can't parallelize everything. But the inventory check is independent:
//encore:api auth method=POST path=/orders/preview
func Preview(ctx context.Context, req *PreviewRequest) (*OrderPreview, error) {
var (
user *users.Profile
stock *inventory.StockStatus
)
g, ctx := errgroup.WithContext(ctx)
g.Go(func() error {
var err error
user, err = users.GetProfile(ctx, req.UserID)
return err
})
g.Go(func() error {
var err error
stock, err = inventory.Check(ctx, &inventory.CheckParams{SKU: req.SKU})
return err
})
if err := g.Wait(); err != nil {
return nil, err
}
// pricing depends on user.Region, so it runs after the user lookup
price, err := pricing.Calculate(ctx, &pricing.CalcParams{SKU: req.SKU, Region: user.Region})
if err != nil {
return nil, err
}
return &OrderPreview{User: user, Stock: stock, Price: price}, nil
}
Total time drops from ~620ms to ~400ms. The dependency graph in the trace waterfall makes these optimization opportunities visible at a glance.
Go makes concurrency easy to start and hard to get right. Multiple goroutines competing for the same lock, connection pool slot, or shared resource add latency that shows up in traces as gaps: time inside a span that isn't accounted for by any child span.
Consider a handler using a shared cache protected by sync.Mutex:
var (
cacheMu sync.Mutex
cache = make(map[string]*Product)
)
//encore:api public method=GET path=/products/:id
func GetProduct(ctx context.Context, id string) (*Product, error) {
cacheMu.Lock()
if p, ok := cache[id]; ok {
cacheMu.Unlock()
return p, nil
}
cacheMu.Unlock()
p, err := loadProduct(ctx, id)
if err != nil {
return nil, err
}
cacheMu.Lock()
cache[id] = p
cacheMu.Unlock()
return p, nil
}
Under low load, the lock adds negligible time. Under high concurrency, goroutines queue up. In a trace, the GetProduct span might take 200ms while the database query inside takes 5ms. The remaining 195ms is lock wait time that won't appear as a child span.
For a read-heavy cache, switch to sync.RWMutex so readers don't block each other. For more complex patterns, use sync.Map or move to a proper cache layer like Encore's caching primitives. The trace gives you the signal: if total span time far exceeds the sum of child span times, something is blocking the goroutine between operations.
You can combine tracing with Go's built-in profiling here. The runtime/pprof package and go tool pprof show mutex contention profiles and goroutine blocking. Tracing tells you which requests are affected; pprof tells you where the lock is.
OpenTelemetry Go is the standard for distributed tracing (see our full OpenTelemetry Go setup guide). Setup involves a tracer provider, an exporter, and auto-instrumentation for your HTTP server and database driver:
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
"go.opentelemetry.io/otel/sdk/resource"
semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
)
func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
exporter, err := otlptracehttp.New(ctx,
otlptracehttp.WithEndpoint("localhost:4318"),
otlptracehttp.WithInsecure(),
)
if err != nil {
return nil, err
}
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String("orders-service"),
)),
)
otel.SetTracerProvider(tp)
return tp, nil
}
You also need instrumentation wrappers for your HTTP handler (otelhttp.NewHandler) and database driver (otelsql.Open). Service-to-service calls still need manual context propagation: extract the trace context from incoming requests and inject it into outgoing ones using otelhttp.NewTransport(). Miss one call, and that service disappears from the trace.
The ecosystem is mature and well-documented. The trade-off is setup overhead and the discipline of keeping instrumentation consistent as services multiply.
Encore.go instruments everything automatically. Every API call, database query, service-to-service call, and Pub/Sub message produces a span without any instrumentation code:
package orders
import (
"context"
"encore.app/users"
"encore.dev/storage/sqldb"
)
//encore:api public method=GET path=/orders/:id
func GetOrder(ctx context.Context, id string) (*OrderDetail, error) {
order, err := db.QueryRow(ctx, `SELECT * FROM orders WHERE id = $1`, id).Scan(...)
if err != nil {
return nil, err
}
user, err := users.GetProfile(ctx, order.UserID)
if err != nil {
return nil, err
}
return &OrderDetail{Order: order, User: user}, nil
}
var db = sqldb.NewDatabase("orders", sqldb.DatabaseConfig{
Migrations: "./migrations",
})
Run encore run and open the local development dashboard at localhost:9400. Every request shows a full trace with timing breakdowns, SQL queries, and request/response data. Everything is instrumented and collected automatically. Because Encore controls the RPC layer and database connections, context propagation happens behind the scenes, and every operation is always captured in the trace.
Start with total duration. If a request took 1200ms and your target is 200ms, you need to find 1000ms of savings. That focuses the search.
Follow the critical path. The critical path is the longest chain of sequential operations. Parallel spans don't add to total time; only the slowest branch counts. In a trace where three goroutines run service calls taking 100ms, 300ms, and 200ms, the critical path through that section is 300ms.
Look for gaps. Time not accounted for by child spans means the goroutine was doing CPU work, waiting on a lock, blocked on a channel, or waiting for a connection pool slot. In Go, gaps often point to mutex contention or connection pool exhaustion. Cross-reference with go tool pprof if the gap is large.
Check span metadata. Database spans should show the SQL query text. HTTP spans show the URL, method, and status code. In Encore, service call spans include request and response payloads. This tells you not just that something was slow, but what data triggered the slow path.
Compare fast and slow traces. If the median request takes 50ms and a p99 takes 2 seconds, open both side by side. The difference usually jumps out: a cache miss, a different query plan, a lock contention spike, or an external API timeout.
The worst performance problems hide in the tail. An endpoint averaging 40ms might have a p99 of 3 seconds.
Bimodal latency. Request durations cluster around 30ms and 900ms with nothing in between. Two code paths: cache hit (fast) and cache miss or table scan (slow). The trace for a slow request shows exactly which path it took.
Gradual degradation. p99 creeps up over weeks while p50 stays flat. Usually data growth. A query fast on 10,000 rows degrades at 1,000,000. Add the index, paginate, or archive old data.
Intermittent spikes. Random slow requests from external dependencies. A payment provider at 150ms normally but 4 seconds during load spikes. Add timeouts with context.WithTimeout and consider gobreaker for circuit breaking.
Connection pool exhaustion. Goroutines waiting for a free database connection add latency that appears as gaps before database spans. Tune SetMaxOpenConns and check sql.DB stats with db.Stats().
With Encore, traces are available from the first encore run. The local development dashboard at localhost:9400 shows every request as a clickable trace. In production with Encore Cloud, traces are collected across all environments with filtering by endpoint, latency percentile, status code, or time range.
The alternative is adding log.Printf("step X took %v", time.Since(start)) statements, redeploying, reproducing the issue, reading logs, deciding you need more timestamps, redeploying again. Tracing replaces that entire cycle with a single waterfall view that's always there.