Your endpoint takes 1200ms. The code looks fine. You add console.log(Date.now()) in a few places, redeploy, and try to piece together where the time goes. In a single-service app, you might get lucky. In a system with multiple services, database calls, and external APIs, you won't.
Distributed tracing solves this by recording every operation inside a request as a span, with precise timing and nesting. Instead of guessing, you open a trace and see a waterfall showing exactly which database query took 800ms, which service call blocked the response, and which external API added latency.
This guide covers the most common causes of slow API requests and shows you how to find them with tracing. We'll start with the patterns to look for, then cover both the manual OpenTelemetry approach and how modern frameworks can give you this visibility automatically.
A trace captures everything that happens during a single request. Each operation becomes a span: an API call, a database query, a service-to-service call, an HTTP request to an external provider. Spans nest inside each other and record start time, duration, and metadata.
Here's what a slow request looks like as a waterfall:
GET /orders/123 [1180ms]
├── orders.get [1175ms]
│ ├── DB: SELECT * FROM orders WHERE id = $1 [ 12ms]
│ ├── users.getProfile [ 450ms]
│ │ └── DB: SELECT * FROM users WHERE id = $1 [ 445ms] ← slow query
│ ├── payments.getStatus [ 380ms]
│ │ └── HTTP: GET stripe.com/v1/charges/ch_... [ 375ms]
│ └── shipping.getTracking [ 330ms]
│ └── HTTP: GET fedex.com/track?id=... [ 325ms]
Without a trace, you'd know the endpoint took 1180ms. With a trace, you can see three things: the user lookup has a slow database query (445ms), and the payment and shipping lookups call external APIs sequentially (375ms + 325ms). Fix the query and parallelize the external calls, and you're under 500ms.
The most frequent cause of slow endpoints. Usually a missing index, a full table scan, or an N+1 query pattern.
In a trace, this shows up as a single database span taking hundreds of milliseconds on a query that should be fast. The query itself looks reasonable, but the database is scanning every row.
// This query runs fine with 100 rows.
// With 500,000 rows and no index on user_id, it takes 800ms.
const orders = await db.query<Order>`
SELECT * FROM orders WHERE user_id = ${userId}
ORDER BY created_at DESC
LIMIT 20
`;
The fix is straightforward once you know which query is slow:
-- Add an index covering the WHERE and ORDER BY columns
CREATE INDEX idx_orders_user_id_created ON orders (user_id, created_at DESC);
After adding the index, the same query drops from 800ms to 2ms. You wouldn't know which query to index without seeing it in the trace.
N+1 is the pattern where you fetch a list, then run a separate query for each item. In a trace, it shows up as dozens of identical database spans stacked sequentially.
GET /orders [920ms]
├── DB: SELECT * FROM orders LIMIT 50 [ 8ms]
├── DB: SELECT * FROM users WHERE id = $1 [ 15ms]
├── DB: SELECT * FROM users WHERE id = $1 [ 14ms]
├── DB: SELECT * FROM users WHERE id = $1 [ 16ms]
│ ... (47 more)
The code doing this often looks harmless:
export const listOrders = api(
{ expose: true, method: "GET", path: "/orders" },
async (): Promise<{ orders: OrderWithUser[] }> => {
const rows = db.query<Order>`SELECT * FROM orders LIMIT 50`;
const orders: OrderWithUser[] = [];
for await (const order of rows) {
// This runs a separate query for each order
const user = await db.queryRow<User>`
SELECT name, email FROM users WHERE id = ${order.userId}
`;
orders.push({ ...order, userName: user!.name });
}
return { orders };
}
);
The fix is a JOIN or a batched IN query:
export const listOrders = api(
{ expose: true, method: "GET", path: "/orders" },
async (): Promise<{ orders: OrderWithUser[] }> => {
const rows = db.query<OrderWithUser>`
SELECT o.*, u.name as "userName", u.email as "userEmail"
FROM orders o
JOIN users u ON u.id = o.user_id
ORDER BY o.created_at DESC
LIMIT 50
`;
const orders: OrderWithUser[] = [];
for await (const row of rows) {
orders.push(row);
}
return { orders };
}
);
One query instead of 51. The trace goes from a wall of database spans to a single 15ms span.
When your endpoint calls multiple external services one after another, the latencies add up. In a trace, you'll see the spans stacked vertically with no overlap:
GET /checkout/summary [1100ms] ├── HTTP: POST stripe.com/v1/payment_intents [ 420ms] ├── HTTP: POST api.taxjar.com/v2/taxes [ 380ms] ├── HTTP: POST api.shipengine.com/v1/rates [ 290ms]
Each call waits for the previous one to finish, but none of them depend on each other's result. The code typically looks like this:
export const checkoutSummary = api(
{ expose: true, auth: true, method: "GET", path: "/checkout/summary" },
async (req: CheckoutRequest): Promise<CheckoutSummary> => {
const payment = await stripe.paymentIntents.create({ ... });
const tax = await taxjar.taxForOrder({ ... });
const shipping = await shipengine.getRates({ ... });
return { payment, tax, shipping };
}
);
Run them in parallel with Promise.all:
export const checkoutSummary = api(
{ expose: true, auth: true, method: "GET", path: "/checkout/summary" },
async (req: CheckoutRequest): Promise<CheckoutSummary> => {
const [payment, tax, shipping] = await Promise.all([
stripe.paymentIntents.create({ ... }),
taxjar.taxForOrder({ ... }),
shipengine.getRates({ ... }),
]);
return { payment, tax, shipping };
}
);
The trace now shows overlapping spans. Total time drops from the sum of all calls (~1100ms) to the duration of the slowest one (~420ms).
The same pattern happens with internal service calls. If your order endpoint calls the user service, then the inventory service, then the pricing service, you're adding latencies.
GET /orders/preview [ 650ms]
├── users.getProfile [ 120ms]
├── inventory.check [ 280ms]
├── pricing.calculate [ 240ms]
With Encore.ts, service calls look like regular function calls, which makes it easy to overlook that they involve network round trips:
import { users, inventory, pricing } from "~encore/clients";
export const preview = api(
{ expose: true, auth: true, method: "POST", path: "/orders/preview" },
async (req: PreviewRequest): Promise<OrderPreview> => {
const user = await users.getProfile({ id: req.userId });
const stock = await inventory.check({ sku: req.sku });
const price = await pricing.calculate({ sku: req.sku, region: user.region });
return { user, stock, price };
}
);
Here, the pricing call depends on the user's region, so you can't parallelize everything. But the inventory check is independent:
export const preview = api(
{ expose: true, auth: true, method: "POST", path: "/orders/preview" },
async (req: PreviewRequest): Promise<OrderPreview> => {
// Start inventory check immediately (doesn't depend on user data)
const stockPromise = inventory.check({ sku: req.sku });
// Get user profile (needed for pricing)
const user = await users.getProfile({ id: req.userId });
// Now run pricing (needs user.region) and await inventory
const [stock, price] = await Promise.all([
stockPromise,
pricing.calculate({ sku: req.sku, region: user.region }),
]);
return { user, stock, price };
}
);
The trace shows that the inventory check and user lookup now overlap, and pricing runs in parallel with the inventory await. Total time drops from ~650ms to ~360ms.
Cache misses are sneaky because your endpoint is fast most of the time. Then a cache entry expires, and the request hits the slow path. In a trace, this shows up as an occasional spike where a cache lookup returns empty and triggers a database query or computation.
GET /products/456 (p99) [ 580ms]
├── Cache: GET product:456 [ 1ms] ← miss
├── DB: SELECT * FROM products WHERE id = $1 [ 5ms]
├── DB: SELECT * FROM product_images WHERE ... [ 12ms]
├── pricing.calculate [ 350ms]
│ ├── DB: SELECT * FROM pricing_rules ... [ 180ms]
│ └── DB: SELECT * FROM discounts WHERE ... [ 165ms]
├── Cache: SET product:456 [ 2ms]
The median response is 3ms (cache hit). The p99 is 580ms because the pricing calculation involves two slow queries. The fix depends on the situation: you might warm the cache proactively, extend TTLs, or optimize the cold path itself. The trace shows you which cold path is the problem.
With Encore's caching primitives, the pattern looks like this:
import { CacheCluster, CacheKeyspace } from "encore.dev/storage/cache";
const cluster = new CacheCluster("product-cache", {
evictionPolicy: "allkeys-lru",
});
const productCache = new CacheKeyspace<string, ProductDetail>(cluster, {
keyPattern: "product/:id",
defaultExpiry: { ttl: 300 }, // 5 minutes
});
export const getProduct = api(
{ expose: true, method: "GET", path: "/products/:id" },
async ({ id }: { id: string }): Promise<ProductDetail> => {
// Check cache first
const cached = await productCache.get(id);
if (cached) return cached;
// Cache miss: build the full product detail
const product = await db.queryRow<Product>`
SELECT * FROM products WHERE id = ${id}
`;
if (!product) throw APIError.notFound("product not found");
const price = await pricing.calculate({ sku: product.sku });
const detail: ProductDetail = { ...product, price };
await productCache.set(id, detail);
return detail;
}
);
When you see p99 latency spikes in your traces, filter by slow requests first. If the slow ones all show cache misses triggering the same expensive path, that's your target.
Here's where tooling choices matter. Distributed tracing requires instrumenting every database call, HTTP request, and service call. You can do this manually with OpenTelemetry, or use a framework that handles it automatically.
OpenTelemetry is the industry standard for distributed tracing. Setting it up in a Node.js application involves installing several packages, configuring exporters, and adding instrumentation:
// tracing.ts (must be imported before anything else)
import { NodeSDK } from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { Resource } from "@opentelemetry/resources";
const sdk = new NodeSDK({
resource: new Resource({ "service.name": "my-api" }),
traceExporter: new OTLPTraceExporter({
url: "https://your-collector:4318/v1/traces",
}),
instrumentations: [
getNodeAutoInstrumentations({
"@opentelemetry/instrumentation-pg": { enabled: true },
"@opentelemetry/instrumentation-http": { enabled: true },
}),
],
});
sdk.start();
This captures HTTP and PostgreSQL spans automatically. But service-to-service calls, custom business logic spans, and framework-specific operations still need manual instrumentation. You also need a collector (Jaeger, Zipkin, or a managed service like Datadog or Honeycomb) to store and visualize traces.
It works, and the ecosystem is mature. The trade-off is setup time, maintenance, and the risk that some operations go uninstrumented because someone forgot to add a span.
Encore.ts instruments everything automatically. Every API call, database query, service-to-service call, Pub/Sub message, and cache operation becomes a span. It just works out of the box.
import { api } from "encore.dev/api";
import { db } from "./db";
import { users } from "~encore/clients";
export const getOrder = api(
{ expose: true, method: "GET", path: "/orders/:id" },
async ({ id }: { id: string }): Promise<OrderDetail> => {
// Each of these automatically becomes a traced span
const order = await db.queryRow<Order>`
SELECT * FROM orders WHERE id = ${id}
`;
const user = await users.getProfile({ id: order!.userId });
return { ...order!, user };
}
);
Run encore run and open the local dashboard at localhost:9400. Every request shows a full trace with timing breakdowns. No collector to configure, no exporter to set up, no packages to install.
This is the difference that matters when debugging a production issue at 2 AM. You don't need to wonder if the slow operation was instrumented. Everything is.
Knowing how to read a trace is as important as having one. Here's what to look for.
Start with total duration. If a request took 1200ms and your SLA is 500ms, you need to find 700ms of savings.
Follow the critical path. The critical path is the longest chain of sequential operations. Parallel operations don't add to total time, only the slowest branch counts. In a trace with three parallel service calls taking 100ms, 300ms, and 200ms, the critical path through that section is 300ms.
Look for gaps. Time between spans that isn't accounted for means your application is doing CPU work, waiting on something uninstrumented, or blocked. Gaps in Encore traces are rare since everything is instrumented, but they point to pure computation time.
Check span metadata. Database spans show the actual SQL query. HTTP spans show the URL, method, and status code. Service call spans show the request and response payloads. This context tells you not just that something was slow, but why.
Filter by latency. Don't look at the average request. Filter for p95 or p99. Those are the requests your users complain about. Often the median is fine and the tail latency tells a different story: cache misses, lock contention, cold starts, or a specific query plan that only triggers with certain data.
The worst performance bugs hide in the tail. An endpoint that averages 50ms might have a p99 of 2 seconds. A few patterns to watch for:
Bimodal distributions. If your latencies cluster around 30ms and 800ms with nothing in between, you have two code paths: one fast (cache hit, indexed query) and one slow (cache miss, full scan). The trace for a slow request will show you which path it took.
Gradual degradation. If p99 creeps up over weeks, it's usually data growth. A query that was fast on 10,000 rows gets slow at 1,000,000. The trace shows the same query getting slower over time. Add the index, or paginate the query.
Intermittent spikes. Random slow requests often come from external dependencies. A payment provider that's usually 200ms but occasionally takes 3 seconds. The trace shows the external HTTP span with the actual response time. Consider adding timeouts and circuit breakers.
With Encore, you don't set up tracing. You run encore run, make requests, and open the local development dashboard at localhost:9400. Every request has a trace. Click one, and you get the full waterfall.
In production with Encore Cloud, traces are collected automatically across all environments. You can filter by endpoint, latency percentile, status code, or time range. When a user reports a slow request, you find the trace and have your answer in seconds.
The alternative is spending hours adding console.log timestamps, deploying, reproducing the issue, reading logs, adding more timestamps, and repeating. Tracing replaces that cycle with one click.