Skip to content

Spring Boot Tracing 分布式链路追踪完全指南 🔍

什么是分布式链路追踪?为什么需要它?

想象一下,你正在开发一个电商系统,用户下单时需要经过多个服务:用户服务验证身份、商品服务检查库存、订单服务创建订单、支付服务处理付款、物流服务安排配送。当用户反馈"下单很慢"时,你如何快速定位是哪个环节出了问题?

IMPORTANT

分布式链路追踪就是为了解决这个问题而生的!它能够跟踪一个请求在多个服务间的完整调用路径,记录每个环节的耗时、状态和上下文信息。

核心概念理解

  • Trace(链路):一次完整的请求调用链,用 TraceId 标识
  • Span(跨度):链路中的一个操作单元,用 SpanId 标识
  • Baggage(行李):在整个链路中传递的键值对数据

Spring Boot Tracing 核心价值

1. 问题定位能力 🎯

kotlin
// 各个服务独立记录日志,无法关联
@RestController
class OrderController {
    private val logger = LoggerFactory.getLogger(OrderController::class.java)
    
    @PostMapping("/orders")
    fun createOrder(@RequestBody order: Order): ResponseEntity<Order> {
        logger.info("开始创建订单") 
        // 无法知道这个请求的完整调用链
        return ResponseEntity.ok(orderService.create(order))
    }
}
kotlin
// 自动生成链路追踪信息
@RestController
class OrderController {
    private val logger = LoggerFactory.getLogger(OrderController::class.java)
    
    @PostMapping("/orders")
    fun createOrder(@RequestBody order: Order): ResponseEntity<Order> {
        logger.info("开始创建订单") 
        // 日志自动包含 TraceId 和 SpanId,可以追踪完整调用链
        return ResponseEntity.ok(orderService.create(order))
    }
}

2. 性能分析能力 📊

通过 Tracing,你可以清晰地看到:

  • 哪个服务调用耗时最长
  • 哪个数据库查询最慢
  • 网络调用的延迟分布

快速上手:搭建你的第一个 Tracing 应用

第一步:添加依赖

kotlin
dependencies {
    implementation("org.springframework.boot:spring-boot-starter-actuator")
    implementation("org.springframework.boot:spring-boot-starter-web")
    
    // OpenTelemetry + Zipkin 组合
    implementation("io.micrometer:micrometer-tracing-bridge-otel") 
    implementation("io.opentelemetry:opentelemetry-exporter-zipkin") 
}
xml
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    
    <!-- OpenTelemetry + Zipkin 组合 -->
    <dependency> 
        <groupId>io.micrometer</groupId> 
        <artifactId>micrometer-tracing-bridge-otel</artifactId> 
    </dependency> 
    <dependency> 
        <groupId>io.opentelemetry</groupId> 
        <artifactId>opentelemetry-exporter-zipkin</artifactId> 
    </dependency> 
</dependencies>

第二步:配置 Tracing

yaml
management:
  tracing:
    sampling:
      probability: 1.0  # 100% 采样率,生产环境建议 0.1
  zipkin:
    tracing:
      endpoint: http://localhost:9411/api/v2/spans

logging:
  pattern:
    correlation: "[${spring.application.name:},%X{traceId:-},%X{spanId:-}] "
  include-application-name: false
properties
# 100% 采样率,生产环境建议 0.1
management.tracing.sampling.probability=1.0
management.zipkin.tracing.endpoint=http://localhost:9411/api/v2/spans

# 日志关联配置
logging.pattern.correlation=[${spring.application.name:},%X{traceId:-},%X{spanId:-}] 
logging.include-application-name=false

第三步:创建示例应用

kotlin
@RestController
@SpringBootApplication
class TracingDemoApplication {
    
    private val logger = LoggerFactory.getLogger(TracingDemoApplication::class.java)
    
    @GetMapping("/")
    fun home(): String {
        logger.info("处理首页请求") 
        return "Hello Tracing World!"
    }
    
    @GetMapping("/order/{id}")
    fun getOrder(@PathVariable id: String): String {
        logger.info("查询订单: $id") 
        // 模拟业务处理
        Thread.sleep(100)
        return "订单信息: $id"
    }
}

fun main(args: Array<String>) {
    runApplication<TracingDemoApplication>(*args)
}

第四步:启动 Zipkin

bash
# 使用 Docker 快速启动 Zipkin
docker run -d -p 9411:9411 openzipkin/zipkin

第五步:测试效果

  1. 启动应用后访问:http://localhost:8080/order/12345
  2. 查看控制台日志,你会看到:
[tracing-demo,64d3c5c8e6b1a2f3,a1b2c3d4e5f6g7h8] INFO  - 查询订单: 12345
  1. 打开 Zipkin UI:http://localhost:9411,点击 "Run Query" 查看链路信息

TIP

日志中的 [应用名,TraceId,SpanId] 就是关联ID,通过它可以快速定位问题!

支持的 Tracer 实现方案

Spring Boot 支持多种 Tracer 实现,每种都有其适用场景:

OpenTelemetry 系列(推荐)

OpenTelemetry 优势

OpenTelemetry 是 CNCF 的毕业项目,是云原生可观测性的标准,具有更好的生态兼容性和未来发展前景。

1. OpenTelemetry + Zipkin

kotlin
// build.gradle.kts
dependencies {
    implementation("io.micrometer:micrometer-tracing-bridge-otel")
    implementation("io.opentelemetry:opentelemetry-exporter-zipkin")
}

2. OpenTelemetry + OTLP

kotlin
// 适用于 Jaeger、Grafana Tempo 等支持 OTLP 协议的后端
dependencies {
    implementation("io.micrometer:micrometer-tracing-bridge-otel")
    implementation("io.opentelemetry:opentelemetry-exporter-otlp")
}

OpenZipkin Brave 系列

kotlin
// 传统方案,稳定性好
dependencies {
    implementation("io.micrometer:micrometer-tracing-bridge-brave")
    implementation("io.zipkin.reporter2:zipkin-reporter-brave")
}

日志关联:让问题无处遁形

默认关联格式

kotlin
@Service
class OrderService {
    private val logger = LoggerFactory.getLogger(OrderService::class.java)
    
    fun processOrder(orderId: String) {
        logger.info("开始处理订单: $orderId")
        // 输出:[803B448A0489F84084905D3093480352-3425F23BB2432450] INFO - 开始处理订单: 12345
        
        try {
            // 业务逻辑
            validateOrder(orderId)
            calculatePrice(orderId)
            updateInventory(orderId)
        } catch (e: Exception) {
            logger.error("订单处理失败: $orderId", e) 
            // 通过 TraceId 可以快速找到相关的所有日志
        }
    }
}

自定义关联格式

yaml
# 使用 Spring Cloud Sleuth 风格的格式
logging:
  pattern:
    correlation: "[${spring.application.name:},%X{traceId:-},%X{spanId:-}] "
  include-application-name: false

跨服务链路传播

自动传播(推荐)

WARNING

必须使用 Spring Boot 提供的自动配置 Builder,否则链路传播不会生效!

kotlin
@Service
class OrderService(
    private val restTemplate: RestTemplate, 
    private val webClient: WebClient
) {
    // ❌ 错误方式:手动创建,不会自动传播 TraceId
}
kotlin
@Service
class OrderService(
    private val restTemplateBuilder: RestTemplateBuilder, 
    private val webClientBuilder: WebClient.Builder 
) {
    
    private val restTemplate = restTemplateBuilder.build() 
    private val webClient = webClientBuilder.build() 
    
    fun callPaymentService(orderId: String): PaymentResult {
        // ✅ 正确方式:TraceId 会自动传播到下游服务
        return restTemplate.postForObject(
            "http://payment-service/pay",
            PaymentRequest(orderId),
            PaymentResult::class.java
        ) ?: throw RuntimeException("支付失败")
    }
    
    fun callInventoryService(productId: String): Mono<InventoryInfo> {
        // ✅ WebClient 也会自动传播 TraceId
        return webClient.get()
            .uri("http://inventory-service/products/$productId")
            .retrieve()
            .bodyToMono(InventoryInfo::class.java)
    }
}

链路传播流程图

创建自定义 Span

使用 Observation API(推荐)

kotlin
@Service
class OrderService(
    private val observationRegistry: ObservationRegistry
) {
    
    fun processComplexOrder(order: Order) {
        // 创建自定义观察点
        val observation = Observation.createNotStarted("order.process", observationRegistry)
            .lowCardinalityKeyValue("order.type", order.type) 
            .lowCardinalityKeyValue("customer.level", order.customer.level) 
        
        observation.observe {
            // 业务逻辑会被包装在一个 Span 中
            validateOrder(order)
            calculateDiscount(order)
            updateInventory(order)
        }
    }
    
    // 更细粒度的控制
    fun processOrderWithDetailedTracking(order: Order) {
        val observation = Observation.createNotStarted("order.detailed-process", observationRegistry)
        
        observation.start()
        try {
            // 验证阶段
            observation.event(Observation.Event.of("order.validation.start"))
            validateOrder(order)
            observation.event(Observation.Event.of("order.validation.end"))
            
            // 计算阶段
            observation.event(Observation.Event.of("order.calculation.start"))
            val totalPrice = calculateTotalPrice(order)
            observation.highCardinalityKeyValue("order.total", totalPrice.toString()) 
            observation.event(Observation.Event.of("order.calculation.end"))
            
        } catch (e: Exception) {
            observation.error(e) 
            throw e
        } finally {
            observation.stop()
        }
    }
}

使用底层 Tracer API

kotlin
@Service
class OrderService(
    private val tracer: Tracer
) {
    
    fun processOrderWithCustomSpan(orderId: String) {
        val span = tracer.nextSpan()
            .name("order.custom-processing") 
            .tag("order.id", orderId) 
            .start()
        
        try {
            tracer.withSpanInScope(span).use {
                // 在这个作用域内的所有操作都会关联到这个 Span
                performBusinessLogic(orderId)
            }
        } catch (e: Exception) {
            span.tag("error", e.message ?: "Unknown error") 
            throw e
        } finally {
            span.end()
        }
    }
}

Baggage:跨服务传递上下文信息

什么是 Baggage?

NOTE

Baggage 就像是"行李",可以在整个链路中携带一些业务上下文信息,比如用户ID、租户ID、实验标识等。

创建和使用 Baggage

kotlin
@Service
class OrderService(
    private val tracer: Tracer
) {
    
    fun processOrderWithUserContext(orderId: String, userId: String) {
        // 创建 Baggage,携带用户上下文
        tracer.createBaggageInScope("user.id", userId).use { 
            tracer.createBaggageInScope("user.level", getUserLevel(userId)).use { 
                // 在这个作用域内,所有下游服务都能获取到这些信息
                callPaymentService(orderId)
                callInventoryService(orderId)
                callNotificationService(orderId)
            }
        }
    }
    
    fun getCurrentUserFromBaggage(): String? {
        return tracer.getBaggage("user.id")?.get() 
    }
}

配置 Baggage 传播

yaml
management:
  tracing:
    baggage:
      # 通过 HTTP 头传播的字段
      remote-fields: user.id,user.level,tenant.id
      # 传播到 MDC 的字段(可在日志中使用)
      correlation:
        fields: user.id,tenant.id

在日志中使用 Baggage

kotlin
@Service
class PaymentService {
    private val logger = LoggerFactory.getLogger(PaymentService::class.java)
    
    fun processPayment(amount: BigDecimal) {
        // 日志中会自动包含 Baggage 信息
        logger.info("处理支付,金额: $amount")
        // 输出:[TraceId-SpanId] [userId:12345] [tenantId:company-a] INFO - 处理支付,金额: 100.00
    }
}

实战案例:电商订单处理链路

让我们通过一个完整的电商订单处理流程来展示 Tracing 的强大功能:

完整的电商订单处理示例
kotlin
// 订单控制器
@RestController
class OrderController(
    private val orderService: OrderService,
    private val observationRegistry: ObservationRegistry
) {
    private val logger = LoggerFactory.getLogger(OrderController::class.java)
    
    @PostMapping("/orders")
    fun createOrder(@RequestBody orderRequest: OrderRequest): ResponseEntity<OrderResponse> {
        return Observation.createNotStarted("order.create", observationRegistry)
            .lowCardinalityKeyValue("order.type", orderRequest.type)
            .lowCardinalityKeyValue("customer.level", orderRequest.customerLevel)
            .observe {
                logger.info("开始创建订单,客户: ${orderRequest.customerId}")
                val order = orderService.createOrder(orderRequest)
                ResponseEntity.ok(OrderResponse.from(order))
            }
    }
}

// 订单服务
@Service
class OrderService(
    private val paymentService: PaymentService,
    private val inventoryService: InventoryService,
    private val notificationService: NotificationService,
    private val tracer: Tracer,
    private val observationRegistry: ObservationRegistry
) {
    private val logger = LoggerFactory.getLogger(OrderService::class.java)
    
    fun createOrder(request: OrderRequest): Order {
        // 设置用户上下文到 Baggage
        return tracer.createBaggageInScope("user.id", request.customerId).use {
            tracer.createBaggageInScope("user.level", request.customerLevel).use {
                processOrderInternal(request)
            }
        }
    }
    
    private fun processOrderInternal(request: OrderRequest): Order {
        val order = Order.from(request)
        
        // 1. 验证订单
        validateOrder(order)
        
        // 2. 检查库存
        checkInventory(order)
        
        // 3. 处理支付
        processPayment(order)
        
        // 4. 发送通知
        sendNotification(order)
        
        logger.info("订单创建完成: ${order.id}")
        return order
    }
    
    private fun validateOrder(order: Order) {
        Observation.createNotStarted("order.validation", observationRegistry)
            .lowCardinalityKeyValue("order.id", order.id)
            .observe {
                logger.info("验证订单: ${order.id}")
                // 验证逻辑
                Thread.sleep(50) // 模拟处理时间
            }
    }
    
    private fun checkInventory(order: Order) {
        Observation.createNotStarted("order.inventory-check", observationRegistry)
            .observe {
                logger.info("检查库存")
                inventoryService.checkAvailability(order.items)
            }
    }
    
    private fun processPayment(order: Order) {
        Observation.createNotStarted("order.payment", observationRegistry)
            .observe {
                logger.info("处理支付")
                paymentService.processPayment(order.totalAmount, order.customerId)
            }
    }
    
    private fun sendNotification(order: Order) {
        Observation.createNotStarted("order.notification", observationRegistry)
            .observe {
                logger.info("发送通知")
                notificationService.sendOrderConfirmation(order)
            }
    }
}

// 支付服务
@Service
class PaymentService(
    private val webClientBuilder: WebClient.Builder,
    private val tracer: Tracer
) {
    private val logger = LoggerFactory.getLogger(PaymentService::class.java)
    private val webClient = webClientBuilder.build()
    
    fun processPayment(amount: BigDecimal, customerId: String): PaymentResult {
        val userId = tracer.getBaggage("user.id")?.get()
        val userLevel = tracer.getBaggage("user.level")?.get()
        
        logger.info("处理支付 - 用户: $userId, 等级: $userLevel, 金额: $amount")
        
        // 调用外部支付服务
        return webClient.post()
            .uri("http://payment-gateway/payments")
            .bodyValue(PaymentRequest(amount, customerId))
            .retrieve()
            .bodyToMono(PaymentResult::class.java)
            .block() ?: throw PaymentException("支付失败")
    }
}

// 库存服务
@Service  
class InventoryService(
    private val restTemplateBuilder: RestTemplateBuilder
) {
    private val logger = LoggerFactory.getLogger(InventoryService::class.java)
    private val restTemplate = restTemplateBuilder.build()
    
    fun checkAvailability(items: List<OrderItem>): InventoryResult {
        logger.info("检查 ${items.size} 个商品的库存")
        
        // 调用库存服务
        return restTemplate.postForObject(
            "http://inventory-service/check",
            InventoryRequest(items),
            InventoryResult::class.java
        ) ?: throw InventoryException("库存检查失败")
    }
}

运行效果

当你创建一个订单时,在 Zipkin 中你会看到完整的调用链:

order.create (200ms)
├── order.validation (50ms)
├── order.inventory-check (80ms)
│   └── HTTP POST inventory-service/check (75ms)
├── order.payment (60ms)
│   └── HTTP POST payment-gateway/payments (55ms)
└── order.notification (10ms)

日志输出示例:

[order-service,abc123,span1] [userId:12345] INFO - 开始创建订单,客户: 12345
[order-service,abc123,span2] [userId:12345] INFO - 验证订单: ORD-001
[order-service,abc123,span3] [userId:12345] INFO - 检查库存
[inventory-service,abc123,span4] [userId:12345] INFO - 检查 2 个商品的库存
[order-service,abc123,span5] [userId:12345] INFO - 处理支付
[payment-service,abc123,span6] [userId:12345] INFO - 处理支付 - 用户: 12345, 等级: VIP, 金额: 299.00

生产环境最佳实践

1. 采样率配置

WARNING

生产环境不要使用 100% 采样率,会对性能造成严重影响!

yaml
management:
  tracing:
    sampling:
      probability: 0.1  # 10% 采样率,平衡性能和可观测性

2. 敏感信息处理

kotlin
@Service
class OrderService(
    private val observationRegistry: ObservationRegistry
) {
    
    fun processOrder(order: Order) {
        Observation.createNotStarted("order.process", observationRegistry)
            .lowCardinalityKeyValue("order.id", order.id)
            .lowCardinalityKeyValue("order.type", order.type)
            // ❌ 不要记录敏感信息
            // .lowCardinalityKeyValue("credit.card", order.creditCard)
            .observe {
                // 业务逻辑
            }
    }
}

3. 错误处理

kotlin
fun processOrderWithErrorHandling(order: Order) {
    val observation = Observation.createNotStarted("order.process", observationRegistry)
    
    observation.start()
    try {
        // 业务逻辑
        processOrder(order)
    } catch (e: BusinessException) {
        observation.error(e) 
        observation.lowCardinalityKeyValue("error.type", "business") 
        throw e
    } catch (e: SystemException) {
        observation.error(e) 
        observation.lowCardinalityKeyValue("error.type", "system") 
        throw e
    } finally {
        observation.stop()
    }
}

测试环境配置

TIP

在测试环境中,Tracing 组件默认不会自动配置,需要手动启用。

kotlin
@SpringBootTest
@TestPropertySource(properties = [
    "management.tracing.enabled=true",
    "management.tracing.sampling.probability=1.0"
])
class OrderServiceTest {
    
    @Autowired
    private lateinit var orderService: OrderService
    
    @Test
    fun `should create order with tracing`() {
        // 测试代码
    }
}

总结

Spring Boot Tracing 为我们提供了强大的分布式链路追踪能力,它能够:

快速定位问题:通过 TraceId 关联所有相关日志 ✅ 性能分析:清晰展示每个环节的耗时 ✅ 上下文传递:通过 Baggage 传递业务上下文 ✅ 自动化集成:与 Spring Boot 生态无缝集成 ✅ 多种后端支持:支持 Zipkin、Jaeger、Wavefront 等

IMPORTANT

记住:Tracing 不是银弹,它是帮助我们理解和优化分布式系统的重要工具。合理使用采样率、保护敏感信息、做好错误处理,才能真正发挥其价值!

现在就开始为你的微服务添加 Tracing 支持吧! 🚀