Appearance
Spring Boot Tracing 分布式链路追踪完全指南 🔍
什么是分布式链路追踪?为什么需要它?
想象一下,你正在开发一个电商系统,用户下单时需要经过多个服务:用户服务验证身份、商品服务检查库存、订单服务创建订单、支付服务处理付款、物流服务安排配送。当用户反馈"下单很慢"时,你如何快速定位是哪个环节出了问题?
IMPORTANT
分布式链路追踪就是为了解决这个问题而生的!它能够跟踪一个请求在多个服务间的完整调用路径,记录每个环节的耗时、状态和上下文信息。
核心概念理解
- Trace(链路):一次完整的请求调用链,用 TraceId 标识
- Span(跨度):链路中的一个操作单元,用 SpanId 标识
- Baggage(行李):在整个链路中传递的键值对数据
Spring Boot Tracing 核心价值
1. 问题定位能力 🎯
kotlin
// 各个服务独立记录日志,无法关联
@RestController
class OrderController {
private val logger = LoggerFactory.getLogger(OrderController::class.java)
@PostMapping("/orders")
fun createOrder(@RequestBody order: Order): ResponseEntity<Order> {
logger.info("开始创建订单")
// 无法知道这个请求的完整调用链
return ResponseEntity.ok(orderService.create(order))
}
}
kotlin
// 自动生成链路追踪信息
@RestController
class OrderController {
private val logger = LoggerFactory.getLogger(OrderController::class.java)
@PostMapping("/orders")
fun createOrder(@RequestBody order: Order): ResponseEntity<Order> {
logger.info("开始创建订单")
// 日志自动包含 TraceId 和 SpanId,可以追踪完整调用链
return ResponseEntity.ok(orderService.create(order))
}
}
2. 性能分析能力 📊
通过 Tracing,你可以清晰地看到:
- 哪个服务调用耗时最长
- 哪个数据库查询最慢
- 网络调用的延迟分布
快速上手:搭建你的第一个 Tracing 应用
第一步:添加依赖
kotlin
dependencies {
implementation("org.springframework.boot:spring-boot-starter-actuator")
implementation("org.springframework.boot:spring-boot-starter-web")
// OpenTelemetry + Zipkin 组合
implementation("io.micrometer:micrometer-tracing-bridge-otel")
implementation("io.opentelemetry:opentelemetry-exporter-zipkin")
}
xml
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- OpenTelemetry + Zipkin 组合 -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-zipkin</artifactId>
</dependency>
</dependencies>
第二步:配置 Tracing
yaml
management:
tracing:
sampling:
probability: 1.0 # 100% 采样率,生产环境建议 0.1
zipkin:
tracing:
endpoint: http://localhost:9411/api/v2/spans
logging:
pattern:
correlation: "[${spring.application.name:},%X{traceId:-},%X{spanId:-}] "
include-application-name: false
properties
# 100% 采样率,生产环境建议 0.1
management.tracing.sampling.probability=1.0
management.zipkin.tracing.endpoint=http://localhost:9411/api/v2/spans
# 日志关联配置
logging.pattern.correlation=[${spring.application.name:},%X{traceId:-},%X{spanId:-}]
logging.include-application-name=false
第三步:创建示例应用
kotlin
@RestController
@SpringBootApplication
class TracingDemoApplication {
private val logger = LoggerFactory.getLogger(TracingDemoApplication::class.java)
@GetMapping("/")
fun home(): String {
logger.info("处理首页请求")
return "Hello Tracing World!"
}
@GetMapping("/order/{id}")
fun getOrder(@PathVariable id: String): String {
logger.info("查询订单: $id")
// 模拟业务处理
Thread.sleep(100)
return "订单信息: $id"
}
}
fun main(args: Array<String>) {
runApplication<TracingDemoApplication>(*args)
}
第四步:启动 Zipkin
bash
# 使用 Docker 快速启动 Zipkin
docker run -d -p 9411:9411 openzipkin/zipkin
第五步:测试效果
- 启动应用后访问:
http://localhost:8080/order/12345
- 查看控制台日志,你会看到:
[tracing-demo,64d3c5c8e6b1a2f3,a1b2c3d4e5f6g7h8] INFO - 查询订单: 12345
- 打开 Zipkin UI:
http://localhost:9411
,点击 "Run Query" 查看链路信息
TIP
日志中的 [应用名,TraceId,SpanId]
就是关联ID,通过它可以快速定位问题!
支持的 Tracer 实现方案
Spring Boot 支持多种 Tracer 实现,每种都有其适用场景:
OpenTelemetry 系列(推荐)
OpenTelemetry 优势
OpenTelemetry 是 CNCF 的毕业项目,是云原生可观测性的标准,具有更好的生态兼容性和未来发展前景。
1. OpenTelemetry + Zipkin
kotlin
// build.gradle.kts
dependencies {
implementation("io.micrometer:micrometer-tracing-bridge-otel")
implementation("io.opentelemetry:opentelemetry-exporter-zipkin")
}
2. OpenTelemetry + OTLP
kotlin
// 适用于 Jaeger、Grafana Tempo 等支持 OTLP 协议的后端
dependencies {
implementation("io.micrometer:micrometer-tracing-bridge-otel")
implementation("io.opentelemetry:opentelemetry-exporter-otlp")
}
OpenZipkin Brave 系列
kotlin
// 传统方案,稳定性好
dependencies {
implementation("io.micrometer:micrometer-tracing-bridge-brave")
implementation("io.zipkin.reporter2:zipkin-reporter-brave")
}
日志关联:让问题无处遁形
默认关联格式
kotlin
@Service
class OrderService {
private val logger = LoggerFactory.getLogger(OrderService::class.java)
fun processOrder(orderId: String) {
logger.info("开始处理订单: $orderId")
// 输出:[803B448A0489F84084905D3093480352-3425F23BB2432450] INFO - 开始处理订单: 12345
try {
// 业务逻辑
validateOrder(orderId)
calculatePrice(orderId)
updateInventory(orderId)
} catch (e: Exception) {
logger.error("订单处理失败: $orderId", e)
// 通过 TraceId 可以快速找到相关的所有日志
}
}
}
自定义关联格式
yaml
# 使用 Spring Cloud Sleuth 风格的格式
logging:
pattern:
correlation: "[${spring.application.name:},%X{traceId:-},%X{spanId:-}] "
include-application-name: false
跨服务链路传播
自动传播(推荐)
WARNING
必须使用 Spring Boot 提供的自动配置 Builder,否则链路传播不会生效!
kotlin
@Service
class OrderService(
private val restTemplate: RestTemplate,
private val webClient: WebClient
) {
// ❌ 错误方式:手动创建,不会自动传播 TraceId
}
kotlin
@Service
class OrderService(
private val restTemplateBuilder: RestTemplateBuilder,
private val webClientBuilder: WebClient.Builder
) {
private val restTemplate = restTemplateBuilder.build()
private val webClient = webClientBuilder.build()
fun callPaymentService(orderId: String): PaymentResult {
// ✅ 正确方式:TraceId 会自动传播到下游服务
return restTemplate.postForObject(
"http://payment-service/pay",
PaymentRequest(orderId),
PaymentResult::class.java
) ?: throw RuntimeException("支付失败")
}
fun callInventoryService(productId: String): Mono<InventoryInfo> {
// ✅ WebClient 也会自动传播 TraceId
return webClient.get()
.uri("http://inventory-service/products/$productId")
.retrieve()
.bodyToMono(InventoryInfo::class.java)
}
}
链路传播流程图
创建自定义 Span
使用 Observation API(推荐)
kotlin
@Service
class OrderService(
private val observationRegistry: ObservationRegistry
) {
fun processComplexOrder(order: Order) {
// 创建自定义观察点
val observation = Observation.createNotStarted("order.process", observationRegistry)
.lowCardinalityKeyValue("order.type", order.type)
.lowCardinalityKeyValue("customer.level", order.customer.level)
observation.observe {
// 业务逻辑会被包装在一个 Span 中
validateOrder(order)
calculateDiscount(order)
updateInventory(order)
}
}
// 更细粒度的控制
fun processOrderWithDetailedTracking(order: Order) {
val observation = Observation.createNotStarted("order.detailed-process", observationRegistry)
observation.start()
try {
// 验证阶段
observation.event(Observation.Event.of("order.validation.start"))
validateOrder(order)
observation.event(Observation.Event.of("order.validation.end"))
// 计算阶段
observation.event(Observation.Event.of("order.calculation.start"))
val totalPrice = calculateTotalPrice(order)
observation.highCardinalityKeyValue("order.total", totalPrice.toString())
observation.event(Observation.Event.of("order.calculation.end"))
} catch (e: Exception) {
observation.error(e)
throw e
} finally {
observation.stop()
}
}
}
使用底层 Tracer API
kotlin
@Service
class OrderService(
private val tracer: Tracer
) {
fun processOrderWithCustomSpan(orderId: String) {
val span = tracer.nextSpan()
.name("order.custom-processing")
.tag("order.id", orderId)
.start()
try {
tracer.withSpanInScope(span).use {
// 在这个作用域内的所有操作都会关联到这个 Span
performBusinessLogic(orderId)
}
} catch (e: Exception) {
span.tag("error", e.message ?: "Unknown error")
throw e
} finally {
span.end()
}
}
}
Baggage:跨服务传递上下文信息
什么是 Baggage?
NOTE
Baggage 就像是"行李",可以在整个链路中携带一些业务上下文信息,比如用户ID、租户ID、实验标识等。
创建和使用 Baggage
kotlin
@Service
class OrderService(
private val tracer: Tracer
) {
fun processOrderWithUserContext(orderId: String, userId: String) {
// 创建 Baggage,携带用户上下文
tracer.createBaggageInScope("user.id", userId).use {
tracer.createBaggageInScope("user.level", getUserLevel(userId)).use {
// 在这个作用域内,所有下游服务都能获取到这些信息
callPaymentService(orderId)
callInventoryService(orderId)
callNotificationService(orderId)
}
}
}
fun getCurrentUserFromBaggage(): String? {
return tracer.getBaggage("user.id")?.get()
}
}
配置 Baggage 传播
yaml
management:
tracing:
baggage:
# 通过 HTTP 头传播的字段
remote-fields: user.id,user.level,tenant.id
# 传播到 MDC 的字段(可在日志中使用)
correlation:
fields: user.id,tenant.id
在日志中使用 Baggage
kotlin
@Service
class PaymentService {
private val logger = LoggerFactory.getLogger(PaymentService::class.java)
fun processPayment(amount: BigDecimal) {
// 日志中会自动包含 Baggage 信息
logger.info("处理支付,金额: $amount")
// 输出:[TraceId-SpanId] [userId:12345] [tenantId:company-a] INFO - 处理支付,金额: 100.00
}
}
实战案例:电商订单处理链路
让我们通过一个完整的电商订单处理流程来展示 Tracing 的强大功能:
完整的电商订单处理示例
kotlin
// 订单控制器
@RestController
class OrderController(
private val orderService: OrderService,
private val observationRegistry: ObservationRegistry
) {
private val logger = LoggerFactory.getLogger(OrderController::class.java)
@PostMapping("/orders")
fun createOrder(@RequestBody orderRequest: OrderRequest): ResponseEntity<OrderResponse> {
return Observation.createNotStarted("order.create", observationRegistry)
.lowCardinalityKeyValue("order.type", orderRequest.type)
.lowCardinalityKeyValue("customer.level", orderRequest.customerLevel)
.observe {
logger.info("开始创建订单,客户: ${orderRequest.customerId}")
val order = orderService.createOrder(orderRequest)
ResponseEntity.ok(OrderResponse.from(order))
}
}
}
// 订单服务
@Service
class OrderService(
private val paymentService: PaymentService,
private val inventoryService: InventoryService,
private val notificationService: NotificationService,
private val tracer: Tracer,
private val observationRegistry: ObservationRegistry
) {
private val logger = LoggerFactory.getLogger(OrderService::class.java)
fun createOrder(request: OrderRequest): Order {
// 设置用户上下文到 Baggage
return tracer.createBaggageInScope("user.id", request.customerId).use {
tracer.createBaggageInScope("user.level", request.customerLevel).use {
processOrderInternal(request)
}
}
}
private fun processOrderInternal(request: OrderRequest): Order {
val order = Order.from(request)
// 1. 验证订单
validateOrder(order)
// 2. 检查库存
checkInventory(order)
// 3. 处理支付
processPayment(order)
// 4. 发送通知
sendNotification(order)
logger.info("订单创建完成: ${order.id}")
return order
}
private fun validateOrder(order: Order) {
Observation.createNotStarted("order.validation", observationRegistry)
.lowCardinalityKeyValue("order.id", order.id)
.observe {
logger.info("验证订单: ${order.id}")
// 验证逻辑
Thread.sleep(50) // 模拟处理时间
}
}
private fun checkInventory(order: Order) {
Observation.createNotStarted("order.inventory-check", observationRegistry)
.observe {
logger.info("检查库存")
inventoryService.checkAvailability(order.items)
}
}
private fun processPayment(order: Order) {
Observation.createNotStarted("order.payment", observationRegistry)
.observe {
logger.info("处理支付")
paymentService.processPayment(order.totalAmount, order.customerId)
}
}
private fun sendNotification(order: Order) {
Observation.createNotStarted("order.notification", observationRegistry)
.observe {
logger.info("发送通知")
notificationService.sendOrderConfirmation(order)
}
}
}
// 支付服务
@Service
class PaymentService(
private val webClientBuilder: WebClient.Builder,
private val tracer: Tracer
) {
private val logger = LoggerFactory.getLogger(PaymentService::class.java)
private val webClient = webClientBuilder.build()
fun processPayment(amount: BigDecimal, customerId: String): PaymentResult {
val userId = tracer.getBaggage("user.id")?.get()
val userLevel = tracer.getBaggage("user.level")?.get()
logger.info("处理支付 - 用户: $userId, 等级: $userLevel, 金额: $amount")
// 调用外部支付服务
return webClient.post()
.uri("http://payment-gateway/payments")
.bodyValue(PaymentRequest(amount, customerId))
.retrieve()
.bodyToMono(PaymentResult::class.java)
.block() ?: throw PaymentException("支付失败")
}
}
// 库存服务
@Service
class InventoryService(
private val restTemplateBuilder: RestTemplateBuilder
) {
private val logger = LoggerFactory.getLogger(InventoryService::class.java)
private val restTemplate = restTemplateBuilder.build()
fun checkAvailability(items: List<OrderItem>): InventoryResult {
logger.info("检查 ${items.size} 个商品的库存")
// 调用库存服务
return restTemplate.postForObject(
"http://inventory-service/check",
InventoryRequest(items),
InventoryResult::class.java
) ?: throw InventoryException("库存检查失败")
}
}
运行效果
当你创建一个订单时,在 Zipkin 中你会看到完整的调用链:
order.create (200ms)
├── order.validation (50ms)
├── order.inventory-check (80ms)
│ └── HTTP POST inventory-service/check (75ms)
├── order.payment (60ms)
│ └── HTTP POST payment-gateway/payments (55ms)
└── order.notification (10ms)
日志输出示例:
[order-service,abc123,span1] [userId:12345] INFO - 开始创建订单,客户: 12345
[order-service,abc123,span2] [userId:12345] INFO - 验证订单: ORD-001
[order-service,abc123,span3] [userId:12345] INFO - 检查库存
[inventory-service,abc123,span4] [userId:12345] INFO - 检查 2 个商品的库存
[order-service,abc123,span5] [userId:12345] INFO - 处理支付
[payment-service,abc123,span6] [userId:12345] INFO - 处理支付 - 用户: 12345, 等级: VIP, 金额: 299.00
生产环境最佳实践
1. 采样率配置
WARNING
生产环境不要使用 100% 采样率,会对性能造成严重影响!
yaml
management:
tracing:
sampling:
probability: 0.1 # 10% 采样率,平衡性能和可观测性
2. 敏感信息处理
kotlin
@Service
class OrderService(
private val observationRegistry: ObservationRegistry
) {
fun processOrder(order: Order) {
Observation.createNotStarted("order.process", observationRegistry)
.lowCardinalityKeyValue("order.id", order.id)
.lowCardinalityKeyValue("order.type", order.type)
// ❌ 不要记录敏感信息
// .lowCardinalityKeyValue("credit.card", order.creditCard)
.observe {
// 业务逻辑
}
}
}
3. 错误处理
kotlin
fun processOrderWithErrorHandling(order: Order) {
val observation = Observation.createNotStarted("order.process", observationRegistry)
observation.start()
try {
// 业务逻辑
processOrder(order)
} catch (e: BusinessException) {
observation.error(e)
observation.lowCardinalityKeyValue("error.type", "business")
throw e
} catch (e: SystemException) {
observation.error(e)
observation.lowCardinalityKeyValue("error.type", "system")
throw e
} finally {
observation.stop()
}
}
测试环境配置
TIP
在测试环境中,Tracing 组件默认不会自动配置,需要手动启用。
kotlin
@SpringBootTest
@TestPropertySource(properties = [
"management.tracing.enabled=true",
"management.tracing.sampling.probability=1.0"
])
class OrderServiceTest {
@Autowired
private lateinit var orderService: OrderService
@Test
fun `should create order with tracing`() {
// 测试代码
}
}
总结
Spring Boot Tracing 为我们提供了强大的分布式链路追踪能力,它能够:
✅ 快速定位问题:通过 TraceId 关联所有相关日志 ✅ 性能分析:清晰展示每个环节的耗时 ✅ 上下文传递:通过 Baggage 传递业务上下文 ✅ 自动化集成:与 Spring Boot 生态无缝集成 ✅ 多种后端支持:支持 Zipkin、Jaeger、Wavefront 等
IMPORTANT
记住:Tracing 不是银弹,它是帮助我们理解和优化分布式系统的重要工具。合理使用采样率、保护敏感信息、做好错误处理,才能真正发挥其价值!
现在就开始为你的微服务添加 Tracing 支持吧! 🚀