Skip to content

Spring Boot Metrics 指标监控详解 📊

NOTE

Spring Boot Actuator 通过 Micrometer 提供了强大的应用程序指标监控能力,支持多种监控系统集成,让你的应用程序健康状况一目了然!

🎯 什么是 Spring Boot Metrics?

Spring Boot Metrics 是基于 Micrometer 的应用程序指标收集和监控解决方案。它就像是给你的应用程序安装了一套"健康监测设备",能够实时收集各种运行指标,帮助你:

  • 📈 监控应用性能:CPU、内存、线程使用情况
  • 🔍 追踪业务指标:请求响应时间、错误率、吞吐量
  • 🚨 及时发现问题:通过指标异常快速定位故障
  • 📊 数据驱动决策:基于真实数据优化应用性能

核心设计理念

🚀 快速开始

1. 添加依赖

kotlin
dependencies {
    implementation("org.springframework.boot:spring-boot-starter-actuator")
    implementation("io.micrometer:micrometer-registry-prometheus") // 以 Prometheus 为例
}
xml
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>
</dependencies>

2. 基础配置

yaml
management:
  endpoints:
    web:
      exposure:
        include: metrics,prometheus
  metrics:
    tags:
      application: my-spring-app
      environment: production
properties
management.endpoints.web.exposure.include=metrics,prometheus
management.metrics.tags.application=my-spring-app
management.metrics.tags.environment=production

📊 支持的监控系统

Spring Boot Metrics 支持 18+ 种监控系统,让你可以选择最适合的解决方案:

热门监控系统配置

yaml
management:
  endpoints:
    web:
      exposure:
        include: prometheus
# 访问 /actuator/prometheus 获取指标
yaml
management:
  datadog:
    metrics:
      export:
        api-key: "YOUR_API_KEY"
        uri: "https://api.datadoghq.com"
        step: "30s"
yaml
management:
  newrelic:
    metrics:
      export:
        api-key: "YOUR_KEY"
        account-id: "YOUR_ACCOUNT_ID"
        step: "30s"

TIP

选择建议

  • 🔥 Prometheus + Grafana:开源方案,功能强大,社区活跃
  • 💼 Datadog:商业方案,开箱即用,功能丰富
  • ☁️ 云厂商方案:如 AWS CloudWatch、Azure Monitor

🔧 自定义指标注册

基础指标注册

kotlin
@Component
class OrderMetrics(private val meterRegistry: MeterRegistry) {
    
    private val orderCounter = Counter.builder("orders.created")
        .description("订单创建总数")
        .tag("type", "online")
        .register(meterRegistry)
    
    private val orderProcessingTime = Timer.builder("orders.processing.time")
        .description("订单处理时间")
        .register(meterRegistry)
    
    fun recordOrderCreated() {
        orderCounter.increment() 
    }
    
    fun recordOrderProcessingTime(duration: Duration) {
        orderProcessingTime.record(duration) 
    }
}

使用 MeterBinder 进行复杂指标注册

kotlin
@Configuration
class CustomMetricsConfiguration {
    
    @Bean
    fun queueSizeMetrics(orderQueue: Queue<Order>): MeterBinder {
        return MeterBinder { registry ->
            // 队列大小指标
            Gauge.builder("queue.size")
                .description("订单队列当前大小")
                .tag("queue", "orders")
                .register(registry) { orderQueue.size.toDouble() } 
            
            // 队列容量利用率
            Gauge.builder("queue.utilization")
                .description("队列容量利用率")
                .tag("queue", "orders")
                .register(registry) { 
                    orderQueue.size.toDouble() / 1000.0 * 100
                }
        }
    }
}

📈 内置指标详解

Spring Boot 自动收集丰富的系统和应用指标:

JVM 指标

  • jvm.memory.used - 内存使用量
  • jvm.gc.pause - GC 暂停时间
  • jvm.threads.live - 活跃线程数

系统指标

  • system.cpu.usage - CPU 使用率
  • disk.free - 磁盘可用空间
  • process.uptime - 应用运行时间

Web 请求指标

  • http.server.requests - HTTP 请求指标
  • 包含 uri, method, status, exception 等标签

数据库指标

  • jdbc.connections.active - 活跃连接数
  • jdbc.connections.max - 最大连接数

🎛️ 指标定制化

1. 自定义标签

kotlin
@Configuration
class MetricsCustomization {
    
    @Bean
    fun commonTagsCustomizer(): MeterRegistryCustomizer<MeterRegistry> {
        return MeterRegistryCustomizer { registry ->
            registry.config().commonTags(
                "service", "order-service",
                "version", "1.0.0",
                "region", "us-east-1"
            )
        }
    }
}

2. 指标过滤

kotlin
@Bean
fun meterFilter(): MeterFilter {
    return MeterFilter.denyNameStartsWith("tomcat") 
        .and(MeterFilter.maximumExpectedValue("http.server.requests", Duration.ofSeconds(10)))
}

3. HTTP 请求指标定制

kotlin
@Bean
fun serverRequestObservationConvention(): ServerRequestObservationConvention {
    return object : DefaultServerRequestObservationConvention() {
        override fun getLowCardinalityKeyValues(context: ServerRequestObservationContext): KeyValues {
            return super.getLowCardinalityKeyValues(context)
                .and("custom.header", context.carrier.getHeader("X-Custom-Header") ?: "unknown") 
        }
    }
}

🔍 实际业务场景示例

电商订单处理监控

kotlin
@Service
class OrderService(
    private val meterRegistry: MeterRegistry,
    private val orderRepository: OrderRepository
) {
    
    private val orderCounter = Counter.builder("business.orders.total")
        .description("订单总数")
        .register(meterRegistry)
    
    private val orderAmountSummary = DistributionSummary.builder("business.orders.amount")
        .description("订单金额分布")
        .baseUnit("yuan")
        .register(meterRegistry)
    
    @Timed(value = "business.orders.processing.time", description = "订单处理时间") 
    fun processOrder(order: Order): Order {
        return try {
            val savedOrder = orderRepository.save(order)
            
            // 记录业务指标
            orderCounter.increment(
                Tags.of(
                    "status", "success",
                    "type", order.type.name,
                    "payment", order.paymentMethod.name
                )
            ) 
            
            orderAmountSummary.record(order.amount.toDouble()) 
            
            savedOrder
        } catch (e: Exception) {
            orderCounter.increment(Tags.of("status", "failed")) 
            throw e
        }
    }
}

缓存性能监控

kotlin
@Component
class CacheMetrics(private val meterRegistry: MeterRegistry) {
    
    private val cacheHits = Counter.builder("cache.hits")
        .description("缓存命中次数")
        .register(meterRegistry)
    
    private val cacheMisses = Counter.builder("cache.misses")
        .description("缓存未命中次数")
        .register(meterRegistry)
    
    fun recordCacheHit(cacheName: String) {
        cacheHits.increment(Tags.of("cache", cacheName)) 
    }
    
    fun recordCacheMiss(cacheName: String) {
        cacheMisses.increment(Tags.of("cache", cacheName)) 
    }
    
    // 计算缓存命中率
    @EventListener
    @Scheduled(fixedRate = 60000) // 每分钟计算一次
    fun calculateHitRate() {
        val hitRate = Gauge.builder("cache.hit.rate")
            .description("缓存命中率")
            .register(meterRegistry) {
                val hits = cacheHits.count()
                val misses = cacheMisses.count()
                if (hits + misses > 0) hits / (hits + misses) * 100 else 0.0
            }
    }
}

🚨 监控告警配置

Prometheus + AlertManager 配置示例

Prometheus 告警规则配置
yaml
# prometheus-rules.yml
groups:
  - name: spring-boot-alerts
    rules:
      # JVM 内存使用率告警
      - alert: HighMemoryUsage
        expr: (jvm_memory_used_bytes / jvm_memory_max_bytes) * 100 > 80
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "应用内存使用率过高"
          description: "{{ $labels.application }} 内存使用率 {{ $value }}%"
      
      # HTTP 错误率告警
      - alert: HighErrorRate
        expr: (rate(http_server_requests_total{status=~"5.."}[5m]) / rate(http_server_requests_total[5m])) * 100 > 5
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "HTTP 错误率过高"
          description: "{{ $labels.application }} 5xx 错误率 {{ $value }}%"

🔧 高级配置技巧

1. 条件性指标启用

yaml
management:
  metrics:
    enable:
      # 禁用 Tomcat 指标
      tomcat: false
      # 只在生产环境启用详细的 JVM 指标
      jvm.gc.pause: ${ENABLE_DETAILED_METRICS:false}

2. 指标采样配置

yaml
management:
  metrics:
    distribution:
      # 配置百分位数
      percentiles:
        http.server.requests: 0.5, 0.9, 0.95, 0.99
      # 配置直方图
      percentiles-histogram:
        http.server.requests: true
      # 配置 SLO
      slo:
        http.server.requests: 100ms, 200ms, 500ms, 1s, 2s

3. 多环境配置

yaml
management:
  metrics:
    export:
      simple:
        enabled: true
    tags:
      environment: development
yaml
management:
  metrics:
    export:
      prometheus:
        enabled: true
      datadog:
        enabled: true
        api-key: ${DATADOG_API_KEY}
    tags:
      environment: production
      datacenter: ${DATACENTER:unknown}

📊 指标端点使用

查看所有指标

bash
curl http://localhost:8080/actuator/metrics

查看特定指标

bash
# 查看 JVM 内存使用
curl http://localhost:8080/actuator/metrics/jvm.memory.used

# 按标签过滤
curl "http://localhost:8080/actuator/metrics/http.server.requests?tag=uri:/api/orders"

🎯 最佳实践

IMPORTANT

指标设计原则

  1. 高基数标签要谨慎:避免使用用户ID、订单号等作为标签
  2. 指标命名要规范:使用点分隔的层次结构
  3. 适度采集:只收集真正有用的指标
  4. 性能考虑:指标收集不应影响业务性能

推荐的指标分类

业务指标 (Business Metrics)

  • 订单数量、收入、用户注册数
  • 业务流程成功率、转化率
  • 核心业务功能的使用频率

应用指标 (Application Metrics)

  • HTTP 请求响应时间、错误率
  • 数据库连接池状态
  • 缓存命中率

基础设施指标 (Infrastructure Metrics)

  • CPU、内存、磁盘使用率
  • JVM 垃圾回收情况
  • 网络连接状态

🔮 总结

Spring Boot Metrics 为你的应用程序提供了全方位的监控能力。通过合理配置和使用指标监控,你可以:

提前发现问题:通过指标异常及时发现潜在问题
优化应用性能:基于真实数据进行性能调优
提升用户体验:确保应用稳定高效运行
支持业务决策:通过业务指标支持数据驱动决策

记住,好的监控不仅仅是收集数据,更重要的是将数据转化为可操作的洞察!🚀