Appearance
Spring Boot Metrics 指标监控详解 📊
NOTE
Spring Boot Actuator 通过 Micrometer 提供了强大的应用程序指标监控能力,支持多种监控系统集成,让你的应用程序健康状况一目了然!
🎯 什么是 Spring Boot Metrics?
Spring Boot Metrics 是基于 Micrometer 的应用程序指标收集和监控解决方案。它就像是给你的应用程序安装了一套"健康监测设备",能够实时收集各种运行指标,帮助你:
- 📈 监控应用性能:CPU、内存、线程使用情况
- 🔍 追踪业务指标:请求响应时间、错误率、吞吐量
- 🚨 及时发现问题:通过指标异常快速定位故障
- 📊 数据驱动决策:基于真实数据优化应用性能
核心设计理念
🚀 快速开始
1. 添加依赖
kotlin
dependencies {
implementation("org.springframework.boot:spring-boot-starter-actuator")
implementation("io.micrometer:micrometer-registry-prometheus") // 以 Prometheus 为例
}
xml
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
</dependencies>
2. 基础配置
yaml
management:
endpoints:
web:
exposure:
include: metrics,prometheus
metrics:
tags:
application: my-spring-app
environment: production
properties
management.endpoints.web.exposure.include=metrics,prometheus
management.metrics.tags.application=my-spring-app
management.metrics.tags.environment=production
📊 支持的监控系统
Spring Boot Metrics 支持 18+ 种监控系统,让你可以选择最适合的解决方案:
热门监控系统配置
yaml
management:
endpoints:
web:
exposure:
include: prometheus
# 访问 /actuator/prometheus 获取指标
yaml
management:
datadog:
metrics:
export:
api-key: "YOUR_API_KEY"
uri: "https://api.datadoghq.com"
step: "30s"
yaml
management:
newrelic:
metrics:
export:
api-key: "YOUR_KEY"
account-id: "YOUR_ACCOUNT_ID"
step: "30s"
TIP
选择建议:
- 🔥 Prometheus + Grafana:开源方案,功能强大,社区活跃
- 💼 Datadog:商业方案,开箱即用,功能丰富
- ☁️ 云厂商方案:如 AWS CloudWatch、Azure Monitor
🔧 自定义指标注册
基础指标注册
kotlin
@Component
class OrderMetrics(private val meterRegistry: MeterRegistry) {
private val orderCounter = Counter.builder("orders.created")
.description("订单创建总数")
.tag("type", "online")
.register(meterRegistry)
private val orderProcessingTime = Timer.builder("orders.processing.time")
.description("订单处理时间")
.register(meterRegistry)
fun recordOrderCreated() {
orderCounter.increment()
}
fun recordOrderProcessingTime(duration: Duration) {
orderProcessingTime.record(duration)
}
}
使用 MeterBinder 进行复杂指标注册
kotlin
@Configuration
class CustomMetricsConfiguration {
@Bean
fun queueSizeMetrics(orderQueue: Queue<Order>): MeterBinder {
return MeterBinder { registry ->
// 队列大小指标
Gauge.builder("queue.size")
.description("订单队列当前大小")
.tag("queue", "orders")
.register(registry) { orderQueue.size.toDouble() }
// 队列容量利用率
Gauge.builder("queue.utilization")
.description("队列容量利用率")
.tag("queue", "orders")
.register(registry) {
orderQueue.size.toDouble() / 1000.0 * 100
}
}
}
}
📈 内置指标详解
Spring Boot 自动收集丰富的系统和应用指标:
JVM 指标
jvm.memory.used
- 内存使用量jvm.gc.pause
- GC 暂停时间jvm.threads.live
- 活跃线程数
系统指标
system.cpu.usage
- CPU 使用率disk.free
- 磁盘可用空间process.uptime
- 应用运行时间
Web 请求指标
http.server.requests
- HTTP 请求指标- 包含
uri
,method
,status
,exception
等标签
数据库指标
jdbc.connections.active
- 活跃连接数jdbc.connections.max
- 最大连接数
🎛️ 指标定制化
1. 自定义标签
kotlin
@Configuration
class MetricsCustomization {
@Bean
fun commonTagsCustomizer(): MeterRegistryCustomizer<MeterRegistry> {
return MeterRegistryCustomizer { registry ->
registry.config().commonTags(
"service", "order-service",
"version", "1.0.0",
"region", "us-east-1"
)
}
}
}
2. 指标过滤
kotlin
@Bean
fun meterFilter(): MeterFilter {
return MeterFilter.denyNameStartsWith("tomcat")
.and(MeterFilter.maximumExpectedValue("http.server.requests", Duration.ofSeconds(10)))
}
3. HTTP 请求指标定制
kotlin
@Bean
fun serverRequestObservationConvention(): ServerRequestObservationConvention {
return object : DefaultServerRequestObservationConvention() {
override fun getLowCardinalityKeyValues(context: ServerRequestObservationContext): KeyValues {
return super.getLowCardinalityKeyValues(context)
.and("custom.header", context.carrier.getHeader("X-Custom-Header") ?: "unknown")
}
}
}
🔍 实际业务场景示例
电商订单处理监控
kotlin
@Service
class OrderService(
private val meterRegistry: MeterRegistry,
private val orderRepository: OrderRepository
) {
private val orderCounter = Counter.builder("business.orders.total")
.description("订单总数")
.register(meterRegistry)
private val orderAmountSummary = DistributionSummary.builder("business.orders.amount")
.description("订单金额分布")
.baseUnit("yuan")
.register(meterRegistry)
@Timed(value = "business.orders.processing.time", description = "订单处理时间")
fun processOrder(order: Order): Order {
return try {
val savedOrder = orderRepository.save(order)
// 记录业务指标
orderCounter.increment(
Tags.of(
"status", "success",
"type", order.type.name,
"payment", order.paymentMethod.name
)
)
orderAmountSummary.record(order.amount.toDouble())
savedOrder
} catch (e: Exception) {
orderCounter.increment(Tags.of("status", "failed"))
throw e
}
}
}
缓存性能监控
kotlin
@Component
class CacheMetrics(private val meterRegistry: MeterRegistry) {
private val cacheHits = Counter.builder("cache.hits")
.description("缓存命中次数")
.register(meterRegistry)
private val cacheMisses = Counter.builder("cache.misses")
.description("缓存未命中次数")
.register(meterRegistry)
fun recordCacheHit(cacheName: String) {
cacheHits.increment(Tags.of("cache", cacheName))
}
fun recordCacheMiss(cacheName: String) {
cacheMisses.increment(Tags.of("cache", cacheName))
}
// 计算缓存命中率
@EventListener
@Scheduled(fixedRate = 60000) // 每分钟计算一次
fun calculateHitRate() {
val hitRate = Gauge.builder("cache.hit.rate")
.description("缓存命中率")
.register(meterRegistry) {
val hits = cacheHits.count()
val misses = cacheMisses.count()
if (hits + misses > 0) hits / (hits + misses) * 100 else 0.0
}
}
}
🚨 监控告警配置
Prometheus + AlertManager 配置示例
Prometheus 告警规则配置
yaml
# prometheus-rules.yml
groups:
- name: spring-boot-alerts
rules:
# JVM 内存使用率告警
- alert: HighMemoryUsage
expr: (jvm_memory_used_bytes / jvm_memory_max_bytes) * 100 > 80
for: 2m
labels:
severity: warning
annotations:
summary: "应用内存使用率过高"
description: "{{ $labels.application }} 内存使用率 {{ $value }}%"
# HTTP 错误率告警
- alert: HighErrorRate
expr: (rate(http_server_requests_total{status=~"5.."}[5m]) / rate(http_server_requests_total[5m])) * 100 > 5
for: 1m
labels:
severity: critical
annotations:
summary: "HTTP 错误率过高"
description: "{{ $labels.application }} 5xx 错误率 {{ $value }}%"
🔧 高级配置技巧
1. 条件性指标启用
yaml
management:
metrics:
enable:
# 禁用 Tomcat 指标
tomcat: false
# 只在生产环境启用详细的 JVM 指标
jvm.gc.pause: ${ENABLE_DETAILED_METRICS:false}
2. 指标采样配置
yaml
management:
metrics:
distribution:
# 配置百分位数
percentiles:
http.server.requests: 0.5, 0.9, 0.95, 0.99
# 配置直方图
percentiles-histogram:
http.server.requests: true
# 配置 SLO
slo:
http.server.requests: 100ms, 200ms, 500ms, 1s, 2s
3. 多环境配置
yaml
management:
metrics:
export:
simple:
enabled: true
tags:
environment: development
yaml
management:
metrics:
export:
prometheus:
enabled: true
datadog:
enabled: true
api-key: ${DATADOG_API_KEY}
tags:
environment: production
datacenter: ${DATACENTER:unknown}
📊 指标端点使用
查看所有指标
bash
curl http://localhost:8080/actuator/metrics
查看特定指标
bash
# 查看 JVM 内存使用
curl http://localhost:8080/actuator/metrics/jvm.memory.used
# 按标签过滤
curl "http://localhost:8080/actuator/metrics/http.server.requests?tag=uri:/api/orders"
🎯 最佳实践
IMPORTANT
指标设计原则:
- 高基数标签要谨慎:避免使用用户ID、订单号等作为标签
- 指标命名要规范:使用点分隔的层次结构
- 适度采集:只收集真正有用的指标
- 性能考虑:指标收集不应影响业务性能
推荐的指标分类
业务指标 (Business Metrics)
- 订单数量、收入、用户注册数
- 业务流程成功率、转化率
- 核心业务功能的使用频率
应用指标 (Application Metrics)
- HTTP 请求响应时间、错误率
- 数据库连接池状态
- 缓存命中率
基础设施指标 (Infrastructure Metrics)
- CPU、内存、磁盘使用率
- JVM 垃圾回收情况
- 网络连接状态
🔮 总结
Spring Boot Metrics 为你的应用程序提供了全方位的监控能力。通过合理配置和使用指标监控,你可以:
✅ 提前发现问题:通过指标异常及时发现潜在问题
✅ 优化应用性能:基于真实数据进行性能调优
✅ 提升用户体验:确保应用稳定高效运行
✅ 支持业务决策:通过业务指标支持数据驱动决策
记住,好的监控不仅仅是收集数据,更重要的是将数据转化为可操作的洞察!🚀