RED

The RED method is a monitoring strategy for microservices that focuses on three key metrics:

  • Rate: The number of requests per second
  • Errors: The number of failed requests
  • Duration: The time taken to serve a request

Key points: - Originated from Google's "Four Golden Signals" for monitoring - Aims to provide actionable insights into user experience, unlike traditional methods like USE - Prioritizes meeting service level objectives (SLOs) over hardware metrics - Simplifies automation and standardization across services - Limitations include lack of visibility into resource utilization and potential for false positives

Java Implementation

Use the micrometer library, which is part of Spring Boot Actuator, to expose metrics endpoints for your application.

  • For Rate: Use Counter to track the number of requests.
  • For Errors: Use Counter to track failed requests.
  • For Duration: Use Timer to track request durations.

RED vs USE

The USE (Utilization, Saturation, Errors) method is focused on monitoring hardware resources like CPU, memory, disk, and network, while the RED (Rate, Errors, Duration) method is designed specifically for monitoring microservices and user experience.

  1. USE is hardware-centric, while RED is service-centric. USE metrics like utilization and saturation are more relevant for physical hardware, but less meaningful for monitoring the performance and reliability of individual microservices.
  2. RED focuses on metrics that directly impact user experience, such as request rate, error rate, and request duration/latency. These metrics provide actionable insights into how users are experiencing the service.
  3. In modern microservice architectures, hardware metrics are less important as long as the service level objectives (SLOs) are being met. RED aligns better with this objective by prioritizing SLO-centric metrics.
  4. RED promotes consistency and standardization across services, making it easier to automate monitoring tasks, alerts, and dashboards. USE metrics can vary significantly across different services and resources.
  5. While USE provides visibility into resource utilization, it lacks insight into the actual workload and request traffic handled by the service, which is crucial for microservices monitoring.

Google 4 Golden Signals

The "Four Golden Signals" are a set of four key metrics introduced by Google's Site Reliability Engineering (SRE) team for monitoring distributed systems and services. These signals provide a high-level view of a system's health and performance from the user's perspective. The four golden signals are:

  1. Latency: The time it takes to service a request. This measures the responsiveness of the system as experienced by users.
  2. Traffic: The amount of demand on the system, typically measured in requests per second. This indicates the load and usage patterns.
  3. Errors: The rate of requests that fail. This captures the frequency of failures and reliability issues.
  4. Saturation: The degree to which the system's resources are being utilized, such as CPU, memory, disk, and network. This indicates if the system is approaching its capacity limits.