Files
grafana/docs/sources/visualizations/panels-visualizations/query-transform-data/expression-queries/expression-examples.md
T
Larissa Wandzura 4e89729e2f ran prettier
2026-01-05 11:31:50 -06:00

16 KiB

aliases, labels, menuTitle, title, description, weight, refs
aliases labels menuTitle title description weight refs
products
cloud
enterprise
oss
Expressions examples Expressions examples Practical expression examples from basic to advanced for common monitoring scenarios 55
grafana-expressions grafana-alerting
pattern destination
/docs/grafana/ /docs/grafana/<GRAFANA_VERSION>/visualizations/panels-visualizations/query-transform-data/expression-queries/
pattern destination
/docs/grafana-cloud/ /docs/grafana/<GRAFANA_VERSION>/visualizations/panels-visualizations/query-transform-data/expression-queries/
pattern destination
/docs/grafana/ /docs/grafana/<GRAFANA_VERSION>/alerting/
pattern destination
/docs/grafana-cloud/ /docs/grafana/<GRAFANA_VERSION>/alerting/

Expressions examples

This document provides practical expression examples for common monitoring and visualization scenarios. Examples progress from basic to advanced, showing you how to solve real-world problems with Grafana Expressions.

For foundational concepts, refer to Grafana expressions.

Basic examples

Start here if you're new to expressions. These examples demonstrate fundamental patterns you'll use frequently.

Convert units

Scenario: Your metrics are in bytes, but you want to display them in gigabytes.

Setup:

  • Query A (Prometheus): node_memory_MemTotal_bytes
  • Expression B (Math): $A / 1024 / 1024 / 1024

Result: Memory values converted from bytes to gigabytes.

Variations:

  • Bytes to megabytes: $A / 1024 / 1024
  • Bytes to terabytes: $A / 1024 / 1024 / 1024 / 1024
  • Milliseconds to seconds: $A / 1000
  • Celsius to Fahrenheit: $A * 9 / 5 + 32

Calculate a simple percentage

Scenario: Show what percentage of total memory is being used.

Setup:

  • Query A (Prometheus): node_memory_MemTotal_bytes
  • Query B (Prometheus): node_memory_MemAvailable_bytes
  • Expression C (Math): ($A - $B) / $A * 100

Result: Memory usage as a percentage (0-100).

Tip: This pattern works for any "used / total * 100" calculation.


Get the current (latest) value

Scenario: Display the most recent temperature reading in a stat panel.

Setup:

  • Query A (InfluxDB): Temperature sensor time series data
  • Expression B (Reduce): Input $A, Function: Last

Result: Single number showing the most recent value from the time series.

When to use: Stat panels, gauges, or any visualization that needs a single current value.


Calculate an average over time

Scenario: Show the average CPU usage over the dashboard time range.

Setup:

  • Query A (Prometheus): node_cpu_seconds_total{mode="idle"}
  • Expression B (Reduce): Input $A, Function: Mean

Result: Average CPU value across the selected time range.

Note: Each series (each CPU core, each host) produces its own average, preserving labels.


Find maximum or minimum values

Scenario: Identify the peak memory usage in the last 24 hours.

Setup:

  • Query A (Prometheus): node_memory_MemUsed_bytes (last 24 hours)
  • Expression B (Reduce): Input $A, Function: Max

Result: Peak memory usage value for each host.

Variations:

  • Use Min to find the lowest value
  • Use Count to see how many data points exist

Simple threshold check

Scenario: Create a binary indicator showing whether CPU is above 80%.

Setup:

  • Query A (Prometheus): 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
  • Expression B (Math): $A > 80

Result: Returns 1 when CPU exceeds 80%, 0 otherwise. Useful for alerting or status indicators.


Intermediate examples

These examples combine multiple operations and handle more complex scenarios.

Calculate error rate percentage

Scenario: Display HTTP error rate as a percentage of total requests.

Setup:

  • Query A (Prometheus): sum(rate(http_requests_total{status=~"5.."}[5m]))
  • Query B (Prometheus): sum(rate(http_requests_total[5m]))
  • Expression C (Math): $A / $B * 100

Result: Error rate percentage across all endpoints.

Handling division by zero: If there are zero requests, this produces infinity. To handle this:

  • Expression C (Math): $B > 0 ? ($A / $B * 100) : 0

This returns 0 when there are no requests instead of infinity.


Calculate available disk space

Scenario: Show available disk space as a percentage for capacity planning.

Setup:

  • Query A (Prometheus): node_filesystem_size_bytes{mountpoint="/"}
  • Query B (Prometheus): node_filesystem_avail_bytes{mountpoint="/"}
  • Expression C (Math): $B / $A * 100

Result: Percentage of disk space available (not used) for each host's root filesystem.

For alerting: Add an alert when available space drops below 10%:

  • Expression D (Math): $C < 10

Aggregate across multiple servers

Scenario: Calculate total requests per second across all web servers.

Setup:

  • Query A (Prometheus): rate(http_requests_total{job="webservers"}[5m])
  • Expression B (Reduce): Input $A, Function: Sum

Result: Total requests per second across all servers combined into a single value.

Alternative: To get the average per server instead:

  • Expression B (Reduce): Input $A, Function: Mean

Combine metrics from different data sources

Scenario: Calculate efficiency by dividing application throughput (Prometheus) by infrastructure cost metric (CloudWatch).

Setup:

  • Query A (Prometheus): sum(rate(processed_jobs_total[5m]))
  • Query B (CloudWatch): EC2 instance cost metric
  • Expression C (Resample): Input $A, Resample to: 1m, Downsample: Mean
  • Expression D (Resample): Input $B, Resample to: 1m, Downsample: Mean
  • Expression E (Math): $C / $D

Result: Jobs processed per dollar (or cost unit), showing application efficiency.

Why resample: Different data sources often have different collection intervals. Resampling ensures timestamps align for math operations.


Compare hosts to fleet average

Scenario: Identify hosts performing worse than the fleet average.

Setup:

  • Query A (Prometheus): node_cpu_usage_percent (returns one series per host)
  • Expression B (Reduce): Input $A, Function: Mean (fleet average)
  • Expression C (Math): $A - $B

Result: Each host shows how much above or below the fleet average they are. Positive values indicate above-average CPU usage.


Filter invalid data

Scenario: Calculate average response time, ignoring any null or NaN values in the data.

Setup:

  • Query A (Time series): Response time data with occasional gaps
  • Expression B (Reduce): Input $A, Function: Mean, Mode: Drop non-numeric

Result: Clean average that ignores invalid data points.

Alternative modes:

  • Strict: Returns NaN if any value is invalid (use when data quality matters)
  • Replace non-numeric: Substitutes a specific value for invalid data points

Calculate rate of change

Scenario: Show how quickly memory usage is increasing or decreasing.

Setup:

  • Query A (Prometheus): node_memory_MemUsed_bytes
  • Query B (Prometheus): node_memory_MemUsed_bytes offset 5m
  • Expression C (Math): $A - $B

Result: Bytes of memory change over the last 5 minutes. Positive = increasing, negative = decreasing.

As a percentage change:

  • Expression C (Math): ($A - $B) / $B * 100

Advanced examples

These examples demonstrate complex multi-step calculations and sophisticated alerting patterns.

Compare current value to 24-hour average

Scenario: Highlight when current traffic is significantly above or below the daily norm.

Setup:

  • Query A (Prometheus): sum(rate(http_requests_total[24h])) (historical average)
  • Query B (Prometheus): sum(rate(http_requests_total[5m])) (current rate)
  • Expression C (Reduce): Input $A, Function: Mean
  • Expression D (Math): ($B - $C) / $C * 100

Result: Percentage difference from the 24-hour average. +50 means 50% above normal, -30 means 30% below normal.

Use cases:

  • Detect traffic anomalies
  • Identify unusual load patterns
  • Trigger alerts for significant deviations

Calculate service level indicator (SLI)

Scenario: Calculate the percentage of requests meeting your latency target (under 200ms).

Setup:

  • Query A (Prometheus): sum(rate(http_request_duration_seconds_bucket{le="0.2"}[5m]))
  • Query B (Prometheus): sum(rate(http_request_duration_seconds_count[5m]))
  • Expression C (Math): $A / $B * 100

Result: Percentage of requests completing in under 200ms (your SLI).

For SLO alerting: Alert when SLI drops below 99%:

  • Expression D (Reduce): Input $C, Function: Mean
  • Expression E (Math): $D < 99

Multi-host alerts with reduction

Scenario: Alert when average CPU across all production servers exceeds 80%.

Setup:

  • Query A (Prometheus): 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle",env="production"}[5m])) * 100)
  • Expression B (Reduce): Input $A, Function: Mean (average across all hosts)
  • Expression C (Math): $B > 80

Result: Single alert that fires when the fleet average crosses the threshold, not individual host alerts.

Alternative - alert on any host:

  • Expression B (Reduce): Input $A, Function: Max

This alerts when any single host exceeds 80%.


Calculate compound metrics

Scenario: Calculate Apdex score (Application Performance Index) from response time buckets.

Setup:

  • Query A (Prometheus): sum(rate(http_request_duration_seconds_bucket{le="0.5"}[5m])) (satisfied: <500ms)
  • Query B (Prometheus): sum(rate(http_request_duration_seconds_bucket{le="2.0"}[5m])) (tolerating: <2s)
  • Query C (Prometheus): sum(rate(http_request_duration_seconds_count[5m])) (total)
  • Expression D (Math): ($A + ($B - $A) / 2) / $C

Result: Apdex score from 0 to 1, where 1 is perfect user satisfaction.

Formula explained: Apdex = (Satisfied + Tolerating/2) / Total


Detect sustained conditions

Scenario: Alert only when CPU has been high for at least 5 minutes, not just a brief spike.

Setup:

  • Query A (Prometheus): avg_over_time(node_cpu_usage_percent[5m])
  • Expression B (Reduce): Input $A, Function: Mean
  • Expression C (Math): $B > 80

Result: Alerts only fire when the 5-minute average exceeds the threshold, filtering out brief spikes.

Alternative approach using count:

  • Query A: node_cpu_usage_percent
  • Expression B (Math): $A > 80
  • Expression C (Reduce): Input $B, Function: Sum (counts "1" values where condition is true)
  • Expression D (Math): $C > 5

This alerts when more than 5 data points in the range exceed the threshold.


Correlate metrics across systems

Scenario: Calculate orders processed per database query to measure backend efficiency.

Setup:

  • Query A (Prometheus - App metrics): sum(rate(orders_processed_total[5m]))
  • Query B (MySQL data source): Database queries per second from performance schema
  • Expression C (Resample): Input $A, Resample to: 30s, Downsample: Mean
  • Expression D (Resample): Input $B, Resample to: 30s, Downsample: Mean
  • Expression E (Math): $C / $D

Result: Orders per database query, showing how efficiently your backend processes orders.

Lower is better: Fewer queries per order means more efficient database usage.


Ratio-based alerts with baseline

Scenario: Alert when error ratio increases by more than 2x compared to yesterday's baseline.

Setup:

  • Query A (Prometheus): sum(rate(http_errors_total[5m])) (current errors)
  • Query B (Prometheus): sum(rate(http_requests_total[5m])) (current requests)
  • Query C (Prometheus): sum(rate(http_errors_total[5m] offset 24h)) (yesterday's errors)
  • Query D (Prometheus): sum(rate(http_requests_total[5m] offset 24h)) (yesterday's requests)
  • Expression E (Math): $A / $B (current error rate)
  • Expression F (Math): $C / $D (baseline error rate)
  • Expression G (Reduce): Input $E, Function: Mean
  • Expression H (Reduce): Input $F, Function: Mean
  • Expression I (Math): $G / $H > 2

Result: Alerts when today's error rate is more than double yesterday's rate.

Why this matters: Absolute thresholds don't account for normal variation. Ratio-based alerting adapts to your system's baseline behavior.


Calculate percentile-based thresholds

Scenario: Alert when response time exceeds the 95th percentile baseline.

Setup:

  • Query A (Prometheus): histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
  • Query B (Prometheus): histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[1h])) by (le))
  • Expression C (Reduce): Input $A, Function: Last (current p95)
  • Expression D (Reduce): Input $B, Function: Mean (baseline p95)
  • Expression E (Math): $C > $D * 1.5

Result: Alerts when current p95 latency exceeds 1.5x the hourly baseline.


Weighted scores across metrics

Scenario: Create a composite health score from multiple metrics (CPU, memory, disk, network).

Setup:

  • Query A: CPU usage percentage (0-100)
  • Query B: Memory usage percentage (0-100)
  • Query C: Disk usage percentage (0-100)
  • Query D: Network saturation percentage (0-100)
  • Expression E (Reduce): Input $A, Function: Mean
  • Expression F (Reduce): Input $B, Function: Mean
  • Expression G (Reduce): Input $C, Function: Mean
  • Expression H (Reduce): Input $D, Function: Mean
  • Expression I (Math): ($E * 0.3) + ($F * 0.25) + ($G * 0.25) + ($H * 0.2)

Result: Weighted health score from 0-100 where lower is healthier. Weights reflect relative importance (CPU 30%, Memory 25%, Disk 25%, Network 20%).

For alerting:

  • Expression J (Math): $I > 70

Alert when composite score indicates degraded health.


Conditional logic with fallbacks

Scenario: Show error rate, but display 0 instead of infinity when there are no requests.

Setup:

  • Query A (Prometheus): sum(rate(http_errors_total[5m]))
  • Query B (Prometheus): sum(rate(http_requests_total[5m]))
  • Expression C (Math): $B > 0 ? ($A / $B * 100) : 0

Result: Error rate percentage that safely handles zero-request periods.

Conditional syntax: condition ? value_if_true : value_if_false

More examples:

  • Cap values at 100: $A > 100 ? 100 : $A
  • Convert negative to zero: $A < 0 ? 0 : $A
  • Binary classification: $A > threshold ? 1 : 0

Time-window comparison for trend detection

Scenario: Detect if metrics are trending up or down by comparing recent data to slightly older data.

Setup:

  • Query A (Prometheus): avg_over_time(http_requests_total[5m])
  • Query B (Prometheus): avg_over_time(http_requests_total[5m] offset 10m)
  • Expression C (Reduce): Input $A, Function: Mean
  • Expression D (Reduce): Input $B, Function: Mean
  • Expression E (Math): ($C - $D) / $D * 100

Result: Percentage change in requests between the last 5 minutes and the previous 5-minute window.

Interpretation:

  • Positive values: Traffic increasing
  • Negative values: Traffic decreasing
  • Values near 0: Traffic stable

Use case: Detect rapid traffic changes that might indicate problems or attacks.


Tips for expression development

Follow these best practices to build reliable, maintainable expressions in your visualizations and alerts.

Start simple and iterate

Begin with basic operations and verify each step works before adding complexity. Use the Query Inspector to see intermediate results.

Name your queries clearly

While RefIDs default to letters, you can use descriptive names. Referencing ${errors} and ${total_requests} is clearer than $A and $B.

Test with realistic time ranges

Expressions may behave differently with various time ranges. Test with the same ranges you'll use in production dashboards or alerts.

Handle edge cases

Consider what happens when:

  • Data is missing (NoData)
  • Values are zero (division by zero)
  • Metrics haven't been collected yet
  • Time series have different numbers of points

Document complex expressions

Add panel descriptions or annotation text explaining what complex expressions calculate and why.

Monitor expression performance

If dashboards become slow, check if expressions are processing too much data. Consider moving heavy aggregations to recording rules or data source queries.