[release-12.3.0] Restructure As code and developer resources (#113969)

Co-authored-by: Roberto Jiménez Sánchez <roberto.jimenez@grafana.com>
Co-authored-by: Anna Urbiztondo <anna.urbiztondo@grafana.com>
This commit is contained in:
Jack Baldry
2025-11-16 19:06:56 +00:00
committed by GitHub
parent 0898ec6045
commit 047da1442e
96 changed files with 8186 additions and 230 deletions
@@ -0,0 +1,73 @@
---
cards:
items:
- description: Learn how to set up Terraform provider and configure your environment for managing Knowledge Graph resources.
height: 24
href: ./getting-started/
title: Get started with Terraform
- description: Configure notification alerts to manage how alerts are processed and routed in your Knowledge Graph.
height: 24
href: ./notification-alerts/
title: Notification alerts
- description: Define suppression rules to temporarily disable specific alerts during maintenance windows or testing.
height: 24
href: ./suppressed-assertions/
title: Suppressed assertions
- description: Create custom entity models and define how entities are discovered based on Prometheus queries.
height: 24
href: ./custom-model-rules/
title: Custom model rules
- description: Configure log data correlation with entities using data source mappings and filtering options.
height: 24
href: ./log-configurations/
title: Log configurations
- description: Set custom thresholds for request, resource, and health assertions to monitor your services.
height: 24
href: ./thresholds/
title: Thresholds
- description: Configure knowledge graph SLOs with entity-centric monitoring and RCA workbench integration for root cause analysis.
height: 24
href: ./knowledge-graph-slo/
title: Knowledge graph SLOs
title_class: pt-0 lh-1
description: Manage Grafana Cloud Knowledge Graph using Terraform
hero:
description: Use Terraform to manage Grafana Cloud Knowledge Graph resources as code. Configure notification alerts, suppressed assertions, custom model rules, log configurations, and threshold configurations using infrastructure as code best practices.
level: 1
title: Manage Knowledge Graph using Terraform
menuTitle: Manage Knowledge Graph in Grafana Cloud using Terraform
title: Manage Knowledge Graph in Grafana Cloud using Terraform
weight: 130
keywords:
- Infrastructure as Code
- Quickstart
- Grafana Cloud
- Terraform
- Knowledge Graph
- Alert Configuration
- Suppressed Assertions
- Custom Model Rules
- Log Configuration
- Threshold Configuration
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/
---
{{< docs/hero-simple key="hero" >}}
---
## Overview
Terraform enables you to manage [Grafana Cloud Knowledge Graph](/docs/grafana-cloud/knowledge-graph/) resources using infrastructure as code. With Terraform, you can define, version control, and deploy Knowledge Graph configurations including alert rules, suppression policies, entity models, log correlations, and thresholds.
## Explore
{{< card-grid key="cards" type="simple" >}}
---
## Related resources
- [Grafana Terraform Provider Documentation](https://registry.terraform.io/providers/grafana/grafana/latest/docs)
- [Knowledge Graph Documentation](/docs/grafana-cloud/knowledge-graph/)
- [Terraform Best Practices](https://www.terraform.io/docs/cloud/guides/recommended-practices/index.html)
@@ -0,0 +1,431 @@
---
description: Define custom entity models for Knowledge Graph using Terraform
menuTitle: Custom model rules
title: Create custom model rules using Terraform
weight: 400
keywords:
- Terraform
- Knowledge Graph
- Custom Model Rules
- Entity Models
- Prometheus
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/custom-model-rules/
---
# Create custom model rules using Terraform
Custom model rules in [Knowledge Graph](/docs/grafana-cloud/knowledge-graph/) allow you to define how entities are discovered and modeled based on Prometheus queries. These rules enable you to create custom entity types, define their relationships, and specify how they should be enriched with additional data.
For information about managing entities and relations in the Knowledge Graph UI, refer to [Manage entities and relations](/docs/grafana-cloud/knowledge-graph/configure/manage-entities-relations/).
## Basic custom model rules
Create a file named `custom-model-rules.tf` and add the following:
```terraform
# Basic custom model rule for services
resource "grafana_asserts_custom_model_rules" "basic_service" {
provider = grafana.asserts
name = "basic-service-model"
rules {
entity {
type = "Service"
name = "service"
defined_by {
query = "up{job!=''}"
label_values = {
service = "job"
}
literals = {
_source = "up_query"
}
}
}
}
}
```
## Advanced service model with scope and lookup
Define service entities with environment scoping and relationship mappings:
```terraform
# Advanced service model with environment scoping
resource "grafana_asserts_custom_model_rules" "advanced_service" {
provider = grafana.asserts
name = "advanced-service-model"
rules {
entity {
type = "Service"
name = "workload | service | job"
scope = {
namespace = "namespace"
env = "asserts_env"
site = "asserts_site"
}
lookup = {
workload = "workload | deployment | statefulset | daemonset | replicaset"
service = "service"
job = "job"
proxy_job = "job"
}
defined_by {
query = "up{job!='', asserts_env!=''}"
label_values = {
service = "service"
job = "job"
workload = "workload"
namespace = "namespace"
}
literals = {
_source = "up_with_workload"
}
}
defined_by {
query = "up{job='maintenance'}"
disabled = true
}
}
}
}
```
## Multi-entity model configuration
Define multiple entity types in a single configuration:
```terraform
# Multiple entity types in a single model
resource "grafana_asserts_custom_model_rules" "multi_entity" {
provider = grafana.asserts
name = "kubernetes-entities"
rules {
# Service entity
entity {
type = "Service"
name = "service"
scope = {
namespace = "namespace"
cluster = "cluster"
}
defined_by {
query = "up{service!=''}"
label_values = {
service = "service"
namespace = "namespace"
cluster = "cluster"
}
}
}
# Pod entity
entity {
type = "Pod"
name = "Pod"
scope = {
namespace = "namespace"
cluster = "cluster"
}
lookup = {
service = "service"
workload = "workload"
}
defined_by {
query = "kube_pod_info{pod!=''}"
label_values = {
Pod = "pod"
namespace = "namespace"
cluster = "cluster"
service = "service"
}
literals = {
_entity_type = "Pod"
}
}
}
# Namespace entity
entity {
type = "Namespace"
name = "namespace"
scope = {
cluster = "cluster"
}
defined_by {
query = "kube_namespace_status_phase{namespace!=''}"
label_values = {
namespace = "namespace"
cluster = "cluster"
}
}
}
}
}
```
## Complex entity with enrichment
Create service entities with multiple data sources and enrichment:
```terraform
# Service entity with enrichment from multiple sources
resource "grafana_asserts_custom_model_rules" "enriched_service" {
provider = grafana.asserts
name = "enriched-service-model"
rules {
entity {
type = "Service"
name = "service"
enriched_by = [
"prometheus_metrics",
"kubernetes_metadata",
"application_logs"
]
scope = {
environment = "asserts_env"
region = "asserts_site"
team = "team"
}
lookup = {
deployment = "workload"
Pod = "pod"
container = "container"
}
# Primary definition from service up metrics
defined_by {
query = "up{service!='', asserts_env!=''}"
label_values = {
service = "service"
environment = "asserts_env"
region = "asserts_site"
team = "team"
}
literals = {
_primary_source = "service_up"
}
}
# Secondary definition from application metrics
defined_by {
query = "http_requests_total{service!=''}"
label_values = {
service = "service"
environment = "environment"
version = "version"
}
literals = {
_secondary_source = "http_metrics"
}
}
# Disabled definition for testing
defined_by {
query = "test_metric{service!=''}"
disabled = true
}
}
}
}
```
## Database and infrastructure entities
Define database and infrastructure entity models:
```terraform
# Database and infrastructure entity models
resource "grafana_asserts_custom_model_rules" "infrastructure" {
provider = grafana.asserts
name = "infrastructure-entities"
rules {
# Database entity
entity {
type = "Database"
name = "database_instance"
scope = {
environment = "env"
region = "region"
}
lookup = {
host = "instance"
port = "port"
db_name = "database"
}
defined_by {
query = "mysql_up{instance!=''}"
label_values = {
database_instance = "instance"
database = "database"
env = "environment"
region = "region"
}
literals = {
_db_type = "mysql"
}
metric_value = "1"
}
defined_by {
query = "postgres_up{instance!=''}"
label_values = {
database_instance = "instance"
database = "datname"
env = "environment"
}
literals = {
_db_type = "postgresql"
}
}
}
# Load balancer entity
entity {
type = "LoadBalancer"
name = "lb_instance"
scope = {
environment = "env"
}
defined_by {
query = "haproxy_up{proxy!=''}"
label_values = {
lb_instance = "instance"
proxy = "proxy"
env = "environment"
}
literals = {
_lb_type = "haproxy"
}
}
}
}
}
```
## Resource reference
### `grafana_asserts_custom_model_rules`
Manage Knowledge Graph custom model rules through the Grafana API. This resource allows you to define custom entity models based on Prometheus queries with advanced mapping and enrichment capabilities.
#### Arguments
| Name | Type | Required | Description |
| ------- | -------------- | -------- | -------------------------------------------------------------------------------------------------------- |
| `name` | `string` | Yes | The name of the custom model rules. This field is immutable and forces recreation if changed. |
| `rules` | `list(object)` | Yes | The rules configuration containing entity definitions. Refer to [rules block](#rules-block) for details. |
#### Rules block
Each `rules` block supports the following:
| Name | Type | Required | Description |
| -------- | -------------- | -------- | ------------------------------------------------------------------------------- |
| `entity` | `list(object)` | Yes | List of entity definitions. Refer to [entity block](#entity-block) for details. |
#### Entity block
Each `entity` block supports the following:
| Name | Type | Required | Description |
| ------------- | -------------- | -------- | ------------------------------------------------------------------------------------------------------ |
| `type` | `string` | Yes | The type of the entity (for example, Service, Pod, Namespace). |
| `name` | `string` | Yes | The name pattern for the entity. Can include pipe-separated alternatives. |
| `defined_by` | `list(object)` | Yes | List of queries that define this entity. Refer to [`defined_by` block](#defined_by-block) for details. |
| `disabled` | `bool` | No | Whether this entity is disabled. Defaults to `false`. |
| `enriched_by` | `list(string)` | No | List of enrichment sources for the entity. |
| `lookup` | `map(string)` | No | Lookup mappings for the entity to relate different label names. |
| `scope` | `map(string)` | No | Scope labels that define the boundaries of this entity type. |
#### `defined_by` block
Each `defined_by` block supports the following:
| Name | Type | Required | Description |
| -------------- | ------------- | -------- | ------------------------------------------------------------------------- |
| `query` | `string` | Yes | The Prometheus query that defines this entity. |
| `disabled` | `bool` | No | Whether this query is disabled. Defaults to `false`. |
| `label_values` | `map(string)` | No | Label value mappings for extracting entity attributes from query results. |
| `literals` | `map(string)` | No | Literal value mappings for adding static attributes to entities. |
| `metric_value` | `string` | No | Metric value to use from the query result. |
{{< admonition type="note" >}}
When `disabled = true` is set for a `defined_by` query, only the `query` field is used for matching. All other fields in the block are ignored.
{{< /admonition >}}
## Best practices
### Entity models
- Design your entity models to reflect your actual infrastructure and application architecture
- Use descriptive names for custom model rules that indicate their purpose and scope
- Start with basic entity definitions and gradually add complexity as needed
- Define clear entity scopes using the `scope` parameter to organize entities by environment, region, or team
### Query design and performance
- Write efficient Prometheus queries that don't overload your monitoring system
- Test your Prometheus queries independently before using them in model rules
- Use specific label filters to reduce the scope of your queries where possible
- Consider the cardinality implications of your entity definitions
- Use the `disabled` flag to temporarily disable problematic queries during debugging
### Relationships and enrichment
- Use `lookup` mappings to establish relationships between different entity types
- Leverage `enriched_by` to specify additional data sources for entity enrichment
- Map Prometheus labels to entity attributes using clear and descriptive names
- Use meaningful `literals` to add static metadata that helps with entity identification
### Label and attribute management
- Establish consistent labeling conventions across your infrastructure
- Use `label_values` to extract dynamic attributes from your metrics
- Document the meaning and expected values of custom literals
- Ensure label names match across different entity definitions for proper relationship discovery
## Validation
After applying the Terraform configuration, verify that:
- Custom model rules are applied in your Knowledge Graph instance
- Entities are being discovered according to your defined queries
- Entity relationships and enrichment are working as expected
- Entity graphs display the correct entity types and connections
- Queries perform well without causing excessive load
## Related documentation
- [Manage entities and relations in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/configure/manage-entities-relations/)
- [Get started with Terraform for Knowledge Graph](../getting-started/)
- [Knowledge graph basics](/docs/grafana-cloud/knowledge-graph/knowledge-graph-basics/)
@@ -0,0 +1,140 @@
---
description: Learn how to configure Terraform to manage Knowledge Graph resources
menuTitle: Get started
title: Get started with Terraform for Knowledge Graph
weight: 100
keywords:
- Terraform
- Knowledge Graph
- Provider Setup
- Getting Started
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/getting-started/
---
# Get started with Terraform for Knowledge Graph
Learn how to configure Terraform to manage [Grafana Cloud Knowledge Graph](/docs/grafana-cloud/knowledge-graph/) resources. This guide walks you through setting up the Grafana Terraform provider and preparing your environment.
## Before you begin
Before you begin, ensure you have the following:
- A Grafana Cloud account, as shown in [Get started](/docs/grafana-cloud/get-started/)
- [Terraform](https://www.terraform.io/downloads) installed on your machine
- Administrator permissions in your Grafana instance
- [Knowledge Graph enabled](/docs/grafana-cloud/knowledge-graph/get-started/) in your Grafana Cloud stack
{{< admonition type="note" >}}
All Terraform configuration files should be saved in the same directory.
{{< /admonition >}}
## Configure the Grafana provider
This Terraform configuration sets up the [Grafana provider](https://registry.terraform.io/providers/grafana/grafana/latest/docs) to provide necessary authentication when managing knowledge graph resources.
You can reuse a similar setup to the one described in [Creating and managing a Grafana Cloud stack using Terraform](/docs/grafana-cloud/as-code/infrastructure-as-code/terraform/terraform-cloud-stack/) to set up a service account and a token.
### Steps
1. Create a Service account and token in Grafana.
To create a new one, refer to [Service account tokens](/docs/grafana/latest/administration/service-accounts/#service-account-tokens).
1. Create a file named `main.tf` and add the following:
```terraform
terraform {
required_providers {
grafana = {
source = "grafana/grafana"
version = ">= 2.9.0"
}
}
}
provider "grafana" {
alias = "asserts"
url = "<Stack-URL>"
auth = "<Service-account-token>"
stack_id = "<Stack-ID>"
}
```
1. Replace the following field values:
- `<Stack-URL>` with the URL of your Grafana stack (for example, `https://my-stack.grafana.net/`)
- `<Service-account-token>` with the service account token that you created
- `<Stack-ID>` with your Grafana Cloud stack ID
{{< admonition type="note" >}}
The `stack_id` parameter is required for Knowledge Graph resources to identify the stack where the resources belong.
{{< /admonition >}}
## Apply Terraform configurations
After creating your Terraform configuration files, apply them using the following commands:
1. Initialize a working directory containing Terraform configuration files:
```shell
terraform init
```
1. Preview the changes that Terraform makes:
```shell
terraform plan
```
1. Apply the configuration files:
```shell
terraform apply
```
## Verify your setup
After applying the configuration, verify your setup by checking that:
- Terraform can authenticate with your Grafana Cloud stack
- The provider is properly configured with the correct stack ID
- No errors appear in the Terraform output
## Best practices
When managing Knowledge Graph resources with Terraform, consider the following best practices:
### Name conventions
- Use descriptive names that clearly indicate the purpose of each resource
- Follow a consistent naming pattern across your organization
- Include environment or team identifiers in names when appropriate
### Version control
- Store your Terraform configurations in version control (Git)
- Use separate directories or workspaces for different environments
- Document changes in commit messages
### State management
- Use remote state backends for team collaboration
- Enable state locking to prevent concurrent modifications
- Regularly back up your Terraform state files
### Security
- Never commit service account tokens or sensitive data to version control
- Use environment variables or secret management tools for credentials
- Rotate service account tokens regularly
## Next steps
Now that you have configured the Terraform provider, you can start managing knowledge graph resources:
- [Configure notification alerts](../notification-alerts/)
- [Define suppressed assertions](../suppressed-assertions/)
- [Create custom model rules](../custom-model-rules/)
- [Set up log configurations](../log-configurations/)
- [Configure thresholds](../thresholds/)
- [Configure knowledge graph SLOs](../knowledge-graph-slo/)
@@ -0,0 +1,696 @@
---
description: Learn how to configure knowledge graph SLOs in Grafana using Terraform for entity-centric monitoring and root cause analysis
menuTitle: Knowledge graph SLOs
title: Configure knowledge graph SLOs using Terraform
weight: 650
keywords:
- Terraform
- Knowledge graph
- SLO
- Service Level Objectives
- RCA workbench
---
# Configure knowledge graph SLOs using Terraform
Service level objectives (SLOs) in the [knowledge graph](/docs/grafana-cloud/knowledge-graph/) provide entity-centric service level monitoring with integrated root cause analysis capabilities. By using the `grafana_slo_provenance` label with the value `asserts`, you can create SLOs that display the "asserts" badge in the UI and enable the **Open RCA workbench** button for seamless troubleshooting.
For details about creating and managing SLOs in the knowledge graph UI, refer to [Create and manage the knowledge graph SLOs](/docs/grafana-cloud/knowledge-graph/configure/manage-slos/).
## Overview
Knowledge graph SLOs extend standard Grafana SLOs with entity-centric monitoring and root cause analysis features:
- **Entity-centric monitoring:** SLOs are tied to specific services, applications, or infrastructure entities tracked by the knowledge graph
- **RCA workbench integration:** The **Open RCA workbench** button enables deep-linking to pre-filtered troubleshooting views
- **Knowledge graph provenance badge:** SLOs display an "asserts" badge instead of "provisioned" in the UI
- **Search expressions:** Define custom search expressions to filter entities in RCA workbench when troubleshooting an SLO breach
## Before you begin
To create a knowledge graph SLO using Terraform, you need to:
- Configure the knowledge graph and have metrics flowing into Grafana Cloud
- [Set up Terraform for the knowledge Graph](../getting-started/)
- Possess knowledge of and have experience with defining SLOs, SLIs, SLAs, and error budgets
- Have an understanding of PromQL
## Create a basic knowledge graph SLO
Create a file named `kg-slo.tf` and add the following:
```terraform
# Basic knowledge graph SLO with entity-centric monitoring
resource "grafana_slo" "kg_example" {
name = "API Service Availability"
description = "SLO managed by knowledge graph for entity-centric monitoring and RCA"
query {
freeform {
query = "sum(rate(http_requests_total{code!~\"5..\"}[$__rate_interval])) / sum(rate(http_requests_total[$__rate_interval]))"
}
type = "freeform"
}
objectives {
value = 0.995
window = "30d"
}
destination_datasource {
uid = "grafanacloud-prom"
}
# Knowledge graph integration labels
# The grafana_slo_provenance label triggers knowledge graph-specific behavior:
# - Displays "asserts" badge instead of "provisioned"
# - Shows "Open RCA workbench" button in the SLO UI
# - Enables correlation with knowledge graph entity-centric monitoring
label {
key = "grafana_slo_provenance"
value = "asserts"
}
label {
key = "service_name"
value = "api-service"
}
# Search expression for RCA workbench
# This enables the "Open RCA workbench" button to deep-link with pre-filtered context
search_expression = "service=api-service"
alerting {
fastburn {
annotation {
key = "name"
value = "SLO Burn Rate Very High"
}
annotation {
key = "description"
value = "Error budget is burning too fast"
}
}
slowburn {
annotation {
key = "name"
value = "SLO Burn Rate High"
}
annotation {
key = "description"
value = "Error budget is burning too fast"
}
}
}
}
```
## Configure an SLO with multiple entity labels
Configure SLOs with multiple entity labels for fine-grained filtering in RCA workbench:
```terraform
# Knowledge graph SLO with comprehensive entity labels
resource "grafana_slo" "payment_service" {
name = "Payment Service Latency SLO"
description = "Latency SLO for payment processing with team and environment context"
query {
freeform {
query = "histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{service=\"payment\"}[$__rate_interval])) by (le)) < 0.5"
}
type = "freeform"
}
objectives {
value = 0.99
window = "7d"
}
destination_datasource {
uid = "grafanacloud-prom"
}
# Knowledge graph provenance - required for RCA workbench integration
label {
key = "grafana_slo_provenance"
value = "asserts"
}
# Service identification
label {
key = "service_name"
value = "payment-service"
}
# Team ownership
label {
key = "team_name"
value = "payments-team"
}
# Environment
label {
key = "environment"
value = "production"
}
# Business unit
label {
key = "business_unit"
value = "fintech"
}
# Search expression with multiple filters
search_expression = "service=payment-service AND environment=production"
alerting {
fastburn {
annotation {
key = "name"
value = "Payment Latency Critical"
}
annotation {
key = "description"
value = "Payment service P99 latency exceeding SLO - immediate attention required"
}
annotation {
key = "runbook_url"
value = "https://docs.example.com/runbooks/payment-latency"
}
}
slowburn {
annotation {
key = "name"
value = "Payment Latency Warning"
}
annotation {
key = "description"
value = "Payment service experiencing elevated latency"
}
}
}
}
```
## Configure a Kubernetes service SLO
Configure knowledge graph SLOs for Kubernetes services with Pod and namespace context:
```terraform
# Knowledge graph SLO for Kubernetes service
resource "grafana_slo" "k8s_frontend" {
name = "Frontend Service Availability"
description = "Availability SLO for frontend service in Kubernetes"
query {
freeform {
query = "sum(rate(http_requests_total{namespace=\"frontend\",code!~\"5..\"}[$__rate_interval])) / sum(rate(http_requests_total{namespace=\"frontend\"}[$__rate_interval]))"
}
type = "freeform"
}
objectives {
value = 0.999
window = "30d"
}
destination_datasource {
uid = "grafanacloud-prom"
}
label {
key = "grafana_slo_provenance"
value = "asserts"
}
label {
key = "service_name"
value = "frontend"
}
label {
key = "namespace"
value = "frontend"
}
label {
key = "cluster"
value = "prod-us-west-2"
}
# Search expression targeting Kubernetes entities
search_expression = "namespace=frontend AND cluster=prod-us-west-2"
alerting {
fastburn {
annotation {
key = "name"
value = "Frontend Service Critical"
}
annotation {
key = "description"
value = "Frontend service availability below SLO"
}
annotation {
key = "severity"
value = "critical"
}
}
slowburn {
annotation {
key = "name"
value = "Frontend Service Degraded"
}
annotation {
key = "description"
value = "Frontend service showing signs of degradation"
}
annotation {
key = "severity"
value = "warning"
}
}
}
}
```
## Configure an API endpoint-specific SLO
Configure knowledge graph SLOs for specific API endpoints with request context:
```terraform
# Knowledge graph SLO for critical API endpoint
resource "grafana_slo" "checkout_api" {
name = "Checkout API Availability"
description = "Availability SLO for /api/checkout endpoint"
query {
freeform {
query = "sum(rate(http_requests_total{path=\"/api/checkout\",code!~\"5..\"}[$__rate_interval])) / sum(rate(http_requests_total{path=\"/api/checkout\"}[$__rate_interval]))"
}
type = "freeform"
}
objectives {
value = 0.9999
window = "30d"
}
destination_datasource {
uid = "grafanacloud-prom"
}
label {
key = "grafana_slo_provenance"
value = "asserts"
}
label {
key = "service_name"
value = "checkout-service"
}
label {
key = "endpoint"
value = "/api/checkout"
}
label {
key = "criticality"
value = "high"
}
# Search expression with endpoint context
search_expression = "service=checkout-service AND path=/api/checkout"
alerting {
fastburn {
annotation {
key = "name"
value = "Checkout API Critical Failure"
}
annotation {
key = "description"
value = "Checkout API experiencing high error rates - revenue impact"
}
annotation {
key = "severity"
value = "critical"
}
annotation {
key = "alert_priority"
value = "P0"
}
}
slowburn {
annotation {
key = "name"
value = "Checkout API Degradation"
}
annotation {
key = "description"
value = "Checkout API showing elevated error rates"
}
annotation {
key = "severity"
value = "warning"
}
}
}
}
```
## Configure a multi-environment SLO
Manage knowledge graph SLOs across multiple environments using Terraform workspaces or modules:
```terraform
# Variable for environment-specific configuration
variable "environment" {
description = "Environment name"
type = string
}
variable "slo_target" {
description = "SLO target percentage"
type = number
}
# Environment-aware knowledge graph SLO
resource "grafana_slo" "api_service" {
name = "${var.environment} - API Service Availability"
description = "API service availability SLO for ${var.environment} environment"
query {
freeform {
query = "sum(rate(http_requests_total{environment=\"${var.environment}\",code!~\"5..\"}[$__rate_interval])) / sum(rate(http_requests_total{environment=\"${var.environment}\"}[$__rate_interval]))"
}
type = "freeform"
}
objectives {
value = var.slo_target
window = "30d"
}
destination_datasource {
uid = "grafanacloud-prom"
}
label {
key = "grafana_slo_provenance"
value = "asserts"
}
label {
key = "service_name"
value = "api-service"
}
label {
key = "environment"
value = var.environment
}
search_expression = "service=api-service AND environment=${var.environment}"
alerting {
fastburn {
annotation {
key = "name"
value = "${var.environment} API Critical"
}
annotation {
key = "description"
value = "API service in ${var.environment} experiencing critical errors"
}
}
slowburn {
annotation {
key = "name"
value = "${var.environment} API Warning"
}
annotation {
key = "description"
value = "API service in ${var.environment} showing elevated errors"
}
}
}
}
```
## Resource reference
### `grafana_slo` with knowledge graph provenance
When creating knowledge graph-managed SLOs, the `grafana_slo` resource requires the `grafana_slo_provenance` label set to `asserts` to enable RCA workbench integration.
#### Required knowledge graph configuration
| Name | Type | Required | Description |
| ------------------------------ | -------- | ----------- | -------------------------------------------------------------------------------------------------- |
| `grafana_slo_provenance` label | `string` | Yes | Must be set to `asserts` to enable knowledge graph-specific features and RCA workbench integration |
| `search_expression` | `string` | Recommended | Search expression for filtering entities in RCA workbench |
#### Key arguments for knowledge graph SLOs
| Name | Type | Required | Description |
| ------------------------ | -------------- | -------- | ----------------------------------------------------------------- |
| `name` | `string` | Yes | The name of the SLO |
| `description` | `string` | No | Description of the SLO purpose and scope |
| `query` | `object` | Yes | Query configuration defining how SLO is calculated |
| `objectives` | `object` | Yes | Target objectives including value and time window |
| `destination_datasource` | `object` | Yes | Destination data source for SLO metrics |
| `label` | `list(object)` | Yes | Labels for the SLO, must include `grafana_slo_provenance=asserts` |
| `search_expression` | `string` | No | Search expression for RCA workbench filtering |
| `alerting` | `object` | No | Alerting configuration for fast burn and slow burn alerts |
#### Query block
The `query` block supports the following:
| Name | Type | Required | Description |
| ---------- | -------- | -------- | --------------------------------------------------------- |
| `type` | `string` | Yes | Query type, typically `freeform` for knowledge graph SLOs |
| `freeform` | `object` | Yes | Freeform query configuration |
The `freeform` block supports:
| Name | Type | Required | Description |
| ------- | -------- | -------- | -------------------------------- |
| `query` | `string` | Yes | PromQL query for SLO calculation |
#### Objectives block
The `objectives` block supports the following:
| Name | Type | Required | Description |
| -------- | -------- | -------- | --------------------------------------------------- |
| `value` | `number` | Yes | Target SLO value (for example, 0.995 for 99.5%) |
| `window` | `string` | Yes | Time window for SLO evaluation (for example, "30d") |
#### Label block
Each `label` block supports the following:
| Name | Type | Required | Description |
| ------- | -------- | -------- | ----------- |
| `key` | `string` | Yes | Label key |
| `value` | `string` | Yes | Label value |
**Required label for knowledge graph SLOs:**
- `grafana_slo_provenance` = `asserts` (enables knowledge graph features)
**Recommended labels for entity tracking:**
- `service_name` - Name of the service
- `team_name` - Team responsible for the service
- `environment` - Environment (prod, staging, development)
- `namespace` - Kubernetes namespace
- `cluster` - Kubernetes cluster name
<!-- vale Grafana.Gerunds = NO -->
#### Alerting block
The `alerting` block supports the following:
| Name | Type | Required | Description |
| ---------- | -------- | -------- | ---------------------------------- |
| `fastburn` | `object` | No | Fast burn rate alert configuration |
| `slowburn` | `object` | No | Slow burn rate alert configuration |
Each alert block (`fastburn`, `slowburn`) supports:
| Name | Type | Required | Description |
| ------------ | -------------- | -------- | ------------------------------- |
| `annotation` | `list(object)` | No | Annotations to add to the alert |
Each `annotation` block supports:
| Name | Type | Required | Description |
| ------- | -------- | -------- | ---------------- |
| `key` | `string` | Yes | Annotation key |
| `value` | `string` | Yes | Annotation value |
Common annotation keys:
- `name` - Alert name
- `description` - Alert description
- `severity` - Alert severity level
- `runbook_url` - Link to runbook documentation
<!-- vale Grafana.Gerunds = YES -->
#### Example
```terraform
resource "grafana_slo" "kg_example" {
name = "My Service SLO"
description = "SLO with knowledge graph RCA integration"
query {
freeform {
query = "sum(rate(http_requests_total{code!~\"5..\"}[$__rate_interval])) / sum(rate(http_requests_total[$__rate_interval]))"
}
type = "freeform"
}
objectives {
value = 0.995
window = "30d"
}
destination_datasource {
uid = "grafanacloud-prom"
}
label {
key = "grafana_slo_provenance"
value = "asserts"
}
label {
key = "service_name"
value = "my-service"
}
search_expression = "service=my-service"
alerting {
fastburn {
annotation {
key = "name"
value = "SLO Fast Burn"
}
}
slowburn {
annotation {
key = "name"
value = "SLO Slow Burn"
}
}
}
}
```
## Best practices
Follow these best practices when setting knowledge graph SLOs.
### Use the knowledge graph provenance label
- Always include the `grafana_slo_provenance` label with value `asserts` for knowledge graph-managed SLOs
- This label enables the "asserts" badge in the UI instead of "provisioned"
- It also enables the **Open RCA workbench** button for troubleshooting SLO breaches
### Define search expressions
- Define meaningful search expressions that filter relevant entities in RCA workbench
- The search expression defines which entities populate RCA workbench when you troubleshoot an SLO breach
- Use entity attributes like service name, environment, namespace, and cluster
- Combine multiple filters with `AND` operators for precise filtering
- Test search expressions in RCA workbench before codifying them in Terraform
### Add entity labels
- Add descriptive labels to track service ownership, environment, and criticality
- Use consistent label naming conventions across all SLOs
- Include team names to enable quick identification of ownership
- Tag critical business services with appropriate labels
### Set SLO targets
- Set realistic SLO targets based on service requirements and capabilities
- Use higher targets (0.999+) for critical user-facing services
- Consider different targets for different environments (production vs staging)
- Review and adjust targets based on actual service performance
### Add alert annotations
- Add comprehensive descriptions to help on-call engineers understand the alert
- Include runbook URLs in annotations for quick access to troubleshooting guides
- Set appropriate severity levels (critical, warning) based on business impact
- Customize alert names to clearly identify the affected service and issue
### Configure queries
- Use PromQL queries that accurately represent service health
- Exclude expected error codes, such as 404, from error calculations when appropriate
- Leverage rate intervals with `$__rate_interval` for dynamic time range support
- Test queries in Grafana before adding them to Terraform configurations
### Set compliance windows
- Use 30-day windows for production SLOs to align with monthly reporting
- Consider shorter windows (7d) for development or testing environments
- Ensure compliance windows align with business requirements and error budget policies
## Verify the configuration
After applying the Terraform configuration, verify that:
- SLOs are created in your Grafana Cloud stack
- SLOs appear in **Observability > SLO** with the "asserts" badge
- The **Open RCA workbench** button is visible when you expand **Objective** for an SLO
- You can select a time range in the **Error Budget Burndown** panel and click **Open in RCA workbench**
- Search expressions correctly filter entities in RCA workbench
- Fast burn and slow burn alerts are configured with appropriate thresholds
- Labels are correctly applied and visible in the SLO details
## Troubleshooting
Follow these troubleshooting steps if you experience issues setting knowledge graph SLOs.
### SLO shows "provisioned" instead of "asserts" badge
Ensure the `grafana_slo_provenance` label is set to `asserts`:
```terraform
label {
key = "grafana_slo_provenance"
value = "asserts"
}
```
### Open RCA workbench button not appearing
- Verify the `search_expression` field is populated
- The **Open RCA workbench** button appears after you have added a search expression in the **RCA workbench Context** section
- Ensure the search expression uses valid entity attributes
- Check that the knowledge graph is properly configured and receiving data
### Alerts not triggering
- Verify the PromQL query returns valid results in Grafana
- Check that the destination data source is correctly configured
- Ensure alerting blocks are properly defined with annotations
## Related documentation
- [Create and manage knowledge graph SLOs](/docs/grafana-cloud/knowledge-graph/configure/manage-slos/)
- [Troubleshoot an SLO breach with the knowledge graph](/docs/grafana-cloud/knowledge-graph/troubleshoot-infra-apps/slos/)
- [Get started with Terraform for the knowledge graph](../getting-started/)
- [Introduction to Grafana SLO](/docs/grafana-cloud/alerting-and-irm/slo/introduction/)
- [Configure notifications in the knowledge graph](/docs/grafana-cloud/knowledge-graph/configure/notifications/)
@@ -0,0 +1,290 @@
---
description: Configure log correlation for Knowledge Graph using Terraform
menuTitle: Log configurations
title: Configure log correlation using Terraform
weight: 500
keywords:
- Terraform
- Knowledge Graph
- Log Configuration
- Log Correlation
- Loki
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/log-configurations/
---
# Configure log correlation using Terraform
Log configurations in [Knowledge Graph](/docs/grafana-cloud/knowledge-graph/) allow you to define how log data is queried and correlated with entities. You can specify data sources, entity matching rules, label mappings, and filtering options for spans and traces.
For information about configuring log correlation in the Knowledge Graph UI, refer to [Configure logs correlation](/docs/grafana-cloud/knowledge-graph/configure/logs-correlation/).
## Basic log configuration
Create a file named `log-configs.tf` and add the following:
```terraform
# Basic log configuration for services
resource "grafana_asserts_log_config" "production" {
provider = grafana.asserts
name = "production"
priority = 1000
default_config = false
data_source_uid = "grafanacloud-logs"
error_label = "error"
match {
property = "asserts_entity_type"
op = "EQUALS"
values = ["Service"]
}
match {
property = "environment"
op = "EQUALS"
values = ["production", "staging"]
}
entity_property_to_log_label_mapping = {
"otel_namespace" = "service_namespace"
"otel_service" = "service_name"
"environment" = "env"
"site" = "region"
}
filter_by_span_id = true
filter_by_trace_id = true
}
```
## Log configuration with multiple match rules
Configure log correlation with multiple entity matching criteria:
```terraform
# Development environment log configuration
resource "grafana_asserts_log_config" "development" {
provider = grafana.asserts
name = "development"
priority = 2000
default_config = true
data_source_uid = "elasticsearch-dev"
error_label = "error"
match {
property = "asserts_entity_type"
op = "EQUALS"
values = ["Service"]
}
match {
property = "environment"
op = "EQUALS"
values = ["development", "testing"]
}
match {
property = "site"
op = "EQUALS"
values = ["us-east-1"]
}
match {
property = "service"
op = "EQUALS"
values = ["api"]
}
entity_property_to_log_label_mapping = {
"otel_namespace" = "service_namespace"
"otel_service" = "service_name"
"environment" = "env"
"site" = "region"
"service" = "app"
}
filter_by_span_id = true
filter_by_trace_id = true
}
```
## Minimal log configuration
Create a minimal configuration for all entities:
```terraform
# Minimal configuration for all entities
resource "grafana_asserts_log_config" "minimal" {
provider = grafana.asserts
name = "minimal"
priority = 3000
default_config = false
data_source_uid = "loki-minimal"
match {
property = "asserts_entity_type"
op = "IS_NOT_NULL"
values = []
}
}
```
## Advanced log configuration with complex match rules
Configure logs with multiple operations and advanced match rules:
```terraform
# Advanced configuration with multiple operations
resource "grafana_asserts_log_config" "advanced" {
provider = grafana.asserts
name = "advanced"
priority = 1500
default_config = false
data_source_uid = "loki-advanced"
error_label = "level"
match {
property = "service_type"
op = "CONTAINS"
values = ["web", "api"]
}
match {
property = "environment"
op = "NOT_EQUALS"
values = ["test"]
}
match {
property = "team"
op = "IS_NOT_NULL"
values = []
}
entity_property_to_log_label_mapping = {
"service_type" = "type"
"team" = "owner"
"environment" = "env"
"version" = "app_version"
}
filter_by_span_id = true
filter_by_trace_id = false
}
```
## Resource reference
### `grafana_asserts_log_config`
Manage Knowledge Graph log configurations through the Grafana API.
#### Arguments
| Name | Type | Required | Description |
| -------------------------------------- | -------------- | -------- | -------------------------------------------------------------------------------------------- |
| `name` | `string` | Yes | The name of the log configuration. This field is immutable and forces recreation if changed. |
| `priority` | `number` | Yes | Priority of the log configuration. Higher priority configurations are evaluated first. |
| `default_config` | `bool` | Yes | Whether this is the default configuration. Default configurations cannot be deleted. |
| `data_source_uid` | `string` | Yes | DataSource UID to be queried (for example, a Loki instance). |
| `match` | `list(object)` | No | List of match rules for entity properties. Refer to [match block](#match-block) for details. |
| `error_label` | `string` | No | Label name used to identify error logs. |
| `entity_property_to_log_label_mapping` | `map(string)` | No | Mapping of entity properties to log labels for correlation. |
| `filter_by_span_id` | `bool` | No | Whether to filter logs by span ID for distributed tracing correlation. |
| `filter_by_trace_id` | `bool` | No | Whether to filter logs by trace ID for distributed tracing correlation. |
#### Match block
Each `match` block supports the following:
| Name | Type | Required | Description |
| ---------- | -------------- | -------- | ------------------------------------------------------------------------------------------------------------------------ |
| `property` | `string` | Yes | Entity property to match against. |
| `op` | `string` | Yes | Operation to use for matching. One of: `EQUALS`, `NOT_EQUALS`, `CONTAINS`, `DOES_NOT_CONTAIN`, `IS_NULL`, `IS_NOT_NULL`. |
| `values` | `list(string)` | Yes | Values to match against. Can be empty for `IS_NULL` and `IS_NOT_NULL` operations. |
#### Example
```terraform
resource "grafana_asserts_log_config" "example" {
provider = grafana.asserts
name = "example-logs"
priority = 1000
default_config = false
data_source_uid = "loki-prod"
error_label = "level"
match {
property = "asserts_entity_type"
op = "EQUALS"
values = ["Service", "Pod"]
}
entity_property_to_log_label_mapping = {
"service" = "app"
"namespace" = "k8s_namespace"
"environment" = "env"
}
filter_by_span_id = true
filter_by_trace_id = true
}
```
## Best practices
### Priority management
- Assign lower priority numbers to more specific configurations
- Higher priority configurations are evaluated first
- Use consistent priority ranges for different configuration types
- Document the reasoning behind priority assignments
### Data source configuration
- Ensure the data source UID matches your actual Loki or log aggregation system
- Test data source connectivity before applying configurations
- Use descriptive names for log configurations to indicate their purpose
- Consider using separate data sources for different environments
### Label map strategy
- Map entity properties consistently across all log configurations
- Use meaningful log label names that match your logging standards
- Document the mapping relationships in configuration comments
- Verify that mapped labels exist in your log data
### Match rules design
- Start with broad match rules and refine based on needs
- Use specific property names that exist in your entity model
- Test match rules with sample data before deploying
- Combine multiple match rules for precise entity targeting
### Distributed trace integration
- Enable `filter_by_span_id` and `filter_by_trace_id` when using OpenTelemetry
- Ensure your logs contain the appropriate trace and span ID labels
- Use consistent label names for trace IDs across your logging infrastructure
- Test trace correlation to verify it works as expected
## Validation
After applying the Terraform configuration, verify that:
- Log configurations are created in your Knowledge Graph instance
- Configurations appear in the Knowledge Graph UI under **Observability > Configuration > Logs**
- Log correlation works when drilling down from entities
- Label mappings correctly translate entity properties to log labels
- Match rules properly filter entities
- Trace and span ID filtering works for distributed tracing
## Related documentation
- [Configure logs correlation in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/configure/logs-correlation/)
- [Get started with Terraform for Knowledge Graph](../getting-started/)
- [Loki documentation](/docs/loki/latest/)
@@ -0,0 +1,224 @@
---
description: Configure notification alerts for Knowledge Graph using Terraform
menuTitle: Notification alerts
title: Configure notification alerts using Terraform
weight: 200
keywords:
- Terraform
- Knowledge Graph
- Notification Alerts
- Alert Configuration
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/notification-alerts/
---
# Configure notification alerts using Terraform
Notification alerts configurations in [Knowledge Graph](/docs/grafana-cloud/knowledge-graph/) allow you to manage how alerts are processed and routed. You can specify match labels to filter alerts, add custom labels, set duration requirements, and control silencing.
For information about configuring notification alerts in the Knowledge Graph UI, refer to [Configure notifications](/docs/grafana-cloud/knowledge-graph/configure/notifications/).
## Basic notification alerts configuration
Create a file named `alert-configs.tf` and add the following:
```terraform
# Basic alert configuration with silencing
resource "grafana_asserts_notification_alerts_config" "prometheus_remote_storage_failures" {
provider = grafana.asserts
name = "PrometheusRemoteStorageFailures"
match_labels = {
alertname = "PrometheusRemoteStorageFailures"
alertgroup = "prometheus.alerts"
asserts_env = "prod"
}
silenced = true
}
# High severity alert with specific job and context matching
resource "grafana_asserts_notification_alerts_config" "error_buildup_notify" {
provider = grafana.asserts
name = "ErrorBuildupNotify"
match_labels = {
alertname = "ErrorBuildup"
job = "acai"
asserts_request_type = "inbound"
asserts_request_context = "/auth"
}
silenced = false
}
```
## Notification alerts with additional labels and duration
Configure alerts with custom labels and timing requirements:
```terraform
# Alert with additional labels and custom duration
resource "grafana_asserts_notification_alerts_config" "payment_test_alert" {
provider = grafana.asserts
name = "PaymentTestAlert"
match_labels = {
alertname = "PaymentTestAlert"
additional_labels = "asserts_severity=~\"critical\""
alertgroup = "alex-k8s-integration-test.alerts"
}
alert_labels = {
testing = "onetwothree"
}
duration = "5m"
silenced = false
}
```
## Latency and performance notification alerts
Monitor and alert on latency and performance issues:
```terraform
# Latency alert for shipping service
resource "grafana_asserts_notification_alerts_config" "high_shipping_latency" {
provider = grafana.asserts
name = "high shipping latency"
match_labels = {
alertname = "LatencyP99ErrorBuildup"
job = "shipping"
asserts_request_type = "inbound"
}
silenced = false
}
# CPU throttling alert with warning severity
resource "grafana_asserts_notification_alerts_config" "cpu_throttling_sustained" {
provider = grafana.asserts
name = "CPUThrottlingSustained"
match_labels = {
alertname = "CPUThrottlingSustained"
additional_labels = "asserts_severity=~\"warning\""
}
silenced = true
}
```
## Infrastructure and service notification alerts
Configure alerts for infrastructure components and services:
```terraform
# Ingress error rate alert
resource "grafana_asserts_notification_alerts_config" "ingress_error" {
provider = grafana.asserts
name = "ingress error"
match_labels = {
alertname = "ErrorRatioBreach"
job = "ingress-nginx-controller-metrics"
asserts_request_type = "inbound"
}
silenced = false
}
# MySQL Galera cluster alert
resource "grafana_asserts_notification_alerts_config" "mysql_galera_not_ready" {
provider = grafana.asserts
name = "MySQLGaleraNotReady"
match_labels = {
alertname = "MySQLGaleraNotReady"
}
silenced = false
}
```
## Resource reference
### `grafana_asserts_notification_alerts_config`
Manage Knowledge Graph notification alerts configurations through the Grafana API.
#### Arguments
| Name | Type | Required | Description |
| -------------- | ------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------- |
| `name` | `string` | Yes | The name of the notification alerts configuration. This field is immutable and forces recreation if changed. |
| `match_labels` | `map(string)` | No | Labels to match for this notification alerts configuration. Used to filter which alerts this configuration applies to. |
| `alert_labels` | `map(string)` | No | Labels to add to alerts generated by this notification alerts configuration. |
| `duration` | `string` | No | Duration for which the condition must be true before firing (for example, '5m', '30s'). Maps to 'for' in Knowledge Graph API. |
| `silenced` | `bool` | No | Whether this notification alerts configuration is silenced. Defaults to `false`. |
#### Example
```terraform
resource "grafana_asserts_notification_alerts_config" "example" {
provider = grafana.asserts
name = "ExampleAlert"
match_labels = {
alertname = "HighCPUUsage"
job = "monitoring"
}
alert_labels = {
severity = "warning"
team = "platform"
}
duration = "5m"
silenced = false
}
```
## Best practices
### Label management
- Use specific and meaningful labels in `match_labels` to ensure precise alert filtering
- Leverage existing label conventions from your monitoring setup
- Consider using `asserts_env` and `asserts_site` labels for multi-environment setups
### Silence strategy
- Use the `silenced` parameter for temporary suppression rather than deleting notification alerts configurations
- Document the reason for silencing in your Terraform configuration comments
- Regularly review silenced configurations to ensure they're still needed
### Duration configuration
- Set appropriate duration values based on your alerting requirements
- Consider the nature of the monitored condition when choosing duration
- Use consistent duration formats across similar alert types
## Validation
After applying the Terraform configuration, verify that:
- Notification alerts configurations are created in your Knowledge Graph instance
- Configurations appear in the Knowledge Graph UI under **Observability > Rules > Notify**
- Match labels correctly filter the intended alerts
- Custom labels are properly applied to generated alerts
## Related documentation
- [Configure notifications in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/configure/notifications/)
- [Get started with Terraform for Knowledge Graph](../getting-started/)
- [Configure alerts in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/configure/alerts/)
@@ -0,0 +1,308 @@
---
description: Configure suppressed assertions for Knowledge Graph using Terraform
menuTitle: Suppressed assertions
title: Configure suppressed assertions using Terraform
weight: 300
keywords:
- Terraform
- Knowledge Graph
- Suppressed Assertions
- Alert Suppression
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/suppressed-assertions/
---
# Configure suppressed assertions using Terraform
Suppressed assertions configurations allow you to disable specific alerts or assertions based on label matching in [Knowledge Graph](/docs/grafana-cloud/knowledge-graph/). This is useful for maintenance windows, test environments, or when you want to temporarily suppress certain types of alerts.
For information about suppressing insights in the Knowledge Graph UI, refer to [Suppress insights](/docs/grafana-cloud/knowledge-graph/troubleshoot-infra-apps/suppress-insights/).
## Basic suppressed assertions configuration
Create a file named `suppressed-assertions.tf` and add the following:
```terraform
# Basic suppressed alert configuration for maintenance
resource "grafana_asserts_suppressed_assertions_config" "maintenance_window" {
provider = grafana.asserts
name = "MaintenanceWindow"
match_labels = {
service = "api-service"
maintenance = "true"
}
}
# Suppress specific alertname during deployment
resource "grafana_asserts_suppressed_assertions_config" "deployment_suppression" {
provider = grafana.asserts
name = "DeploymentSuppression"
match_labels = {
alertname = "HighLatency"
job = "web-service"
env = "staging"
}
}
# Suppress alerts for specific test environment
resource "grafana_asserts_suppressed_assertions_config" "test_environment_suppression" {
provider = grafana.asserts
name = "TestEnvironmentSuppression"
match_labels = {
alertgroup = "test.alerts"
environment = "test"
}
}
```
## Service-specific suppression configurations
Suppress alerts for specific services during maintenance or operational activities:
```terraform
# Suppress alerts for specific services during maintenance
resource "grafana_asserts_suppressed_assertions_config" "api_service_maintenance" {
provider = grafana.asserts
name = "APIServiceMaintenance"
match_labels = {
service = "api-gateway"
job = "api-gateway"
maintenance = "scheduled"
}
}
# Suppress database alerts during backup operations
resource "grafana_asserts_suppressed_assertions_config" "database_backup" {
provider = grafana.asserts
name = "DatabaseBackupSuppression"
match_labels = {
service = "postgresql"
job = "postgres-exporter"
backup_mode = "active"
}
}
# Suppress monitoring system alerts during updates
resource "grafana_asserts_suppressed_assertions_config" "monitoring_update" {
provider = grafana.asserts
name = "MonitoringSystemUpdate"
match_labels = {
service = "prometheus"
job = "prometheus"
update = "in_progress"
}
}
```
## Environment and team-based suppression
Create suppression rules based on environment or team:
```terraform
# Suppress all alerts for development environment
resource "grafana_asserts_suppressed_assertions_config" "dev_environment" {
provider = grafana.asserts
name = "DevelopmentEnvironmentSuppression"
match_labels = {
environment = "development"
team = "platform"
}
}
# Suppress alerts for specific team during their maintenance window
resource "grafana_asserts_suppressed_assertions_config" "team_maintenance" {
provider = grafana.asserts
name = "TeamMaintenanceWindow"
match_labels = {
team = "backend"
maintenance = "team_scheduled"
timezone = "UTC"
}
}
# Suppress alerts for staging environment during testing
resource "grafana_asserts_suppressed_assertions_config" "staging_testing" {
provider = grafana.asserts
name = "StagingTestingSuppression"
match_labels = {
environment = "staging"
testing = "automated"
job = "integration-tests"
}
}
```
## Alert type and severity-based suppression
Suppress alerts based on their type or severity:
```terraform
# Suppress low severity alerts during business hours
resource "grafana_asserts_suppressed_assertions_config" "low_severity_business_hours" {
provider = grafana.asserts
name = "LowSeverityBusinessHours"
match_labels = {
severity = "warning"
timezone = "business_hours"
}
}
# Suppress specific alert types during known issues
resource "grafana_asserts_suppressed_assertions_config" "known_issue_suppression" {
provider = grafana.asserts
name = "KnownIssueSuppression"
match_labels = {
alertname = "HighMemoryUsage"
service = "legacy-service"
issue_id = "LEG-123"
}
}
# Suppress infrastructure alerts during planned maintenance
resource "grafana_asserts_suppressed_assertions_config" "infrastructure_maintenance" {
provider = grafana.asserts
name = "InfrastructureMaintenance"
match_labels = {
alertgroup = "infrastructure.alerts"
maintenance_type = "planned"
affected_services = "all"
}
}
```
## Complex multi-label suppression
Define complex suppression rules with multiple labels:
```terraform
# Complex suppression for multi-service deployments
resource "grafana_asserts_suppressed_assertions_config" "multi_service_deployment" {
provider = grafana.asserts
name = "MultiServiceDeploymentSuppression"
match_labels = {
deployment_id = "deploy-2024-01-15"
services = "api,worker,frontend"
environment = "production"
deployment_type = "blue_green"
}
}
# Suppress alerts for specific cluster during maintenance
resource "grafana_asserts_suppressed_assertions_config" "cluster_maintenance" {
provider = grafana.asserts
name = "ClusterMaintenanceSuppression"
match_labels = {
cluster = "production-cluster-1"
maintenance = "cluster_upgrade"
affected_nodes = "all"
estimated_duration = "2h"
}
}
# Suppress alerts for specific region during network issues
resource "grafana_asserts_suppressed_assertions_config" "regional_network_issue" {
provider = grafana.asserts
name = "RegionalNetworkIssueSuppression"
match_labels = {
region = "us-west-2"
issue_type = "network"
affected_services = "external_dependencies"
incident_id = "NET-456"
}
}
```
## Resource reference
### `grafana_asserts_suppressed_assertions_config`
Manage Knowledge Graph suppressed assertions configurations through the Grafana API.
#### Arguments
| Name | Type | Required | Description |
| -------------- | ------------- | -------- | ------------------------------------------------------------------------------------------------------------------ |
| `name` | `string` | Yes | The name of the suppressed assertions configuration. This field is immutable and forces recreation if changed. |
| `match_labels` | `map(string)` | No | Labels to match for this suppressed assertions configuration. Used to determine which alerts should be suppressed. |
#### Example
```terraform
resource "grafana_asserts_suppressed_assertions_config" "example" {
provider = grafana.asserts
name = "ExampleSuppression"
match_labels = {
alertname = "TestAlert"
env = "development"
}
}
```
## Best practices
### Suppression strategy
- Use suppression rules for temporary situations rather than permanent solutions
- Document the reason for suppression in your Terraform configuration comments
- Set expiration dates or reminders to review suppression rules
- Prefer fixing alert thresholds over suppressing recurring false positives
### Label match rules
- Be specific with match labels to avoid suppressing unintended alerts
- Test suppression rules in non-production environments first
- Use descriptive names that indicate the purpose and scope of the suppression
- Include relevant context in labels (for example, incident IDs, maintenance windows)
### Lifecycle management
- Regularly review active suppression rules to ensure they're still needed
- Remove or update suppression rules after maintenance windows or deployments
- Use version control to track when suppression rules were added and why
- Consider using time-based automation to enable or disable suppression rules
## Validation
After applying the Terraform configuration, verify that:
- Suppressed assertions configurations are active in your Knowledge Graph instance
- Configurations appear in the Knowledge Graph UI under **Observability > Rules > Suppress**
- Matching alerts are properly suppressed
- Suppression rules don't affect unintended alerts
## Related documentation
- [Suppress insights in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/troubleshoot-infra-apps/suppress-insights/)
- [Get started with Terraform for Knowledge Graph](../getting-started/)
- [Configure notifications](/docs/grafana-cloud/knowledge-graph/configure/notifications/)
@@ -0,0 +1,355 @@
---
description: Configure thresholds for Knowledge Graph using Terraform
menuTitle: Thresholds
title: Configure thresholds using Terraform
weight: 600
keywords:
- Terraform
- Knowledge Graph
- Thresholds
- Request Thresholds
- Resource Thresholds
- Health Thresholds
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/thresholds/
---
# Configure thresholds using Terraform
Threshold configurations in [Knowledge Graph](/docs/grafana-cloud/knowledge-graph/) allow you to define custom thresholds for request, resource, and health assertions. These configurations help you set specific limits and conditions for monitoring your services and infrastructure.
For information about managing thresholds in the Knowledge Graph UI, refer to [Manage thresholds](/docs/grafana-cloud/knowledge-graph/configure/manage-thresholds/).
## Basic threshold configuration
Create a file named `thresholds.tf` and add the following:
```terraform
# Basic threshold configuration with all three types
resource "grafana_asserts_thresholds" "basic" {
provider = grafana.asserts
request_thresholds = [{
entity_name = "payment-service"
assertion_name = "ErrorRatioBreach"
request_type = "inbound"
request_context = "/charge"
value = 0.01
}]
resource_thresholds = [{
assertion_name = "Saturation"
resource_type = "container"
container_name = "worker"
source = "metrics"
severity = "warning"
value = 75
}]
health_thresholds = [{
assertion_name = "ServiceDown"
expression = "up < 1"
entity_type = "Service"
}]
}
```
## Request threshold configurations
Configure thresholds for different service request types and contexts:
```terraform
# Multiple request thresholds for different services
resource "grafana_asserts_thresholds" "request_thresholds" {
provider = grafana.asserts
request_thresholds = [
{
entity_name = "api-service"
assertion_name = "ErrorRatioBreach"
request_type = "inbound"
request_context = "/api/v1/users"
value = 0.02
},
{
entity_name = "api-service"
assertion_name = "LatencyP99ErrorBuildup"
request_type = "inbound"
request_context = "/api/v1/orders"
value = 500
},
{
entity_name = "payment-gateway"
assertion_name = "RequestRateAnomaly"
request_type = "outbound"
request_context = "/payment/process"
value = 1000
}
]
}
```
## Resource threshold configurations
Define resource thresholds for different severity levels:
```terraform
# Resource thresholds for different severity levels
resource "grafana_asserts_thresholds" "resource_thresholds" {
provider = grafana.asserts
resource_thresholds = [
{
assertion_name = "Saturation"
resource_type = "container"
container_name = "web-server"
source = "metrics"
severity = "warning"
value = 75
},
{
assertion_name = "Saturation"
resource_type = "container"
container_name = "web-server"
source = "metrics"
severity = "critical"
value = 90
},
{
assertion_name = "ResourceRateBreach"
resource_type = "Pod"
container_name = "database"
source = "logs"
severity = "warning"
value = 80
}
]
}
```
## Health threshold configurations
Configure health checks with Prometheus expressions:
```terraform
# Health thresholds with Prometheus expressions
resource "grafana_asserts_thresholds" "health_thresholds" {
provider = grafana.asserts
health_thresholds = [
{
assertion_name = "ServiceDown"
expression = "up{job=\"api-service\"} < 1"
entity_type = "Service"
},
{
assertion_name = "HighMemoryUsage"
expression = "memory_usage_percent > 85"
entity_type = "Service"
},
{
assertion_name = "DatabaseConnectivity"
expression = "db_connection_pool_active / db_connection_pool_max > 0.9"
entity_type = "Service"
}
]
}
```
## Comprehensive threshold configuration
Define comprehensive thresholds for production environments:
```terraform
# Production environment with comprehensive thresholds
resource "grafana_asserts_thresholds" "production" {
provider = grafana.asserts
request_thresholds = [
{
entity_name = "frontend"
assertion_name = "ErrorRatioBreach"
request_type = "inbound"
request_context = "/"
value = 0.005
},
{
entity_name = "backend-api"
assertion_name = "LatencyP99ErrorBuildup"
request_type = "inbound"
request_context = "/api"
value = 200
}
]
resource_thresholds = [
{
assertion_name = "Saturation"
resource_type = "container"
container_name = "frontend"
source = "metrics"
severity = "warning"
value = 70
},
{
assertion_name = "Saturation"
resource_type = "container"
container_name = "backend-api"
source = "metrics"
severity = "critical"
value = 85
}
]
health_thresholds = [
{
assertion_name = "ServiceDown"
expression = "up < 1"
entity_type = "Service"
},
{
assertion_name = "NodeDown"
expression = "up{job=\"node-exporter\"} < 1"
entity_type = "Service"
}
]
}
```
## Resource reference
### `grafana_asserts_thresholds`
Manage Knowledge Graph threshold configurations through the Grafana API. This resource allows you to define custom thresholds for request, resource, and health assertions.
#### Arguments
| Name | Type | Required | Description |
| --------------------- | -------------- | -------- | ------------------------------------------------------------------------------------------------------------------------ |
| `request_thresholds` | `list(object)` | No | List of request threshold configurations. Refer to [request thresholds block](#request-thresholds-block) for details. |
| `resource_thresholds` | `list(object)` | No | List of resource threshold configurations. Refer to [resource thresholds block](#resource-thresholds-block) for details. |
| `health_thresholds` | `list(object)` | No | List of health threshold configurations. Refer to [health thresholds block](#health-thresholds-block) for details. |
#### Request thresholds block
Each `request_thresholds` block supports the following:
| Name | Type | Required | Description |
| ----------------- | -------- | -------- | ------------------------------------------------------ |
| `entity_name` | `string` | Yes | The name of the entity to apply the threshold to. |
| `assertion_name` | `string` | Yes | The name of the assertion to configure. |
| `request_type` | `string` | Yes | The type of request (inbound, outbound). |
| `request_context` | `string` | Yes | The request context or path to apply the threshold to. |
| `value` | `number` | Yes | The threshold value. |
#### Resource thresholds block
Each `resource_thresholds` block supports the following:
| Name | Type | Required | Description |
| ---------------- | -------- | -------- | ---------------------------------------------------- |
| `assertion_name` | `string` | Yes | The name of the assertion to configure. |
| `resource_type` | `string` | Yes | The type of resource (container, Pod, node). |
| `container_name` | `string` | Yes | The name of the container to apply the threshold to. |
| `source` | `string` | Yes | The source of the metrics (metrics, logs). |
| `severity` | `string` | Yes | The severity level (warning, critical). |
| `value` | `number` | Yes | The threshold value. |
#### Health thresholds block
Each `health_thresholds` block supports the following:
| Name | Type | Required | Description |
| ---------------- | -------- | -------- | ------------------------------------------------------------------------------------ |
| `assertion_name` | `string` | Yes | The name of the assertion to configure. |
| `expression` | `string` | Yes | The Prometheus expression for the health check. |
| `entity_type` | `string` | Yes | Entity type for the health threshold (for example, Service, Pod, Namespace, Volume). |
| `alert_category` | `string` | No | Optional alert category label for the health threshold. |
#### Example
```terraform
resource "grafana_asserts_thresholds" "example" {
provider = grafana.asserts
request_thresholds = [{
entity_name = "api-service"
assertion_name = "ErrorRatioBreach"
request_type = "inbound"
request_context = "/api/v1/users"
value = 0.02
}]
resource_thresholds = [{
assertion_name = "Saturation"
resource_type = "container"
container_name = "web-server"
source = "metrics"
severity = "warning"
value = 75
}]
health_thresholds = [{
assertion_name = "ServiceDown"
expression = "up{job=\"api-service\"} < 1"
entity_type = "Service"
}]
}
```
## Best practices
### Threshold configuration management
- Set appropriate threshold values based on your service level objectives (SLOs)
- Use different severity levels (warning, critical) to create escalation paths
- Test threshold configurations in non-production environments first
- Monitor threshold effectiveness and adjust values based on actual performance data
### Request threshold best practices
- Configure request thresholds for critical user-facing endpoints
- Set different thresholds for different request types (inbound vs outbound)
- Consider request context when setting thresholds for specific API paths
- Use error ratio thresholds to catch service degradation early
- Review historical performance data to set realistic threshold values
### Resource threshold best practices
- Set resource thresholds based on your infrastructure capacity
- Use container-specific thresholds for microservices architectures
- Configure both warning and critical thresholds for gradual escalation
- Monitor resource utilization patterns to set realistic threshold values
- Consider seasonal or periodic patterns in resource usage
### Health threshold best practices
- Use Prometheus expressions that accurately reflect service health
- Test health check expressions independently before applying them
- Set up health thresholds for critical dependencies and external services
- Use composite expressions for complex health checks
- Ensure expressions perform efficiently without causing excessive load
### Value selection guidelines
- Start conservative and adjust based on real-world performance
- Use percentages (0-1 range) for ratio-based metrics
- Use milliseconds for latency thresholds
- Document the reasoning behind specific threshold values
- Review and update thresholds regularly based on system evolution
## Validation
After applying the Terraform configuration, verify that:
- Threshold configurations are applied in your Knowledge Graph instance
- Configurations appear in the Knowledge Graph UI under **Observability > Rules > Threshold**
- Request thresholds correctly identify breaches for specified services
- Resource thresholds trigger at appropriate severity levels
- Health thresholds accurately reflect service status
- Threshold values align with your SLO commitments
## Related documentation
- [Manage thresholds in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/configure/manage-thresholds/)
- [Get started with Terraform for Knowledge Graph](../getting-started/)
- [Configure alerts in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/configure/alerts/)