[release-12.3.0] Restructure As code and developer resources (#113969)
Co-authored-by: Roberto Jiménez Sánchez <roberto.jimenez@grafana.com> Co-authored-by: Anna Urbiztondo <anna.urbiztondo@grafana.com>
This commit is contained in:
+73
@@ -0,0 +1,73 @@
|
||||
---
|
||||
cards:
|
||||
items:
|
||||
- description: Learn how to set up Terraform provider and configure your environment for managing Knowledge Graph resources.
|
||||
height: 24
|
||||
href: ./getting-started/
|
||||
title: Get started with Terraform
|
||||
- description: Configure notification alerts to manage how alerts are processed and routed in your Knowledge Graph.
|
||||
height: 24
|
||||
href: ./notification-alerts/
|
||||
title: Notification alerts
|
||||
- description: Define suppression rules to temporarily disable specific alerts during maintenance windows or testing.
|
||||
height: 24
|
||||
href: ./suppressed-assertions/
|
||||
title: Suppressed assertions
|
||||
- description: Create custom entity models and define how entities are discovered based on Prometheus queries.
|
||||
height: 24
|
||||
href: ./custom-model-rules/
|
||||
title: Custom model rules
|
||||
- description: Configure log data correlation with entities using data source mappings and filtering options.
|
||||
height: 24
|
||||
href: ./log-configurations/
|
||||
title: Log configurations
|
||||
- description: Set custom thresholds for request, resource, and health assertions to monitor your services.
|
||||
height: 24
|
||||
href: ./thresholds/
|
||||
title: Thresholds
|
||||
- description: Configure knowledge graph SLOs with entity-centric monitoring and RCA workbench integration for root cause analysis.
|
||||
height: 24
|
||||
href: ./knowledge-graph-slo/
|
||||
title: Knowledge graph SLOs
|
||||
title_class: pt-0 lh-1
|
||||
description: Manage Grafana Cloud Knowledge Graph using Terraform
|
||||
hero:
|
||||
description: Use Terraform to manage Grafana Cloud Knowledge Graph resources as code. Configure notification alerts, suppressed assertions, custom model rules, log configurations, and threshold configurations using infrastructure as code best practices.
|
||||
level: 1
|
||||
title: Manage Knowledge Graph using Terraform
|
||||
menuTitle: Manage Knowledge Graph in Grafana Cloud using Terraform
|
||||
title: Manage Knowledge Graph in Grafana Cloud using Terraform
|
||||
weight: 130
|
||||
keywords:
|
||||
- Infrastructure as Code
|
||||
- Quickstart
|
||||
- Grafana Cloud
|
||||
- Terraform
|
||||
- Knowledge Graph
|
||||
- Alert Configuration
|
||||
- Suppressed Assertions
|
||||
- Custom Model Rules
|
||||
- Log Configuration
|
||||
- Threshold Configuration
|
||||
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/
|
||||
---
|
||||
|
||||
{{< docs/hero-simple key="hero" >}}
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Terraform enables you to manage [Grafana Cloud Knowledge Graph](/docs/grafana-cloud/knowledge-graph/) resources using infrastructure as code. With Terraform, you can define, version control, and deploy Knowledge Graph configurations including alert rules, suppression policies, entity models, log correlations, and thresholds.
|
||||
|
||||
## Explore
|
||||
|
||||
{{< card-grid key="cards" type="simple" >}}
|
||||
|
||||
---
|
||||
|
||||
## Related resources
|
||||
|
||||
- [Grafana Terraform Provider Documentation](https://registry.terraform.io/providers/grafana/grafana/latest/docs)
|
||||
- [Knowledge Graph Documentation](/docs/grafana-cloud/knowledge-graph/)
|
||||
- [Terraform Best Practices](https://www.terraform.io/docs/cloud/guides/recommended-practices/index.html)
|
||||
+431
@@ -0,0 +1,431 @@
|
||||
---
|
||||
description: Define custom entity models for Knowledge Graph using Terraform
|
||||
menuTitle: Custom model rules
|
||||
title: Create custom model rules using Terraform
|
||||
weight: 400
|
||||
keywords:
|
||||
- Terraform
|
||||
- Knowledge Graph
|
||||
- Custom Model Rules
|
||||
- Entity Models
|
||||
- Prometheus
|
||||
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/custom-model-rules/
|
||||
---
|
||||
|
||||
# Create custom model rules using Terraform
|
||||
|
||||
Custom model rules in [Knowledge Graph](/docs/grafana-cloud/knowledge-graph/) allow you to define how entities are discovered and modeled based on Prometheus queries. These rules enable you to create custom entity types, define their relationships, and specify how they should be enriched with additional data.
|
||||
|
||||
For information about managing entities and relations in the Knowledge Graph UI, refer to [Manage entities and relations](/docs/grafana-cloud/knowledge-graph/configure/manage-entities-relations/).
|
||||
|
||||
## Basic custom model rules
|
||||
|
||||
Create a file named `custom-model-rules.tf` and add the following:
|
||||
|
||||
```terraform
|
||||
# Basic custom model rule for services
|
||||
resource "grafana_asserts_custom_model_rules" "basic_service" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "basic-service-model"
|
||||
|
||||
rules {
|
||||
entity {
|
||||
type = "Service"
|
||||
name = "service"
|
||||
|
||||
defined_by {
|
||||
query = "up{job!=''}"
|
||||
label_values = {
|
||||
service = "job"
|
||||
}
|
||||
literals = {
|
||||
_source = "up_query"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Advanced service model with scope and lookup
|
||||
|
||||
Define service entities with environment scoping and relationship mappings:
|
||||
|
||||
```terraform
|
||||
# Advanced service model with environment scoping
|
||||
resource "grafana_asserts_custom_model_rules" "advanced_service" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "advanced-service-model"
|
||||
|
||||
rules {
|
||||
entity {
|
||||
type = "Service"
|
||||
name = "workload | service | job"
|
||||
|
||||
scope = {
|
||||
namespace = "namespace"
|
||||
env = "asserts_env"
|
||||
site = "asserts_site"
|
||||
}
|
||||
|
||||
lookup = {
|
||||
workload = "workload | deployment | statefulset | daemonset | replicaset"
|
||||
service = "service"
|
||||
job = "job"
|
||||
proxy_job = "job"
|
||||
}
|
||||
|
||||
defined_by {
|
||||
query = "up{job!='', asserts_env!=''}"
|
||||
label_values = {
|
||||
service = "service"
|
||||
job = "job"
|
||||
workload = "workload"
|
||||
namespace = "namespace"
|
||||
}
|
||||
literals = {
|
||||
_source = "up_with_workload"
|
||||
}
|
||||
}
|
||||
|
||||
defined_by {
|
||||
query = "up{job='maintenance'}"
|
||||
disabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Multi-entity model configuration
|
||||
|
||||
Define multiple entity types in a single configuration:
|
||||
|
||||
```terraform
|
||||
# Multiple entity types in a single model
|
||||
resource "grafana_asserts_custom_model_rules" "multi_entity" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "kubernetes-entities"
|
||||
|
||||
rules {
|
||||
# Service entity
|
||||
entity {
|
||||
type = "Service"
|
||||
name = "service"
|
||||
|
||||
scope = {
|
||||
namespace = "namespace"
|
||||
cluster = "cluster"
|
||||
}
|
||||
|
||||
defined_by {
|
||||
query = "up{service!=''}"
|
||||
label_values = {
|
||||
service = "service"
|
||||
namespace = "namespace"
|
||||
cluster = "cluster"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Pod entity
|
||||
entity {
|
||||
type = "Pod"
|
||||
name = "Pod"
|
||||
|
||||
scope = {
|
||||
namespace = "namespace"
|
||||
cluster = "cluster"
|
||||
}
|
||||
|
||||
lookup = {
|
||||
service = "service"
|
||||
workload = "workload"
|
||||
}
|
||||
|
||||
defined_by {
|
||||
query = "kube_pod_info{pod!=''}"
|
||||
label_values = {
|
||||
Pod = "pod"
|
||||
namespace = "namespace"
|
||||
cluster = "cluster"
|
||||
service = "service"
|
||||
}
|
||||
literals = {
|
||||
_entity_type = "Pod"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Namespace entity
|
||||
entity {
|
||||
type = "Namespace"
|
||||
name = "namespace"
|
||||
|
||||
scope = {
|
||||
cluster = "cluster"
|
||||
}
|
||||
|
||||
defined_by {
|
||||
query = "kube_namespace_status_phase{namespace!=''}"
|
||||
label_values = {
|
||||
namespace = "namespace"
|
||||
cluster = "cluster"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Complex entity with enrichment
|
||||
|
||||
Create service entities with multiple data sources and enrichment:
|
||||
|
||||
```terraform
|
||||
# Service entity with enrichment from multiple sources
|
||||
resource "grafana_asserts_custom_model_rules" "enriched_service" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "enriched-service-model"
|
||||
|
||||
rules {
|
||||
entity {
|
||||
type = "Service"
|
||||
name = "service"
|
||||
|
||||
enriched_by = [
|
||||
"prometheus_metrics",
|
||||
"kubernetes_metadata",
|
||||
"application_logs"
|
||||
]
|
||||
|
||||
scope = {
|
||||
environment = "asserts_env"
|
||||
region = "asserts_site"
|
||||
team = "team"
|
||||
}
|
||||
|
||||
lookup = {
|
||||
deployment = "workload"
|
||||
Pod = "pod"
|
||||
container = "container"
|
||||
}
|
||||
|
||||
# Primary definition from service up metrics
|
||||
defined_by {
|
||||
query = "up{service!='', asserts_env!=''}"
|
||||
label_values = {
|
||||
service = "service"
|
||||
environment = "asserts_env"
|
||||
region = "asserts_site"
|
||||
team = "team"
|
||||
}
|
||||
literals = {
|
||||
_primary_source = "service_up"
|
||||
}
|
||||
}
|
||||
|
||||
# Secondary definition from application metrics
|
||||
defined_by {
|
||||
query = "http_requests_total{service!=''}"
|
||||
label_values = {
|
||||
service = "service"
|
||||
environment = "environment"
|
||||
version = "version"
|
||||
}
|
||||
literals = {
|
||||
_secondary_source = "http_metrics"
|
||||
}
|
||||
}
|
||||
|
||||
# Disabled definition for testing
|
||||
defined_by {
|
||||
query = "test_metric{service!=''}"
|
||||
disabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Database and infrastructure entities
|
||||
|
||||
Define database and infrastructure entity models:
|
||||
|
||||
```terraform
|
||||
# Database and infrastructure entity models
|
||||
resource "grafana_asserts_custom_model_rules" "infrastructure" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "infrastructure-entities"
|
||||
|
||||
rules {
|
||||
# Database entity
|
||||
entity {
|
||||
type = "Database"
|
||||
name = "database_instance"
|
||||
|
||||
scope = {
|
||||
environment = "env"
|
||||
region = "region"
|
||||
}
|
||||
|
||||
lookup = {
|
||||
host = "instance"
|
||||
port = "port"
|
||||
db_name = "database"
|
||||
}
|
||||
|
||||
defined_by {
|
||||
query = "mysql_up{instance!=''}"
|
||||
label_values = {
|
||||
database_instance = "instance"
|
||||
database = "database"
|
||||
env = "environment"
|
||||
region = "region"
|
||||
}
|
||||
literals = {
|
||||
_db_type = "mysql"
|
||||
}
|
||||
metric_value = "1"
|
||||
}
|
||||
|
||||
defined_by {
|
||||
query = "postgres_up{instance!=''}"
|
||||
label_values = {
|
||||
database_instance = "instance"
|
||||
database = "datname"
|
||||
env = "environment"
|
||||
}
|
||||
literals = {
|
||||
_db_type = "postgresql"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Load balancer entity
|
||||
entity {
|
||||
type = "LoadBalancer"
|
||||
name = "lb_instance"
|
||||
|
||||
scope = {
|
||||
environment = "env"
|
||||
}
|
||||
|
||||
defined_by {
|
||||
query = "haproxy_up{proxy!=''}"
|
||||
label_values = {
|
||||
lb_instance = "instance"
|
||||
proxy = "proxy"
|
||||
env = "environment"
|
||||
}
|
||||
literals = {
|
||||
_lb_type = "haproxy"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Resource reference
|
||||
|
||||
### `grafana_asserts_custom_model_rules`
|
||||
|
||||
Manage Knowledge Graph custom model rules through the Grafana API. This resource allows you to define custom entity models based on Prometheus queries with advanced mapping and enrichment capabilities.
|
||||
|
||||
#### Arguments
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ------- | -------------- | -------- | -------------------------------------------------------------------------------------------------------- |
|
||||
| `name` | `string` | Yes | The name of the custom model rules. This field is immutable and forces recreation if changed. |
|
||||
| `rules` | `list(object)` | Yes | The rules configuration containing entity definitions. Refer to [rules block](#rules-block) for details. |
|
||||
|
||||
#### Rules block
|
||||
|
||||
Each `rules` block supports the following:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| -------- | -------------- | -------- | ------------------------------------------------------------------------------- |
|
||||
| `entity` | `list(object)` | Yes | List of entity definitions. Refer to [entity block](#entity-block) for details. |
|
||||
|
||||
#### Entity block
|
||||
|
||||
Each `entity` block supports the following:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ------------- | -------------- | -------- | ------------------------------------------------------------------------------------------------------ |
|
||||
| `type` | `string` | Yes | The type of the entity (for example, Service, Pod, Namespace). |
|
||||
| `name` | `string` | Yes | The name pattern for the entity. Can include pipe-separated alternatives. |
|
||||
| `defined_by` | `list(object)` | Yes | List of queries that define this entity. Refer to [`defined_by` block](#defined_by-block) for details. |
|
||||
| `disabled` | `bool` | No | Whether this entity is disabled. Defaults to `false`. |
|
||||
| `enriched_by` | `list(string)` | No | List of enrichment sources for the entity. |
|
||||
| `lookup` | `map(string)` | No | Lookup mappings for the entity to relate different label names. |
|
||||
| `scope` | `map(string)` | No | Scope labels that define the boundaries of this entity type. |
|
||||
|
||||
#### `defined_by` block
|
||||
|
||||
Each `defined_by` block supports the following:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| -------------- | ------------- | -------- | ------------------------------------------------------------------------- |
|
||||
| `query` | `string` | Yes | The Prometheus query that defines this entity. |
|
||||
| `disabled` | `bool` | No | Whether this query is disabled. Defaults to `false`. |
|
||||
| `label_values` | `map(string)` | No | Label value mappings for extracting entity attributes from query results. |
|
||||
| `literals` | `map(string)` | No | Literal value mappings for adding static attributes to entities. |
|
||||
| `metric_value` | `string` | No | Metric value to use from the query result. |
|
||||
|
||||
{{< admonition type="note" >}}
|
||||
When `disabled = true` is set for a `defined_by` query, only the `query` field is used for matching. All other fields in the block are ignored.
|
||||
{{< /admonition >}}
|
||||
|
||||
## Best practices
|
||||
|
||||
### Entity models
|
||||
|
||||
- Design your entity models to reflect your actual infrastructure and application architecture
|
||||
- Use descriptive names for custom model rules that indicate their purpose and scope
|
||||
- Start with basic entity definitions and gradually add complexity as needed
|
||||
- Define clear entity scopes using the `scope` parameter to organize entities by environment, region, or team
|
||||
|
||||
### Query design and performance
|
||||
|
||||
- Write efficient Prometheus queries that don't overload your monitoring system
|
||||
- Test your Prometheus queries independently before using them in model rules
|
||||
- Use specific label filters to reduce the scope of your queries where possible
|
||||
- Consider the cardinality implications of your entity definitions
|
||||
- Use the `disabled` flag to temporarily disable problematic queries during debugging
|
||||
|
||||
### Relationships and enrichment
|
||||
|
||||
- Use `lookup` mappings to establish relationships between different entity types
|
||||
- Leverage `enriched_by` to specify additional data sources for entity enrichment
|
||||
- Map Prometheus labels to entity attributes using clear and descriptive names
|
||||
- Use meaningful `literals` to add static metadata that helps with entity identification
|
||||
|
||||
### Label and attribute management
|
||||
|
||||
- Establish consistent labeling conventions across your infrastructure
|
||||
- Use `label_values` to extract dynamic attributes from your metrics
|
||||
- Document the meaning and expected values of custom literals
|
||||
- Ensure label names match across different entity definitions for proper relationship discovery
|
||||
|
||||
## Validation
|
||||
|
||||
After applying the Terraform configuration, verify that:
|
||||
|
||||
- Custom model rules are applied in your Knowledge Graph instance
|
||||
- Entities are being discovered according to your defined queries
|
||||
- Entity relationships and enrichment are working as expected
|
||||
- Entity graphs display the correct entity types and connections
|
||||
- Queries perform well without causing excessive load
|
||||
|
||||
## Related documentation
|
||||
|
||||
- [Manage entities and relations in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/configure/manage-entities-relations/)
|
||||
- [Get started with Terraform for Knowledge Graph](../getting-started/)
|
||||
- [Knowledge graph basics](/docs/grafana-cloud/knowledge-graph/knowledge-graph-basics/)
|
||||
+140
@@ -0,0 +1,140 @@
|
||||
---
|
||||
description: Learn how to configure Terraform to manage Knowledge Graph resources
|
||||
menuTitle: Get started
|
||||
title: Get started with Terraform for Knowledge Graph
|
||||
weight: 100
|
||||
keywords:
|
||||
- Terraform
|
||||
- Knowledge Graph
|
||||
- Provider Setup
|
||||
- Getting Started
|
||||
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/getting-started/
|
||||
---
|
||||
|
||||
# Get started with Terraform for Knowledge Graph
|
||||
|
||||
Learn how to configure Terraform to manage [Grafana Cloud Knowledge Graph](/docs/grafana-cloud/knowledge-graph/) resources. This guide walks you through setting up the Grafana Terraform provider and preparing your environment.
|
||||
|
||||
## Before you begin
|
||||
|
||||
Before you begin, ensure you have the following:
|
||||
|
||||
- A Grafana Cloud account, as shown in [Get started](/docs/grafana-cloud/get-started/)
|
||||
- [Terraform](https://www.terraform.io/downloads) installed on your machine
|
||||
- Administrator permissions in your Grafana instance
|
||||
- [Knowledge Graph enabled](/docs/grafana-cloud/knowledge-graph/get-started/) in your Grafana Cloud stack
|
||||
|
||||
{{< admonition type="note" >}}
|
||||
All Terraform configuration files should be saved in the same directory.
|
||||
{{< /admonition >}}
|
||||
|
||||
## Configure the Grafana provider
|
||||
|
||||
This Terraform configuration sets up the [Grafana provider](https://registry.terraform.io/providers/grafana/grafana/latest/docs) to provide necessary authentication when managing knowledge graph resources.
|
||||
|
||||
You can reuse a similar setup to the one described in [Creating and managing a Grafana Cloud stack using Terraform](/docs/grafana-cloud/as-code/infrastructure-as-code/terraform/terraform-cloud-stack/) to set up a service account and a token.
|
||||
|
||||
### Steps
|
||||
|
||||
1. Create a Service account and token in Grafana.
|
||||
|
||||
To create a new one, refer to [Service account tokens](/docs/grafana/latest/administration/service-accounts/#service-account-tokens).
|
||||
|
||||
1. Create a file named `main.tf` and add the following:
|
||||
|
||||
```terraform
|
||||
terraform {
|
||||
required_providers {
|
||||
grafana = {
|
||||
source = "grafana/grafana"
|
||||
version = ">= 2.9.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
provider "grafana" {
|
||||
alias = "asserts"
|
||||
|
||||
url = "<Stack-URL>"
|
||||
auth = "<Service-account-token>"
|
||||
stack_id = "<Stack-ID>"
|
||||
}
|
||||
```
|
||||
|
||||
1. Replace the following field values:
|
||||
- `<Stack-URL>` with the URL of your Grafana stack (for example, `https://my-stack.grafana.net/`)
|
||||
- `<Service-account-token>` with the service account token that you created
|
||||
- `<Stack-ID>` with your Grafana Cloud stack ID
|
||||
|
||||
{{< admonition type="note" >}}
|
||||
The `stack_id` parameter is required for Knowledge Graph resources to identify the stack where the resources belong.
|
||||
{{< /admonition >}}
|
||||
|
||||
## Apply Terraform configurations
|
||||
|
||||
After creating your Terraform configuration files, apply them using the following commands:
|
||||
|
||||
1. Initialize a working directory containing Terraform configuration files:
|
||||
|
||||
```shell
|
||||
terraform init
|
||||
```
|
||||
|
||||
1. Preview the changes that Terraform makes:
|
||||
|
||||
```shell
|
||||
terraform plan
|
||||
```
|
||||
|
||||
1. Apply the configuration files:
|
||||
|
||||
```shell
|
||||
terraform apply
|
||||
```
|
||||
|
||||
## Verify your setup
|
||||
|
||||
After applying the configuration, verify your setup by checking that:
|
||||
|
||||
- Terraform can authenticate with your Grafana Cloud stack
|
||||
- The provider is properly configured with the correct stack ID
|
||||
- No errors appear in the Terraform output
|
||||
|
||||
## Best practices
|
||||
|
||||
When managing Knowledge Graph resources with Terraform, consider the following best practices:
|
||||
|
||||
### Name conventions
|
||||
|
||||
- Use descriptive names that clearly indicate the purpose of each resource
|
||||
- Follow a consistent naming pattern across your organization
|
||||
- Include environment or team identifiers in names when appropriate
|
||||
|
||||
### Version control
|
||||
|
||||
- Store your Terraform configurations in version control (Git)
|
||||
- Use separate directories or workspaces for different environments
|
||||
- Document changes in commit messages
|
||||
|
||||
### State management
|
||||
|
||||
- Use remote state backends for team collaboration
|
||||
- Enable state locking to prevent concurrent modifications
|
||||
- Regularly back up your Terraform state files
|
||||
|
||||
### Security
|
||||
|
||||
- Never commit service account tokens or sensitive data to version control
|
||||
- Use environment variables or secret management tools for credentials
|
||||
- Rotate service account tokens regularly
|
||||
|
||||
## Next steps
|
||||
|
||||
Now that you have configured the Terraform provider, you can start managing knowledge graph resources:
|
||||
|
||||
- [Configure notification alerts](../notification-alerts/)
|
||||
- [Define suppressed assertions](../suppressed-assertions/)
|
||||
- [Create custom model rules](../custom-model-rules/)
|
||||
- [Set up log configurations](../log-configurations/)
|
||||
- [Configure thresholds](../thresholds/)
|
||||
- [Configure knowledge graph SLOs](../knowledge-graph-slo/)
|
||||
+696
@@ -0,0 +1,696 @@
|
||||
---
|
||||
description: Learn how to configure knowledge graph SLOs in Grafana using Terraform for entity-centric monitoring and root cause analysis
|
||||
menuTitle: Knowledge graph SLOs
|
||||
title: Configure knowledge graph SLOs using Terraform
|
||||
weight: 650
|
||||
keywords:
|
||||
- Terraform
|
||||
- Knowledge graph
|
||||
- SLO
|
||||
- Service Level Objectives
|
||||
- RCA workbench
|
||||
---
|
||||
|
||||
# Configure knowledge graph SLOs using Terraform
|
||||
|
||||
Service level objectives (SLOs) in the [knowledge graph](/docs/grafana-cloud/knowledge-graph/) provide entity-centric service level monitoring with integrated root cause analysis capabilities. By using the `grafana_slo_provenance` label with the value `asserts`, you can create SLOs that display the "asserts" badge in the UI and enable the **Open RCA workbench** button for seamless troubleshooting.
|
||||
|
||||
For details about creating and managing SLOs in the knowledge graph UI, refer to [Create and manage the knowledge graph SLOs](/docs/grafana-cloud/knowledge-graph/configure/manage-slos/).
|
||||
|
||||
## Overview
|
||||
|
||||
Knowledge graph SLOs extend standard Grafana SLOs with entity-centric monitoring and root cause analysis features:
|
||||
|
||||
- **Entity-centric monitoring:** SLOs are tied to specific services, applications, or infrastructure entities tracked by the knowledge graph
|
||||
- **RCA workbench integration:** The **Open RCA workbench** button enables deep-linking to pre-filtered troubleshooting views
|
||||
- **Knowledge graph provenance badge:** SLOs display an "asserts" badge instead of "provisioned" in the UI
|
||||
- **Search expressions:** Define custom search expressions to filter entities in RCA workbench when troubleshooting an SLO breach
|
||||
|
||||
## Before you begin
|
||||
|
||||
To create a knowledge graph SLO using Terraform, you need to:
|
||||
|
||||
- Configure the knowledge graph and have metrics flowing into Grafana Cloud
|
||||
- [Set up Terraform for the knowledge Graph](../getting-started/)
|
||||
- Possess knowledge of and have experience with defining SLOs, SLIs, SLAs, and error budgets
|
||||
- Have an understanding of PromQL
|
||||
|
||||
## Create a basic knowledge graph SLO
|
||||
|
||||
Create a file named `kg-slo.tf` and add the following:
|
||||
|
||||
```terraform
|
||||
# Basic knowledge graph SLO with entity-centric monitoring
|
||||
resource "grafana_slo" "kg_example" {
|
||||
name = "API Service Availability"
|
||||
description = "SLO managed by knowledge graph for entity-centric monitoring and RCA"
|
||||
|
||||
query {
|
||||
freeform {
|
||||
query = "sum(rate(http_requests_total{code!~\"5..\"}[$__rate_interval])) / sum(rate(http_requests_total[$__rate_interval]))"
|
||||
}
|
||||
type = "freeform"
|
||||
}
|
||||
|
||||
objectives {
|
||||
value = 0.995
|
||||
window = "30d"
|
||||
}
|
||||
|
||||
destination_datasource {
|
||||
uid = "grafanacloud-prom"
|
||||
}
|
||||
|
||||
# Knowledge graph integration labels
|
||||
# The grafana_slo_provenance label triggers knowledge graph-specific behavior:
|
||||
# - Displays "asserts" badge instead of "provisioned"
|
||||
# - Shows "Open RCA workbench" button in the SLO UI
|
||||
# - Enables correlation with knowledge graph entity-centric monitoring
|
||||
label {
|
||||
key = "grafana_slo_provenance"
|
||||
value = "asserts"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "service_name"
|
||||
value = "api-service"
|
||||
}
|
||||
|
||||
# Search expression for RCA workbench
|
||||
# This enables the "Open RCA workbench" button to deep-link with pre-filtered context
|
||||
search_expression = "service=api-service"
|
||||
|
||||
alerting {
|
||||
fastburn {
|
||||
annotation {
|
||||
key = "name"
|
||||
value = "SLO Burn Rate Very High"
|
||||
}
|
||||
annotation {
|
||||
key = "description"
|
||||
value = "Error budget is burning too fast"
|
||||
}
|
||||
}
|
||||
slowburn {
|
||||
annotation {
|
||||
key = "name"
|
||||
value = "SLO Burn Rate High"
|
||||
}
|
||||
annotation {
|
||||
key = "description"
|
||||
value = "Error budget is burning too fast"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Configure an SLO with multiple entity labels
|
||||
|
||||
Configure SLOs with multiple entity labels for fine-grained filtering in RCA workbench:
|
||||
|
||||
```terraform
|
||||
# Knowledge graph SLO with comprehensive entity labels
|
||||
resource "grafana_slo" "payment_service" {
|
||||
name = "Payment Service Latency SLO"
|
||||
description = "Latency SLO for payment processing with team and environment context"
|
||||
|
||||
query {
|
||||
freeform {
|
||||
query = "histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{service=\"payment\"}[$__rate_interval])) by (le)) < 0.5"
|
||||
}
|
||||
type = "freeform"
|
||||
}
|
||||
|
||||
objectives {
|
||||
value = 0.99
|
||||
window = "7d"
|
||||
}
|
||||
|
||||
destination_datasource {
|
||||
uid = "grafanacloud-prom"
|
||||
}
|
||||
|
||||
# Knowledge graph provenance - required for RCA workbench integration
|
||||
label {
|
||||
key = "grafana_slo_provenance"
|
||||
value = "asserts"
|
||||
}
|
||||
|
||||
# Service identification
|
||||
label {
|
||||
key = "service_name"
|
||||
value = "payment-service"
|
||||
}
|
||||
|
||||
# Team ownership
|
||||
label {
|
||||
key = "team_name"
|
||||
value = "payments-team"
|
||||
}
|
||||
|
||||
# Environment
|
||||
label {
|
||||
key = "environment"
|
||||
value = "production"
|
||||
}
|
||||
|
||||
# Business unit
|
||||
label {
|
||||
key = "business_unit"
|
||||
value = "fintech"
|
||||
}
|
||||
|
||||
# Search expression with multiple filters
|
||||
search_expression = "service=payment-service AND environment=production"
|
||||
|
||||
alerting {
|
||||
fastburn {
|
||||
annotation {
|
||||
key = "name"
|
||||
value = "Payment Latency Critical"
|
||||
}
|
||||
annotation {
|
||||
key = "description"
|
||||
value = "Payment service P99 latency exceeding SLO - immediate attention required"
|
||||
}
|
||||
annotation {
|
||||
key = "runbook_url"
|
||||
value = "https://docs.example.com/runbooks/payment-latency"
|
||||
}
|
||||
}
|
||||
slowburn {
|
||||
annotation {
|
||||
key = "name"
|
||||
value = "Payment Latency Warning"
|
||||
}
|
||||
annotation {
|
||||
key = "description"
|
||||
value = "Payment service experiencing elevated latency"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Configure a Kubernetes service SLO
|
||||
|
||||
Configure knowledge graph SLOs for Kubernetes services with Pod and namespace context:
|
||||
|
||||
```terraform
|
||||
# Knowledge graph SLO for Kubernetes service
|
||||
resource "grafana_slo" "k8s_frontend" {
|
||||
name = "Frontend Service Availability"
|
||||
description = "Availability SLO for frontend service in Kubernetes"
|
||||
|
||||
query {
|
||||
freeform {
|
||||
query = "sum(rate(http_requests_total{namespace=\"frontend\",code!~\"5..\"}[$__rate_interval])) / sum(rate(http_requests_total{namespace=\"frontend\"}[$__rate_interval]))"
|
||||
}
|
||||
type = "freeform"
|
||||
}
|
||||
|
||||
objectives {
|
||||
value = 0.999
|
||||
window = "30d"
|
||||
}
|
||||
|
||||
destination_datasource {
|
||||
uid = "grafanacloud-prom"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "grafana_slo_provenance"
|
||||
value = "asserts"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "service_name"
|
||||
value = "frontend"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "namespace"
|
||||
value = "frontend"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "cluster"
|
||||
value = "prod-us-west-2"
|
||||
}
|
||||
|
||||
# Search expression targeting Kubernetes entities
|
||||
search_expression = "namespace=frontend AND cluster=prod-us-west-2"
|
||||
|
||||
alerting {
|
||||
fastburn {
|
||||
annotation {
|
||||
key = "name"
|
||||
value = "Frontend Service Critical"
|
||||
}
|
||||
annotation {
|
||||
key = "description"
|
||||
value = "Frontend service availability below SLO"
|
||||
}
|
||||
annotation {
|
||||
key = "severity"
|
||||
value = "critical"
|
||||
}
|
||||
}
|
||||
slowburn {
|
||||
annotation {
|
||||
key = "name"
|
||||
value = "Frontend Service Degraded"
|
||||
}
|
||||
annotation {
|
||||
key = "description"
|
||||
value = "Frontend service showing signs of degradation"
|
||||
}
|
||||
annotation {
|
||||
key = "severity"
|
||||
value = "warning"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Configure an API endpoint-specific SLO
|
||||
|
||||
Configure knowledge graph SLOs for specific API endpoints with request context:
|
||||
|
||||
```terraform
|
||||
# Knowledge graph SLO for critical API endpoint
|
||||
resource "grafana_slo" "checkout_api" {
|
||||
name = "Checkout API Availability"
|
||||
description = "Availability SLO for /api/checkout endpoint"
|
||||
|
||||
query {
|
||||
freeform {
|
||||
query = "sum(rate(http_requests_total{path=\"/api/checkout\",code!~\"5..\"}[$__rate_interval])) / sum(rate(http_requests_total{path=\"/api/checkout\"}[$__rate_interval]))"
|
||||
}
|
||||
type = "freeform"
|
||||
}
|
||||
|
||||
objectives {
|
||||
value = 0.9999
|
||||
window = "30d"
|
||||
}
|
||||
|
||||
destination_datasource {
|
||||
uid = "grafanacloud-prom"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "grafana_slo_provenance"
|
||||
value = "asserts"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "service_name"
|
||||
value = "checkout-service"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "endpoint"
|
||||
value = "/api/checkout"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "criticality"
|
||||
value = "high"
|
||||
}
|
||||
|
||||
# Search expression with endpoint context
|
||||
search_expression = "service=checkout-service AND path=/api/checkout"
|
||||
|
||||
alerting {
|
||||
fastburn {
|
||||
annotation {
|
||||
key = "name"
|
||||
value = "Checkout API Critical Failure"
|
||||
}
|
||||
annotation {
|
||||
key = "description"
|
||||
value = "Checkout API experiencing high error rates - revenue impact"
|
||||
}
|
||||
annotation {
|
||||
key = "severity"
|
||||
value = "critical"
|
||||
}
|
||||
annotation {
|
||||
key = "alert_priority"
|
||||
value = "P0"
|
||||
}
|
||||
}
|
||||
slowburn {
|
||||
annotation {
|
||||
key = "name"
|
||||
value = "Checkout API Degradation"
|
||||
}
|
||||
annotation {
|
||||
key = "description"
|
||||
value = "Checkout API showing elevated error rates"
|
||||
}
|
||||
annotation {
|
||||
key = "severity"
|
||||
value = "warning"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Configure a multi-environment SLO
|
||||
|
||||
Manage knowledge graph SLOs across multiple environments using Terraform workspaces or modules:
|
||||
|
||||
```terraform
|
||||
# Variable for environment-specific configuration
|
||||
variable "environment" {
|
||||
description = "Environment name"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "slo_target" {
|
||||
description = "SLO target percentage"
|
||||
type = number
|
||||
}
|
||||
|
||||
# Environment-aware knowledge graph SLO
|
||||
resource "grafana_slo" "api_service" {
|
||||
name = "${var.environment} - API Service Availability"
|
||||
description = "API service availability SLO for ${var.environment} environment"
|
||||
|
||||
query {
|
||||
freeform {
|
||||
query = "sum(rate(http_requests_total{environment=\"${var.environment}\",code!~\"5..\"}[$__rate_interval])) / sum(rate(http_requests_total{environment=\"${var.environment}\"}[$__rate_interval]))"
|
||||
}
|
||||
type = "freeform"
|
||||
}
|
||||
|
||||
objectives {
|
||||
value = var.slo_target
|
||||
window = "30d"
|
||||
}
|
||||
|
||||
destination_datasource {
|
||||
uid = "grafanacloud-prom"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "grafana_slo_provenance"
|
||||
value = "asserts"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "service_name"
|
||||
value = "api-service"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "environment"
|
||||
value = var.environment
|
||||
}
|
||||
|
||||
search_expression = "service=api-service AND environment=${var.environment}"
|
||||
|
||||
alerting {
|
||||
fastburn {
|
||||
annotation {
|
||||
key = "name"
|
||||
value = "${var.environment} API Critical"
|
||||
}
|
||||
annotation {
|
||||
key = "description"
|
||||
value = "API service in ${var.environment} experiencing critical errors"
|
||||
}
|
||||
}
|
||||
slowburn {
|
||||
annotation {
|
||||
key = "name"
|
||||
value = "${var.environment} API Warning"
|
||||
}
|
||||
annotation {
|
||||
key = "description"
|
||||
value = "API service in ${var.environment} showing elevated errors"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Resource reference
|
||||
|
||||
### `grafana_slo` with knowledge graph provenance
|
||||
|
||||
When creating knowledge graph-managed SLOs, the `grafana_slo` resource requires the `grafana_slo_provenance` label set to `asserts` to enable RCA workbench integration.
|
||||
|
||||
#### Required knowledge graph configuration
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ------------------------------ | -------- | ----------- | -------------------------------------------------------------------------------------------------- |
|
||||
| `grafana_slo_provenance` label | `string` | Yes | Must be set to `asserts` to enable knowledge graph-specific features and RCA workbench integration |
|
||||
| `search_expression` | `string` | Recommended | Search expression for filtering entities in RCA workbench |
|
||||
|
||||
#### Key arguments for knowledge graph SLOs
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ------------------------ | -------------- | -------- | ----------------------------------------------------------------- |
|
||||
| `name` | `string` | Yes | The name of the SLO |
|
||||
| `description` | `string` | No | Description of the SLO purpose and scope |
|
||||
| `query` | `object` | Yes | Query configuration defining how SLO is calculated |
|
||||
| `objectives` | `object` | Yes | Target objectives including value and time window |
|
||||
| `destination_datasource` | `object` | Yes | Destination data source for SLO metrics |
|
||||
| `label` | `list(object)` | Yes | Labels for the SLO, must include `grafana_slo_provenance=asserts` |
|
||||
| `search_expression` | `string` | No | Search expression for RCA workbench filtering |
|
||||
| `alerting` | `object` | No | Alerting configuration for fast burn and slow burn alerts |
|
||||
|
||||
#### Query block
|
||||
|
||||
The `query` block supports the following:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ---------- | -------- | -------- | --------------------------------------------------------- |
|
||||
| `type` | `string` | Yes | Query type, typically `freeform` for knowledge graph SLOs |
|
||||
| `freeform` | `object` | Yes | Freeform query configuration |
|
||||
|
||||
The `freeform` block supports:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ------- | -------- | -------- | -------------------------------- |
|
||||
| `query` | `string` | Yes | PromQL query for SLO calculation |
|
||||
|
||||
#### Objectives block
|
||||
|
||||
The `objectives` block supports the following:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| -------- | -------- | -------- | --------------------------------------------------- |
|
||||
| `value` | `number` | Yes | Target SLO value (for example, 0.995 for 99.5%) |
|
||||
| `window` | `string` | Yes | Time window for SLO evaluation (for example, "30d") |
|
||||
|
||||
#### Label block
|
||||
|
||||
Each `label` block supports the following:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ------- | -------- | -------- | ----------- |
|
||||
| `key` | `string` | Yes | Label key |
|
||||
| `value` | `string` | Yes | Label value |
|
||||
|
||||
**Required label for knowledge graph SLOs:**
|
||||
|
||||
- `grafana_slo_provenance` = `asserts` (enables knowledge graph features)
|
||||
|
||||
**Recommended labels for entity tracking:**
|
||||
|
||||
- `service_name` - Name of the service
|
||||
- `team_name` - Team responsible for the service
|
||||
- `environment` - Environment (prod, staging, development)
|
||||
- `namespace` - Kubernetes namespace
|
||||
- `cluster` - Kubernetes cluster name
|
||||
|
||||
<!-- vale Grafana.Gerunds = NO -->
|
||||
|
||||
#### Alerting block
|
||||
|
||||
The `alerting` block supports the following:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ---------- | -------- | -------- | ---------------------------------- |
|
||||
| `fastburn` | `object` | No | Fast burn rate alert configuration |
|
||||
| `slowburn` | `object` | No | Slow burn rate alert configuration |
|
||||
|
||||
Each alert block (`fastburn`, `slowburn`) supports:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ------------ | -------------- | -------- | ------------------------------- |
|
||||
| `annotation` | `list(object)` | No | Annotations to add to the alert |
|
||||
|
||||
Each `annotation` block supports:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ------- | -------- | -------- | ---------------- |
|
||||
| `key` | `string` | Yes | Annotation key |
|
||||
| `value` | `string` | Yes | Annotation value |
|
||||
|
||||
Common annotation keys:
|
||||
|
||||
- `name` - Alert name
|
||||
- `description` - Alert description
|
||||
- `severity` - Alert severity level
|
||||
- `runbook_url` - Link to runbook documentation
|
||||
<!-- vale Grafana.Gerunds = YES -->
|
||||
|
||||
#### Example
|
||||
|
||||
```terraform
|
||||
resource "grafana_slo" "kg_example" {
|
||||
name = "My Service SLO"
|
||||
description = "SLO with knowledge graph RCA integration"
|
||||
|
||||
query {
|
||||
freeform {
|
||||
query = "sum(rate(http_requests_total{code!~\"5..\"}[$__rate_interval])) / sum(rate(http_requests_total[$__rate_interval]))"
|
||||
}
|
||||
type = "freeform"
|
||||
}
|
||||
|
||||
objectives {
|
||||
value = 0.995
|
||||
window = "30d"
|
||||
}
|
||||
|
||||
destination_datasource {
|
||||
uid = "grafanacloud-prom"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "grafana_slo_provenance"
|
||||
value = "asserts"
|
||||
}
|
||||
|
||||
label {
|
||||
key = "service_name"
|
||||
value = "my-service"
|
||||
}
|
||||
|
||||
search_expression = "service=my-service"
|
||||
|
||||
alerting {
|
||||
fastburn {
|
||||
annotation {
|
||||
key = "name"
|
||||
value = "SLO Fast Burn"
|
||||
}
|
||||
}
|
||||
slowburn {
|
||||
annotation {
|
||||
key = "name"
|
||||
value = "SLO Slow Burn"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
Follow these best practices when setting knowledge graph SLOs.
|
||||
|
||||
### Use the knowledge graph provenance label
|
||||
|
||||
- Always include the `grafana_slo_provenance` label with value `asserts` for knowledge graph-managed SLOs
|
||||
- This label enables the "asserts" badge in the UI instead of "provisioned"
|
||||
- It also enables the **Open RCA workbench** button for troubleshooting SLO breaches
|
||||
|
||||
### Define search expressions
|
||||
|
||||
- Define meaningful search expressions that filter relevant entities in RCA workbench
|
||||
- The search expression defines which entities populate RCA workbench when you troubleshoot an SLO breach
|
||||
- Use entity attributes like service name, environment, namespace, and cluster
|
||||
- Combine multiple filters with `AND` operators for precise filtering
|
||||
- Test search expressions in RCA workbench before codifying them in Terraform
|
||||
|
||||
### Add entity labels
|
||||
|
||||
- Add descriptive labels to track service ownership, environment, and criticality
|
||||
- Use consistent label naming conventions across all SLOs
|
||||
- Include team names to enable quick identification of ownership
|
||||
- Tag critical business services with appropriate labels
|
||||
|
||||
### Set SLO targets
|
||||
|
||||
- Set realistic SLO targets based on service requirements and capabilities
|
||||
- Use higher targets (0.999+) for critical user-facing services
|
||||
- Consider different targets for different environments (production vs staging)
|
||||
- Review and adjust targets based on actual service performance
|
||||
|
||||
### Add alert annotations
|
||||
|
||||
- Add comprehensive descriptions to help on-call engineers understand the alert
|
||||
- Include runbook URLs in annotations for quick access to troubleshooting guides
|
||||
- Set appropriate severity levels (critical, warning) based on business impact
|
||||
- Customize alert names to clearly identify the affected service and issue
|
||||
|
||||
### Configure queries
|
||||
|
||||
- Use PromQL queries that accurately represent service health
|
||||
- Exclude expected error codes, such as 404, from error calculations when appropriate
|
||||
- Leverage rate intervals with `$__rate_interval` for dynamic time range support
|
||||
- Test queries in Grafana before adding them to Terraform configurations
|
||||
|
||||
### Set compliance windows
|
||||
|
||||
- Use 30-day windows for production SLOs to align with monthly reporting
|
||||
- Consider shorter windows (7d) for development or testing environments
|
||||
- Ensure compliance windows align with business requirements and error budget policies
|
||||
|
||||
## Verify the configuration
|
||||
|
||||
After applying the Terraform configuration, verify that:
|
||||
|
||||
- SLOs are created in your Grafana Cloud stack
|
||||
- SLOs appear in **Observability > SLO** with the "asserts" badge
|
||||
- The **Open RCA workbench** button is visible when you expand **Objective** for an SLO
|
||||
- You can select a time range in the **Error Budget Burndown** panel and click **Open in RCA workbench**
|
||||
- Search expressions correctly filter entities in RCA workbench
|
||||
- Fast burn and slow burn alerts are configured with appropriate thresholds
|
||||
- Labels are correctly applied and visible in the SLO details
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
Follow these troubleshooting steps if you experience issues setting knowledge graph SLOs.
|
||||
|
||||
### SLO shows "provisioned" instead of "asserts" badge
|
||||
|
||||
Ensure the `grafana_slo_provenance` label is set to `asserts`:
|
||||
|
||||
```terraform
|
||||
label {
|
||||
key = "grafana_slo_provenance"
|
||||
value = "asserts"
|
||||
}
|
||||
```
|
||||
|
||||
### Open RCA workbench button not appearing
|
||||
|
||||
- Verify the `search_expression` field is populated
|
||||
- The **Open RCA workbench** button appears after you have added a search expression in the **RCA workbench Context** section
|
||||
- Ensure the search expression uses valid entity attributes
|
||||
- Check that the knowledge graph is properly configured and receiving data
|
||||
|
||||
### Alerts not triggering
|
||||
|
||||
- Verify the PromQL query returns valid results in Grafana
|
||||
- Check that the destination data source is correctly configured
|
||||
- Ensure alerting blocks are properly defined with annotations
|
||||
|
||||
## Related documentation
|
||||
|
||||
- [Create and manage knowledge graph SLOs](/docs/grafana-cloud/knowledge-graph/configure/manage-slos/)
|
||||
- [Troubleshoot an SLO breach with the knowledge graph](/docs/grafana-cloud/knowledge-graph/troubleshoot-infra-apps/slos/)
|
||||
- [Get started with Terraform for the knowledge graph](../getting-started/)
|
||||
- [Introduction to Grafana SLO](/docs/grafana-cloud/alerting-and-irm/slo/introduction/)
|
||||
- [Configure notifications in the knowledge graph](/docs/grafana-cloud/knowledge-graph/configure/notifications/)
|
||||
+290
@@ -0,0 +1,290 @@
|
||||
---
|
||||
description: Configure log correlation for Knowledge Graph using Terraform
|
||||
menuTitle: Log configurations
|
||||
title: Configure log correlation using Terraform
|
||||
weight: 500
|
||||
keywords:
|
||||
- Terraform
|
||||
- Knowledge Graph
|
||||
- Log Configuration
|
||||
- Log Correlation
|
||||
- Loki
|
||||
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/log-configurations/
|
||||
---
|
||||
|
||||
# Configure log correlation using Terraform
|
||||
|
||||
Log configurations in [Knowledge Graph](/docs/grafana-cloud/knowledge-graph/) allow you to define how log data is queried and correlated with entities. You can specify data sources, entity matching rules, label mappings, and filtering options for spans and traces.
|
||||
|
||||
For information about configuring log correlation in the Knowledge Graph UI, refer to [Configure logs correlation](/docs/grafana-cloud/knowledge-graph/configure/logs-correlation/).
|
||||
|
||||
## Basic log configuration
|
||||
|
||||
Create a file named `log-configs.tf` and add the following:
|
||||
|
||||
```terraform
|
||||
# Basic log configuration for services
|
||||
resource "grafana_asserts_log_config" "production" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "production"
|
||||
priority = 1000
|
||||
default_config = false
|
||||
data_source_uid = "grafanacloud-logs"
|
||||
error_label = "error"
|
||||
|
||||
match {
|
||||
property = "asserts_entity_type"
|
||||
op = "EQUALS"
|
||||
values = ["Service"]
|
||||
}
|
||||
|
||||
match {
|
||||
property = "environment"
|
||||
op = "EQUALS"
|
||||
values = ["production", "staging"]
|
||||
}
|
||||
|
||||
entity_property_to_log_label_mapping = {
|
||||
"otel_namespace" = "service_namespace"
|
||||
"otel_service" = "service_name"
|
||||
"environment" = "env"
|
||||
"site" = "region"
|
||||
}
|
||||
|
||||
filter_by_span_id = true
|
||||
filter_by_trace_id = true
|
||||
}
|
||||
```
|
||||
|
||||
## Log configuration with multiple match rules
|
||||
|
||||
Configure log correlation with multiple entity matching criteria:
|
||||
|
||||
```terraform
|
||||
# Development environment log configuration
|
||||
resource "grafana_asserts_log_config" "development" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "development"
|
||||
priority = 2000
|
||||
default_config = true
|
||||
data_source_uid = "elasticsearch-dev"
|
||||
error_label = "error"
|
||||
|
||||
match {
|
||||
property = "asserts_entity_type"
|
||||
op = "EQUALS"
|
||||
values = ["Service"]
|
||||
}
|
||||
|
||||
match {
|
||||
property = "environment"
|
||||
op = "EQUALS"
|
||||
values = ["development", "testing"]
|
||||
}
|
||||
|
||||
match {
|
||||
property = "site"
|
||||
op = "EQUALS"
|
||||
values = ["us-east-1"]
|
||||
}
|
||||
|
||||
match {
|
||||
property = "service"
|
||||
op = "EQUALS"
|
||||
values = ["api"]
|
||||
}
|
||||
|
||||
entity_property_to_log_label_mapping = {
|
||||
"otel_namespace" = "service_namespace"
|
||||
"otel_service" = "service_name"
|
||||
"environment" = "env"
|
||||
"site" = "region"
|
||||
"service" = "app"
|
||||
}
|
||||
|
||||
filter_by_span_id = true
|
||||
filter_by_trace_id = true
|
||||
}
|
||||
```
|
||||
|
||||
## Minimal log configuration
|
||||
|
||||
Create a minimal configuration for all entities:
|
||||
|
||||
```terraform
|
||||
# Minimal configuration for all entities
|
||||
resource "grafana_asserts_log_config" "minimal" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "minimal"
|
||||
priority = 3000
|
||||
default_config = false
|
||||
data_source_uid = "loki-minimal"
|
||||
|
||||
match {
|
||||
property = "asserts_entity_type"
|
||||
op = "IS_NOT_NULL"
|
||||
values = []
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Advanced log configuration with complex match rules
|
||||
|
||||
Configure logs with multiple operations and advanced match rules:
|
||||
|
||||
```terraform
|
||||
# Advanced configuration with multiple operations
|
||||
resource "grafana_asserts_log_config" "advanced" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "advanced"
|
||||
priority = 1500
|
||||
default_config = false
|
||||
data_source_uid = "loki-advanced"
|
||||
error_label = "level"
|
||||
|
||||
match {
|
||||
property = "service_type"
|
||||
op = "CONTAINS"
|
||||
values = ["web", "api"]
|
||||
}
|
||||
|
||||
match {
|
||||
property = "environment"
|
||||
op = "NOT_EQUALS"
|
||||
values = ["test"]
|
||||
}
|
||||
|
||||
match {
|
||||
property = "team"
|
||||
op = "IS_NOT_NULL"
|
||||
values = []
|
||||
}
|
||||
|
||||
entity_property_to_log_label_mapping = {
|
||||
"service_type" = "type"
|
||||
"team" = "owner"
|
||||
"environment" = "env"
|
||||
"version" = "app_version"
|
||||
}
|
||||
|
||||
filter_by_span_id = true
|
||||
filter_by_trace_id = false
|
||||
}
|
||||
```
|
||||
|
||||
## Resource reference
|
||||
|
||||
### `grafana_asserts_log_config`
|
||||
|
||||
Manage Knowledge Graph log configurations through the Grafana API.
|
||||
|
||||
#### Arguments
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| -------------------------------------- | -------------- | -------- | -------------------------------------------------------------------------------------------- |
|
||||
| `name` | `string` | Yes | The name of the log configuration. This field is immutable and forces recreation if changed. |
|
||||
| `priority` | `number` | Yes | Priority of the log configuration. Higher priority configurations are evaluated first. |
|
||||
| `default_config` | `bool` | Yes | Whether this is the default configuration. Default configurations cannot be deleted. |
|
||||
| `data_source_uid` | `string` | Yes | DataSource UID to be queried (for example, a Loki instance). |
|
||||
| `match` | `list(object)` | No | List of match rules for entity properties. Refer to [match block](#match-block) for details. |
|
||||
| `error_label` | `string` | No | Label name used to identify error logs. |
|
||||
| `entity_property_to_log_label_mapping` | `map(string)` | No | Mapping of entity properties to log labels for correlation. |
|
||||
| `filter_by_span_id` | `bool` | No | Whether to filter logs by span ID for distributed tracing correlation. |
|
||||
| `filter_by_trace_id` | `bool` | No | Whether to filter logs by trace ID for distributed tracing correlation. |
|
||||
|
||||
#### Match block
|
||||
|
||||
Each `match` block supports the following:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ---------- | -------------- | -------- | ------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `property` | `string` | Yes | Entity property to match against. |
|
||||
| `op` | `string` | Yes | Operation to use for matching. One of: `EQUALS`, `NOT_EQUALS`, `CONTAINS`, `DOES_NOT_CONTAIN`, `IS_NULL`, `IS_NOT_NULL`. |
|
||||
| `values` | `list(string)` | Yes | Values to match against. Can be empty for `IS_NULL` and `IS_NOT_NULL` operations. |
|
||||
|
||||
#### Example
|
||||
|
||||
```terraform
|
||||
resource "grafana_asserts_log_config" "example" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "example-logs"
|
||||
priority = 1000
|
||||
default_config = false
|
||||
data_source_uid = "loki-prod"
|
||||
error_label = "level"
|
||||
|
||||
match {
|
||||
property = "asserts_entity_type"
|
||||
op = "EQUALS"
|
||||
values = ["Service", "Pod"]
|
||||
}
|
||||
|
||||
entity_property_to_log_label_mapping = {
|
||||
"service" = "app"
|
||||
"namespace" = "k8s_namespace"
|
||||
"environment" = "env"
|
||||
}
|
||||
|
||||
filter_by_span_id = true
|
||||
filter_by_trace_id = true
|
||||
}
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
### Priority management
|
||||
|
||||
- Assign lower priority numbers to more specific configurations
|
||||
- Higher priority configurations are evaluated first
|
||||
- Use consistent priority ranges for different configuration types
|
||||
- Document the reasoning behind priority assignments
|
||||
|
||||
### Data source configuration
|
||||
|
||||
- Ensure the data source UID matches your actual Loki or log aggregation system
|
||||
- Test data source connectivity before applying configurations
|
||||
- Use descriptive names for log configurations to indicate their purpose
|
||||
- Consider using separate data sources for different environments
|
||||
|
||||
### Label map strategy
|
||||
|
||||
- Map entity properties consistently across all log configurations
|
||||
- Use meaningful log label names that match your logging standards
|
||||
- Document the mapping relationships in configuration comments
|
||||
- Verify that mapped labels exist in your log data
|
||||
|
||||
### Match rules design
|
||||
|
||||
- Start with broad match rules and refine based on needs
|
||||
- Use specific property names that exist in your entity model
|
||||
- Test match rules with sample data before deploying
|
||||
- Combine multiple match rules for precise entity targeting
|
||||
|
||||
### Distributed trace integration
|
||||
|
||||
- Enable `filter_by_span_id` and `filter_by_trace_id` when using OpenTelemetry
|
||||
- Ensure your logs contain the appropriate trace and span ID labels
|
||||
- Use consistent label names for trace IDs across your logging infrastructure
|
||||
- Test trace correlation to verify it works as expected
|
||||
|
||||
## Validation
|
||||
|
||||
After applying the Terraform configuration, verify that:
|
||||
|
||||
- Log configurations are created in your Knowledge Graph instance
|
||||
- Configurations appear in the Knowledge Graph UI under **Observability > Configuration > Logs**
|
||||
- Log correlation works when drilling down from entities
|
||||
- Label mappings correctly translate entity properties to log labels
|
||||
- Match rules properly filter entities
|
||||
- Trace and span ID filtering works for distributed tracing
|
||||
|
||||
## Related documentation
|
||||
|
||||
- [Configure logs correlation in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/configure/logs-correlation/)
|
||||
- [Get started with Terraform for Knowledge Graph](../getting-started/)
|
||||
- [Loki documentation](/docs/loki/latest/)
|
||||
+224
@@ -0,0 +1,224 @@
|
||||
---
|
||||
description: Configure notification alerts for Knowledge Graph using Terraform
|
||||
menuTitle: Notification alerts
|
||||
title: Configure notification alerts using Terraform
|
||||
weight: 200
|
||||
keywords:
|
||||
- Terraform
|
||||
- Knowledge Graph
|
||||
- Notification Alerts
|
||||
- Alert Configuration
|
||||
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/notification-alerts/
|
||||
---
|
||||
|
||||
# Configure notification alerts using Terraform
|
||||
|
||||
Notification alerts configurations in [Knowledge Graph](/docs/grafana-cloud/knowledge-graph/) allow you to manage how alerts are processed and routed. You can specify match labels to filter alerts, add custom labels, set duration requirements, and control silencing.
|
||||
|
||||
For information about configuring notification alerts in the Knowledge Graph UI, refer to [Configure notifications](/docs/grafana-cloud/knowledge-graph/configure/notifications/).
|
||||
|
||||
## Basic notification alerts configuration
|
||||
|
||||
Create a file named `alert-configs.tf` and add the following:
|
||||
|
||||
```terraform
|
||||
# Basic alert configuration with silencing
|
||||
resource "grafana_asserts_notification_alerts_config" "prometheus_remote_storage_failures" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "PrometheusRemoteStorageFailures"
|
||||
|
||||
match_labels = {
|
||||
alertname = "PrometheusRemoteStorageFailures"
|
||||
alertgroup = "prometheus.alerts"
|
||||
asserts_env = "prod"
|
||||
}
|
||||
|
||||
silenced = true
|
||||
}
|
||||
|
||||
# High severity alert with specific job and context matching
|
||||
resource "grafana_asserts_notification_alerts_config" "error_buildup_notify" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "ErrorBuildupNotify"
|
||||
|
||||
match_labels = {
|
||||
alertname = "ErrorBuildup"
|
||||
job = "acai"
|
||||
asserts_request_type = "inbound"
|
||||
asserts_request_context = "/auth"
|
||||
}
|
||||
|
||||
silenced = false
|
||||
}
|
||||
```
|
||||
|
||||
## Notification alerts with additional labels and duration
|
||||
|
||||
Configure alerts with custom labels and timing requirements:
|
||||
|
||||
```terraform
|
||||
# Alert with additional labels and custom duration
|
||||
resource "grafana_asserts_notification_alerts_config" "payment_test_alert" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "PaymentTestAlert"
|
||||
|
||||
match_labels = {
|
||||
alertname = "PaymentTestAlert"
|
||||
additional_labels = "asserts_severity=~\"critical\""
|
||||
alertgroup = "alex-k8s-integration-test.alerts"
|
||||
}
|
||||
|
||||
alert_labels = {
|
||||
testing = "onetwothree"
|
||||
}
|
||||
|
||||
duration = "5m"
|
||||
silenced = false
|
||||
}
|
||||
```
|
||||
|
||||
## Latency and performance notification alerts
|
||||
|
||||
Monitor and alert on latency and performance issues:
|
||||
|
||||
```terraform
|
||||
# Latency alert for shipping service
|
||||
resource "grafana_asserts_notification_alerts_config" "high_shipping_latency" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "high shipping latency"
|
||||
|
||||
match_labels = {
|
||||
alertname = "LatencyP99ErrorBuildup"
|
||||
job = "shipping"
|
||||
asserts_request_type = "inbound"
|
||||
}
|
||||
|
||||
silenced = false
|
||||
}
|
||||
|
||||
# CPU throttling alert with warning severity
|
||||
resource "grafana_asserts_notification_alerts_config" "cpu_throttling_sustained" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "CPUThrottlingSustained"
|
||||
|
||||
match_labels = {
|
||||
alertname = "CPUThrottlingSustained"
|
||||
additional_labels = "asserts_severity=~\"warning\""
|
||||
}
|
||||
|
||||
silenced = true
|
||||
}
|
||||
```
|
||||
|
||||
## Infrastructure and service notification alerts
|
||||
|
||||
Configure alerts for infrastructure components and services:
|
||||
|
||||
```terraform
|
||||
# Ingress error rate alert
|
||||
resource "grafana_asserts_notification_alerts_config" "ingress_error" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "ingress error"
|
||||
|
||||
match_labels = {
|
||||
alertname = "ErrorRatioBreach"
|
||||
job = "ingress-nginx-controller-metrics"
|
||||
asserts_request_type = "inbound"
|
||||
}
|
||||
|
||||
silenced = false
|
||||
}
|
||||
|
||||
# MySQL Galera cluster alert
|
||||
resource "grafana_asserts_notification_alerts_config" "mysql_galera_not_ready" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "MySQLGaleraNotReady"
|
||||
|
||||
match_labels = {
|
||||
alertname = "MySQLGaleraNotReady"
|
||||
}
|
||||
|
||||
silenced = false
|
||||
}
|
||||
```
|
||||
|
||||
## Resource reference
|
||||
|
||||
### `grafana_asserts_notification_alerts_config`
|
||||
|
||||
Manage Knowledge Graph notification alerts configurations through the Grafana API.
|
||||
|
||||
#### Arguments
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| -------------- | ------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `name` | `string` | Yes | The name of the notification alerts configuration. This field is immutable and forces recreation if changed. |
|
||||
| `match_labels` | `map(string)` | No | Labels to match for this notification alerts configuration. Used to filter which alerts this configuration applies to. |
|
||||
| `alert_labels` | `map(string)` | No | Labels to add to alerts generated by this notification alerts configuration. |
|
||||
| `duration` | `string` | No | Duration for which the condition must be true before firing (for example, '5m', '30s'). Maps to 'for' in Knowledge Graph API. |
|
||||
| `silenced` | `bool` | No | Whether this notification alerts configuration is silenced. Defaults to `false`. |
|
||||
|
||||
#### Example
|
||||
|
||||
```terraform
|
||||
resource "grafana_asserts_notification_alerts_config" "example" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "ExampleAlert"
|
||||
|
||||
match_labels = {
|
||||
alertname = "HighCPUUsage"
|
||||
job = "monitoring"
|
||||
}
|
||||
|
||||
alert_labels = {
|
||||
severity = "warning"
|
||||
team = "platform"
|
||||
}
|
||||
|
||||
duration = "5m"
|
||||
silenced = false
|
||||
}
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
### Label management
|
||||
|
||||
- Use specific and meaningful labels in `match_labels` to ensure precise alert filtering
|
||||
- Leverage existing label conventions from your monitoring setup
|
||||
- Consider using `asserts_env` and `asserts_site` labels for multi-environment setups
|
||||
|
||||
### Silence strategy
|
||||
|
||||
- Use the `silenced` parameter for temporary suppression rather than deleting notification alerts configurations
|
||||
- Document the reason for silencing in your Terraform configuration comments
|
||||
- Regularly review silenced configurations to ensure they're still needed
|
||||
|
||||
### Duration configuration
|
||||
|
||||
- Set appropriate duration values based on your alerting requirements
|
||||
- Consider the nature of the monitored condition when choosing duration
|
||||
- Use consistent duration formats across similar alert types
|
||||
|
||||
## Validation
|
||||
|
||||
After applying the Terraform configuration, verify that:
|
||||
|
||||
- Notification alerts configurations are created in your Knowledge Graph instance
|
||||
- Configurations appear in the Knowledge Graph UI under **Observability > Rules > Notify**
|
||||
- Match labels correctly filter the intended alerts
|
||||
- Custom labels are properly applied to generated alerts
|
||||
|
||||
## Related documentation
|
||||
|
||||
- [Configure notifications in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/configure/notifications/)
|
||||
- [Get started with Terraform for Knowledge Graph](../getting-started/)
|
||||
- [Configure alerts in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/configure/alerts/)
|
||||
+308
@@ -0,0 +1,308 @@
|
||||
---
|
||||
description: Configure suppressed assertions for Knowledge Graph using Terraform
|
||||
menuTitle: Suppressed assertions
|
||||
title: Configure suppressed assertions using Terraform
|
||||
weight: 300
|
||||
keywords:
|
||||
- Terraform
|
||||
- Knowledge Graph
|
||||
- Suppressed Assertions
|
||||
- Alert Suppression
|
||||
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/suppressed-assertions/
|
||||
---
|
||||
|
||||
# Configure suppressed assertions using Terraform
|
||||
|
||||
Suppressed assertions configurations allow you to disable specific alerts or assertions based on label matching in [Knowledge Graph](/docs/grafana-cloud/knowledge-graph/). This is useful for maintenance windows, test environments, or when you want to temporarily suppress certain types of alerts.
|
||||
|
||||
For information about suppressing insights in the Knowledge Graph UI, refer to [Suppress insights](/docs/grafana-cloud/knowledge-graph/troubleshoot-infra-apps/suppress-insights/).
|
||||
|
||||
## Basic suppressed assertions configuration
|
||||
|
||||
Create a file named `suppressed-assertions.tf` and add the following:
|
||||
|
||||
```terraform
|
||||
# Basic suppressed alert configuration for maintenance
|
||||
resource "grafana_asserts_suppressed_assertions_config" "maintenance_window" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "MaintenanceWindow"
|
||||
|
||||
match_labels = {
|
||||
service = "api-service"
|
||||
maintenance = "true"
|
||||
}
|
||||
}
|
||||
|
||||
# Suppress specific alertname during deployment
|
||||
resource "grafana_asserts_suppressed_assertions_config" "deployment_suppression" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "DeploymentSuppression"
|
||||
|
||||
match_labels = {
|
||||
alertname = "HighLatency"
|
||||
job = "web-service"
|
||||
env = "staging"
|
||||
}
|
||||
}
|
||||
|
||||
# Suppress alerts for specific test environment
|
||||
resource "grafana_asserts_suppressed_assertions_config" "test_environment_suppression" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "TestEnvironmentSuppression"
|
||||
|
||||
match_labels = {
|
||||
alertgroup = "test.alerts"
|
||||
environment = "test"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Service-specific suppression configurations
|
||||
|
||||
Suppress alerts for specific services during maintenance or operational activities:
|
||||
|
||||
```terraform
|
||||
# Suppress alerts for specific services during maintenance
|
||||
resource "grafana_asserts_suppressed_assertions_config" "api_service_maintenance" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "APIServiceMaintenance"
|
||||
|
||||
match_labels = {
|
||||
service = "api-gateway"
|
||||
job = "api-gateway"
|
||||
maintenance = "scheduled"
|
||||
}
|
||||
}
|
||||
|
||||
# Suppress database alerts during backup operations
|
||||
resource "grafana_asserts_suppressed_assertions_config" "database_backup" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "DatabaseBackupSuppression"
|
||||
|
||||
match_labels = {
|
||||
service = "postgresql"
|
||||
job = "postgres-exporter"
|
||||
backup_mode = "active"
|
||||
}
|
||||
}
|
||||
|
||||
# Suppress monitoring system alerts during updates
|
||||
resource "grafana_asserts_suppressed_assertions_config" "monitoring_update" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "MonitoringSystemUpdate"
|
||||
|
||||
match_labels = {
|
||||
service = "prometheus"
|
||||
job = "prometheus"
|
||||
update = "in_progress"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Environment and team-based suppression
|
||||
|
||||
Create suppression rules based on environment or team:
|
||||
|
||||
```terraform
|
||||
# Suppress all alerts for development environment
|
||||
resource "grafana_asserts_suppressed_assertions_config" "dev_environment" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "DevelopmentEnvironmentSuppression"
|
||||
|
||||
match_labels = {
|
||||
environment = "development"
|
||||
team = "platform"
|
||||
}
|
||||
}
|
||||
|
||||
# Suppress alerts for specific team during their maintenance window
|
||||
resource "grafana_asserts_suppressed_assertions_config" "team_maintenance" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "TeamMaintenanceWindow"
|
||||
|
||||
match_labels = {
|
||||
team = "backend"
|
||||
maintenance = "team_scheduled"
|
||||
timezone = "UTC"
|
||||
}
|
||||
}
|
||||
|
||||
# Suppress alerts for staging environment during testing
|
||||
resource "grafana_asserts_suppressed_assertions_config" "staging_testing" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "StagingTestingSuppression"
|
||||
|
||||
match_labels = {
|
||||
environment = "staging"
|
||||
testing = "automated"
|
||||
job = "integration-tests"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Alert type and severity-based suppression
|
||||
|
||||
Suppress alerts based on their type or severity:
|
||||
|
||||
```terraform
|
||||
# Suppress low severity alerts during business hours
|
||||
resource "grafana_asserts_suppressed_assertions_config" "low_severity_business_hours" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "LowSeverityBusinessHours"
|
||||
|
||||
match_labels = {
|
||||
severity = "warning"
|
||||
timezone = "business_hours"
|
||||
}
|
||||
}
|
||||
|
||||
# Suppress specific alert types during known issues
|
||||
resource "grafana_asserts_suppressed_assertions_config" "known_issue_suppression" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "KnownIssueSuppression"
|
||||
|
||||
match_labels = {
|
||||
alertname = "HighMemoryUsage"
|
||||
service = "legacy-service"
|
||||
issue_id = "LEG-123"
|
||||
}
|
||||
}
|
||||
|
||||
# Suppress infrastructure alerts during planned maintenance
|
||||
resource "grafana_asserts_suppressed_assertions_config" "infrastructure_maintenance" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "InfrastructureMaintenance"
|
||||
|
||||
match_labels = {
|
||||
alertgroup = "infrastructure.alerts"
|
||||
maintenance_type = "planned"
|
||||
affected_services = "all"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Complex multi-label suppression
|
||||
|
||||
Define complex suppression rules with multiple labels:
|
||||
|
||||
```terraform
|
||||
# Complex suppression for multi-service deployments
|
||||
resource "grafana_asserts_suppressed_assertions_config" "multi_service_deployment" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "MultiServiceDeploymentSuppression"
|
||||
|
||||
match_labels = {
|
||||
deployment_id = "deploy-2024-01-15"
|
||||
services = "api,worker,frontend"
|
||||
environment = "production"
|
||||
deployment_type = "blue_green"
|
||||
}
|
||||
}
|
||||
|
||||
# Suppress alerts for specific cluster during maintenance
|
||||
resource "grafana_asserts_suppressed_assertions_config" "cluster_maintenance" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "ClusterMaintenanceSuppression"
|
||||
|
||||
match_labels = {
|
||||
cluster = "production-cluster-1"
|
||||
maintenance = "cluster_upgrade"
|
||||
affected_nodes = "all"
|
||||
estimated_duration = "2h"
|
||||
}
|
||||
}
|
||||
|
||||
# Suppress alerts for specific region during network issues
|
||||
resource "grafana_asserts_suppressed_assertions_config" "regional_network_issue" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "RegionalNetworkIssueSuppression"
|
||||
|
||||
match_labels = {
|
||||
region = "us-west-2"
|
||||
issue_type = "network"
|
||||
affected_services = "external_dependencies"
|
||||
incident_id = "NET-456"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Resource reference
|
||||
|
||||
### `grafana_asserts_suppressed_assertions_config`
|
||||
|
||||
Manage Knowledge Graph suppressed assertions configurations through the Grafana API.
|
||||
|
||||
#### Arguments
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| -------------- | ------------- | -------- | ------------------------------------------------------------------------------------------------------------------ |
|
||||
| `name` | `string` | Yes | The name of the suppressed assertions configuration. This field is immutable and forces recreation if changed. |
|
||||
| `match_labels` | `map(string)` | No | Labels to match for this suppressed assertions configuration. Used to determine which alerts should be suppressed. |
|
||||
|
||||
#### Example
|
||||
|
||||
```terraform
|
||||
resource "grafana_asserts_suppressed_assertions_config" "example" {
|
||||
provider = grafana.asserts
|
||||
|
||||
name = "ExampleSuppression"
|
||||
|
||||
match_labels = {
|
||||
alertname = "TestAlert"
|
||||
env = "development"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
### Suppression strategy
|
||||
|
||||
- Use suppression rules for temporary situations rather than permanent solutions
|
||||
- Document the reason for suppression in your Terraform configuration comments
|
||||
- Set expiration dates or reminders to review suppression rules
|
||||
- Prefer fixing alert thresholds over suppressing recurring false positives
|
||||
|
||||
### Label match rules
|
||||
|
||||
- Be specific with match labels to avoid suppressing unintended alerts
|
||||
- Test suppression rules in non-production environments first
|
||||
- Use descriptive names that indicate the purpose and scope of the suppression
|
||||
- Include relevant context in labels (for example, incident IDs, maintenance windows)
|
||||
|
||||
### Lifecycle management
|
||||
|
||||
- Regularly review active suppression rules to ensure they're still needed
|
||||
- Remove or update suppression rules after maintenance windows or deployments
|
||||
- Use version control to track when suppression rules were added and why
|
||||
- Consider using time-based automation to enable or disable suppression rules
|
||||
|
||||
## Validation
|
||||
|
||||
After applying the Terraform configuration, verify that:
|
||||
|
||||
- Suppressed assertions configurations are active in your Knowledge Graph instance
|
||||
- Configurations appear in the Knowledge Graph UI under **Observability > Rules > Suppress**
|
||||
- Matching alerts are properly suppressed
|
||||
- Suppression rules don't affect unintended alerts
|
||||
|
||||
## Related documentation
|
||||
|
||||
- [Suppress insights in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/troubleshoot-infra-apps/suppress-insights/)
|
||||
- [Get started with Terraform for Knowledge Graph](../getting-started/)
|
||||
- [Configure notifications](/docs/grafana-cloud/knowledge-graph/configure/notifications/)
|
||||
+355
@@ -0,0 +1,355 @@
|
||||
---
|
||||
description: Configure thresholds for Knowledge Graph using Terraform
|
||||
menuTitle: Thresholds
|
||||
title: Configure thresholds using Terraform
|
||||
weight: 600
|
||||
keywords:
|
||||
- Terraform
|
||||
- Knowledge Graph
|
||||
- Thresholds
|
||||
- Request Thresholds
|
||||
- Resource Thresholds
|
||||
- Health Thresholds
|
||||
canonical: https://grafana.com/docs/grafana/latest/as-code/infrastructure-as-code/terraform/terraform-knowledge-graph/thresholds/
|
||||
---
|
||||
|
||||
# Configure thresholds using Terraform
|
||||
|
||||
Threshold configurations in [Knowledge Graph](/docs/grafana-cloud/knowledge-graph/) allow you to define custom thresholds for request, resource, and health assertions. These configurations help you set specific limits and conditions for monitoring your services and infrastructure.
|
||||
|
||||
For information about managing thresholds in the Knowledge Graph UI, refer to [Manage thresholds](/docs/grafana-cloud/knowledge-graph/configure/manage-thresholds/).
|
||||
|
||||
## Basic threshold configuration
|
||||
|
||||
Create a file named `thresholds.tf` and add the following:
|
||||
|
||||
```terraform
|
||||
# Basic threshold configuration with all three types
|
||||
resource "grafana_asserts_thresholds" "basic" {
|
||||
provider = grafana.asserts
|
||||
|
||||
request_thresholds = [{
|
||||
entity_name = "payment-service"
|
||||
assertion_name = "ErrorRatioBreach"
|
||||
request_type = "inbound"
|
||||
request_context = "/charge"
|
||||
value = 0.01
|
||||
}]
|
||||
|
||||
resource_thresholds = [{
|
||||
assertion_name = "Saturation"
|
||||
resource_type = "container"
|
||||
container_name = "worker"
|
||||
source = "metrics"
|
||||
severity = "warning"
|
||||
value = 75
|
||||
}]
|
||||
|
||||
health_thresholds = [{
|
||||
assertion_name = "ServiceDown"
|
||||
expression = "up < 1"
|
||||
entity_type = "Service"
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
## Request threshold configurations
|
||||
|
||||
Configure thresholds for different service request types and contexts:
|
||||
|
||||
```terraform
|
||||
# Multiple request thresholds for different services
|
||||
resource "grafana_asserts_thresholds" "request_thresholds" {
|
||||
provider = grafana.asserts
|
||||
|
||||
request_thresholds = [
|
||||
{
|
||||
entity_name = "api-service"
|
||||
assertion_name = "ErrorRatioBreach"
|
||||
request_type = "inbound"
|
||||
request_context = "/api/v1/users"
|
||||
value = 0.02
|
||||
},
|
||||
{
|
||||
entity_name = "api-service"
|
||||
assertion_name = "LatencyP99ErrorBuildup"
|
||||
request_type = "inbound"
|
||||
request_context = "/api/v1/orders"
|
||||
value = 500
|
||||
},
|
||||
{
|
||||
entity_name = "payment-gateway"
|
||||
assertion_name = "RequestRateAnomaly"
|
||||
request_type = "outbound"
|
||||
request_context = "/payment/process"
|
||||
value = 1000
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Resource threshold configurations
|
||||
|
||||
Define resource thresholds for different severity levels:
|
||||
|
||||
```terraform
|
||||
# Resource thresholds for different severity levels
|
||||
resource "grafana_asserts_thresholds" "resource_thresholds" {
|
||||
provider = grafana.asserts
|
||||
|
||||
resource_thresholds = [
|
||||
{
|
||||
assertion_name = "Saturation"
|
||||
resource_type = "container"
|
||||
container_name = "web-server"
|
||||
source = "metrics"
|
||||
severity = "warning"
|
||||
value = 75
|
||||
},
|
||||
{
|
||||
assertion_name = "Saturation"
|
||||
resource_type = "container"
|
||||
container_name = "web-server"
|
||||
source = "metrics"
|
||||
severity = "critical"
|
||||
value = 90
|
||||
},
|
||||
{
|
||||
assertion_name = "ResourceRateBreach"
|
||||
resource_type = "Pod"
|
||||
container_name = "database"
|
||||
source = "logs"
|
||||
severity = "warning"
|
||||
value = 80
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Health threshold configurations
|
||||
|
||||
Configure health checks with Prometheus expressions:
|
||||
|
||||
```terraform
|
||||
# Health thresholds with Prometheus expressions
|
||||
resource "grafana_asserts_thresholds" "health_thresholds" {
|
||||
provider = grafana.asserts
|
||||
|
||||
health_thresholds = [
|
||||
{
|
||||
assertion_name = "ServiceDown"
|
||||
expression = "up{job=\"api-service\"} < 1"
|
||||
entity_type = "Service"
|
||||
},
|
||||
{
|
||||
assertion_name = "HighMemoryUsage"
|
||||
expression = "memory_usage_percent > 85"
|
||||
entity_type = "Service"
|
||||
},
|
||||
{
|
||||
assertion_name = "DatabaseConnectivity"
|
||||
expression = "db_connection_pool_active / db_connection_pool_max > 0.9"
|
||||
entity_type = "Service"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Comprehensive threshold configuration
|
||||
|
||||
Define comprehensive thresholds for production environments:
|
||||
|
||||
```terraform
|
||||
# Production environment with comprehensive thresholds
|
||||
resource "grafana_asserts_thresholds" "production" {
|
||||
provider = grafana.asserts
|
||||
|
||||
request_thresholds = [
|
||||
{
|
||||
entity_name = "frontend"
|
||||
assertion_name = "ErrorRatioBreach"
|
||||
request_type = "inbound"
|
||||
request_context = "/"
|
||||
value = 0.005
|
||||
},
|
||||
{
|
||||
entity_name = "backend-api"
|
||||
assertion_name = "LatencyP99ErrorBuildup"
|
||||
request_type = "inbound"
|
||||
request_context = "/api"
|
||||
value = 200
|
||||
}
|
||||
]
|
||||
|
||||
resource_thresholds = [
|
||||
{
|
||||
assertion_name = "Saturation"
|
||||
resource_type = "container"
|
||||
container_name = "frontend"
|
||||
source = "metrics"
|
||||
severity = "warning"
|
||||
value = 70
|
||||
},
|
||||
{
|
||||
assertion_name = "Saturation"
|
||||
resource_type = "container"
|
||||
container_name = "backend-api"
|
||||
source = "metrics"
|
||||
severity = "critical"
|
||||
value = 85
|
||||
}
|
||||
]
|
||||
|
||||
health_thresholds = [
|
||||
{
|
||||
assertion_name = "ServiceDown"
|
||||
expression = "up < 1"
|
||||
entity_type = "Service"
|
||||
},
|
||||
{
|
||||
assertion_name = "NodeDown"
|
||||
expression = "up{job=\"node-exporter\"} < 1"
|
||||
entity_type = "Service"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Resource reference
|
||||
|
||||
### `grafana_asserts_thresholds`
|
||||
|
||||
Manage Knowledge Graph threshold configurations through the Grafana API. This resource allows you to define custom thresholds for request, resource, and health assertions.
|
||||
|
||||
#### Arguments
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| --------------------- | -------------- | -------- | ------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `request_thresholds` | `list(object)` | No | List of request threshold configurations. Refer to [request thresholds block](#request-thresholds-block) for details. |
|
||||
| `resource_thresholds` | `list(object)` | No | List of resource threshold configurations. Refer to [resource thresholds block](#resource-thresholds-block) for details. |
|
||||
| `health_thresholds` | `list(object)` | No | List of health threshold configurations. Refer to [health thresholds block](#health-thresholds-block) for details. |
|
||||
|
||||
#### Request thresholds block
|
||||
|
||||
Each `request_thresholds` block supports the following:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ----------------- | -------- | -------- | ------------------------------------------------------ |
|
||||
| `entity_name` | `string` | Yes | The name of the entity to apply the threshold to. |
|
||||
| `assertion_name` | `string` | Yes | The name of the assertion to configure. |
|
||||
| `request_type` | `string` | Yes | The type of request (inbound, outbound). |
|
||||
| `request_context` | `string` | Yes | The request context or path to apply the threshold to. |
|
||||
| `value` | `number` | Yes | The threshold value. |
|
||||
|
||||
#### Resource thresholds block
|
||||
|
||||
Each `resource_thresholds` block supports the following:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ---------------- | -------- | -------- | ---------------------------------------------------- |
|
||||
| `assertion_name` | `string` | Yes | The name of the assertion to configure. |
|
||||
| `resource_type` | `string` | Yes | The type of resource (container, Pod, node). |
|
||||
| `container_name` | `string` | Yes | The name of the container to apply the threshold to. |
|
||||
| `source` | `string` | Yes | The source of the metrics (metrics, logs). |
|
||||
| `severity` | `string` | Yes | The severity level (warning, critical). |
|
||||
| `value` | `number` | Yes | The threshold value. |
|
||||
|
||||
#### Health thresholds block
|
||||
|
||||
Each `health_thresholds` block supports the following:
|
||||
|
||||
| Name | Type | Required | Description |
|
||||
| ---------------- | -------- | -------- | ------------------------------------------------------------------------------------ |
|
||||
| `assertion_name` | `string` | Yes | The name of the assertion to configure. |
|
||||
| `expression` | `string` | Yes | The Prometheus expression for the health check. |
|
||||
| `entity_type` | `string` | Yes | Entity type for the health threshold (for example, Service, Pod, Namespace, Volume). |
|
||||
| `alert_category` | `string` | No | Optional alert category label for the health threshold. |
|
||||
|
||||
#### Example
|
||||
|
||||
```terraform
|
||||
resource "grafana_asserts_thresholds" "example" {
|
||||
provider = grafana.asserts
|
||||
|
||||
request_thresholds = [{
|
||||
entity_name = "api-service"
|
||||
assertion_name = "ErrorRatioBreach"
|
||||
request_type = "inbound"
|
||||
request_context = "/api/v1/users"
|
||||
value = 0.02
|
||||
}]
|
||||
|
||||
resource_thresholds = [{
|
||||
assertion_name = "Saturation"
|
||||
resource_type = "container"
|
||||
container_name = "web-server"
|
||||
source = "metrics"
|
||||
severity = "warning"
|
||||
value = 75
|
||||
}]
|
||||
|
||||
health_thresholds = [{
|
||||
assertion_name = "ServiceDown"
|
||||
expression = "up{job=\"api-service\"} < 1"
|
||||
entity_type = "Service"
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
### Threshold configuration management
|
||||
|
||||
- Set appropriate threshold values based on your service level objectives (SLOs)
|
||||
- Use different severity levels (warning, critical) to create escalation paths
|
||||
- Test threshold configurations in non-production environments first
|
||||
- Monitor threshold effectiveness and adjust values based on actual performance data
|
||||
|
||||
### Request threshold best practices
|
||||
|
||||
- Configure request thresholds for critical user-facing endpoints
|
||||
- Set different thresholds for different request types (inbound vs outbound)
|
||||
- Consider request context when setting thresholds for specific API paths
|
||||
- Use error ratio thresholds to catch service degradation early
|
||||
- Review historical performance data to set realistic threshold values
|
||||
|
||||
### Resource threshold best practices
|
||||
|
||||
- Set resource thresholds based on your infrastructure capacity
|
||||
- Use container-specific thresholds for microservices architectures
|
||||
- Configure both warning and critical thresholds for gradual escalation
|
||||
- Monitor resource utilization patterns to set realistic threshold values
|
||||
- Consider seasonal or periodic patterns in resource usage
|
||||
|
||||
### Health threshold best practices
|
||||
|
||||
- Use Prometheus expressions that accurately reflect service health
|
||||
- Test health check expressions independently before applying them
|
||||
- Set up health thresholds for critical dependencies and external services
|
||||
- Use composite expressions for complex health checks
|
||||
- Ensure expressions perform efficiently without causing excessive load
|
||||
|
||||
### Value selection guidelines
|
||||
|
||||
- Start conservative and adjust based on real-world performance
|
||||
- Use percentages (0-1 range) for ratio-based metrics
|
||||
- Use milliseconds for latency thresholds
|
||||
- Document the reasoning behind specific threshold values
|
||||
- Review and update thresholds regularly based on system evolution
|
||||
|
||||
## Validation
|
||||
|
||||
After applying the Terraform configuration, verify that:
|
||||
|
||||
- Threshold configurations are applied in your Knowledge Graph instance
|
||||
- Configurations appear in the Knowledge Graph UI under **Observability > Rules > Threshold**
|
||||
- Request thresholds correctly identify breaches for specified services
|
||||
- Resource thresholds trigger at appropriate severity levels
|
||||
- Health thresholds accurately reflect service status
|
||||
- Threshold values align with your SLO commitments
|
||||
|
||||
## Related documentation
|
||||
|
||||
- [Manage thresholds in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/configure/manage-thresholds/)
|
||||
- [Get started with Terraform for Knowledge Graph](../getting-started/)
|
||||
- [Configure alerts in Knowledge Graph](/docs/grafana-cloud/knowledge-graph/configure/alerts/)
|
||||
Reference in New Issue
Block a user