grafana/pkg/storage/unified/README.md


# Unified Storage

The unified storage projects aims to provide a simple and extensible backend to unify the way we store different objects within the Grafana app platform.

It provides generic storage for k8s objects, and can store data either within dedicated tables in the main Grafana database, or in separate storage.

By default it runs in-process within Grafana, but it can also be run as a standalone GRPC service (`storage-server`).

## Storage Overview

There are 2 main tables, the `resource` table stores a "current" view of the objects, and the `resource_history` table stores a record of each revision of a given object.

## Running Unified Storage

### Playlists: baseline configuration

The minimum config settings required are:

```ini
; need to specify target here for override to work later
target = all

[server]
; https is required for kubectl
protocol = https

[feature_toggles]
; store playlists in k8s
kubernetesPlaylists = true

[grafana-apiserver]
; use unified storage for k8s apiserver
storage_type = unified

# Dualwriter modes
# 0: disabled (default mode)
# 1: read from legacy, write to legacy, write to unified best-effort
# 2: read from legacy, write to both
# 3: read from unified, write to both
# 4: read from unified, write to unified
# 5: read from unified, write to unified, ignore background sync state
[unified_storage.playlists.playlist.grafana.app]
dualWriterMode = 0
```

**Note**: When using the Dualwriter, Watch will only work with mode 5.

### Folders: baseline configuration

NOTE: allowing folders to be backed by Unified Storage is under development and so are these instructions.

The minimum config settings required are:

```ini
; need to specify target here for override to work later
target = all

[server]
; https is required for kubectl
protocol = https

[feature_toggles]
grafanaAPIServerWithExperimentalAPIs = true

[unified_storage.folders.folder.grafana.app]
dualWriterMode = 4

[unified_storage.dashboards.dashboard.grafana.app]
dualWriterMode = 4

[grafana-apiserver]
; use unified storage for k8s apiserver
storage_type = unified
```

### Setting up a kubeconfig

With this configuration, you can run everything in-process. Run the Grafana backend with:

```sh
bra run
```

or

```sh
make run
```

The default kubeconfig sends requests directly to the apiserver, to authenticate as a grafana user, create `grafana.kubeconfig`:
```yaml
apiVersion: v1
clusters:
- cluster:
    insecure-skip-tls-verify: true
    server: https://127.0.0.1:3000
  name: default-cluster
contexts:
- context:
    cluster: default-cluster
    namespace: default
    user: default
  name: default-context
current-context: default-context
kind: Config
preferences: {}
users:
- name: default
  user:
    username: <username>
    password: <password>
```
Where `<username>` and `<password>` are credentials for basic auth against Grafana. For example, with the [default credentials](https://github.com/grafana/grafana/blob/HEAD/contribute/developer-guide.md#backend):
```yaml
    username: admin
    password: admin
```

### Playlists: interacting with the k8s API

In this mode, you can interact with the k8s api. Make sure you are in the directory where you created `grafana.kubeconfig`. Then run:
```sh
kubectl --kubeconfig=./grafana.kubeconfig get playlist
```

If this is your first time running the command, a successful response would be:
```sh
No resources found in default namespace.
```

To create a playlist, create a file `playlist-generate.yaml`:
```yaml
apiVersion: playlist.grafana.app/v0alpha1
kind: Playlist
metadata:
  generateName: x # anything is ok here... except yes or true -- they become boolean!
  labels:
    foo: bar
  annotations:
    grafana.app/slug: "slugger"
    grafana.app/updatedBy: "updater"
spec:
  title: Playlist with auto generated UID
  interval: 5m
  items:
  - type: dashboard_by_tag
    value: panel-tests
  - type: dashboard_by_uid
    value: vmie2cmWz # dashboard from devenv
```
then run:
```sh
kubectl --kubeconfig=./grafana.kubeconfig create -f playlist-generate.yaml
```

For example, a successful response would be:
```sh
playlist.playlist.grafana.app/u394j4d3-s63j-2d74-g8hf-958773jtybf2 created
```

When running
```sh
kubectl --kubeconfig=./grafana.kubeconfig get playlist
```
you should now see something like:
```sh
NAME                                   TITLE                              INTERVAL   CREATED AT
u394j4d3-s63j-2d74-g8hf-958773jtybf2   Playlist with auto generated UID   5m         2023-12-14T13:53:35Z
```

To update the playlist, update the `playlist-generate.yaml` file then run:
```sh
kubectl --kubeconfig=./grafana.kubeconfig patch playlist <NAME> --patch-file playlist-generate.yaml
```

In the example, `<NAME>` would be `u394j4d3-s63j-2d74-g8hf-958773jtybf2`.

### Folders: interacting with the k8s API

Make sure you are in the directory where you created `grafana.kubeconfig`. Then run:
```sh
kubectl --kubeconfig=./grafana.kubeconfig get folder
```

If this is your first time running the command, a successful response would be:
```sh
No resources found in default namespace.
```

To create a folder, create a file `folder-generate.yaml`:
```yaml
apiVersion: folder.grafana.app/v1beta1
kind: Folder
metadata:
  generateName: x # anything is ok here... except yes or true -- they become boolean!
spec:
  title: Example folder
```
then run:
```sh
kubectl --kubeconfig=./grafana.kubeconfig create -f folder-generate.yaml
```

### Run as a GRPC service

#### Start GRPC storage-server

Make sure you have the gRPC address in the `[grafana-apiserver]` section of your config file:
```ini
[grafana-apiserver]
; your gRPC server address
address = localhost:10000
```

You also need the `[grpc_server_authentication]` section to authenticate incoming requests:
```ini
[grpc_server_authentication]
; http url to Grafana's signing keys to validate incoming id tokens
signing_keys_url = http://localhost:3000/api/signing-keys/keys
mode = "on-prem"
```

This currently only works with a separate database configuration (see previous section).

Start the storage-server with:
```sh
GF_DEFAULT_TARGET=storage-server ./bin/grafana server target
```

The GRPC service will listen on port 10000

#### Use GRPC server

To run grafana against the storage-server, override the `storage_type` setting:
```sh
GF_GRAFANA_APISERVER_STORAGE_TYPE=unified-grpc ./bin/grafana server
```

You can then list the previously-created playlists with:
```sh
kubectl --kubeconfig=./grafana.kubeconfig get playlist
```

## Changing protobuf interface

- install [protoc](https://grpc.io/docs/protoc-installation/)
- install the protocol compiler plugin for Go
```sh
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
```
- make changes in `.proto` file
- to compile all protobuf files in the repository run `make protobuf` at its top level

## Setting up search
To enable it, add the following to your `custom.ini` under the `[feature_toggles]` section:
```ini
[feature_toggles]
; Used by the Grafana instance
unifiedStorageSearchUI = true

; Used by unified storage
unifiedStorageSearch = true
; (optional) Allows you to sort dashboards by usage insights fields when using enterprise
; unifiedStorageSearchSprinkles = true
```

The dashboard search page has been set up to search unified storage. Additionally, all legacy search calls (e.g. `/api/search`) will go to
unified storage when the dual writer mode is set to 3 or greater. When <= 2, the legacy search api calls will go to legacy storage.

## Running load tests
Load tests and instructions can be found [here](https://github.com/grafana/grafana-api-tests/tree/main/simulation/src/unified_storage).

## Running with a distributor

For this deployment model, the storage-api server establishes a consistent hashing ring to distribute tenant requests. The distributor serves as the primary request router, mapping incoming traffic to the appropriate storage-api server based on tenant ID. When testing functionalities reliant on this sharded persistence layer, the following steps are mandatory.

### 0. Update your network interface to allow processes to bind to localhost addresses

For this setup to work, we need to have more than one instance of `storage-api` and at least one instance of
`distributor` service. This step is a requirement for MacOS, as it by default will only allow processes to bind to `127.0.0.1` and not
`127.0.0.2`.

Run the command below in your terminal for every IP you want to enable:

```sh
sudo ifconfig lo0 alias <ip> up
```

### 1. Start MySQL DB

The storage server doesn't support `sqlite` so we need to have a dedicated external database. You can start one with
docker in case you don't have one:

```sh
docker run -d --name db -e "MYSQL_DATABASE=grafana" -e "MYSQL_USER=grafana" -e "MYSQL_PASSWORD=grafana" -e "MYSQL_ROOT_PASSWORD=root" -p 3306:3306 docker.io/bitnami/mysql:8.0.31
```

or use our mysql docker block:
```sh
make devenv sources=mysql
```

### 2. Create dedicated ini files for every service

Example distributor ini file:

* Bind grpc/http server to `127.0.0.1`
* Bind and join `memberlist` on `127.0.0.1:7946` (default memberlist port)

```ini
target = search-server-distributor

[server]
http_port = 3000
http_addr = "127.0.0.1"

[grpc_server]
network = "tcp"
address = "127.0.0.1:10000"

[grafana-apiserver]
storage_type = unified

[grpc_server_authentication]
signing_keys_url = http://localhost:3011/api/signing-keys/keys
mode = "on-prem"

[unified_storage]
enable_sharding = true
memberlist_bind_addr = "127.0.0.1"
memberlist_advertise_addr = "127.0.0.1"
memberlist_join_member = "127.0.0.1:7946"
```

Example unified storage ini file:

* Bind grpc/http server to `127.0.0.2`
* Configue MySQL database parameters
* Enable a few feature flags
* Give it a unique `instance_id` (defaults to hostname, so you need to define it locally)
* Bind `memberlist` to `127.0.0.2` and join the member on `127.0.0.1` (the distributor module above)

You can repeat the same configuration for many different storage-api instances by changing the bind address
from `127.0.0.2` to something else, eg `127.0.0.3`

```ini
target = storage-server

[server]
http_port = 3000
http_addr = "127.0.0.2"

[resource_api]
db_type = mysql
db_host = localhost:3306
db_name = grafana ; or whatever you defined in your currently running database
db_user = grafana ; or whatever you defined in your currently running database
db_pass = grafana ; or whatever you defined in your currently running database

[grpc_server]
network = "tcp"
address = "127.0.0.2:10000"

[grafana-apiserver]
storage_type = unified

[grpc_server_authentication]
signing_keys_url = http://localhost:3011/api/signing-keys/keys
mode = "on-prem"

[feature_toggles]
unifiedStorage = true
unifiedStorageSearch = true

[unified_storage]
enable_sharding = true
instance_id = node-0
memberlist_bind_addr = "127.0.0.2"
memberlist_advertise_addr = "127.0.0.2"
memberlist_join_member = "127.0.0.1:7946"
```

Example grafana ini file:

* Bind http server to `127.0.0.2`.
* Explicitly declare the sqlite db. This is so when you run a second instance they don't both try to use the same sqlite
  file.
* Configure the storage api client to talk to the distributor on `127.0.0.1:10000`
* Configure feature flags/modes as desired

Then repeat this configuration and change:

* the `stack_id` to something unique
* the database
* the bind address (so the browser can save the auth for every instance in a different cookie)
```ini
target = all

[environment]
stack_id = 1

[database]
type = sqlite3
name = grafana
user = root
path = grafana1.db

[grafana-apiserver]
address = 127.0.0.1:10000
storage_type = unified-grpc
search_server_address = 127.0.0.1:10000

[server]
protocol = http
http_port = 3011
http_addr = "127.0.0.2"

[feature_toggles]
unifiedStorageSearchUI = true

[unified_storage.dashboards.dashboard.grafana.app]
dualWriterMode = 3
[unified_storage.folders.folder.grafana.app]
dualWriterMode = 3
[unified_storage.playlists.playlist.grafana.app]
dualWriterMode = 4
```

### 3. Run the services

Build the backend:

```sh
GO_BUILD_DEV=1 make build-go
```

You will need a separate process for every service. It's the same command with a separate `ini` file to it. For
example, if you created a `distributor.ini` file in the `conf` directory, this is how you would run the distributor:

```sh
./bin/grafana server target --config conf/distributor.ini
```

Repeat for the other services.

```sh
./bin/grafana server target --config conf/storage-api-1.ini
./bin/grafana server target --config conf/storage-api-2.ini
./bin/grafana server target --config conf/storage-api-3.ini

./bin/grafana server --config conf/grafana1.ini
./bin/grafana server --config conf/grafana2.ini
./bin/grafana server --config conf/grafana3.ini
```

etc

### 4. Verify that it is working

If all is well, you will be able to visit every grafana stack you started and use it normally. Visit
`http://127.0.0.2:3011`, login with `admin`/`admin`, create some dashboards/folders, etc.

For debugging purposes, you can view the memberlist status by visitting `http://127.0.0.1:3000/memberlist` and check
that every instance you create is part of the memberlist.
You can also visit `http://127.0.0.1:3000/ring` to view the ring status and the storage-api servers that are part of the
ring.

---

## Dual Writer System

The Dual Writer system is a critical component of Unified Storage that manages the transition between legacy storage and unified storage during the migration process. It provides six different modes (0-5) that control how data is read from and written to both storage systems.

### Dual Writer Mode Reference Table

| Mode | Description | Read Source | Read Behavior | Write Targets | Write Behavior | Error Handling | Background Sync |
|------|-------------|-------------|---------------|---------------|----------------|----------------|-----------------|
| **0** | Disabled | Legacy Only | Synchronous | Legacy Only | Synchronous | Legacy errors bubble up | None |
| **1** | Legacy Primary + Best Effort Unified | Legacy Only | Legacy: Sync<br/>Unified: Async (background) | Legacy + Unified | Legacy: Sync<br/>Unified: Async (background) | Only legacy errors bubble up.<br/>Unified errors logged but ignored | Active - syncs legacy → unified |
| **2** | Legacy Primary + Unified Sync | Legacy Only | Legacy: Sync<br/>Unified: Sync (verification read) | Legacy + Unified | Legacy: Sync<br/>Unified: Sync | Legacy errors bubble up first.<br/>Unified errors bubble up (except NotFound which is ignored).<br/>If write succeeds in legacy but fails in unified, unified error bubbles up and legacy is cleaned up | Active - syncs legacy → unified |
| **3** | Unified Primary + Legacy Sync | Unified Primary | Unified: Sync<br/>Legacy: Fallback on NotFound | Legacy + Unified | Legacy: Sync<br/>Unified: Sync | Legacy errors bubble up first.<br/>If legacy succeeds but unified fails, unified error bubbles up and legacy is cleaned up | Prerequisite - only available after sync completes |
| **4** | Unified Only (Post-Sync) | Unified Only | Synchronous | Unified Only | Synchronous | Unified errors bubble up | Prerequisite - only available after sync completes |
| **5** | Unified Only (Force) | Unified Only | Synchronous | Unified Only | Synchronous | Unified errors bubble up | None - bypasses sync requirements |


### Dual Writer Architecture

The dual writer acts as an intermediary layer that sits between the API layer and the storage backends, routing read and write operations based on the configured mode.

```mermaid
graph TB
    subgraph "API Layer"
        A[REST API Request]
    end

    subgraph "Dual Writer Layer"
        B[Dual Writer]
        B --> C{Mode Decision}
    end

    subgraph "Storage Backends"
        D[Legacy Storage<br/>SQL Database]
        E[Unified Storage<br/>K8s-style Storage]
    end

    subgraph "Background Services"
        F[Data Syncer<br/>Background Job]
        G[Server Lock Service<br/>Distributed Lock]
    end

    A --> B
    C --> D
    C --> E
    F --> D
    F --> E
    F --> G
```

### Mode-Specific Data Flow Diagrams

#### Mode 0: Legacy Only (Disabled)
```mermaid
sequenceDiagram
    participant API as API Request
    participant DW as Dual Writer
    participant LS as Legacy Storage
    participant US as Unified Storage

    Note over DW: Mode 0 - Unified Storage Disabled

    API->>DW: Read/Write Request
    DW->>LS: Forward Request
    LS-->>DW: Response
    DW-->>API: Response

    Note over US: Not Used
```

#### Mode 1: Legacy Primary + Best Effort Unified
```mermaid
sequenceDiagram
    participant API as API Request
    participant DW as Dual Writer
    participant LS as Legacy Storage
    participant US as Unified Storage
    participant BG as Background Sync

    Note over DW: Mode 1 - Legacy Primary, Unified Best-Effort

    %% Read Operations
    API->>DW: Read Request
    DW->>LS: Read from Legacy
    LS-->>DW: Data
    DW->>US: Read from Unified (Background)
    Note over US: Errors ignored
    DW-->>API: Legacy Data

    %% Write Operations
    API->>DW: Write Request
    DW->>LS: Write to Legacy
    LS-->>DW: Success/Error
    alt Legacy Write Successful
        DW->>US: Write to Unified (Background)
        Note over US: Errors ignored
        DW-->>API: Legacy Result
    else Legacy Write Failed
        DW-->>API: Legacy Error
    end

    BG->>LS: Periodic Sync Check
    BG->>US: Sync Missing Data
```

#### Mode 2: Legacy Primary + Unified Sync
```mermaid
sequenceDiagram
    participant API as API Request
    participant DW as Dual Writer
    participant LS as Legacy Storage
    participant US as Unified Storage
    participant BG as Background Sync

    Note over DW: Mode 2 - Legacy Primary, Unified Synchronous

    %% Read Operations
    API->>DW: Read Request
    DW->>LS: Read from Legacy
    LS-->>DW: Data
    DW->>US: Verification Read (Foreground)
    Note over US: Verifies unified storage can serve the same object
    US-->>DW: Success/Error
    alt Verification Read Failed (Non-NotFound)
        DW-->>API: Unified Error
    else Verification Read Success or NotFound
        DW-->>API: Legacy Data
    end

    %% Write Operations
    API->>DW: Write Request
    DW->>LS: Write to Legacy
    LS-->>DW: Success/Error
    alt Legacy Write Successful
        DW->>US: Write to Unified (Foreground)
        US-->>DW: Success/Error
        alt Unified Write Failed
            DW->>LS: Cleanup Legacy (Best Effort)
            DW-->>API: Unified Error
        else Both Writes Successful
            DW-->>API: Legacy Result
        end
    else Legacy Write Failed
        DW-->>API: Legacy Error
    end

    BG->>LS: Periodic Sync Check
    BG->>US: Sync Missing Data
```

#### Mode 3: Unified Primary + Legacy Sync
```mermaid
sequenceDiagram
    participant API as API Request
    participant DW as Dual Writer
    participant LS as Legacy Storage
    participant US as Unified Storage

    Note over DW: Mode 3 - Unified Primary, Legacy Sync
    Note over DW: Only activated after background sync succeeds

    %% Read Operations
    API->>DW: Read Request
    DW->>US: Read from Unified
    US-->>DW: Data/Error
    alt Unified Read NotFound
        DW->>LS: Fallback to Legacy
        LS-->>DW: Data/Error
        DW-->>API: Legacy Result
    else Unified Read Success
        DW-->>API: Unified Data
    end

    %% Write Operations
    API->>DW: Write Request
    DW->>LS: Write to Legacy
    LS-->>DW: Success/Error
    alt Legacy Write Successful
        DW->>US: Write to Unified
        US-->>DW: Success/Error
        alt Unified Write Failed
            DW->>LS: Cleanup Legacy (Best Effort)
            DW-->>API: Unified Error
        else Both Writes Successful
            DW-->>API: Unified Result
        end
    else Legacy Write Failed
        DW-->>API: Legacy Error
    end
```

#### Mode 4 & 5: Unified Only
```mermaid
sequenceDiagram
    participant API as API Request
    participant DW as Dual Writer
    participant LS as Legacy Storage
    participant US as Unified Storage

    Note over DW: Mode 4/5 - Unified Only
    Note over DW: Mode 4: After background sync succeeds
    Note over DW: Mode 5: Ignores background sync state

    API->>DW: Read/Write Request
    DW->>US: Forward Request
    US-->>DW: Response
    DW-->>API: Response

    Note over LS: Not Used
```

### Background Sync Behavior

The background sync service runs periodically (default: every hour) and is responsible for:

1. **Data Synchronization**: Ensures legacy and unified storage contain the same data
2. **Mode Progression**: Enables transition from Mode 2 → Mode 3 → Mode 4
3. **Conflict Resolution**: Handles cases where data exists in one storage but not the other

#### Sync Process Flow

```mermaid
flowchart TD
    A[Background Sync Trigger] --> B{Current Mode}

    B -->|Mode 1/2| C[Acquire Distributed Lock]
    B -->|Mode 3+| Z[No Sync Needed]

    C --> D[List Legacy Storage Items]
    D --> E[List Unified Storage Items]
    E --> F[Compare All Items]

    F --> G{Item Comparison}

    G -->|Missing in Unified| H[Create in Unified]
    G -->|Missing in Legacy| I[Delete from Unified]
    G -->|Different Content| J[Update Unified with Legacy Version]
    G -->|Identical| K[No Action Needed]

    H --> L[Track Sync Success]
    I --> L
    J --> L
    K --> L

    L --> M{All Items Synced?}
    M -->|Yes| N[Mark Sync Complete<br/>Enable Mode Progression]
    M -->|No| O[Log Failures<br/>Retry Next Cycle]

    N --> P[Release Lock]
    O --> P
    Z --> P
```

#### Mode Transition Requirements

- **Mode 0 → Mode 1**: Configuration change only
- **Mode 1 → Mode 2**: Configuration change only
- **Mode 2 → Mode 3**: Requires successful background sync completion
- **Mode 3 → Mode 4**: Requires successful background sync completion
- **Mode 4 → Mode 5**: Configuration change only
- **Any Mode → Mode 5**: Configuration change only (bypasses sync requirements)

### Error Handling Strategies

#### Write Operation Error Priority
1. **Legacy Storage Errors**: Always bubble up immediately if legacy write fails
2. **Unified Storage Errors**:
   - Mode 1: Logged but ignored
   - Mode 2+: Bubble up after legacy cleanup attempt
3. **Cleanup Operations**: Best effort - failures are logged but don't fail the original operation

#### Read Operation Fallback
- **Mode 2**: `NotFound` errors from unified storage are ignored (object may not be synced yet), but other errors bubble up
- **Mode 3**: If unified storage returns `NotFound`, automatically falls back to legacy storage
- **Other Modes**: No fallback - errors bubble up directly

### Configuration

#### Setting Dual Writer Mode
```ini
[unified_storage.{resource}.{kind}.{group}]
dualWriterMode = {0-5}
```

#### Background Sync Configuration
```ini
[unified_storage]
; Enable data sync between legacy and unified storage
enable_data_sync = true

; Sync interval (default: 1 hour)
data_sync_interval = 1h

; Maximum records to sync per run (default: 1000)
data_sync_records_limit = 1000

; Skip data sync requirement for mode transitions
skip_data_sync = false
```

### Monitoring and Observability

The dual writer system provides metrics for monitoring:

- `dual_writer_requests_total`: Counter of requests by mode, operation, and status
- `dual_writer_sync_duration_seconds`: Histogram of background sync duration
- `dual_writer_sync_success_total`: Counter of successful sync operations
- `dual_writer_mode_transitions_total`: Counter of mode transitions

Use these metrics to monitor the health of your migration and identify any issues with the dual writer system.

---

## Unified Search System

The Unified Search system provides a scalable, distributed search capability for Grafana's Unified Storage. It uses a ring-based architecture to distribute search requests across multiple search server instances, with namespace-based sharding for optimal performance and data distribution.

### System Architecture

The search system provides both unified and legacy search capabilities, with routing based on dual writer mode configuration:

```mermaid
graph TB
    subgraph "Request Sources"
        A[Grafana UI<br/>User Search]
        B[Dashboard Service<br/>Resource Searches]
        C[Folder Service<br/>Resource Searches]
        D[Alerting Service<br/>Resource Searches]
        E[Provisioning Service<br/>Resource Searches]
        F[API Endpoints<br/>Search Operations]
    end

    subgraph "API Gateway Layer"
        G[Grafana API Server<br/>Search Endpoint]
        H[Search Client<br/>Dual Writer Aware]
    end

    subgraph "Routing Decision"
        I{Dual Writer Mode<br/>Check}
    end

    subgraph "Unified Search Path (Mode 3+)"
        J[Search Distributor<br/>Ring-based Routing]
        K[Ring<br/>Consistent Hashing]
        L[Search API Server 1<br/>Namespace Sharding<br/>+ Embedded Bleve Backend]
        M[Search API Server 2<br/>Namespace Sharding<br/>+ Embedded Bleve Backend]
        N[Search API Server 3<br/>Namespace Sharding<br/>+ Embedded Bleve Backend]
        O[Unified Storage<br/>K8s-style Resources]
    end

    subgraph "Legacy Search Path (Mode 0-2)"
        Q[Legacy Search Service<br/>SQL-based Search]
        R[Legacy Storage<br/>Traditional Tables]
        S[Shadow Traffic<br/>Mode 1-2 + Flag Enabled]
    end

    A --> G
    B --> H
    C --> H
    D --> H
    E --> H
    F --> G
    G --> H
    H --> I

    I -->|Mode 3+| J
    I -->|Mode 0-2| Q

    J --> K
    K --> L
    K --> M
    K --> N
    L -.->|Just-in-Time<br/>Indexing| O
    M -.->|Just-in-Time<br/>Indexing| O
    N -.->|Just-in-Time<br/>Indexing| O

    Q --> R
    Q -.->|Shadow Traffic<br/>Mode 1-2 + Flag| S
    S -.-> J
```

### Search Backend Routing

The search client routes requests based on the dual writer mode configuration for each resource type:

#### Dual Writer Mode → Backend Routing
- **Mode 0-2**: Route to **Legacy Search**
  - Mode 1-2: Shadow traffic to Unified Search (if `unifiedStorageSearchDualReaderEnabled` is enabled)
  - Mode 0: No shadow traffic
- **Mode 3+**: Route to **Unified Search**
  - No shadow traffic needed (unified is primary)

### Feature Flags

Unified Search requires several feature flags to be enabled depending on the desired functionality:

#### Prerequisites (Required for Unified Storage)

| Feature Flag | Purpose | Stage | Required For |
|--------------|---------|-------|--------------|
| `grafanaAPIServerWithExperimentalAPIs` | Allow experimental API groups | Development | Access to v0alpha1 APIs (including search) |

#### Unified Search Specific Flags

| Feature Flag | Purpose | Stage | Required For |
|--------------|---------|-------|--------------|
| `unifiedStorageSearch` | Core search functionality | Experimental | Search API servers, indexing |
| `unifiedStorageSearchUI` | Frontend search interface | Experimental | Grafana UI search |
| `unifiedStorageSearchSprinkles` | Usage insights integration | Experimental | Dashboard usage sorting (Enterprise) |
| `unifiedStorageSearchDualReaderEnabled` | Shadow traffic to unified search | Experimental | Shadow traffic during migration |

#### Basic Configuration
```ini
[feature_toggles]
; Prerequisites for unified storage (required)
grafanaAPIServerWithExperimentalAPIs = true

; Core search functionality (required)
unifiedStorageSearch = true

; Enable search UI (required for frontend)
unifiedStorageSearchUI = true

; Enable shadow traffic during migration (optional)
unifiedStorageSearchDualReaderEnabled = true

; Enable usage insights sorting (Enterprise only)
unifiedStorageSearchSprinkles = true
```

### Request Flow Diagrams

#### Search Request Flow with Dual Writer Mode Routing

Search requests originate from multiple sources, and the search client routes based on dual writer mode configuration:

```mermaid
flowchart TD
    A[Search Request] --> B{Dual Writer Mode}
    B -->|Mode 3+| C[Unified Search]
    B -->|Mode 0-2| D[Legacy Search]
    C --> E[Return Results]
    D --> E
```

#### Search Request Flow with Shadow Traffic

When `unifiedStorageSearchDualReaderEnabled` is enabled and resource is in dual writer Mode 1-2 (legacy primary), shadow traffic is generated:

```mermaid
flowchart TD
    A[Search Request] --> B{Shadow Traffic Enabled?}
    B -->|Yes| C[Primary: Legacy Search]
    B -->|No| D[Single Search Path]
    C --> E[Background: Unified Search]
    C --> F[Return Legacy Results]
    E --> G[Log Results for Comparison]
    D --> H[Return Results]
```

### Distributor Architecture

The Search Distributor acts as a smart proxy that routes search requests to the appropriate search API server based on namespace hashing:

```mermaid
flowchart TD
    A[Incoming Search Request] --> B[Hash Namespace]
    B --> C[Select Random Instance]
    C --> D[Proxy Request]
    D --> E[Return Response]
```

#### Key Features:
- **Namespace-based routing**: Each request is routed based on the target namespace
- **Load balancing**: Random selection among available replicas for the namespace
- **Health awareness**: Only routes to `ACTIVE` ring instances
- **Connection pooling**: Reuses gRPC connections for efficiency
- **Proxy headers**: Adds metadata for debugging and tracing

### Ring Architecture

The hash ring provides consistent, distributed assignment of namespaces to search API servers:

```mermaid
flowchart TD
    A[Namespace] --> B[Hash Function]
    B --> C[Ring Position]
    C --> D[Assigned Instance]
    D --> E[Search Processing]
```

#### Ring Properties:
- **Consistent hashing**: Uses FNV32 hash function for namespace distribution
- **128 tokens per instance**: Provides good distribution across the ring
- **Replication factor**: Configurable redundancy (default based on cluster size)
- **State management**: Instances transition through JOINING → ACTIVE → LEAVING
- **Automatic rebalancing**: Ring adjusts when instances join/leave

### Namespace-Based Sharding

Unified Search uses namespace-based sharding to distribute search indexes across multiple search API servers:

```mermaid
flowchart LR
    A[Namespaces] --> B[Hash Ring]
    B --> C[Search Server 1]
    B --> D[Search Server 2]
    B --> E[Search Server 3]
    C --> F[Indexes for Assigned Namespaces]
    D --> G[Indexes for Assigned Namespaces]
    E --> H[Indexes for Assigned Namespaces]
```

#### Sharding Benefits:
1. **Horizontal scalability**: Add more search servers to handle more namespaces
2. **Resource isolation**: Each namespace's index is independent
3. **Parallel processing**: Multiple searches can run simultaneously across different servers
4. **Fault tolerance**: Namespace availability depends only on its assigned server(s)

#### Sharding Algorithm:
```go
func getSearchServer(namespace string) string {
    hash := fnv.New32a()
    hash.Write([]byte(namespace))

    // Get replication set from ring
    replicationSet := ring.GetWithOptions(
        hash.Sum32(),
        searchRingRead,
        ring.WithReplicationFactor(ring.ReplicationFactor())
    )

    // Random load balancing within replication set
    instance := replicationSet.Instances[rand.Intn(len(replicationSet.Instances))]
    return instance.Id
}
```

### Search Index Management

Each search API server contains an embedded Bleve search engine that manages indexes for its assigned namespaces:

```mermaid
flowchart TD
    A[Search Request] --> B{Index Ready?}
    B -->|Yes| C[Query Index]
    B -->|No| D[Build Index]
    D --> E[Fetch Resources]
    E --> F[Create Index]
    F --> C
    C --> G[Return Results]

    H[Resource Changes] --> I[Update Index]
    I --> F
```

#### Index Architecture Details:

**Embedded Bleve Backend**: Each Search API Server contains its own Bleve search engine instance, not a shared external service.

**Just-in-Time Indexing**: When a search request arrives for a namespace that doesn't have an index (or has an outdated index):
1. The Search API Server fetches all resources for that namespace from Unified Storage
2. Builds search documents in memory
3. Creates either a memory-based or disk-based Bleve index depending on size
4. Executes the search query against the newly built index
5. Returns results to the user

**Index Storage Strategy**:
- **Memory indexes**: For small datasets (< `index_file_threshold` documents)
- **Disk indexes**: For large datasets (≥ `index_file_threshold` documents)
- Indexes are stored per Search API Server instance, not globally shared

**Background Updates**: In addition to just-in-time indexing, Search API Servers also maintain indexes through background watch events for incremental updates.

#### Index Configuration:
```ini
[unified_storage]
; Path for disk-based search indexes
index_path = /var/lib/grafana/unified-search/bleve

; Threshold for file-based vs memory indexes
index_file_threshold = 1000

; Maximum batch size for indexing
index_max_batch_size = 100

; Number of worker threads for indexing
index_workers = 4

; Cache TTL for indexes
index_cache_ttl = 1h

; Periodic rebuild interval (for usage insights)
index_rebuild_interval = 24h

; Minimum resource count required to build an index (default: 1)
; If a namespace has fewer resources than this threshold, no index will be created
index_min_count = 1

; Maximum resource count before creating an empty index (default: 0 = no limit)
; When exceeded, creates an empty index instead of indexing all resources for performance
index_max_count = 0
```

### Search Request Sources

Unified Search serves multiple types of consumers within the Grafana ecosystem:

#### 1. User-Initiated Searches
- **Source**: Grafana UI search interface
- **Purpose**: Interactive dashboard and folder discovery
- **Characteristics**: Real-time, user-facing, latency-sensitive
- **Endpoint**: `/api/v1/search` (legacy search UI) or `/apis/dashboard.grafana.app/v0alpha1/namespaces/{namespace}/search` (when `unifiedStorageSearchUI` is enabled)

#### 2. Internal Service Searches

Internal services use different search backends depending on dual writer mode configuration:

- **Dashboard Service**:
  - Find related dashboards based on tags, folders, or content
  - Discover dashboards for playlist creation
  - Validate dashboard references during operations
  - **Backend**: Depends on dashboard dual writer mode (Legacy for Mode 0-2, Unified for Mode 3+)

- **Folder Service**:
  - Retrieve folder contents and nested structures
  - Resolve folder hierarchy relationships
  - Check folder permissions and accessibility
  - **Backend**: Depends on folder dual writer mode (Legacy for Mode 0-2, Unified for Mode 3+)

- **Alerting Service**:
  - Discover dashboards and panels for alert rule creation
  - Find existing alert rules across namespaces
  - Resolve dashboard/panel references in alert definitions
  - **Backend**: Mixed - dashboard searches use dashboard dual writer mode, alert rule searches typically use legacy

- **Provisioning Service**:
  - Check for existing resources before provisioning
  - Validate resource uniqueness and naming conflicts
  - Discover resources for bulk operations
  - **Backend**: Depends on each resource type's dual writer mode configuration

- **API Services**:
  - Backend support for various API endpoints
  - Resource validation and dependency checking
  - **Backend**: Routes based on resource type's dual writer mode

#### 3. Search Operation Types

Unified Search supports multiple types of search operations:

##### Resource Search
- **Purpose**: Find resources (dashboards, folders, etc.) by content
- **Endpoint**: `/api/v1/search` (legacy) or `/apis/dashboard.grafana.app/v0alpha1/namespaces/{namespace}/search` (when `unifiedStorageSearchUI` is enabled)
- **Additional endpoint**: `/apis/dashboard.grafana.app/v0alpha1/namespaces/{namespace}/search/sortable` for retrieving sortable fields
- **Features**: Full-text search, filtering, sorting

**Sortable Fields:**

The `/search/sortable` endpoint currently returns a limited static list:
```json
{
  "fields": [
    {"field": "title", "display": "Title (A-Z)", "type": "string"},
    {"field": "-title", "display": "Title (Z-A)", "type": "string"}
  ]
}
```

However, the search backend actually supports sorting by many more fields:

**Standard Fields:**
- `title` - Resource display name (uses `title_phrase` for exact sorting)
- `name` - Kubernetes resource name
- `description` - Resource description
- `folder` - Parent folder name
- `created` - Creation timestamp (int64)
- `updated` - Last update timestamp (int64)
- `createdBy` - Creator user ID
- `updatedBy` - Last updater user ID
- `tags` - Resource tags (array)
- `rv` - Resource version (int64)

**Dashboard-Specific Fields** (require `fields.` prefix):
- `fields.schema_version` - Dashboard schema version
- `fields.link_count` - Number of dashboard links
- `fields.panel_types` - Panel types used in dashboard
- `fields.ds_types` - Data source types used
- `fields.transformation` - Transformations used

**Usage Insights Fields** (Enterprise only, require `fields.` prefix):
- `fields.views_total` - Total dashboard views
- `fields.views_last_1_days` / `fields.views_last_7_days` / `fields.views_last_30_days` - Recent views
- `fields.views_today` - Today's views
- `fields.queries_total` - Total queries executed
- `fields.queries_last_1_days` / `fields.queries_last_7_days` / `fields.queries_last_30_days` - Recent queries
- `fields.queries_today` - Today's queries
- `fields.errors_total` - Total errors
- `fields.errors_last_1_days` / `fields.errors_last_7_days` / `fields.errors_last_30_days` - Recent errors
- `fields.errors_today` - Today's errors

**Usage Examples:**
```bash
# Sort by title (ascending)
GET /apis/dashboard.grafana.app/v0alpha1/namespaces/{namespace}/search?sortBy=title

# Sort by creation date (descending)
GET /apis/dashboard.grafana.app/v0alpha1/namespaces/{namespace}/search?sortBy=-created

# Sort by usage insights (Enterprise)
GET /apis/dashboard.grafana.app/v0alpha1/namespaces/{namespace}/search?sortBy=-fields.views_total
```

*Note: There's currently a discrepancy between the limited fields exposed by `/search/sortable` and the full range of fields actually supported by the search backend.*

##### Federated Search
- **Purpose**: Search across multiple resource types simultaneously
- **Features**: Cross-resource queries, unified result ranking, combined sorting and faceting
- **Implementation**: Uses Bleve IndexAlias to combine multiple indexes for unified searching
- **Default behavior**: When no type is specified, automatically federates dashboards and folders

**How Federated Search Works:**

Federated search is implemented using Bleve's IndexAlias feature, which allows searching across multiple indexes as if they were a single unified index. This enables:

1. **Cross-resource queries**: Search for content across dashboards, folders, and other resource types
2. **Unified sorting**: Results from different resource types are merged and sorted together
3. **Combined faceting**: Aggregate facet statistics across all federated resource types
4. **Permission filtering**: Respects user permissions for each resource type independently

**API Usage Examples:**

**1. Default Federation (Dashboards + Folders):**
```bash
# When no type is specified, automatically searches dashboards and folders
GET /apis/dashboard.grafana.app/v0alpha1/namespaces/{namespace}/search?query=my-search
```

**2. Single Resource Type Search:**
```bash
# Search only folders (despite the "dashboard" API group, type parameter controls what's searched)
GET /apis/dashboard.grafana.app/v0alpha1/namespaces/{namespace}/search?type=folders&query=my-search

# Search only dashboards
GET /apis/dashboard.grafana.app/v0alpha1/namespaces/{namespace}/search?type=dashboards&query=my-search
```

**3. Explicit Two-Type Federation:**
```bash
# Search dashboards (primary) with folders federated
GET /apis/dashboard.grafana.app/v0alpha1/namespaces/{namespace}/search?type=dashboards&type=folders&query=my-search
```

**4. Protocol Buffer Request Structure:**
```protobuf
message ResourceSearchRequest {
  ListOptions options = 1;          // Primary resource type to search
  repeated ResourceKey federated = 2; // Additional resource types to federate
  string query = 3;                 // Search query applied across all types
  // ... other fields
}
```

**Example gRPC/Protocol Buffer Usage:**
```go
searchRequest := &resourcepb.ResourceSearchRequest{
    Options: &resourcepb.ListOptions{
        Key: dashboardKey, // Primary: search dashboards
    },
    Federated: []*resourcepb.ResourceKey{
        folderKey, // Also search folders
    },
    Query: "monitoring",
    Limit: 50,
    SortBy: []*resourcepb.ResourceSearchRequest_Sort{
        {Field: "title", Desc: false}, // Sort combined results by title
    },
}
```

**5. Unified Results:**
Federated search returns a single result set containing resources from all specified types, with:
- **Unified ranking**: All results scored and ranked together
- **Cross-type sorting**: Resources from different types sorted by common fields (title, tags, etc.)
- **Resource type identification**: Each result includes metadata indicating its resource type
- **Permission-aware filtering**: Only returns resources the user has permission to see

**Limitations:**
- Federation only works across resource types with **common fields** (title, tags, folder, etc.)
- All federated indexes must be of the same search backend type (currently Bleve)
- Currently supports up to 2 resource types in federation via the API endpoint
- **Architectural note**: The search endpoint is under `dashboard.grafana.app` but can search any resource type via the `type` parameter - this is a design choice where the "dashboard search" has evolved into a generic search endpoint

##### Managed Objects
- **Purpose**: Administrative queries for resource management
- **Operations**: Count, list, statistics

##### Stats and Monitoring
- **Purpose**: Index health and performance metrics
- **Metrics**: Document counts, index sizes, search latency


### Monitoring and Observability

Key metrics for monitoring Unified Search:

- `unified_search_requests_total`: Search request counts by type and status
- `unified_search_request_duration_seconds`: Search request latency
- `unified_search_index_size_bytes`: Size of search indexes
- `unified_search_documents_total`: Number of indexed documents
- `unified_search_indexing_duration_seconds`: Time to build/update indexes
- `unified_search_shadow_requests_total`: Shadow traffic request counts
- `unified_search_ring_members`: Number of active search server instances