Commit Graph

17 Commits

Author SHA1 Message Date
Alexander Weaver 3634079b8f Alerting: Attach hash of instance labels to state history log lines (#65968)
* Add instanceID which is hash of labels

* Rename field to fingerprint

* Move to prometheus style signature

* Appease linter
2023-04-19 14:22:19 -05:00
Alexander Weaver cf7157f683 Alerting: Capture refID of rule's condition expression in Loki state history entries (#66419)
* Capture condition from rule

* Add test
2023-04-18 14:21:28 -05:00
Alexander Weaver 5e87ea745d Alerting: Fix and re-enable filters instance labels in log line test (#65618)
Fix and reenable test
2023-03-30 09:02:18 -05:00
Dimitris Sotirakis e758b017d0 Alerting: Disable filters instance labels in log line test (#65610)
* Disable filters instance labels in log line test

* Add drone reference
2023-03-30 16:04:29 +03:00
Alexander Weaver a416100abc Alerting: No longer index state history log streams by instance labels (#65474)
* Remove private labels

* No longer index by instance labels

* Labels are now invariant, only build them once

* Remove bucketing since everything is in a single stream

* Refactor statesToStreams to only return a single unified log stream

* Don't query on labels that no longer exist

* Move selector logic to loki layer, genericize client to work in terms of straight logQL

* Add support for line-level label filters in query

* Combine existing selector tests for better parallelism

* Tests for logQL construction

* Underscore instead of dot for unwrapping labels in logql
2023-03-29 11:52:11 -05:00
Alexander Weaver de1637afe5 Alerting: Add alert instance labels to Loki log lines in addition to stream labels (#65403)
Add instance labels to log line
2023-03-28 08:57:51 -05:00
Alexander Weaver dd04757fc9 Alerting: Add "backend" label to state history writes metrics (#65395)
* Add backend label to state history writes metrics

* Update test expectations
2023-03-28 08:49:51 -05:00
Alexander Weaver 07368dec74 Alerting: Fix attachment of external labels to Loki state history log streams (#65140)
Fix attachment of external labels, add tests
2023-03-21 18:00:59 -05:00
Alexander Weaver bf54f2672e Alerting: Switch to snappy-compressed-protobuf for outgoing push requests to Loki (#65077)
* Encode with snappy, always

* JSON encoder type

* Headers

* Copy labels formatter from promtail

* Implement snappy-proto encoding

* Create encoder interface, test both encoders, choose snappy-proto by default

* Make encoder configurable at the LokiCfg level

* Export both encoders

* Touch up comment and tests

* Drop unnecessary conversions after move to plain strings to appease linter
2023-03-21 13:38:42 -05:00
Alexander Weaver cc7e5ce62e Alerting: Fix ambiguous handling of equals in labels when bucketing Loki state history streams (#65013)
* Use JSON instead of data.Labels string format as label repr

* Drop debug log line
2023-03-21 12:33:27 -05:00
Alexander Weaver e39d7f44c9 Alerting: Elide requests to Loki if nothing should be recorded (#65011)
Exit early if no log streams or annotations
2023-03-21 09:30:56 -05:00
Alexander Weaver a31672fa40 Alerting: Create new state history "fanout" backend that dispatches to multiple other backends at once (#64774)
* Rename RecordStatesAsync to Record

* Rename QueryStates to Query

* Implement fanout writes

* Implement primary queries

* Simplify error joining

* Add test for query path

* Add tests for writes and error propagation

* Allow fanout backend to be configured

* Touch up log messages and config validation

* Consistent documentation for all backend structs

* Parse and normalize backend names more consistently against an enum

* Touch-ups to documentation

* Improve clarity around multi-record blocking

* Keep primary and secondaries more distinct

* Rename fanout backend to multiple backend

* Simplify config keys for multi backend mode
2023-03-17 12:41:18 -05:00
Alexander Weaver 19d01dff91 Alerting: Expose Prometheus metrics for persisting state history (#63157)
* Create historian metrics and dependency inject

* Record counter for total number of state transitions logged

* Track write failures

* Track current number of active write goroutines

* Record histogram of how long it takes to write history data

* Don't copy the registerer

* Adjust naming of write failures metric

* Introduce WritesTotal to complement WritesFailedTotal

* Measure TransitionsFailedTotal to complement TransitionsTotal

* Rename all to state_history

* Remove redundant Total suffix

* Increment totals all the time, not just on success

* Drop ActiveWriteGoroutines

* Drop PersistDuration in favor of WriteDuration

* Drop unused gauge

* Make writes and writesFailed per org

* Add metric indicating backend and a spot for future metadata

* Drop _batch_ from names and update help

* Add metric for bytes written

* Better pairing of total + failure metric updates

* Few tweaks to wording and naming

* Record info metric during composition

* Create fakeRequester and simple happy path test using it

* Blocking test for the full historian and test for happy path metrics

* Add tests for failure case metrics

* Smoke test for full annotation persistence

* Create test for metrics on annotation persistence, both happy and failing paths

* Address linter complaints

* More linter complaints

* Remove unnecessary whitespace

* Consistency improvements to help texts

* Update tests to match new descs
2023-03-06 10:40:37 -06:00
Alexander Weaver 958fb2c50a Alerting: Unify structs in Loki client and make them more consistent with Prometheus (#63055)
* Use existing row struct instead of [2]string, add deserialization helper

* Replace Stream struct with stream struct which is exactly the same

* Drop unused status field

* Don't export queryRes and queryData

* Tests for custom marshalling

* Rename row fields to T and V for consistency with prometheus samples

* Rename row to sample
2023-02-11 05:17:44 -06:00
Jean-Philippe Quéméner 1694a35e0c Alerting: implement loki query for alert state history (#61992)
* Alerting: implement loki query for alert state history

* extract selector building

* add unit tests for selector creation

* backup

* give selectors their own type

* build dataframe

* add some tests

* small changes after manual testing

* use struct client

* golint

* more golint

* Make RuleUID optional for Loki implementation

* Drop initial assumption that we only have one series

* Pare down to three columns, fix timestamp overflows, improve failure cases in loki responses

* Embed structred log lines in the dataframe as objects rather than json strings

* Include state history label filter

* Remove dead code

---------

Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com>
2023-02-02 16:31:51 -06:00
Alexander Weaver 647f73ddc5 Alerting: Add static label to all state history entries (#62817)
* Add static label to all state history entries

* Separate label and value visually
2023-02-02 13:25:26 -06:00
Alexander Weaver 9fa28c11c5 Alerting: Usability adjustments to Loki representation of state history values (#62643)
* Extract label merge, add test file

* Extract error/NoData to first class fields, remove a layer from values

* Include dashUID and panelID as line-level fields

* Drop unnecessary object receiver

* Add tests for stream building

* Drop NoData field from log lines
2023-02-02 10:54:10 -06:00