OpenShift audit works at the API server level, logging all requests coming to the server.
However, if API server instance is unable to write errors, an alert must be issued
in order for the organization to take a relevant action. e.g. shutting down that instance.
Kubernetes by default has metrics that enable one to write such alerts:
apiserver_audit_event_total
apiserver_audit_error_total
Such an example is shipped in OCP 4.9+
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: audit-errors
namespace: openshift-kube-apiserver
spec:
groups:
- name: apiserver-audit
rules:
- alert: AuditLogError
annotations:
summary: |-
An API Server instance was unable to write audit logs. This could be
triggered by the node running out of space, or a malicious actor
tampering with the audit logs.
description: An API Server had an error writing to an audit log.
expr: |
sum by (apiserver,instance)(rate(apiserver_audit_error_total{apiserver=~".+-apiserver"}[5m])) / sum by (apiserver,instance) (rate(apiserver_audit_event_total{apiserver=~".+-apiserver"}[5m])) > 0
for: 1m
labels:
severity: warning
For more information, consult the
official Kubernetes documentation.