LAB-004 — OBSERVABILITY RESEARCH

Grafana

Telemetry & ML Defense Monitoring

The visualization layer for our threat intelligence pipeline. Tracking model inference latency, honeypot attack vectors, and autonomous defense actions in real-time.

All Services

Research Context

Validating model performance.

Our deployment of Grafana goes beyond standard system administration; it is the primary lens through which we analyze the performance of local LLMs and the efficacy of our ML-driven threat detection pipelines. By correlating inference latency with GPU power states, we optimize architectures for edge deployment.

Key References

Turnbull, J. (2018). "Monitoring with Prometheus." Turnbull Press.

CPU

—

Memory

—

GPU VRAM

—

Containers

—

Monitoring Stack

Prometheus → Grafana pipeline.

Node exporters and cAdvisor collect metrics from both Hub and Satellite nodes. Prometheus scrapes every 15 seconds, Grafana visualizes with custom dashboards, and Uptime Kuma monitors external availability from the edge.

📡

Collectors

Node Exporter · cAdvisor

System metrics, container stats, GPU utilization from all nodes

⚙️

Time-Series DB

Prometheus

15s scrape interval, PromQL queries, alert rules evaluation

📊

Visualization

Grafana

Custom dashboards, real-time graphs, threshold alerts

🔍

External Monitor

Uptime Kuma

External availability checks from edge node with status pages

Live Charts

CPU over time

Memory over time

GPU utilization

Container count

Dashboards

Everything. Monitored.

Docker Overview

Container health, resource usage, restart counts across 42+ services.

System Resources

CPU, RAM, disk I/O, network throughput for Hub and Satellite nodes.

Security Events

Suricata alerts, CrowdSec bans, honeypot activity, Wazuh SIEM events.

GPU & AI Metrics

VRAM usage, GPU temperature, model inference latency, Ollama throughput.

Service Health

Uptime tracking, response times, SSL cert expiry, DNS resolution.

Alert Rules

Active alerts, firing history, notification routing to ntfy channels.