LAB-004 — OBSERVABILITY RESEARCH

Grafana

Telemetry & ML Defense Monitoring

The visualization layer for our threat intelligence pipeline. Tracking model inference latency, honeypot attack vectors, and autonomous defense actions in real-time.

All Services
Data Monitoring Dashboards

Validating model performance.

Our deployment of Grafana goes beyond standard system administration; it is the primary lens through which we analyze the performance of local LLMs and the efficacy of our ML-driven threat detection pipelines. By correlating inference latency with GPU power states, we optimize architectures for edge deployment.

Key References

Turnbull, J. (2018). "Monitoring with Prometheus." Turnbull Press.

CPU
Memory
GPU VRAM
Containers

Prometheus → Grafana pipeline.

Node exporters and cAdvisor collect metrics from both Hub and Satellite nodes. Prometheus scrapes every 15 seconds, Grafana visualizes with custom dashboards, and Uptime Kuma monitors external availability from the edge.

📡
Collectors
Node Exporter · cAdvisor

System metrics, container stats, GPU utilization from all nodes

⚙️
Time-Series DB
Prometheus

15s scrape interval, PromQL queries, alert rules evaluation

📊
Visualization
Grafana

Custom dashboards, real-time graphs, threshold alerts

🔍
External Monitor
Uptime Kuma

External availability checks from edge node with status pages

CPU over time
Memory over time
GPU utilization
Container count

Everything. Monitored.

Docker Overview

Container health, resource usage, restart counts across 42+ services.

System Resources

CPU, RAM, disk I/O, network throughput for Hub and Satellite nodes.

Security Events

Suricata alerts, CrowdSec bans, honeypot activity, Wazuh SIEM events.

GPU & AI Metrics

VRAM usage, GPU temperature, model inference latency, Ollama throughput.

Service Health

Uptime tracking, response times, SSL cert expiry, DNS resolution.

Alert Rules

Active alerts, firing history, notification routing to ntfy channels.