Using Grafana / Prometheus with StoredSafe

Note

Health metrics via SNMPv3 are available from StoredSafe versions later than 4.1.0 (build 7120).

Background

StoredSafe exposes metrics via SNMPv3 using Net-SNMP extend (NET-SNMP-EXTEND-MIB). Monitoring access is read-only and performed by SNMP polling.

Note

StoredSafe requires SNMPv3 with authPriv for all metrics access. Unauthenticated or unencrypted SNMP is not supported, as operational metrics may expose sensitive system state and health information.

SNMPv1, SNMPv2c, and SNMPv3 without privacy (authNoPriv) are not supported.

StoredSafe publishes metrics as plain text in the format:

key value

All values are numeric.

Why a “bridge” is needed

Prometheus and Grafana work best with numeric time-series metrics (gauges/counters). SNMP extend returns text values, so customers typically deploy a small “bridge” that:

  • polls SNMPv3

  • extracts numeric values from the returned text

  • exposes them as Prometheus metrics for Grafana dashboards and alerting

StoredSafe is responsible for exposing metrics via SNMP only. All collection, transformation, storage, and alerting logic is external.

StoredSafe does not initiate outbound connections for monitoring purposes.

Option B: Telegraf SNMP input -> Prometheus/InfluxDB -> Grafana

Telegraf can poll SNMPv3 targets and transform fields using processor plugins. This is useful when string values must be parsed and converted to numeric fields.

High-level steps

  1. Deploy Telegraf and configure SNMPv3 credentials.

  2. Poll StoredSafe extend output (prefer nsExtendOutLine."metrics").

  3. Extract the numeric value from each key value line and store it in a numeric field.

  4. Export to Prometheus (Telegraf Prometheus output) or to InfluxDB, and visualize in Grafana.

Grafana / Prometheus FAQ

What do I need to run?

To visualize StoredSafe metrics in Grafana, you need:

  • StoredSafe exposes metrics exclusively via SNMPv3 using the authPriv security level. Credentials are configured by a system administrator in the StoredSafe management console.

  • One of the following bridges: - Prometheus snmp_exporter (recommended), or - Telegraf with SNMP input and parsing/processing.

  • Grafana connected to Prometheus or InfluxDB.

No CLI or operating system access to the StoredSafe appliance is available to customers. All configuration, including SNMPv3 credentials, is performed via the StoredSafe management console.

Which SNMP version is supported?

Only SNMPv3 is supported for metrics access.

Supported security level: - authPriv (authentication and encryption)

Note

authPriv is mandatory to ensure confidentiality and integrity of management data in transit. StoredSafe does not support unauthenticated or unencrypted SNMP access, as metrics may reveal sensitive operational state and system health information.

SNMPv1, SNMPv2c, and SNMPv3 without privacy (authNoPriv) are not supported.

How often should I poll metrics?

Recommended polling interval:

  • 300 seconds (or slightly higher) for normal environments

  • 300–600 seconds for large environments with many targets

Polling more frequently than the audit interval does not provide additional information and is not recommended.

Polling does not affect audit cadence, backups, or system state.

What does the ok metric mean?

ok is a computed, high-level health indicator:

  • ok = 1 means the system is healthy.

  • ok = 0 means immediate attention is required.

ok represents whether all mandatory operational requirements are currently met.

Mandatory backup failures will force ok = 0, as will other critical conditions such as stale auditing, database failures, or RAID/drive health issues.

How should I alert on backups?

Backups are mandatory and should always be monitored.

Recommended alert conditions:

  • backup_completed_ok == 0 (critical)

  • backup_transfer_ok == 0 (critical)

Optional additional alerts:

  • backup_completed_age_s exceeds your operational threshold

  • backup_transfer_age_s exceeds your operational threshold

How do I detect if auditing has stopped?

Use the audit freshness metric:

  • audit_stale_s

Recommended alert:

  • audit_stale_s > 900 seconds (15 minutes)

Why are metrics returned as text?

Metrics are exposed via Net-SNMP extend and intentionally use a simple key value format to remain compatible with SNMP-based monitoring systems.

Prometheus alerting examples

The expressions below assume metrics are exported into Prometheus with a storedsafe_ prefix.

Overall health

storedsafe_ok == 0

Mandatory backups

storedsafe_backup_completed_ok == 0
storedsafe_backup_transfer_ok == 0

Audit freshness

storedsafe_audit_stale_s > 900

Database health

storedsafe_db_ok == 0

Optional latency threshold:

storedsafe_db_latency_ms > 200

Backup footprint (optional)

storedsafe_backup_occupies_bytes > 0

Storage / RAID

storedsafe_raid_ok == 0
storedsafe_failed_drives > 0
storedsafe_critical_drives > 0
storedsafe_degraded_drives > 0

Warnings and violations

storedsafe_warnings > 0
storedsafe_violations > 0

Licensing (optional)

storedsafe_active_users > storedsafe_licensed_users