.. _monitoring_storedsafe: StoredSafe Metrics via SNMPv3 ============================= .. note:: Health metrics via SNMPv3 are available from StoredSafe versions later than 4.1.0 (build 7120). Overview -------- StoredSafe exposes health and operational metrics via **SNMPv3** using the standard **NET-SNMP-EXTEND-MIB**. All monitoring access is read-only and performed via SNMP. Only **SNMPv3** is supported. SNMPv1 and SNMPv2c are intentionally not available. Architecture ------------ StoredSafe uses a two-stage metrics architecture designed for stability, security, and low operational overhead. 1) Internal audit (scheduled) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ An internal audit process runs periodically (typically every 5 minutes). The audit: - evaluates system state (backups, database, storage, RAID, certificates, licenses, etc.) - evaluates rolling 7-day activity statistics - writes the current state to an internal status file This process is fully controlled by the appliance and **cannot be triggered, modified, or influenced by customers**. 2) Metrics exposure (on-demand via SNMP extend) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When a monitoring system polls the SNMP extend OID, the appliance executes a lightweight metrics script. The script: - reads the most recent audit results - formats them as numeric metrics - returns the result via SNMP Polling metrics: - does **not** trigger audits - does **not** trigger backups - does **not** change system state It only returns the most recently audited state. Output format ------------- Metrics are returned as plain text using the format: :: key value - One metric per line - All values are numeric - Metric names are stable This format is intentionally simple and deterministic to support SNMP-based monitoring systems. Recommended SNMP object ----------------------- StoredSafe exposes metrics via Net-SNMP extend. **Recommended (one metric per line):** - MIB: ``NET-SNMP-EXTEND-MIB`` - Object: ``nsExtendOutLine`` - Extend name: ``metrics`` This returns **one metric per SNMP instance**, which is easier to consume for monitoring systems. **Compatibility / debugging:** - Object: ``nsExtendOutputFull`` - Returns the full output as a single multi-line string Both objects expose identical data. Numeric OID reference --------------------- The extend token ``metrics`` is exposed via the following numeric OID: :: 1.3.6.1.4.1.8072.1.3.2.4.1.2.7.109.101.116.114.105.99.115 This corresponds to: - ``NET-SNMP-EXTEND-MIB::nsExtendOutLine."metrics"`` - string index ``"metrics"`` (length 7, ASCII-encoded) Example (nsExtendOutLine) ------------------------- :: NET-SNMP-EXTEND-MIB::nsExtendOutLine."metrics".1 = uptime_s 1201877 NET-SNMP-EXTEND-MIB::nsExtendOutLine."metrics".2 = mem_used_pct 34 NET-SNMP-EXTEND-MIB::nsExtendOutLine."metrics".3 = disk_free 39173955584 ... NET-SNMP-EXTEND-MIB::nsExtendOutLine."metrics".59 = ok 1 Example output (healthy) ------------------------ :: ok 1 uptime_s 1201877 cpu_load1 0.35 mem_used_pct 33 disk_total 56263483392 disk_used 14695231488 disk_used_pct 26 disk_free 39174152192 disk_free_pct 70 audit_last_update_epoch 1767700204 audit_stale_s 142 db_ok 1 db_latency_ms 5 raid_ok 1 critical_drives 0 failed_drives 0 degraded_drives 0 backup_completed_ok 1 backup_completed_last_ok_epoch 1767654012 backup_completed_age_s 46334 backup_completed_stale 0 backup_completed_error 0 backup_completed_missing 0 backup_transfer_ok 1 backup_transfer_last_ok_epoch 1767656704 backup_transfer_age_s 43642 backup_transfer_stale 0 backup_transfer_error 0 backup_transfer_missing 0 backup_occupies_bytes 115821462 violations 1 warnings 0 warnings_count 0 noauth_warnings 0 noauth_warnings_count 0 crl_fetch_error 0 crl_invalid_file 0 crl_invalid_signature 0 crl_expired 0 stats7d_auth_failure 0 stats7d_weak_passphrase 0 stats7d_login 0 stats7d_decrypt 0 stats7d_vaults_deleted 0 stats7d_objects_deleted 0 stats7d_vaults_created 0 stats7d_objects_created 0 active_users 176 deactivated_users 136 active_vaults 164 active_objects 12019 deleted_vaults_total 249 deleted_objects_total 3488 version_major 4 version_minor 1 version_patch 0 version_build 7134 version_full 4107134 Metric semantics ---------------- Overall health ^^^^^^^^^^^^^^ ``ok`` Overall health indicator. - ``1`` = system healthy - ``0`` = immediate attention required The value is derived from the audited state. Mandatory backup failures will force ``ok = 0``. Audit freshness ^^^^^^^^^^^^^^^ ``audit_last_update_epoch`` Unix epoch timestamp of the last successful audit update. ``audit_stale_s`` Seconds since the last audit update. If the audit process stops running, this value increases continuously. Database health ^^^^^^^^^^^^^^^ ``db_ok`` ``1`` if database connectivity and a test query succeeded during audit, otherwise ``0``. ``db_latency_ms`` Database round-trip time measured during audit (milliseconds). Storage and RAID ^^^^^^^^^^^^^^^^ ``raid_ok`` Indicates RAID health. A value of ``0`` is considered critical and will force the overall ``ok`` metric to ``0``. ``critical_drives`` / ``degraded_drives`` / ``failed_drives`` Drive-level health indicators derived from the audit. Disk capacity metrics ^^^^^^^^^^^^^^^^^^^^^ ``disk_total`` / ``disk_used`` / ``disk_free`` Disk capacity values reported in bytes. ``disk_used_pct`` / ``disk_free_pct`` Disk usage percentages derived from allocatable capacity. .. note:: Disk capacity metrics represent allocatable storage as evaluated by the audit process. Percentages may not sum to 100% due to reserved space and filesystem metadata. Backup metrics (mandatory) ^^^^^^^^^^^^^^^^^^^^^^^^^ Backups are mandatory and enforced by the platform. ``backup_completed_ok`` / ``backup_transfer_ok`` ``1`` if the audit recorded a recent successful backup/transfer with no warnings. ``0`` if the audit detected missing backups, errors, or missing log entries. A value of ``0`` is considered **critical** and results in ``ok = 0``. ``backup_*_last_ok_epoch`` Unix epoch timestamp of the last successful backup/transfer. Reported as ``0`` when the corresponding ``*_ok`` is ``0``. ``backup_*_age_s`` Age in seconds of the last successful backup/transfer. Reported as ``0`` when the corresponding ``*_ok`` is ``0``. ``backup_*_stale`` / ``backup_*_error`` / ``backup_*_missing`` Optional classification flags derived from audit warnings. These simplify alert routing but are not required for parsing. ``backup_occupies_bytes`` Estimated backup storage footprint (bytes) as recorded during audit. Violations and warnings ^^^^^^^^^^^^^^^^^^^^^^ ``violations`` ``1`` if at least one license or policy violation exists. Violations are typically administrative, not outage conditions. ``warnings`` / ``warnings_count`` Indicates operational warnings and how many are currently active. mTLS / CRL related warnings ^^^^^^^^^^^^^^^^^^^^^^^^^^ ``noauth_warnings`` / ``noauth_warnings_count`` Indicates issues that may affect certificate-based (mTLS) authentication. ``crl_fetch_error`` / ``crl_invalid_file`` / ``crl_invalid_signature`` / ``crl_expired`` Fine-grained CRL-related indicators. 7-day activity statistics ^^^^^^^^^^^^^^^^^^^^^^^^^ Rolling 7-day totals (GAUGE semantics): - ``stats7d_auth_failure`` - ``stats7d_weak_passphrase`` - ``stats7d_login`` - ``stats7d_decrypt`` - ``stats7d_vaults_deleted`` / ``stats7d_objects_deleted`` - ``stats7d_vaults_created`` / ``stats7d_objects_created`` Current inventory ^^^^^^^^^^^^^^^^^ ``active_users`` / ``deactivated_users`` / ``active_vaults`` / ``active_objects`` Current inventory counts from the most recent audit. User and object metrics semantics --------------------------------- StoredSafe exposes both **cumulative counters** and **current state metrics**. It is important to distinguish between these categories when building dashboards and alerts. Object and vault state ^^^^^^^^^^^^^^^^^^^^^^ The following metrics represent the **current state** of the system and may increase or decrease over time: - ``active_vaults`` - ``active_objects`` Object and vault counters ^^^^^^^^^^^^^^^^^^^^^^^^^ The following metrics are **cumulative counters** and represent totals since the StoredSafe system was installed: - ``deleted_vaults_total`` - ``deleted_objects_total`` These values are monotonically increasing and are **not reset** when objects or vaults are removed. They are intended for historical insight and capacity trend analysis, not for alerting on instantaneous state. User state metrics ^^^^^^^^^^^^^^^^^^ User-related metrics reflect the **current operational state** of the system. - ``active_users`` Number of users that are **enabled and allowed to authenticate** (users with the ``ACTIVE`` status flag set). - ``deactivated_users`` Number of users that are **disabled and not allowed to authenticate** (users with the ``ACTIVE`` status flag unset). These metrics represent a point-in-time snapshot and may increase or decrease as user accounts are activated or deactivated. .. note:: ``active_users`` does not indicate currently logged-in users. It represents the number of accounts that are permitted to log in at the time of polling. Metric usage guidance ^^^^^^^^^^^^^^^^^^^^^ - Use **cumulative counters** for trend analysis and reporting. - Use **state metrics** (such as ``active_users`` and ``deactivated_users``) for dashboards and operational visibility. - Avoid alerting on cumulative counters directly, as they are not expected to decrease. Version information ^^^^^^^^^^^^^^^^^^^ ``version_major`` / ``version_minor`` / ``version_patch`` / ``version_build`` StoredSafe release information. ``version_full`` Numeric composite version, useful for comparisons:: major*1_000_000 + minor*100_000 + patch*1_000 + build Example:: 4.1.0 build 7034 → 4107034 Polling behavior ---------------- - SNMP polling executes the metrics script only. - Polling does not affect audit cadence or backups. - Recommended polling interval: **≥300 seconds**. Security -------- - SNMPv3 with authentication and encryption is required. - Access is read-only. - No secrets, credentials, or sensitive payload data are exposed.