Overview
Infisical exports metrics in OpenTelemetry (OTEL) format, which provides maximum flexibility for your monitoring infrastructure. While this guide focuses on Grafana, the OTEL format means you can easily integrate with:- Cloud-native monitoring: AWS CloudWatch, Google Cloud Monitoring, Azure Monitor
- Observability platforms: Datadog, New Relic, Splunk, Dynatrace
- Custom backends: Any system that supports OTEL ingestion
- Traditional monitoring: Prometheus, Grafana (as covered in this guide)
- Pull-based (Prometheus): Exposes metrics on a dedicated endpoint for Prometheus to scrape
- Push-based (OTLP): Sends metrics to an OpenTelemetry Collector via OTLP protocol
Prerequisites
- Self-hosted Infisical instance running
- Access to deploy monitoring services (Prometheus, Grafana, etc.)
- Basic understanding of Prometheus and Grafana
Setup
Environment Variables
Configure the following environment variables in your Infisical backend:- Pull-based Monitoring (Prometheus)
- Push-based Monitoring (OTLP)
/metrics endpoint, allowing Prometheus to scrape the data. The metrics are exposed in Prometheus format but originate from OpenTelemetry instrumentation.Configuration
Expose the metrics port
- Docker: Expose port 9464
- Kubernetes: Create a service exposing port 9464
- Other: Ensure port 9464 is accessible to your monitoring stack
Create Prometheus configuration
prometheus.yml:infisical-backend:9464 with the actual hostname and port where your Infisical backend is running. This could be:- Docker Compose:
infisical-backend:9464(service name) - Kubernetes:
infisical-backend.default.svc.cluster.local:9464(service name) - Bare Metal:
192.168.1.100:9464(actual IP address) - Cloud:
your-infisical.example.com:9464(domain name)
Deployment Options
Once you’ve configured Infisical to expose metrics, you’ll need to deploy Prometheus to scrape and store them. Below are examples for different deployment environments. Choose the option that matches your infrastructure.- Docker Compose
- Kubernetes
- Helm
Available Metrics
Infisical emits metrics on two OpenTelemetry meters simultaneously. Choose which to scrape based on your deployment scale.- High-cardinality, per-actor meters: the
InfisicalandAPImeters’ original metrics (infisical.http.server.request.*,infisical.http.server.error.count,infisical.secret.read.count,infisical.auth.attempt.count,infisical.kmip.operation.count) include per-actor labels such asuser.email,identity.name,client.address,user_agent.original,organization.name,project.name,secret.path, andsecret.name. TheSecretSyncs,PkiSyncs, andIntegrationsmeters likewise carry unbounded labels such assyncId. Useful for self-hosted deployments where you want per-user visibility directly in Grafana. May become expensive at large scale (many users, identities, or IPs) due to label cardinality. InfisicalCoremeter (bounded-cardinality): all newer metrics (queue, audit log, permission cache, secret cache, rate limit, build info,infisical.core.http.error.count, authentication latency, token renewal, SSO config changes, SCIM provisioning, and database connection pool) use only IDs and bounded enums as labels. No names, emails, IPs, or user agents. Designed for large or multi-tenant deployments. Per-actor detail is available in audit logs instead.
InfisicalCore metrics to keep cardinality under control.
To eliminate the in-memory cost of the high-cardinality meters entirely, set OTEL_DROP_HIGH_CARDINALITY_METERS=true. When enabled, the SDK discards all data points from the Infisical, API, SecretSyncs, PkiSyncs, and Integrations meters before aggregation. The instruments still exist in code (no errors), but nothing is stored or exported. Only InfisicalCore metrics are emitted. Defaults to false.
For per-user / per-identity / per-IP breakdowns, query the audit log table. It carries actorId, actorType, ip, userAgent and full event detail. Metrics give you the rate and latency; audit logs give you the who.
Resource attributes (every metric)
Every emitted metric carries these resource-level attributes (no per-metric cardinality cost):service.name— the fixed identifier for the Infisical backend serviceservice.version— the release version or git SHA of the running Infisical instancegit.commit.sha— the exact commit the build was produced from, when availabledeployment.environment— the environment the instance is running in (e.g.production,staging,development)
Core API Metrics
These metrics track all HTTP API requests to Infisical, including request counts, latency, and errors. Use these to monitor overall API health, identify performance bottlenecks, and track usage patterns across users and machine identities.Total API Requests
Total API Requests
infisical.http.server.request.countType: CounterUnit: {request}Description: Total number of API requests to Infisical (covers both human users and machine identities)Attributes:infisical.organization.id(string): Organization IDinfisical.organization.name(string): Organization name (e.g., “Platform Engineering Team”)infisical.user.id(string, optional): User ID if human userinfisical.user.email(string, optional): User email (e.g., “jane.doe@cisco.com”)infisical.identity.id(string, optional): Machine identity IDinfisical.identity.name(string, optional): Machine identity name (e.g., “prod-k8s-operator”)infisical.auth.method(string, optional): Auth method usedhttp.request.method(string): HTTP method (GET, POST, PUT, DELETE)http.route(string): API endpoint route patternhttp.response.status_code(int): HTTP status codeinfisical.project.id(string, optional): Project IDinfisical.project.name(string, optional): Project nameuser_agent.original(string, optional): User agent stringclient.address(string, optional): IP address
Request Duration
Request Duration
infisical.http.server.request.durationType: HistogramUnit: s (seconds)Description: API request latencyBuckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]Attributes:infisical.organization.id(string): Organization IDinfisical.organization.name(string): Organization nameinfisical.user.id(string, optional): User ID if human userinfisical.user.email(string, optional): User emailinfisical.identity.id(string, optional): Machine identity IDinfisical.identity.name(string, optional): Machine identity namehttp.request.method(string): HTTP methodhttp.route(string): API endpoint route patternhttp.response.status_code(int): HTTP status codeinfisical.project.id(string, optional): Project IDinfisical.project.name(string, optional): Project name
API Errors by Actor
API Errors by Actor
infisical.http.server.error.countType: CounterUnit: {error}Description: API errors grouped by actor (for identifying misconfigured services)Attributes:infisical.organization.id(string): Organization IDinfisical.organization.name(string): Organization nameinfisical.user.id(string, optional): User ID if humaninfisical.user.email(string, optional): User emailinfisical.identity.id(string, optional): Identity ID if machineinfisical.identity.name(string, optional): Identity namehttp.route(string): API endpoint where error occurredhttp.request.method(string): HTTP methoderror.type(string): Error category/type (client_error, server_error, auth_error, rate_limit_error, etc.)infisical.project.id(string, optional): Project IDinfisical.project.name(string, optional): Project nameclient.address(string, optional): IP addressuser_agent.original(string, optional): User agent information
Secret Operations Metrics
These metrics provide visibility into secret access patterns, helping you understand which secrets are being accessed, by whom, and from where. Essential for security auditing and access pattern analysis.Secret Read Operations
Secret Read Operations
infisical.secret.read.countType: CounterUnit: {operation}Description: Number of secret read operationsAttributes:infisical.organization.id(string): Organization IDinfisical.organization.name(string): Organization nameinfisical.project.id(string): Project IDinfisical.project.name(string): Project name (e.g., “payment-service-secrets”)infisical.environment(string): Environment (dev, staging, prod)infisical.secret.path(string): Path to secrets (e.g., “/microservice-a/database”)infisical.secret.name(string, optional): Name of secretinfisical.user.id(string, optional): User ID if humaninfisical.user.email(string, optional): User emailinfisical.identity.id(string, optional): Machine identity IDinfisical.identity.name(string, optional): Machine identity nameuser_agent.original(string, optional): User agent/SDK informationclient.address(string, optional): IP address
Authentication Metrics
These metrics track authentication attempts and outcomes, enabling you to monitor login success rates, detect potential security threats, and identify authentication issues.Login Attempts
Login Attempts
infisical.auth.attempt.countType: CounterUnit: {attempt}Description: Authentication attempts (both successful and failed)Attributes:infisical.organization.id(string): Organization IDinfisical.organization.name(string): Organization nameinfisical.user.id(string, optional): User ID if human (if identifiable)infisical.user.email(string, optional): User email (if identifiable)infisical.identity.id(string, optional): Identity ID if machine (if identifiable)infisical.identity.name(string, optional): Identity name (if identifiable)infisical.auth.method(string): Authentication method attemptedinfisical.auth.result(string): success or failureerror.type(string, optional): Reason for failure if failed (invalid_credentials, expired_token, invalid_token, etc.)client.address(string): IP addressuser_agent.original(string, optional): User agent/client informationinfisical.auth.attempt.username(string, optional): Attempted username/email (if available)
Authentication Latency (InfisicalCore)
Authentication Latency (InfisicalCore)
infisical.auth.attempt.durationType: HistogramUnit: s (seconds)Description: Authentication attempt latency by method and result. External verifications (SAML, OIDC, Kubernetes, AWS, GCP, Azure, OCI, AliCloud, …) include the IdP/provider network round trip, so this is what tells you “SAML logins suddenly got slow” or “Kubernetes auth verification is timing out”. Covers every user and machine identity login flow, including LDAP user login.Attributes (bounded):infisical.auth.method(string): Authentication method (email,saml,oidc,google,github,gitlab,ldap,universal-auth,kubernetes-auth,aws-auth,gcp-auth,azure-auth,oci-auth,alicloud-auth,tls-cert-auth,oidc-auth,jwt-auth,ldap-auth,spiffe-auth)infisical.auth.result(string):successorfailureerror.type(string, optional): Bounded failure classification (present on failures)infisical.organization.id(string, optional): Organization ID when known
Token Renewal (InfisicalCore)
Token Renewal (InfisicalCore)
infisical.auth.token.renewal.countType: CounterUnit: {renewal}Description: Machine identity access token renewal attempts by outcome. Distinct from infisical.auth.attempt.*, which tracks initial logins.Attributes (bounded):outcome(string):successorfailureinfisical.auth.method(string, optional): Identity auth method when knownerror.type(string, optional): Bounded failure classification (present on failures)
infisical.auth.attempt.count counter lives on the high-cardinality Infisical meter, while
infisical.auth.attempt.duration and infisical.auth.token.renewal.count live on the bounded
InfisicalCore meter. There is no per-identity “active identity” gauge: there is no reliable
last-authenticated timestamp in the schema, so monthly/weekly active identity counts are best derived
from the infisical.auth.attempt.* series in your metrics backend (e.g. count by (...) over a time
window) rather than a snapshot gauge.SSO Configuration Metrics (InfisicalCore)
These metrics track changes to SSO configuration, helping you detect unexpected reconfiguration of identity providers.SSO Config Changes
SSO Config Changes
infisical.sso.config.change.countType: CounterUnit: {change}Description: SSO configuration create/update events by provider.Attributes (bounded):sso.provider(string):saml,oidc, orldapsso.action(string):createorupdateinfisical.organization.id(string, optional): Organization ID
SCIM Provisioning Metrics (InfisicalCore)
These metrics track SCIM provisioning operations (user and group lifecycle), enabling you to monitor directory-sync throughput and failures.SCIM Operations
SCIM Operations
infisical.scim.operation.countType: CounterUnit: {operation}Description: SCIM provisioning operations by type and outcome.Attributes (bounded):scim.operation(string):create_user,update_user,replace_user,delete_user,create_group,update_group,replace_group,delete_groupoutcome(string):successorfailureinfisical.organization.id(string, optional): Organization IDerror.type(string, optional): Bounded failure classification (present on failures)
SCIM Operation Latency
SCIM Operation Latency
infisical.scim.operation.durationType: HistogramUnit: s (seconds)Description: Latency of SCIM provisioning operations. Same attributes as infisical.scim.operation.count.Database Metrics (InfisicalCore)
This metric provides visibility into connection pool health. Query latency is intentionally not emitted, as managed databases (for example, Amazon RDS Performance Insights) already report per-statement latency at the server.Connection Pool
Connection Pool
infisical.db.pool.connectionsType: Observable GaugeUnit: {connection}Description: Knex/tarn connection pool counts, observed on each export. Watch pending rising with used saturated for pool exhaustion.Attributes (bounded):db.pool.state(string):used,free, orpending
Key Management Interoperability Protocol Metrics
These metrics track Key Management Interoperability Protocol (KMIP) operations, providing visibility into key management activities including key creation, retrieval, activation, revocation, and destruction.KMIP Operations
KMIP Operations
infisical.kmip.operation.countType: CounterUnit: {operation}Description: Number of KMIP operations performedAttributes:infisical.kmip.operation.type(string): Operation type (create,get,get_attributes,activate,revoke,destroy,locate,register)infisical.organization.id(string): Organization IDinfisical.project.id(string): Project IDinfisical.kmip.client.id(string): KMIP client ID performing the operationinfisical.kmip.object.id(string, optional): Managed object/key IDinfisical.kmip.object.name(string, optional): Managed object/key nameinfisical.identity.id(string, optional): Machine identity IDinfisical.identity.name(string, optional): Machine identity nameuser_agent.original(string, optional): User agent stringclient.address(string, optional): Client IP address
Integration & Secret Sync Metrics
These metrics monitor secret synchronization operations between Infisical and external systems, helping you track sync health, identify integration failures, and troubleshoot connectivity issues.integration_secret_sync_errors
integration_secret_sync_errors
- Labels:
version,integration,integrationId,type,status,name,projectId - Example: Monitor integration sync failures across different services
secret_sync_sync_secrets_errors
secret_sync_sync_secrets_errors
- Labels:
version,destination,syncId,projectId,type,status,name - Example: Track secret sync failures to external systems
secret_sync_import_secrets_errors
secret_sync_import_secrets_errors
- Labels:
version,destination,syncId,projectId,type,status,name - Example: Monitor secret import failures
secret_sync_remove_secrets_errors
secret_sync_remove_secrets_errors
- Labels:
version,destination,syncId,projectId,type,status,name - Example: Track secret removal operation failures
Job Queue Metrics (InfisicalCore)
These metrics give per-queue visibility into BullMQ worker health: throughput, latency, contention, failures, and stalls. Use these to detect stuck workers, queue backlog, and which queues are failing.Queue Job Count
Queue Job Count
infisical.queue.job.countType: CounterUnit: {job}Description: Jobs processed by outcome.Attributes:queue.name(string): e.g.audit-log,secret-sync,secret-rotation-v2job.name(string): BullMQ job nameoutcome(string):completedorfailed
Queue Job Duration
Queue Job Duration
infisical.queue.job.durationType: HistogramUnit: sDescription: Job processing duration (worker pickup to completion). Skipped on framework-level failures where processedOn is undefined, so the histogram is not polluted with phantom zero-duration points.Attributes: queue.name, job.name, outcomeQueue Job Wait
Queue Job Wait
infisical.queue.job.waitType: HistogramUnit: sDescription: Time the job spent waiting for a worker (queue contention). Subtracts the configured job.opts.delay so intentional scheduling doesn’t inflate percentiles. Only recorded on completed jobs.Attributes: queue.name, job.nameQueue Job Failure (classified)
Queue Job Failure (classified)
infisical.queue.job.failure.countType: CounterUnit: {failure}Description: Failures classified by error type. Alert when attempts.exhausted="true" — those are real failures (all retries spent), not transient errors.Attributes:queue.name,job.nameerror.type(string): one ofvalidation,auth,permission,not_found,rate_limit,db,timeout,network,cryptography,policy,scim,oidc,internal,unknownattempts.exhausted(string):"true"or"false"
Queue Stalled
Queue Stalled
infisical.queue.stalled.countType: CounterUnit: {job}Description: Stalled jobs (the worker’s lock on a job expired without completing it). Strongest signal of a stuck worker, OOM, or network partition. Previously invisible.Attributes: queue.nameQueue Depth
Queue Depth
infisical.queue.depthType: Observable GaugeUnit: {job}Description: Current number of jobs in each queue state. The SDK invokes the callback on each scrape / push interval.Attributes:queue.name(string)queue.state(string):waiting,active,delayed,failed,completed, etc.
Audit Log Metrics (InfisicalCore)
End-to-end audit-log pipeline observability: how many events get enqueued per event type / actor, how long persistence takes, and how many are ultimately dropped.Audit Log Enqueued
Audit Log Enqueued
infisical.audit_log.enqueued.countType: CounterUnit: {event}Description: Audit log events enqueued to BullMQ for persistence.Attributes:audit_log.event_type(string): e.g.LOGIN_USER,CREATE_SECRET, …audit_log.actor_type(string):user,identity,serviceinfisical.organization.id(string, optional)
Audit Log Persist Duration
Audit Log Persist Duration
infisical.audit_log.persist.durationType: HistogramUnit: sDescription: Latency from worker pickup to durable storage.Attributes:audit_log.backend(string):postgresorclickhouseaudit_log.event_type(string)infisical.organization.id(string)
Audit Log Dropped
Audit Log Dropped
infisical.audit_log.dropped.countType: CounterUnit: {event}Description: Audit log events that exhausted BullMQ retries and were not persisted. Operators should alert when this is non-zero — a dropped audit event is a compliance signal.Attributes:audit_log.event_type(string)audit_log.drop_reason(string):max_retriesinfisical.organization.id(string, optional)
Audit Log Stream Metrics (InfisicalCore)
Per-provider observability for the audit-log stream feature (Datadog, Splunk, Custom HTTP, Azure, Cribl).Audit Log Stream Delivery
Audit Log Stream Delivery
infisical.audit_log_stream.delivery.countType: CounterUnit: {delivery}Description: Per-provider stream delivery attempts.Attributes:audit_log_stream.provider(string):datadog,splunk,custom,azure,criblinfisical.organization.id(string)outcome(string):successorfailureerror.type(string, only on failure): one of the closed enum values
Audit Log Stream Delivery Duration
Audit Log Stream Delivery Duration
infisical.audit_log_stream.delivery.durationType: HistogramUnit: sDescription: Per-provider stream delivery latency (HTTP round trip to the SIEM).Attributes: audit_log_stream.provider, infisical.organization.id, outcome, error.type (on failure)Permission Cache Metrics (InfisicalCore)
The CASL permission cache uses a fingerprint-based two-tier scheme. These metrics tell you whether the cache is doing its job.Permission Cache Lookup
Permission Cache Lookup
infisical.permission_cache.lookup.countType: CounterUnit: {lookup}Description: Per-lookup branch: marker_hit (fast path, 0 DB reads), fingerprint_match (1 DB read, cached data returned), full_refetch (full DB re-fetch), or fingerprint_error (fingerprint fetch failed, bypassing cache).Attributes:cache.result(string):marker_hit,fingerprint_match,full_refetch,fingerprint_error
Permission Cache Fingerprint Duration
Permission Cache Fingerprint Duration
infisical.permission_cache.fingerprint.durationType: HistogramUnit: sDescription: Time to compute the lightweight permission fingerprint (1 DB read on marker expiry).Secret Cache Metrics (InfisicalCore)
The secret service caches encrypted secret payloads to avoid redundant decryption on repeated reads. These metrics tell you whether the cache is effective and whether entries are being skipped for exceeding the size cap.Secret Cache Access
Secret Cache Access
infisical.secret.cache.access.countType: CounterUnit: {access}Description: Secret service-layer cache accesses by outcome. not_modified (client revalidation returned 304), hit (served from cache), or miss (cache empty/stale, full read performed).Attributes:cache.result(string):not_modified,hit,miss
Secret Cache Entry Bytes
Secret Cache Entry Bytes
infisical.secret.cache.entry.bytesType: HistogramUnit: ByDescription: Encrypted secret cache entry size computed at write time. Use this to size the cache and tune the per-entry byte cap.Secret Cache Oversize Skip
Secret Cache Oversize Skip
infisical.secret.cache.oversize_skip.countType: CounterUnit: {skip}Description: Secret cache writes skipped because the entry exceeded the max byte cap. A high rate means large payloads are never being cached and will always incur a full read.Rate Limit Metrics (InfisicalCore)
Rate Limit Exceeded
Rate Limit Exceeded
infisical.rate_limit.exceeded.countType: CounterUnit: {request}Description: HTTP 429 responses (rate limit exceeded). Labels are intentionally bounded to http.route only — for per-actor breakdowns, query the audit log.Attributes:http.route(string)http.request.method(string)
Build Info (InfisicalCore)
Build Info
Build Info
infisical.build.infoType: Observable GaugeDescription: Always emits 1. The labels carry the deployed version, git SHA, and Node version. Use this to filter Grafana dashboards by deployed version without paying per-metric cardinality.Attributes:service.version(string)git.commit.sha(string)node.version(string)
Node Runtime Metrics (auto)
Heap usage, GC pause, event loop lag, and other Node runtime metrics are auto-emitted via@opentelemetry/instrumentation-runtime-node. Metric names follow the OTel runtime semantic conventions (nodejs.eventloop.delay.*, v8js.heap.size.*, etc.).
System Metrics
These low-level HTTP metrics are automatically collected by OpenTelemetry’s instrumentation layer, providing baseline performance data for all HTTP traffic.http_server_duration
http_server_duration
http_client_duration
http_client_duration
Troubleshooting
Metrics not appearing
Metrics not appearing
- Verify
OTEL_TELEMETRY_COLLECTION_ENABLED=trueis set in your Infisical environment variables - Ensure the correct
OTEL_EXPORT_TYPEis set (prometheusorotlp) - Check network connectivity between Infisical and your monitoring services (Prometheus or OTLP collector)
- For pull-based monitoring: Verify port 9464 is exposed and accessible
- For push-based monitoring: Verify the OTLP endpoint URL is correct and reachable
- Check Infisical backend logs for any errors related to metrics export
Authentication errors
Authentication errors
- Verify basic auth credentials in your OTLP configuration match between Infisical and the collector
- Check that
OTEL_COLLECTOR_BASIC_AUTH_USERNAMEandOTEL_COLLECTOR_BASIC_AUTH_PASSWORDmatch the credentials in yourotel-collector-config.yaml - Ensure the htpasswd format in the collector configuration is correct
- Test the collector endpoint manually using curl with the same credentials to verify they work