Guardian Policy Engine Flaw Allows Remediation Actions to Permanently Hang Agent Threads, Suspending All Rule Enforcement
A critical security vulnerability in the GuaranteedState rule engine allows remediation actions to permanently block guardian worker threads, effectively suspending all policy enforcement on affected agents. The flaw stems from the absence of a maximum execution timeout for remediation actions, including registry writes, service restarts, and file operations defined in GuaranteedState rules.
The `GuaranteedStateRule` protocol buffer definition and corresponding `guardian_engine.cpp` implementation contain no `action_timeout` field and no enforcement mechanism to limit remediation action execution time. While the `remediation_latency_us` field is logged in events after action completion, this metric serves only for post-hoc monitoring and never triggers termination of hung operations. An attacker who can create or modify rules can exploit this design gap by crafting remediation actions that target unreachable resources, such as registry writes to UNC network paths, causing the guardian worker thread to block for the full operating system TCP timeout—approximately 75 seconds by default—per rule evaluation.
In a worst-case scenario, a single malicious rule pushed across an agent fleet would cause every affected agent to permanently stall its guardian worker on that rule. During this blocking period, no other GuaranteedState rules can be evaluated on the affected agents. The `remediation_latency_us` metric, logged after the fact, provides no protective value since it is recorded only after the blocking operation completes or times out. The absence of pre-execution resource validation or deadline enforcement means any rule targeting an unresponsive endpoint creates a denial-of-service condition against the policy enforcement subsystem itself.