AUDIT-005: Ledger Cleanup Bug Risks Permanent Data Loss by Ignoring Publish Queue
A critical flaw in the ledger's cleanup routine risks permanently deleting essential data, potentially causing publication failures and data corruption. The bug resides in the `bucket` crate, where the cleanup process fails to account for bucket files still referenced by queued checkpoint snapshots. This oversight means files required for future operations can be erroneously purged, leaving the system in an inconsistent state.
The core failure is a mismatch between the data referenced by the live system and the data stored for pending operations. Specifically, the function `LedgerManager::all_referenced_bucket_hashes()` only returns hashes from the current live and hot-archive bucket lists, completely ignoring hashes stored in SQLite for queued checkpoint snapshots. This incomplete set is then passed to the cleanup routine in `App` and `CatchupImpl`, which ultimately calls `BucketManager::retain_buckets()`. This function deletes every on-disk bucket file not present in the supplied 'keep-set,' operating without any check against the publish queue or persisted snapshots. Later, when checkpoint publication attempts to reload these queued snapshots from the database, it will fail to open the now-deleted bucket files.
This MEDIUM severity bug exposes a dangerous gap in the system's data lifecycle management. It creates a silent failure mode where cleanup, intended to maintain efficiency, instead undermines data integrity. The risk is not just a failed publication but a permanent loss of historical checkpoint data that cannot be reconstructed, potentially breaking chain continuity for nodes. The flaw necessitates an immediate review of all data retention logic that intersects with asynchronous operations like the publish queue.