Container images are fundamentally structured as a series of read-only layers, with a writable layer typically added on top when a container instance is created. This layered architecture facilitates efficient storage and distribution by allowing multiple images to share common base layers. However, removing a layer, especially one that multiple images or containers might reference, is a complex operation.
Removing a layer involves not only deleting the physical data but also updating metadata, managing references, and ensuring that no critical data is inadvertently removed or left in an inconsistent state. Throughout this entire operation, a lock is held, preventing other processes from accessing the storage.
Storage locks do not need to be held for the entire duration of layer deletion. The most time-consuming part of the layer deletion is removing physical data from the disk. While SSDs accelerate this process, they only partially solve the issue. Frequent layer deletions and other I/O-intensive operations can prolong storage lock times, making the storage unusable for other processes.
How can files be removed more efficiently, and what operation could be faster than direct removal? One approach is to move files to a trash directory, allowing their removal to occur outside the storage lock. However, implementing a “trash” mechanism would require addressing scenarios where a process fails to empty the trash or multiple processes need to access its contents. While moving files can be I/O intensive, renaming a file to a different location on the same filesystem is an effective alternative, especially when needing to move numerous files present in a layer.
A new internal structure is essential for moving files between locations while preserving robust locking mechanisms, preventing other threads from accessing the same directory, and enabling robust failure recovery. To achieve this, an internal temporary directory with recovery capabilities has been implemented with this commit in containers/storage. This temporary directory can be leveraged in the future to facilitate parallel image pulls, preventing mutual blocking.
To effectively manage parallel access to temporary directories, each TempDir instance immediately acquires and holds an exclusive lock on its corresponding file upon NewTempDir() initialization. This lock serves as an indicator that the temporary directory is actively being used by the process or goroutine holding the TempDir object. Conversely, if the lock is not held, the temporary directory is deemed stale and subsequently cleaned as part of a recovery process.
A new staging lock was necessary for the temporary directory to exist. This was due to the fact that locking files used in storage would cause the temporary directory’s old lock files to leak during cleanup.
Optimized layer deletion is now possible with this new temporary directory mechanism used when removing layers. During the layer deletion process, metadata is updated, references are managed, and it’s ensured that no critical data is inadvertently removed or left in an inconsistent state. Instead of physically deleting files, they are moved to a temporary directory. After all deletion steps are complete, this temporary directory is cleaned outside of the storage lock. This significantly reduces the storage lock holding time.
To validate the impact of these optimizations, benchmarking was performed, comparing layer removal with parallel listing of images before and after the implementation of layer deletion optimization.
Benchmark
The benchmark aims to quantify the performance improvement of optimized layer deletion by simulating a common real-world scenario involving two concurrent operations.
- Background Container Image Deletion: The optimized layer removal mechanism will initiate a container image deletion process in the background. This operation will acquire the storage lock but release it much faster, thanks to the deferred physical file deletion. (The
docker.io/tensorflow/tensorflow:latest-gpuimage was used for this deletion.) - Immediate Image Listing: Immediately after initiating the deletion, an image listing operation will be attempted. This operation requires acquiring the storage lock. The time it takes for the image listing operation to successfully acquire the lock and complete will be a key metric.
Measuring the time it takes for the image listing operation to complete allows us to directly assess the impact of the reduced storage lock holding time during layer deletion. This demonstrates the improved throughput and concurrency for other storage operations in Podman v5.6.
These are the results of a basic script that executed the benchmark:
Before:
Untagged: docker.io/tensorflow/tensorflow:latest-gpu
Deleted: 94f506c52991baf5ead292f5a82dafe0480ef177d6a7846d671a206e9b0dd91b
real 0m1.787s
user 0m0.033s
sys 0m1.319s
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.fedoraproject.org/fedora latest 2856f68043ad 5 weeks ago 176 MB
<none> <none> feedd0289cd5 8 weeks ago 415 MB
<none> <none> b56eb9950ba9 8 weeks ago 415 MB
<none> <none> a30cf268d19a 2 months ago 176 MB
docker.io/library/eclipse-temurin 17-jdk-alpine e3c32894dd61 2 months ago 334 MB
quay.io/libpod/testimage 20241011 13dc0b3d0b0a 9 months ago 14.2 MB
quay.io/libpod/alpine_nginx latest ecea49d99daa 3 years ago 23.1 MB
quay.io/libpod/alpine latest 915beeae4675 5 years ago 5.59 MB
real 0m1.792s
user 0m0.050s
sys 0m1.364s
After:
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.fedoraproject.org/fedora latest 2856f68043ad 5 weeks ago 176 MB
<none> <none> feedd0289cd5 8 weeks ago 415 MB
<none> <none> b56eb9950ba9 8 weeks ago 415 MB
<none> <none> a30cf268d19a 2 months ago 176 MB
docker.io/library/eclipse-temurin 17-jdk-alpine e3c32894dd61 2 months ago 334 MB
quay.io/libpod/testimage 20241011 13dc0b3d0b0a 9 months ago 14.2 MB
quay.io/libpod/alpine_nginx latest ecea49d99daa 3 years ago 23.1 MB
quay.io/libpod/alpine latest 915beeae4675 5 years ago 5.59 MB
real 0m0.545s
user 0m0.007s
sys 0m0.057s
Untagged: docker.io/tensorflow/tensorflow:latest-gpu
Deleted: 94f506c52991baf5ead292f5a82dafe0480ef177d6a7846d671a206e9b0dd91b
real 0m1.492s
user 0m0.021s
sys 0m0.989s
With these containers/storage optimizations, image listing now takes 0.545 seconds, down from 1.792 seconds, a 60% increase in speed in this instance. This improvement is measurable even on high-performance machines (e.g., M3 Pro with a fast SSD) because the image listing process no longer waits for the entire deletion process to complete and excludes already deleted images.
These results clearly demonstrate a substantial reduction in the time the storage lock is held, leading to significantly improved throughput and concurrency for layer removal operations in Podman v5.6. The ability to move file deletion outside the critical storage lock path has transformed a bottleneck into an efficient background process, enabling smoother and faster container image management.

Leave a Reply