I have yet to bump into perfect software. Bugs, failures, and short-comings are the reality of software developers. They often have upsides whether it might be learning about a new area of code in a larger application or coming up with ideas to prevent problems.
We had an interesting problem brought to our attention recently where the Podman database had become corrupted. The database’s primary job is to keep track of containers, their states, and their configurations. In this case, the reporters were testing Podman simulating power loss on a special appliance. While the Podman team has done a limited amount of power loss testing as has our users either on purpose or accidentally, we rarely see database corruptions because on order for the database to become corrupted, it must be writing (at least theoretically). And writing to our database is usually a very quick action, so the window is quite small.
While we could not pin down root cause immediately, we were asked about how to recover from this type of failure. If your database is corrupted, most Podman commands will not work. In the case of this situation, their containers were normally recreated on each boot and was done so programatically using the RESTFul API.
Podman DB corruption
If your Podman database becomes corrupted, in almost all cases you will not be able to recover any existing containers.
In this case, the containers were being run by a privileged user: root. As such, I will show the recovery as a privileged user and add in any notes for rootless users. The procedure is roughly the same. First, the error:
$ sudo podman ps panic: invalid freelist page: 66, page type is leaf goroutine 1 [running]: go.etcd.io/bbolt.(*freelist).read(0x50c95d?, 0x7f96a7e42000) /home/baude/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/freelist.go:266 +0x22e go.etcd.io/bbolt.(*DB).loadFreelist.func1() /home/baude/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:323 +0xb8 sync.(*Once).doSlow(0xc00011e1c8?, 0x10?) /usr/lib/golang/src/sync/once.go:74 +0xc2 sync.(*Once).Do(...) /usr/lib/golang/src/sync/once.go:65 go.etcd.io/bbolt.(*DB).loadFreelist(0xc00011e000?) /home/baude/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:316 +0x47 go.etcd.io/bbolt.Open({0x7ffd1855f26a, 0x23}, 0x1b6?, 0x0) /home/baude/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:293 +0x48b main.(*CheckCommand).Run(0xc00005fe58, {0xc0000141a0, 0x1, 0x1}) /home/baude/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/cmd/bbolt/main.go:202 +0x1a5 main.(*Main).Run(0xc000104f40, {0xc000014190, 0x2, 0x2}) /home/baude/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/cmd/bbolt/main.go:112 +0x979 main.main() /home/baude/go/pkg/mod/go.etcd.io/bbolt@v1.3.6/cmd/bbolt/main.go:70 +0xae
The first step to recovery is to delete the existing database.
$ sudo rm /var/lib/containers/storage/libpod/bolt_state.db
Rootless database path
The privileged user’s database is by default stored at /var/lib/containers/storage/libpod/bolt_state.db. The rootless user’s is stored at ~/.local/share/containers/storage/libpod/bolt_state.db by default.
When Podman is unable to find its database, it will create a new empty database.
$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Now we can recreate our containers, which were called container1, container2, and container3.
$ sudo podman create --name container1 alpine top Error: creating container storage: the container name "container1" is already in use by e77786c096e083b258bad2e196255f7dc1a2859cfb9dd35436648e1541bdce23. You have to remove that container to be able to reuse that name: that name is already in use
How can the container already exist but not be seen in a list of all containers? The error message could be more helpful. There is a little known option called `–external`for `podman ps` to view containers in this external storage state.
$ sudo podman ps -a --external CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 69f78dfaa0a6 docker.io/library/alpine:latest storage About a minute ago Storage container1 e5db3ad9125e docker.io/library/alpine:latest storage About a minute ago Storage container2 52591b8b7676 docker.io/library/alpine:latest storage About a minute ago Storage container3
Notice how the STATUS
column lists STORAGE
for all three of the containers. Again, this is because these containers still are on the filesystem, which explains the error earlier. Rather than trying to delete them and recreating the containers (which is also perfectly valid), we can simply use the --replace
option which is available for both podman run
and create
.
$ sudo podman create --replace --name container1 alpine top container1 128876a5b3828350c7bfbc268dee99f69fb2374b5c8ce94cff879b4f83e78c6d
The --replace
option will perform the removal of the previous container from the filesystem and run or create the new container with that name. This option is also applicable to containers in your database and regular storage.
Leave a Reply