The Model Context Protocol (MCP) is gaining significant traction in AI. This protocol, supported by most large language model (LLM) providers, enhances an LLM’s capabilities, allowing it to manage resources like your laptop’s filesystem, AWS services, and most importantly for this discussion, containers.

MCP uses a client-server architecture. The local MCP client connects an LLM provider to one or more MCP Servers executing specialized tasks. MCP server development is straightforward, leading to a rapid increase in available servers.
While the MCP protocol is fascinating (see introductory posts here and here), this post focuses on demonstrating how an LLM can manage containers using an MCP server. In particular, we will use RamaLama to run the model Qwen 2.1 locally, Goose CLI as the MCP Client, and the mcp-server-docker to execute podman commands.
Running an AI model locally with RamaLama
RamaLama is an engine for running AI models inside local containers, which makes it extremely simple and secure to run a model on a personal laptop. RamaLama analyzes the machine hardware (CPU, GPU, NPU, etc) and automatically chooses the best container images and inference server options to run the AI workload.
RamaLama runs on macOS, Windows, and Linux. We have installed it on macOS with this command (see the documentation for different installation options):
curl -fsSL https://ramalama.ai/install.sh | bash
RamaLama requires Podman or Docker to run AI models in a container. On macOS and Windows, a Podman machine must be created first. For macOS, we use the following command to create the Podman machine (any existing Podman machine must be stopped first):
$ CONTAINERS_MACHINE_PROVIDER=libkrun \
podman machine init --memory 4096 --now mcp-test
It’s crucial to use libkrun
rather than the default applehv
to use Apple GPUs from within the container where the AI model runs.
Once Podman is configured, RamaLama’s serve command allows us to run a Qwen 2.5 model within a container:
$ ramalama --container --runtime=llama.cpp serve \
--ctx-size=32768 \
--port=11434 \
--webui off \
qwen2.5:7b
We specified a large context size using the --ctx-size
parameter. AI models perform better with a 32K token context when interacting with an MCP server than with the default 4K. For other options, refer to the RamaLama documentation.
Installing the MCP Client: Goose CLI
A local MCP Agent is a prerequisite for using MCP servers. There are many options, such as Claude Desktop and Visual Studio Code. We will use Goose CLI, a command-line agent for CLI scripts and automation.
On macOS, the Goose CLI can be installed by running the following command in a terminal (there are instructions for Linux and Windows, too):
curl -fsSL \
https://github.com/block/goose/releases/download/stable/download_cli.sh \
| bash
Goose connection to an LLM provider can be done interactively, with the command goose configure, or by editing its configuration file (~/.config/goose/config.yaml
on macOS):
cat << EOF > ~/.config/goose/config.yaml
GOOSE_PROVIDER: ollama
OLLAMA_HOST: localhost
GOOSE_MODEL: qwen2.5
EOF
This configuration enables Goose to connect to Ollama and, crucially, RamaLama. The default model used will be Qwen 2.5.
Let the AI model run containers using mcp-server-docker
To demonstrate managing containers with an MCP server, we will use mcp-server-docker. mcp-server-docker
requires uv
(installation instructions) and Podman or Docker properly configured.
Despite its name, the mcp-server-docker can use Podman too if it’s running as a service. It uses the Python Docker SDK and connects to the default Docker socket. $DOCKER_HOST
can be used to specify a different socket path.
Goose’s run
command, using the mcp-server-docker
extension, instructs the AI model to “deploy an nginx container exposing port 80 on port 9000”:
goose run \
--with-extension "uvx mcp-server-docker" \
-n container-tests \
-t "deploy an nginx container exposing port 80 on port 9000"
The example above works on macOS with the Podman machine configured to listen to the default Docker socket (unix:///var/run/docker.sock
). On Linux, you may need to specify the Podman socket path explicitly:
--with-extension \
“DOCKER_HOST=unix://$XDG_RUNTIME_DIR/podman/podman.sock \
uvx mcp-server-docker”
As a result, the MCP server will start an nginx container and provide some helpful information:

Note that Goose saves session state in the folder ~/.local/share/goose/sessions/
and logs in ~/.local/state/goose/logs/cli
, and can be helpful for investigating its operations.
It’s also worth noting that although the AI model runs in a container, Goose and the MCP Server execute on the host with full access to host resources. If this raises security concerns, running them inside a container is possible.
Appendix 1: Using different model providers
Experimenting with Anthropic and Ollama was beneficial during the preparation of this blog post. Anthropic, a more powerful and fine-tuned model provider, offers a faster feedback loop than local models, which helps understand the MCP server capabilities. Ollama facilitates local model experimentation without needing to configure Podman or Docker. Once a setup works with these providers, RamaLama can be easily adapted to run the same local model within a container.
To use Goose with Anthropic (with model Claude Sonet 3.5):
$ export ANTHROPIC_API_KEY=<your key> # Get Key at https://console.anthropic.com/
$ cat << EOF > ~/.config/goose/config.yaml # Configure Goose
GOOSE_PROVIDER: anthropic
GOOSE_MODEL: claude-3-5-sonnet-latest
ANTHROPIC_HOST: https://api.anthropic.com
EOF
To use Goose with Ollama (with model Qwen 2.5):
$ cat << EOF > ~/.config/goose/config.yaml # Configure Goose
GOOSE_PROVIDER: ollama
OLLAMA_HOST: localhost
GOOSE_MODEL: qwen2.5
EOF
$ OLLAMA_CONTEXT_LENGTH=32768 ollama server # Start Ollama
$ curl http://localhost:11434 # Check that Ollama is running
$ ollama run qwen2.5 # Pull Qwen 2.5 and run it locally
$ ollama ps # Show the running models details
Appendix 2: Using podman-mcp-server rather than mcp-server-docker
Another MCP server for LLM container management is podman-mcp-server. While it has fewer GitHub stars than `mcp-server-docker` and isn’t yet listed on the official MCP servers page, it doesn’t require Podman to be run as a service, and fellow Red Hatter Marc Nuri develops it, and I decided to test it:
goose run \
--with-extension "npx -y podman-mcp-server" \
-n container-tests \
-t "deploy an nginx container exposing port 80 on port 9000"
The output produced is similar, and the container is created correctly. I noted that `podman-mcp-server` avoids some status confusion between ‘started’ and ‘created’, which I observed with mcp-server-docker
.

Appendix 3: Hallucinations
Due to the non-deterministic nature of AI models, a consistent prompt (‘deploy an nginx container exposing it on port 9000’) may yield slightly different answers. Occasionally, models can follow an incorrect path and become convinced that something false is true (known as ‘hallucinations’). Here’s an example, experienced during this blog post’s preparation, where the AI model is convinced that it should create a volume and fails to do so:

Leave a Reply