-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
N/A (not using Kubernetes controller, running dind image directly on Docker)
Deployment Method
Other
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Start an actions-runner-dind container with the command above
2. Verify inner Docker is working: docker exec github-runner-1 docker ps (should succeed)
3. Forcefully terminate the container: docker kill github-runner-1
4. Restart the container: docker start github-runner-1
5. Check inner Docker: docker exec github-runner-1 docker ps
-> Fails with: "Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"Describe the bug
When an actions-runner-dind container is forcefully terminated (e.g., via docker kill, VM preemption, or system crash), the inner Docker daemon's PID file (/var/run/docker.pid) remains inside the container. On subsequent container restart, the inner Docker daemon fails to start because it detects the stale PID file.
This is a common scenario when running self-hosted runners on Spot/Preemptible VMs, where the VM can be terminated at any time without graceful shutdown.
Root Cause:
When the container is killed with SIGKILL, the inner Docker daemon doesn't have a chance to clean up /var/run/docker.pid. When the container restarts, the stale PID file prevents the new dockerd process from starting.
Describe the expected behavior
The inner Docker daemon should start successfully after container restart, even if the container was previously forcefully terminated.
Additional Context
Environment:
- Image:
ghcr.io/actions-runner-controller/actions-runner-controller/actions-runner-dind:ubuntu-22.04 - Runner Version: 2.320.0
- Platform: GCP Spot VMs (both amd64 and arm64)
- Docker version on host: 24.x
Workaround:
After container restart, manually clean up the PID file and restart dockerd:
# For a single runner
docker exec github-runner-1 bash -c 'sudo rm -f /var/run/docker.pid && sudo dockerd &'
sleep 3
# For all runners
docker ps -a --format json | jq .Names | grep github-runner | cut -d'"' -f2 | while read runner; do
echo $runner
docker exec $runner docker ps 2>&1 || \
docker exec $runner bash -c 'sudo rm -f /var/run/docker.pid && sudo dockerd & sleep 3'
doneSuggested Fix:
The container's entrypoint script should clean up stale PID files before starting the Docker daemon:
# In the entrypoint script, before starting dockerd
rm -f /var/run/docker.pid /var/run/containerd/containerd.pidRelated Issues:
- Intermittently getting "Cannot connect to the Docker daemon at unix:///var/run/docker.sock" #3794 - Intermittently getting "Cannot connect to the Docker daemon" (different root cause: iptables)
- Runners created with actions-runner-controller in we have a lot of pods with errors: "Cannot connect to the Docker daemon at unix:///run/docker.sock. Is the docker daemon running?" #3257 - Cannot connect to the Docker daemon (general issue)
- How to make sure docker daemon sucessfully started even through PID file existed moby/moby#46988 - How to make sure docker daemon successfully started even through PID file existed
Controller Logs
N/A (not using Kubernetes controller)
Runner Pod Logs
# After forced termination and restart, checking docker inside container:
$ docker exec github-runner-1 docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
# The stale PID file exists:
$ docker exec github-runner-1 ls -la /var/run/docker.pid
-rw-r--r-- 1 root root 5 Jan 25 07:00 /var/run/docker.pid
$ docker exec github-runner-1 cat /var/run/docker.pid
123 # Old PID from before termination