Docker daemon fails to start after container forced termination due to stale PID file

### Checks

- [x] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [x] I am using charts that are officially provided

### Controller Version

N/A (not using Kubernetes controller, running dind image directly on Docker)

### Deployment Method

Other

### Checks

- [x] This isn't a question or user support case (For Q&A and community support, go to [Discussions](https://github.com/actions/actions-runner-controller/discussions)).
- [x] I've read the [Changelog](https://github.com/actions/actions-runner-controller/blob/master/docs/gha-runner-scale-set-controller/README.md#changelog) before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

### To Reproduce

```markdown
1. Start an actions-runner-dind container with the command above
2. Verify inner Docker is working: docker exec github-runner-1 docker ps (should succeed)
3. Forcefully terminate the container: docker kill github-runner-1
4. Restart the container: docker start github-runner-1
5. Check inner Docker: docker exec github-runner-1 docker ps
   -> Fails with: "Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
```

### Describe the bug

When an `actions-runner-dind` container is forcefully terminated (e.g., via `docker kill`, VM preemption, or system crash), the inner Docker daemon's PID file (`/var/run/docker.pid`) remains inside the container. On subsequent container restart, the inner Docker daemon fails to start because it detects the stale PID file.

This is a common scenario when running self-hosted runners on Spot/Preemptible VMs, where the VM can be terminated at any time without graceful shutdown.

**Root Cause:**
When the container is killed with SIGKILL, the inner Docker daemon doesn't have a chance to clean up `/var/run/docker.pid`. When the container restarts, the stale PID file prevents the new dockerd process from starting.

### Describe the expected behavior

The inner Docker daemon should start successfully after container restart, even if the container was previously forcefully terminated.

### Additional Context

**Environment:**
- Image: `ghcr.io/actions-runner-controller/actions-runner-controller/actions-runner-dind:ubuntu-22.04`
- Runner Version: 2.320.0
- Platform: GCP Spot VMs (both amd64 and arm64)
- Docker version on host: 24.x

**Workaround:**
After container restart, manually clean up the PID file and restart dockerd:


```shell
# For a single runner
docker exec github-runner-1 bash -c 'sudo rm -f /var/run/docker.pid && sudo dockerd &'
sleep 3

# For all runners
docker ps -a --format json | jq .Names | grep github-runner | cut -d'"' -f2 | while read runner; do
  echo $runner
  docker exec $runner docker ps 2>&1 || \
    docker exec $runner bash -c 'sudo rm -f /var/run/docker.pid && sudo dockerd & sleep 3'
done
```

**Suggested Fix:**
The container's entrypoint script should clean up stale PID files before starting the Docker daemon:


```shell
# In the entrypoint script, before starting dockerd
rm -f /var/run/docker.pid /var/run/containerd/containerd.pid
```

**Related Issues:**
- #3794 - Intermittently getting "Cannot connect to the Docker daemon" (different root cause: iptables)
- #3257 - Cannot connect to the Docker daemon (general issue)
- moby/moby#46988 - How to make sure docker daemon successfully started even through PID file existed

### Controller Logs

N/A (not using Kubernetes controller)

### Runner Pod Logs

```shell
# After forced termination and restart, checking docker inside container:
$ docker exec github-runner-1 docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

# The stale PID file exists:
$ docker exec github-runner-1 ls -la /var/run/docker.pid
-rw-r--r-- 1 root root 5 Jan 25 07:00 /var/run/docker.pid

$ docker exec github-runner-1 cat /var/run/docker.pid
123  # Old PID from before termination
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker daemon fails to start after container forced termination due to stale PID file #4362

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Additional Context

Controller Logs

Runner Pod Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Docker daemon fails to start after container forced termination due to stale PID file #4362

Description

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Additional Context

Controller Logs

Runner Pod Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions