fix: add NVIDIA CDI device for WSL2 GPU support #3895

limam-B · 2025-11-23T01:46:04Z

What does this PR do?

Adds NVIDIA CDI device (nvidia.com/gpu=all) to WSL2 container creation to enable actual GPU access. Previously, WSL2 containers had GPU environment variables but missing device mounting, causing inference to run on CPU despite showing "GPU Inference" badge.

Screenshot / video of UI

No UI changes, backend fix only

What issues does this PR fix or reference?

Fixes #3431

How to test this PR?

1- Windows 11 + WSL2 with NVIDIA GPU and drivers installed
2- Install NVIDIA Container Toolkit in WSL2 and generate CDI config (nvidia-ctk cdi generate)
3- Enable "Experimental GPU" in AI Lab settings
4- Create a new service with any model
5- In WSL2, run nvidia-smi - should show GPU usage and llama-server process
6- Verify container devices: podman inspect <container-id> | grep -A5 Devices - should show nvidia.com/gpu device (not empty)

jeffmaury · 2025-11-26T12:48:49Z

packages/backend/src/workers/provider/LlamaCppPython.ts

            Type: 'bind',
          });

+          devices.push({


question: This is a flag not a path to share, what is the rationale to do that ?

This is a CDI (Container Device Interface) device identifier. Podman uses nvidia.com/gpu=all as a CDI spec name to automatically mount all NVIDIA GPU devices.
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html

Based on that link - i'm not that excpert - i belielve that Podman Devices array accepts CDI device names like nvidia.com/gpu=all in PathOnHost , so when Podman sees such format, it automatically resolves it via CDI and mounts all GPU devices.
This is the same pattern used for Linux (as screen shoot above).
The alternative would be using the --device CLI flag, but since we're using the API, this is the equivalent approach.

bmahabirbu · 2025-11-30T20:39:08Z

Can confirm adding nvidia.com/gpu=all works (mimicking what is being done for the nvidia docs and what ramalama uses), but we already have this as part of the driver enablement

devices.push({
            PathOnHost: '/dev/dxg',
            PathInContainer: '/dev/dxg',

Will investigate more but nvidia gpu passthrough was working via wsl before and no code relating to the gpu has changed. I have a hunch it could be something else

I know that the qe team was recently checking out gpu stuff and would appreciate their knowledge on this too! Thank you!

limam-B · 2025-12-01T01:02:41Z

Can confirm adding nvidia.com/gpu=all works (mimicking what is being done for the nvidia docs and what ramalama uses), but we already have this as part of the driver enablement
devices.push({
            PathOnHost: '/dev/dxg',
            PathInContainer: '/dev/dxg',
Will investigate more but nvidia gpu passthrough was working via wsl before and no code relating to the gpu has changed. I have a hunch it could be something else

I know that the qe team was recently checking out gpu stuff and would appreciate their knowledge on this too! Thank you!

Thanks for testing.

Regarding "nvidia gpu passthrough was working via wsl before" - that was with ai-lab-playground-chat-cuda which was based on cuda-ubi9-python-3.9 (full CUDA workbench with libraries embedded):

"labels": {
    "org.opencontainers.image.base.digest": "sha256:b5f15b03e09a5a4193bad4b6027d20098dcc694b82ddb618c22d09f2b8a7723e",
    "org.opencontainers.image.base.name": "quay.io/opendatahub/workbench-images:cuda-ubi9-python-3.9-20231206"
}

https://github.com/containers/podman-desktop-extension-ai-lab-playground-images/pkgs/container/podman-desktop-extension-ai-lab-playground-images%2Fai-lab-playground-chat-cuda/413979254?tag=e85acc66a1849a0c6841cb6d7aa8982e8d1aaa88

The switch to ramalama/cuda-llama-server in e34d59f changed this - the new image expects CDI injection.

/dev/dxg - provides GPU hardware access
nvidia.com/gpu=all (CDI) - provides the CUDA software stack

The old image had the CUDA stack baked in; the new one doesn't.

bmahabirbu · 2025-12-01T01:54:22Z

Ah thanks for the in-depth explanation that makes sense!

axel7083

We have a pretty old issue #1824 on detecting the nvidia CDI

As of today, we do some magic 🪄 trick to let the container access the GPU on WSL, which is not ideal but work for all user even when they do not have CDI installed

I ma okay with this change, if this is not causing errors for user that do not have it

axel7083 · 2025-12-01T08:29:46Z

packages/backend/src/workers/provider/LlamaCppPython.ts

          });

+          devices.push({
+            PathOnHost: 'nvidia.com/gpu=all',


question: what happens if the podman machine do not have the nvidia CDI installed?

I guess If CDI isn't configured, Podman will fail to resolve nvidia.com/gpu=all and the container won't start.
But users enabling GPU support should have nvidia-container-toolkit installed which generates the CDI spec.
Maybe should add a check like the Linux case does with isNvidiaCDIConfigured()?
"I'll confirm by reproducing this scenario, drop more later on.

Test Scenario: What happens without CDI?

Check current CDI status in Podman machine

podman machine ssh cat /etc/cdi/nvidia.yaml
file exists, CDI configured.

Temporarily disable CDI

SSH into the Podman machine

podman machine ssh

Disable/Backup the CDI config

sudo mv /etc/cdi/nvidia.yaml /etc/cdi/nvidia.yaml.disabled

Exit the SSH session

exit

Test Results

Inference server with [ GPU ENABLED | no CDI ] in AI Lab.

Inference server with [ GPU DISABLED | no CDI ] in AI Lab.

Why this behavior is correct:

Thanks to the conditional checks at:

LlamaCppPython.ts:230 - Only gets GPU if experimentalGPU setting is enabled

LlamaCppPython.ts:108 - Only adds CDI device if gpu object exists

The CDI device is only added when GPU is explicitly enabled in settings.

Conclusion:

This is the correct behavior since RamaLama requires CDI:
https://github.com/containers/ramalama/blob/main/docs/ramalama-cuda.7.md

CPU mode is unaffected (no CDI device added when GPU is disabled)

GPU mode gives clear error when CDI is missing

GPU mode works when CDI is properly configured

RamaLama requires CDI Documented

AI Lab Extension requires CDI Documented

Background:

The "magic trick" in #1824 worked with the old ai-lab-playground-chat-cuda image (CUDA embedded).
RamaLama images expect CDI injection instead , this change happened in e34d59f.

We should update the AI Lab documentation to mention CDI is required for WSL GPU support. Maybe?

bmahabirbu · 2025-12-16T13:46:22Z

Hi @limam-B, please rebase and will check if tests pass we should get this in

limam-B · 2025-12-16T22:33:42Z

Hi @limam-B, please rebase and will check if tests pass we should get this in

Hello @bmahabirbu
Yes sir !
sorry for tag

bmahabirbu · 2026-01-22T01:53:49Z

Ah i'm so sorry @limam-B I forgot about this one imo this is all set to merge, would you like to do one more rebase and check tests?

I promise ill get on it this time!! And thank you kindly for the contribution

bmahabirbu

@limam-B Actually, after testing I ran into '''Something went wrong while pulling ChatBot: Error: (HTTP code 500) server error - setting up CDI devices: unresolvable CDI devices nvidia.com/gpu=all''' which is fine since i didnt have cdi and after setting it up via the official docs everything worked as expected. This error message should have a link to direct the user to these docs for setting the nvidia cdi up! Should be an easy fix just point to https://podman-desktop.io/docs/podman/gpu

What i did notice before the change the magic of the old code still somehow worked i need to investigate this more as to why but the latest main branch allows the gpu usage without cdi setup still! Edit: after investigation the syslink strategy thats imployed is still working!

I am on windows 11 with latest wsl2 installed. I made sure to start with a fresh podman machine in podman desktop before testing again main and again this pr. I confirmed that cdi is not created on the machine when testing main

I do agree this is the safer route in the long run but maybe a follow up pr should be made to auto setup the cdi for the podman machine on nvidia detection to the end user wont know the difference? For now the error pointing to the documentation should be suffient thoughts?

Can you try again on main and see if it still runs on cpu despite gpu enabled? If so guess the trick doesnt work for every machine

Signed-off-by: limam-B <73091373+limam-B@users.noreply.github.com>

limam-B requested review from a team, benoitf and jeffmaury as code owners November 23, 2025 01:46

limam-B requested review from dgolovin and gastoner November 23, 2025 01:46

limam-B mentioned this pull request Nov 23, 2025

AI Lab does not pass CDI GPU devices on WSL2, causing models to run on CPU despite showing "GPU Inference" podman-desktop/podman-desktop#15003

Closed

jeffmaury reviewed Nov 26, 2025

View reviewed changes

jeffmaury requested a review from axel7083 November 26, 2025 15:39

bmahabirbu requested a review from ScrewTSW November 30, 2025 20:23

bmahabirbu approved these changes Dec 1, 2025

View reviewed changes

axel7083 reviewed Dec 1, 2025

View reviewed changes

limam-B requested a review from axel7083 December 9, 2025 08:42

limam-B force-pushed the fix-wsl2-gpu branch from 311c901 to 08cd7f8 Compare December 16, 2025 19:05

limam-B force-pushed the fix-wsl2-gpu branch from 08cd7f8 to 29a33b8 Compare December 18, 2025 08:40

bmahabirbu force-pushed the fix-wsl2-gpu branch from 29a33b8 to 5f620d6 Compare January 22, 2026 01:54

bmahabirbu requested changes Jan 22, 2026

View reviewed changes

fix: add NVIDIA CDI device for WSL2 GPU support

7229d19

Signed-off-by: limam-B <73091373+limam-B@users.noreply.github.com>

bmahabirbu force-pushed the fix-wsl2-gpu branch from 5f620d6 to 7229d19 Compare January 23, 2026 03:25

fix: add NVIDIA CDI device for WSL2 GPU support #3895

Are you sure you want to change the base?

fix: add NVIDIA CDI device for WSL2 GPU support #3895

Conversation

limam-B commented Nov 23, 2025

What does this PR do?

Screenshot / video of UI

What issues does this PR fix or reference?

How to test this PR?

Uh oh!

jeffmaury Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

limam-B Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

limam-B Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

bmahabirbu commented Nov 30, 2025

Uh oh!

limam-B commented Dec 1, 2025

Uh oh!

bmahabirbu commented Dec 1, 2025

Uh oh!

axel7083 left a comment

Choose a reason for hiding this comment

Uh oh!

axel7083 Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

limam-B Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

limam-B Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Test Scenario: What happens without CDI?

Check current CDI status in Podman machine

Temporarily disable CDI

Test Results

Inference server with [ GPU ENABLED | no CDI ] in AI Lab.

Inference server with [ GPU DISABLED | no CDI ] in AI Lab.

Conclusion:

Background:

Uh oh!

bmahabirbu commented Dec 16, 2025

Uh oh!

limam-B commented Dec 16, 2025

Uh oh!

bmahabirbu commented Jan 22, 2026

Uh oh!

bmahabirbu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

limam-B Nov 26, 2025 •

edited

Loading

limam-B Dec 1, 2025 •

edited

Loading

limam-B Dec 1, 2025 •

edited

Loading

bmahabirbu left a comment •

edited

Loading