feat: Add Intel Arc GPU support for inference servers #4006

rgolangh · 2025-12-25T17:43:28Z

Motivation
This enables AI Lab to leverage Intel IPEX containers for hardware
acceleration on Intel Arc GPUs, providing better performance for
inference workloads on Intel hardware.

Modifications

Add Intel IPEX image to llamacpp image definitions
Update getLlamaCppInferenceImage() to detect and use Intel GPUs
Add Intel GPU device passthrough (/dev/dri) for container creation
Add Intel-specific environment variables (ZES_ENABLE_SYSMAN, OLLAMA_NUM_GPU)

How was this tested

Downloaded ibm-granite/granite-4.0-micro
Note: 'hybrid' model with the -h- in their name do not work. like ibm-granite/granite-4.0-h-micro
Created a new service
Run intel_gpu_top to examine GPU utilization
Execute curl to invoke the chat endpoint

Signed-off-by: Roy Golan rgolan@redhat.com

bmahabirbu

I wish i could test but LGTM this is awesome thanks!!

rgolangh · 2026-01-06T05:25:59Z

packages/backend/src/assets/inference-images.json

    "default": "quay.io/ramalama/ramalama-llama-server@sha256:9560fdb4f0bf4f44fddc4b1d8066b3e65d233c1673607e0029b78ebc812f3e5a",
-    "cuda": "quay.io/ramalama/cuda-llama-server@sha256:1a6d4fe31b527ad34b3d049eea11f142ad660485700cb9ac8c1d41d8887390cf"
+    "cuda": "quay.io/ramalama/cuda-llama-server@sha256:1a6d4fe31b527ad34b3d049eea11f142ad660485700cb9ac8c1d41d8887390cf",
+    "intel": "docker.io/intelanalytics/ipex-llm-inference-cpp-xpu:latest"


Guess it will be better to pin the image to its digest. I'll update it

packages/backend/src/workers/provider/LlamaCppPython.ts

packages/backend/src/assets/inference-images.json

rgolangh · 2026-01-12T16:10:23Z

On Mon, 12 Jan 2026 at 18:08, Jeff MAURY ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In packages/backend/src/assets/inference-images.json <#4006 (comment)> : > @@ -4,7 +4,8 @@ }, "llamacpp": { "default": ***@***.***:9560fdb4f0bf4f44fddc4b1d8066b3e65d233c1673607e0029b78ebc812f3e5a", - "cuda": ***@***.***:1a6d4fe31b527ad34b3d049eea11f142ad660485700cb9ac8c1d41d8887390cf" + "cuda": ***@***.***:1a6d4fe31b527ad34b3d049eea11f142ad660485700cb9ac8c1d41d8887390cf", + "intel": ***@***.***:74c7fba6e12a083ff664ae54e1ff16a977a39caa03d272125db406eeddaee09e" *question:* there are ramalama images forIntel GPU ( https://quay.io/repository/ramalama/intel-gpu-llama-server?tab=tags); what not use them ?

Thanks for this pointer - I'll give it a try. Are those getting updated regularly?

…

— Reply to this email directly, view it on GitHub <#4006 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABGBYHAYG4RBZY3PWBOKUKT4GPBGZAVCNFSM6AAAAACQAKKO2CVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTMNJRGU3TCMBUG4> . You are receiving this because you authored the thread.Message ID: <containers/podman-desktop-extension-ai-lab/pull/4006/review/3651571047@ github.com>

jeffmaury · 2026-01-13T09:19:31Z

packages/backend/src/workers/provider/LlamaCppPython.ts

+        return llamacpp.default;
      case VMType.LIBKRUN:
      case VMType.LIBKRUN_LABEL:
+        if (gpu?.vendor === GPUVendor.INTEL) return llamacpp.intel;


issue: libkrun machines are Apple only, don't think this makes sense

libkrun runs on both linux and Mac - https://github.com/containers/libkrun?tab=readme-ov-file#libkrun

Maybe but for the Podman landscape it is restricted to MacOS

Ack, removing

rgolangh · 2026-01-19T19:56:48Z

Can you review again please? Thanks

bmahabirbu · 2026-01-21T21:32:07Z

@rgolangh just a few comments but LGTM again, just wondering about the digest

rgolangh · 2026-01-22T06:24:12Z

@rgolangh just a few comments but LGTM again, just wondering about the digest

I ramalama image is set with a digest:
https://github.com/containers/podman-desktop-extension-ai-lab/pull/4006/changes#diff-282db7021967fc3e5e8d19443a8ed52e09c94aa13d8e7e4e2475d6d11ad5ac98R8

bmahabirbu · 2026-01-22T14:30:09Z

@rgolangh ah sorry for not being clear I thought that the cuda digest was changed but it looks to be the same just a space edit made it seem different. LGTM then! I really appreciate the contribution and the effort

jeffmaury · 2026-01-22T17:31:22Z

packages/backend/src/workers/provider/LlamaCppPython.spec.ts

+    );
+  });
+
+  test('LIBKRUN vmtype with Intel GPU should use llamacpp.intel image and no custom entrypoint', async () => {


suggestion: this test does not make sense to me as libkrun is MacOS only

ack, removing

jeffmaury · 2026-01-22T17:34:31Z

packages/backend/src/workers/provider/LlamaCppPython.ts

+        return llamacpp.default;
      case VMType.LIBKRUN:
      case VMType.LIBKRUN_LABEL:
+        if (gpu?.vendor === GPUVendor.INTEL) return llamacpp.intel;


Maybe but for the Podman landscape it is restricted to MacOS

- Add Intel IPEX image to llamacpp image definitions - Update getLlamaCppInferenceImage() to detect and use Intel GPUs - Add Intel GPU device passthrough (/dev/dri) for container creation - Add Intel-specific environment variables (ZES_ENABLE_SYSMAN, OLLAMA_NUM_GPU) - Set user=0 for Intel GPU on Linux and disable DeviceRequests This enables AI Lab to leverage Intel IPEX containers for hardware acceleration on Intel Arc GPUs, providing better performance for inference workloads on Intel hardware. Signed-off-by: Roy Golan <rgolan@redhat.com>

rgolangh · 2026-01-22T19:46:03Z

@jeffmaury I removed LIBKRUN changes. Please take a look.

jeffmaury · 2026-01-23T10:16:01Z

packages/backend/src/workers/provider/LlamaCppPython.ts

+            });
+
+            user = '0';
+          } else if (gpu.vendor === GPUVendor.INTEL) {


praise: this part should be removed as well

rgolangh requested review from a team, benoitf and jeffmaury as code owners December 25, 2025 17:43

rgolangh requested review from cdrage and gastoner December 25, 2025 17:43

rgolangh force-pushed the feat/intel-arc-gpu-support branch from 43a37a8 to 6ad37bc Compare December 25, 2025 17:55

bmahabirbu force-pushed the feat/intel-arc-gpu-support branch from 6885791 to 004430e Compare January 6, 2026 04:10

bmahabirbu approved these changes Jan 6, 2026

View reviewed changes

rgolangh commented Jan 6, 2026

View reviewed changes

packages/backend/src/workers/provider/LlamaCppPython.ts Show resolved Hide resolved

rgolangh force-pushed the feat/intel-arc-gpu-support branch from 004430e to 5a4eed2 Compare January 6, 2026 05:39

jeffmaury reviewed Jan 12, 2026

View reviewed changes

packages/backend/src/assets/inference-images.json Outdated Show resolved Hide resolved

jeffmaury reviewed Jan 13, 2026

View reviewed changes

rgolangh force-pushed the feat/intel-arc-gpu-support branch from 5a4eed2 to 77fb333 Compare January 17, 2026 18:45

rgolangh force-pushed the feat/intel-arc-gpu-support branch 2 times, most recently from 2813f52 to 0947e71 Compare January 21, 2026 14:30

jeffmaury reviewed Jan 22, 2026

View reviewed changes

rgolangh force-pushed the feat/intel-arc-gpu-support branch from 0947e71 to 306294e Compare January 22, 2026 19:42

rgolangh force-pushed the feat/intel-arc-gpu-support branch from 306294e to 0928d89 Compare January 22, 2026 19:43

jeffmaury requested changes Jan 23, 2026

View reviewed changes

feat: Add Intel Arc GPU support for inference servers #4006

Are you sure you want to change the base?

feat: Add Intel Arc GPU support for inference servers #4006

Conversation

rgolangh commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bmahabirbu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rgolangh commented Jan 12, 2026 via email

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgolangh commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bmahabirbu commented Jan 21, 2026

Uh oh!

rgolangh commented Jan 22, 2026

Uh oh!

bmahabirbu commented Jan 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgolangh commented Jan 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rgolangh commented Dec 25, 2025 •

edited

Loading

rgolangh commented Jan 19, 2026 •

edited

Loading