-
Notifications
You must be signed in to change notification settings - Fork 76
feat: Add Intel Arc GPU support for inference servers #4006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
43a37a8 to
6ad37bc
Compare
6885791 to
004430e
Compare
bmahabirbu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish i could test but LGTM this is awesome thanks!!
| "default": "quay.io/ramalama/ramalama-llama-server@sha256:9560fdb4f0bf4f44fddc4b1d8066b3e65d233c1673607e0029b78ebc812f3e5a", | ||
| "cuda": "quay.io/ramalama/cuda-llama-server@sha256:1a6d4fe31b527ad34b3d049eea11f142ad660485700cb9ac8c1d41d8887390cf" | ||
| "cuda": "quay.io/ramalama/cuda-llama-server@sha256:1a6d4fe31b527ad34b3d049eea11f142ad660485700cb9ac8c1d41d8887390cf", | ||
| "intel": "docker.io/intelanalytics/ipex-llm-inference-cpp-xpu:latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess it will be better to pin the image to its digest. I'll update it
004430e to
5a4eed2
Compare
|
On Mon, 12 Jan 2026 at 18:08, Jeff MAURY ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In packages/backend/src/assets/inference-images.json
<#4006 (comment)>
:
> @@ -4,7 +4,8 @@
},
"llamacpp": {
"default": ***@***.***:9560fdb4f0bf4f44fddc4b1d8066b3e65d233c1673607e0029b78ebc812f3e5a",
- "cuda": ***@***.***:1a6d4fe31b527ad34b3d049eea11f142ad660485700cb9ac8c1d41d8887390cf"
+ "cuda": ***@***.***:1a6d4fe31b527ad34b3d049eea11f142ad660485700cb9ac8c1d41d8887390cf",
+ "intel": ***@***.***:74c7fba6e12a083ff664ae54e1ff16a977a39caa03d272125db406eeddaee09e"
*question:* there are ramalama images forIntel GPU (
https://quay.io/repository/ramalama/intel-gpu-llama-server?tab=tags);
what not use them ?
Thanks for this pointer - I'll give it a try. Are those getting updated
regularly?
… —
Reply to this email directly, view it on GitHub
<#4006 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGBYHAYG4RBZY3PWBOKUKT4GPBGZAVCNFSM6AAAAACQAKKO2CVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTMNJRGU3TCMBUG4>
.
You are receiving this because you authored the thread.Message ID:
<containers/podman-desktop-extension-ai-lab/pull/4006/review/3651571047@
github.com>
|
| return llamacpp.default; | ||
| case VMType.LIBKRUN: | ||
| case VMType.LIBKRUN_LABEL: | ||
| if (gpu?.vendor === GPUVendor.INTEL) return llamacpp.intel; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: libkrun machines are Apple only, don't think this makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libkrun runs on both linux and Mac - https://github.com/containers/libkrun?tab=readme-ov-file#libkrun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe but for the Podman landscape it is restricted to MacOS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, removing
5a4eed2 to
77fb333
Compare
|
Can you review again please? Thanks |
2813f52 to
0947e71
Compare
|
@rgolangh just a few comments but LGTM again, just wondering about the digest |
I ramalama image is set with a digest: |
|
@rgolangh ah sorry for not being clear I thought that the cuda digest was changed but it looks to be the same just a space edit made it seem different. LGTM then! I really appreciate the contribution and the effort |
| ); | ||
| }); | ||
|
|
||
| test('LIBKRUN vmtype with Intel GPU should use llamacpp.intel image and no custom entrypoint', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: this test does not make sense to me as libkrun is MacOS only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, removing
| return llamacpp.default; | ||
| case VMType.LIBKRUN: | ||
| case VMType.LIBKRUN_LABEL: | ||
| if (gpu?.vendor === GPUVendor.INTEL) return llamacpp.intel; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe but for the Podman landscape it is restricted to MacOS
0947e71 to
306294e
Compare
- Add Intel IPEX image to llamacpp image definitions - Update getLlamaCppInferenceImage() to detect and use Intel GPUs - Add Intel GPU device passthrough (/dev/dri) for container creation - Add Intel-specific environment variables (ZES_ENABLE_SYSMAN, OLLAMA_NUM_GPU) - Set user=0 for Intel GPU on Linux and disable DeviceRequests This enables AI Lab to leverage Intel IPEX containers for hardware acceleration on Intel Arc GPUs, providing better performance for inference workloads on Intel hardware. Signed-off-by: Roy Golan <rgolan@redhat.com>
306294e to
0928d89
Compare
|
@jeffmaury I removed LIBKRUN changes. Please take a look. |
| }); | ||
|
|
||
| user = '0'; | ||
| } else if (gpu.vendor === GPUVendor.INTEL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: this part should be removed as well
Motivation
This enables AI Lab to leverage Intel IPEX containers for hardware
acceleration on Intel Arc GPUs, providing better performance for
inference workloads on Intel hardware.
Modifications
How was this tested
ibm-granite/granite-4.0-microNote: 'hybrid' model with the -h- in their name do not work. like
ibm-granite/granite-4.0-h-microintel_gpu_topto examine GPU utilizationSigned-off-by: Roy Golan rgolan@redhat.com