Skip to content

Conversation

@Abhishek-Kushwaha1
Copy link

@Abhishek-Kushwaha1 Abhishek-Kushwaha1 commented Jan 8, 2026

  • This PR expands the E2E test coverage for the ai-services CLI, focusing on the application lifecycle s flow and document ingestion using the ingest-docs pod.
  • The tests are written using Ginkgo/Gomega and follow Kubernetes-style E2E patterns.
  • Any test cases that are not yet implemented or are unstable are explicitly skipped using XContext and are not silently ignore.

Commands covered:
ai-services -help
ai-services bootstrap
ai-services bootstrap configure
ai-services bootstrap validate
ai-services application create
ai-services application ps
ai-services application stop
ai-services application delete

mayuka-c and others added 21 commits January 13, 2026 08:23
- Skips downloading of images during application creation if the image
with given tag already exist locally.
- Introduces a new `--image-pull-policy` flag which takes 3 values
`Always`, `IfNotPresent` and `Never`.
- If the flag is not set, then it uses `IfNotPresent` by default.
- Deprecates `--skip-image-download` flag and moving
`--image-pull-policy` flag can be used

Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
…ework

Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek Kushwaha <Abhishek.Kushwaha@ibm.com>
Copy link
Contributor

@Shubhamag12 Shubhamag12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few comments
will review this PR in multiple iterations

return bin, nil
}

return "", fmt.Errorf("AI_SERVICES_BIN=%s failed verification: %w", bin, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my understanding, if ai-serivces binary is not found in first step, we proceed for next step (to either lookup or build one).

If that is the case, should we return from here? or should we just log that :AI_SERVICES_BIN not set, will check temp bin, and proceed for next step

Please correct me if I am wrong

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shubhamag12
If AI_SERVICES_BIN is set, we assume the user wants to use that binary and fail if it’s invalid.
We only look for or build another binary when AI_SERVICES_BIN is not set.
Considering your suggestion, we can add that log message and proceed further.


fmt.Printf("[BOOTSTRAP] Successfully built and verified binary: %s\n", binPath)

return binPath, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small recommendation: If the build partially succeeds but verification fails, the invalid binary remains in testBinDir. Maybe a cleanup logic will do here.

Can be taken up in subsequent PRs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted

continue
}

parts := strings.Fields(line)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should enable bound checking here, as the code will panic if in any case the output has less than 2 fields

// extractHostIP extracts the host IP from the CLI output using regex.
func extractHostIP(output string) (string, error) {
const minMatchGroups = 2
re := regexp.MustCompile(`http[s]?://([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

This regex signifies: 999.999.999.999

we can use somewhat like this https?://(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}), and validate this ip using net.ParseIP

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted

backendPort string,
uiPort string,
opts CreateOptions,
_ []string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this if not used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be removing it in coming commits.

// VerifyContainers checks if application pods are healthy and their restart counts are zero.
func VerifyContainers(appName string) error {
fmt.Println("[Podman] verifying containers for app:", appName)
res, _ := common.RunCommand("ai-services", "application", "ps", appName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check for error as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. will fix this

Comment on lines +107 to +123
for _, ctr := range pod.Containers {
ctrRes, err := common.RunCommand("podman", "inspect", ctr.Id)
if err != nil {
return 0, fmt.Errorf("failed to inspect container %s: %w", ctr.Name, err)
}
var ctrData []ContainerInspect
if err := json.Unmarshal([]byte(ctrRes), &ctrData); err != nil {
return 0, fmt.Errorf("failed to parse container inspect %s: %w", ctr.Name, err)
}
if len(ctrData) == 0 {
return 0, fmt.Errorf("no container inspect data for %s", ctr.Name)
}
totalRestarts += ctrData[0].State.RestartCount
}

return totalRestarts, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are making too many calls here.
1 for pod and N for pod containers (n being number of containers)

Suggested approach

  1. Collect all container IDs
  2. Inspect all containers in one call
  3. Parse all container data at once
  4. And counts restarts

Sample

Suggested change
for _, ctr := range pod.Containers {
ctrRes, err := common.RunCommand("podman", "inspect", ctr.Id)
if err != nil {
return 0, fmt.Errorf("failed to inspect container %s: %w", ctr.Name, err)
}
var ctrData []ContainerInspect
if err := json.Unmarshal([]byte(ctrRes), &ctrData); err != nil {
return 0, fmt.Errorf("failed to parse container inspect %s: %w", ctr.Name, err)
}
if len(ctrData) == 0 {
return 0, fmt.Errorf("no container inspect data for %s", ctr.Name)
}
totalRestarts += ctrData[0].State.RestartCount
}
return totalRestarts, nil
}
ctrIDs := make([]string, 0, len(pod.Containers))
for _, ctr := range pod.Containers {
ctrIDs = append(ctrIDs, ctr.Id)
}
args := append([]string{"inspect"}, ctrIDs...)
ctrRes, err := common.RunCommand("podman", args...)
if err != nil {
return 0, fmt.Errorf("failed to inspect containers in pod %s: %w", podName, err)
}
var allContainers []ContainerInspect
if err := json.Unmarshal([]byte(ctrRes), &allContainers); err != nil {
return 0, fmt.Errorf("failed to parse container inspect: %w", err)
}
totalRestarts := 0
for _, ctr := range allContainers {
totalRestarts += ctr.State.RestartCount
}

We are making 2 calls in total here

Let me know if I understood it correctly and is the suggestion apt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. will fix in next commit

AppName string
PodName string
Status string
RestartCount int
Copy link
Contributor

@Shubhamag12 Shubhamag12 Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see RestartCount not being used anywhere. We are feeding it into the structure here, but never reference it in later code. Should we remove it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. i will remove it

Copy link
Member

@mayuka-c mayuka-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my initial comments. PTAL.
I will be reviewing in multiple rounds as it is quite big.

@@ -0,0 +1,8 @@
package rag

import "log"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see logs being used and in few places fmt. Can we have a single pkg being used?

})
})

}) No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new line

testServer = httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch r.Method {
case "GET":
w.Write([]byte(`{"status":"ok"}`))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take care of all the linters

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have checked and resolved linter errors.

"5000", // backend port
"3000", //ui port
cli.CreateOptions{
SkipImageDownload: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is been deprecated. So prefer to use --image-pull-policy here. Pls refer this PR: #230

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted

cfg,
appName,
"rag",
"ui.port=3000,backend.port=5000",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default ui port and backend port runs at 3000 and 5000 respectively. So I would prefer to specify a diff ports here and make sure its running,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, will be testing it with different ports,

Comment on lines +145 to +151
expectedPodSuffixes := []string{
"vllm-server",
"milvus",
"clean-docs",
"ingest-docs",
"chat-bot",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably be present outside the method

Copy link
Member

@AS-Prajna AS-Prajna Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay , What do you suggest as appropriate place ?

Copy link
Member

@mayuka-c mayuka-c Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably can we have vars dir under e2e and keep it there. There are chances that this slice will be used even for others in future like upgrade and rollbacks

Comment on lines +50 to +62
It("GET works", func() {
client := NewHTTPClient()
resp, err := client.Get("/test")
Expect(err).NotTo(HaveOccurred())
defer func() {
if cerr := resp.Body.Close(); cerr != nil {
fmt.Printf("[WARNING] failed to close response body: %v\n", cerr)
}
}()
body, err := io.ReadAll(resp.Body)
Expect(err).NotTo(HaveOccurred())
Expect(string(body)).To(ContainSubstring("ok"))
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know what are these API calls for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, while create the base structure of the e2e I created it. I will remove it in coming PR.

Comment on lines +11 to +12
ginkgo "github.com/onsi/ginkgo/v2"
gomega "github.com/onsi/gomega"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In few places I see implicit imports done using . and in few places I see named imports. Can we have convention followed everywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. For now I will make the changes by following the existing convention of implicit imports. If necessary, we can shift to named.

@@ -0,0 +1,64 @@
package httpclient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know why we have httpClient package?.
I see the BaseURL says "AI_SERVICES_BASE_URL" but we are not running any API server

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, while create the base structure of the e2e I created it. I will remove it in coming PR.


import (
"encoding/json"
"log"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls replace all the standard log pkg with our own internal logger pkg defined here: https://github.com/IBM/project-ai-services/blob/main/ai-services/internal/pkg/logger/logger.go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do replace it everywhere, so that only 1 logger pkg

}

// CreateAppWithHealthAndRAG creates an application, waits for health checks, and validates RAG endpoints.
func CreateAppWithHealthAndRAG(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this func is tied to RAG.
I feel this might not be the right way as CLI is agnostic to application. So better to have a seperate dir application for each app with CreateApp present under it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly wherever it is application specific can exists under it

Copy link
Author

@Abhishek-Kushwaha1 Abhishek-Kushwaha1 Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed the function and added comment to make it explicit that this helper is RAG-specific and used only by RAG E2E tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants