-
Notifications
You must be signed in to change notification settings - Fork 30
302-Initial changes for adding e2e tests for ai-services testing framework #257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
19b7750 to
b000575
Compare
- Skips downloading of images during application creation if the image with given tag already exist locally. - Introduces a new `--image-pull-policy` flag which takes 3 values `Always`, `IfNotPresent` and `Never`. - If the flag is not set, then it uses `IfNotPresent` by default. - Deprecates `--skip-image-download` flag and moving `--image-pull-policy` flag can be used Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
…ework Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
Signed-off-by: Abhishek-Kushwaha <Abhishek.Kushwaha@ibm.com>
92919de to
e606e6b
Compare
Signed-off-by: Abhishek Kushwaha <Abhishek.Kushwaha@ibm.com>
Shubhamag12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few comments
will review this PR in multiple iterations
| return bin, nil | ||
| } | ||
|
|
||
| return "", fmt.Errorf("AI_SERVICES_BIN=%s failed verification: %w", bin, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my understanding, if ai-serivces binary is not found in first step, we proceed for next step (to either lookup or build one).
If that is the case, should we return from here? or should we just log that :AI_SERVICES_BIN not set, will check temp bin, and proceed for next step
Please correct me if I am wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Shubhamag12
If AI_SERVICES_BIN is set, we assume the user wants to use that binary and fail if it’s invalid.
We only look for or build another binary when AI_SERVICES_BIN is not set.
Considering your suggestion, we can add that log message and proceed further.
|
|
||
| fmt.Printf("[BOOTSTRAP] Successfully built and verified binary: %s\n", binPath) | ||
|
|
||
| return binPath, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small recommendation: If the build partially succeeds but verification fails, the invalid binary remains in testBinDir. Maybe a cleanup logic will do here.
Can be taken up in subsequent PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted
| continue | ||
| } | ||
|
|
||
| parts := strings.Fields(line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should enable bound checking here, as the code will panic if in any case the output has less than 2 fields
| // extractHostIP extracts the host IP from the CLI output using regex. | ||
| func extractHostIP(output string) (string, error) { | ||
| const minMatchGroups = 2 | ||
| re := regexp.MustCompile(`http[s]?://([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
This regex signifies: 999.999.999.999
we can use somewhat like this https?://(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}), and validate this ip using net.ParseIP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted
| backendPort string, | ||
| uiPort string, | ||
| opts CreateOptions, | ||
| _ []string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove this if not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be removing it in coming commits.
| // VerifyContainers checks if application pods are healthy and their restart counts are zero. | ||
| func VerifyContainers(appName string) error { | ||
| fmt.Println("[Podman] verifying containers for app:", appName) | ||
| res, _ := common.RunCommand("ai-services", "application", "ps", appName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should check for error as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. will fix this
| for _, ctr := range pod.Containers { | ||
| ctrRes, err := common.RunCommand("podman", "inspect", ctr.Id) | ||
| if err != nil { | ||
| return 0, fmt.Errorf("failed to inspect container %s: %w", ctr.Name, err) | ||
| } | ||
| var ctrData []ContainerInspect | ||
| if err := json.Unmarshal([]byte(ctrRes), &ctrData); err != nil { | ||
| return 0, fmt.Errorf("failed to parse container inspect %s: %w", ctr.Name, err) | ||
| } | ||
| if len(ctrData) == 0 { | ||
| return 0, fmt.Errorf("no container inspect data for %s", ctr.Name) | ||
| } | ||
| totalRestarts += ctrData[0].State.RestartCount | ||
| } | ||
|
|
||
| return totalRestarts, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are making too many calls here.
1 for pod and N for pod containers (n being number of containers)
Suggested approach
- Collect all container IDs
- Inspect all containers in one call
- Parse all container data at once
- And counts restarts
Sample
| for _, ctr := range pod.Containers { | |
| ctrRes, err := common.RunCommand("podman", "inspect", ctr.Id) | |
| if err != nil { | |
| return 0, fmt.Errorf("failed to inspect container %s: %w", ctr.Name, err) | |
| } | |
| var ctrData []ContainerInspect | |
| if err := json.Unmarshal([]byte(ctrRes), &ctrData); err != nil { | |
| return 0, fmt.Errorf("failed to parse container inspect %s: %w", ctr.Name, err) | |
| } | |
| if len(ctrData) == 0 { | |
| return 0, fmt.Errorf("no container inspect data for %s", ctr.Name) | |
| } | |
| totalRestarts += ctrData[0].State.RestartCount | |
| } | |
| return totalRestarts, nil | |
| } | |
| ctrIDs := make([]string, 0, len(pod.Containers)) | |
| for _, ctr := range pod.Containers { | |
| ctrIDs = append(ctrIDs, ctr.Id) | |
| } | |
| args := append([]string{"inspect"}, ctrIDs...) | |
| ctrRes, err := common.RunCommand("podman", args...) | |
| if err != nil { | |
| return 0, fmt.Errorf("failed to inspect containers in pod %s: %w", podName, err) | |
| } | |
| var allContainers []ContainerInspect | |
| if err := json.Unmarshal([]byte(ctrRes), &allContainers); err != nil { | |
| return 0, fmt.Errorf("failed to parse container inspect: %w", err) | |
| } | |
| totalRestarts := 0 | |
| for _, ctr := range allContainers { | |
| totalRestarts += ctr.State.RestartCount | |
| } |
We are making 2 calls in total here
Let me know if I understood it correctly and is the suggestion apt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. will fix in next commit
| AppName string | ||
| PodName string | ||
| Status string | ||
| RestartCount int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see RestartCount not being used anywhere. We are feeding it into the structure here, but never reference it in later code. Should we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. i will remove it
mayuka-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my initial comments. PTAL.
I will be reviewing in multiple rounds as it is quite big.
| @@ -0,0 +1,8 @@ | |||
| package rag | |||
|
|
|||
| import "log" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see logs being used and in few places fmt. Can we have a single pkg being used?
| }) | ||
| }) | ||
|
|
||
| }) No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a new line
| testServer = httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
| switch r.Method { | ||
| case "GET": | ||
| w.Write([]byte(`{"status":"ok"}`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take care of all the linters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have checked and resolved linter errors.
| "5000", // backend port | ||
| "3000", //ui port | ||
| cli.CreateOptions{ | ||
| SkipImageDownload: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is been deprecated. So prefer to use --image-pull-policy here. Pls refer this PR: #230
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted
| cfg, | ||
| appName, | ||
| "rag", | ||
| "ui.port=3000,backend.port=5000", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default ui port and backend port runs at 3000 and 5000 respectively. So I would prefer to specify a diff ports here and make sure its running,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted, will be testing it with different ports,
| expectedPodSuffixes := []string{ | ||
| "vllm-server", | ||
| "milvus", | ||
| "clean-docs", | ||
| "ingest-docs", | ||
| "chat-bot", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be present outside the method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay , What do you suggest as appropriate place ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably can we have vars dir under e2e and keep it there. There are chances that this slice will be used even for others in future like upgrade and rollbacks
| It("GET works", func() { | ||
| client := NewHTTPClient() | ||
| resp, err := client.Get("/test") | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| defer func() { | ||
| if cerr := resp.Body.Close(); cerr != nil { | ||
| fmt.Printf("[WARNING] failed to close response body: %v\n", cerr) | ||
| } | ||
| }() | ||
| body, err := io.ReadAll(resp.Body) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| Expect(string(body)).To(ContainSubstring("ok")) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I know what are these API calls for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, while create the base structure of the e2e I created it. I will remove it in coming PR.
| ginkgo "github.com/onsi/ginkgo/v2" | ||
| gomega "github.com/onsi/gomega" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In few places I see implicit imports done using . and in few places I see named imports. Can we have convention followed everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. For now I will make the changes by following the existing convention of implicit imports. If necessary, we can shift to named.
| @@ -0,0 +1,64 @@ | |||
| package httpclient | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I know why we have httpClient package?.
I see the BaseURL says "AI_SERVICES_BASE_URL" but we are not running any API server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, while create the base structure of the e2e I created it. I will remove it in coming PR.
|
|
||
| import ( | ||
| "encoding/json" | ||
| "log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls replace all the standard log pkg with our own internal logger pkg defined here: https://github.com/IBM/project-ai-services/blob/main/ai-services/internal/pkg/logger/logger.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do replace it everywhere, so that only 1 logger pkg
| } | ||
|
|
||
| // CreateAppWithHealthAndRAG creates an application, waits for health checks, and validates RAG endpoints. | ||
| func CreateAppWithHealthAndRAG( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this func is tied to RAG.
I feel this might not be the right way as CLI is agnostic to application. So better to have a seperate dir application for each app with CreateApp present under it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly wherever it is application specific can exists under it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed the function and added comment to make it explicit that this helper is RAG-specific and used only by RAG E2E tests.
ai-services CLI, focusing on the application lifecycle s flow and document ingestion using the ingest-docs pod.XContextand are not silently ignore.Commands covered:
ai-services -help
ai-services bootstrap
ai-services bootstrap configure
ai-services bootstrap validate
ai-services application create
ai-services application ps
ai-services application stop
ai-services application delete