Migrate Alpine importer to advisory V2 #2111

ziadhany · 2026-01-08T21:38:53Z

Issue:

Migrate Alpine linux importer #2088

ziadhany · 2026-01-08T22:14:17Z

INFO 2026-01-26 19:15:30.575619 UTC Pipeline [AlpineLinuxImporterPipeline] starting
INFO 2026-01-26 19:15:30.575748 UTC Step [collect_and_store_advisories] starting
Importing data using alpine_linux_importer_v2
INFO 2026-01-26 22:39:08.084020 UTC Successfully collected 108,252 advisories
INFO 2026-01-26 22:39:08.084139 UTC Step [collect_and_store_advisories] completed in 12218 seconds (3.4 hours)
INFO 2026-01-26 22:39:08.084171 UTC Pipeline completed in 12218 seconds (3.4 hours)

from vulnerabilities.models import AdvisoryV2
from django.db.models import Count
duplicates = (
    AdvisoryV2.objects
    .values('avid')
    .annotate(count=Count('id'))
    .filter(count__gt=1)
)
len(duplicates)
Out[2]: 0
AdvisoryV2.objects.count()
Out[3]: 108252

ziadhany · 2026-01-15T14:01:46Z

@TG1999 @pombredanne I have a question about Alpine migration. We are fetching one URL and processing the data without grouping by CVE.

The problem is that each URL reports a package version along with its fixed CVEs. How can we obtain a unique identifier for this importer? Is it a good idea to restructure the data and create a large mapping, using the CVE as the unique identifier?

Proposed structure:
CVE: [purl_1, purl_2, ...]

Example:
Package: aom

Sources:
https://secdb.alpinelinux.org/v3.22/main.json -> CVEs: "CVE-2021-30473", "CVE-2021-30474", "CVE-2021-30475"
https://secdb.alpinelinux.org/v3.21/main.json -> CVEs: "CVE-2021-30473", "CVE-2021-30474", "CVE-2021-30475"

ziadhany · 2026-01-26T14:25:03Z

vulnerabilities/pipelines/v2_importers/alpine_linux_importer.py

+                )
+
+            for cve in aliases:
+                advisory_id = f"{pkg_infos['name']}/{qualifiers['distroversion']}/{cve}"


ex:

apache2/v3.20/2.4.26-r0/CVE-2017-7668

vulnerabilities/tests/pipelines/v2_importers/test_alpine_linux_importer_pipeline.py

ziadhany · 2026-01-28T13:41:57Z

The logs in debug mode:

alpine.zip

keshav-space

Thanks @ziadhany, see comments below.

keshav-space · 2026-01-29T18:47:15Z

vulnerabilities/pipelines/v2_importers/alpine_linux_importer.py

+def fetch_advisory_directory_links(
+    page_response_content: str,
+    base_url: str,
+    logger: callable = None,
+) -> List[str]:
+    """
+    Return a list of advisory directory links present in `page_response_content` html string
+    """
+    index_page = BeautifulSoup(page_response_content, features="lxml")
+    alpine_versions = [
+        link.text
+        for link in index_page.find_all("a")
+        if link.text.startswith("v") or link.text.startswith("edge")
+    ]
+
+    if not alpine_versions:
+        if logger:
+            logger(
+                f"No versions found in {base_url!r}",
+                level=logging.DEBUG,
+            )
+        return []
+
+    advisory_directory_links = [urljoin(base_url, version) for version in alpine_versions]
+
+    return advisory_directory_links
+
+
+def fetch_advisory_links(
+    advisory_directory_page: str,
+    advisory_directory_link: str,
+    logger: callable = None,
+) -> Iterable[str]:
+    """
+    Yield json file urls present in `advisory_directory_page`
+    """
+    advisory_directory_page = BeautifulSoup(advisory_directory_page, features="lxml")
+    anchor_tags = advisory_directory_page.find_all("a")
+    if not anchor_tags:
+        if logger:
+            logger(
+                f"No anchor tags found in {advisory_directory_link!r}",
+                level=logging.DEBUG,
+            )
+        return iter([])
+    for anchor_tag in anchor_tags:
+        if anchor_tag.text.endswith("json"):
+            yield urljoin(advisory_directory_link, anchor_tag.text)


@ziadhany this is bit brittle. I've created a mirror for Alpine secdb here https://github.com/aboutcode-org/aboutcode-mirror-alpine-secdb let's use this instead.

Ok, I’ll update the code. I didn’t notice we have a mirror

keshav-space · 2026-01-29T18:47:21Z

vulnerabilities/pipelines/v2_importers/alpine_linux_importer.py

+        return (cls.collect_and_store_advisories,)
+
+    def advisories_count(self) -> int:
+        return 0


Let's return count based on packages key.

Are you sure about this? The problem is that we create an AdvisoryData entry for every CVE.
For example (not related): CVE-2019-3828, CVE-2020-1733.

https://nvd.nist.gov/vuln/detail/CVE-2019-3828
https://nvd.nist.gov/vuln/detail/CVE-2020-1733

"packages": [ { "pkg": { "name": "ansible", "secfixes": { "2.6.3-r0": [ "CVE-2018-10875" ], "2.7.9-r0": [ "CVE-2018-16876" ], "2.8.11-r0": [ "CVE-2019-3828", "CVE-2020-1733", "CVE-2020-1740" ],

getting the correct count means we should loop over every package alias.

@ziadhany since we already have all the advisory files locally, we can instead return the count of CVEs from these files.
Perhaps we can return something like this?

sum(len(re.findall(r'\bCVE-\d{4}-\d+\b', a.read_text())) for a in secdb.rglob("*.json"))

Signed-off-by: ziad hany <ziadhany2016@gmail.com>

…aseImporterPipelineV2 Signed-off-by: ziad hany <ziadhany2016@gmail.com>

Signed-off-by: ziad hany <ziadhany2016@gmail.com>

Fix duplication on advisory_id Signed-off-by: ziad hany <ziadhany2016@gmail.com>

Signed-off-by: ziad hany <ziadhany2016@gmail.com>

ziadhany commented Jan 26, 2026

View reviewed changes

ziadhany requested review from TG1999 and keshav-space January 26, 2026 22:49

keshav-space reviewed Jan 27, 2026

View reviewed changes

vulnerabilities/tests/pipelines/v2_importers/test_alpine_linux_importer_pipeline.py Show resolved Hide resolved

ziadhany requested a review from keshav-space January 28, 2026 13:50

keshav-space requested changes Jan 29, 2026

View reviewed changes

ziadhany added 5 commits January 30, 2026 18:02

Migrate Alpine importer to advisory V2

c12ed5f

Signed-off-by: ziad hany <ziadhany2016@gmail.com>

Fix typo Alpine Linux importer pipeline inherits from VulnerableCodeB…

b935cf9

…aseImporterPipelineV2 Signed-off-by: ziad hany <ziadhany2016@gmail.com>

Update Alpine to avoid fetching the same URL links multiple times

b7aff19

Signed-off-by: ziad hany <ziadhany2016@gmail.com>

Update alpine linux so for every vulnerability id AdvisoryData

e9bd3cd

Fix duplication on advisory_id Signed-off-by: ziad hany <ziadhany2016@gmail.com>

Update Alpine pipeline to use aboutcode-mirror-alpine-secdb

0bb7b03

Signed-off-by: ziad hany <ziadhany2016@gmail.com>

ziadhany force-pushed the alpine-migrate branch from 26f912d to 0bb7b03 Compare January 30, 2026 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate Alpine importer to advisory V2 #2111

Migrate Alpine importer to advisory V2 #2111

ziadhany commented Jan 8, 2026 •

edited

Loading

Uh oh!

ziadhany commented Jan 8, 2026 •

edited

Loading

Uh oh!

ziadhany commented Jan 15, 2026 •

edited

Loading

Uh oh!

ziadhany Jan 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

ziadhany commented Jan 28, 2026 •

edited

Loading

Uh oh!

keshav-space left a comment

Uh oh!

keshav-space Jan 29, 2026

Uh oh!

ziadhany Jan 29, 2026

Uh oh!

keshav-space Jan 29, 2026

Uh oh!

ziadhany Jan 30, 2026

Uh oh!

keshav-space Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Migrate Alpine importer to advisory V2 #2111

Are you sure you want to change the base?

Migrate Alpine importer to advisory V2 #2111

Conversation

ziadhany commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ziadhany commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ziadhany commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ziadhany Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ziadhany commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keshav-space left a comment

Choose a reason for hiding this comment

Uh oh!

keshav-space Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

ziadhany Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

keshav-space Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

ziadhany Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

keshav-space Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ziadhany commented Jan 8, 2026 •

edited

Loading

ziadhany commented Jan 8, 2026 •

edited

Loading

ziadhany commented Jan 15, 2026 •

edited

Loading

ziadhany Jan 26, 2026 •

edited

Loading

ziadhany commented Jan 28, 2026 •

edited

Loading