-
-
Notifications
You must be signed in to change notification settings - Fork 270
Migrate Alpine importer to advisory V2 #2111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
@TG1999 @pombredanne I have a question about Alpine migration. We are fetching one URL and processing the data without grouping by CVE. The problem is that each URL reports a package version along with its fixed CVEs. How can we obtain a unique identifier for this importer? Is it a good idea to restructure the data and create a large mapping, using the CVE as the unique identifier? Proposed structure: Example: Sources: |
| ) | ||
|
|
||
| for cve in aliases: | ||
| advisory_id = f"{pkg_infos['name']}/{qualifiers['distroversion']}/{cve}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ex:
apache2/v3.20/2.4.26-r0/CVE-2017-7668
vulnerabilities/tests/pipelines/v2_importers/test_alpine_linux_importer_pipeline.py
Show resolved
Hide resolved
|
The logs in debug mode: |
keshav-space
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ziadhany, see comments below.
| def fetch_advisory_directory_links( | ||
| page_response_content: str, | ||
| base_url: str, | ||
| logger: callable = None, | ||
| ) -> List[str]: | ||
| """ | ||
| Return a list of advisory directory links present in `page_response_content` html string | ||
| """ | ||
| index_page = BeautifulSoup(page_response_content, features="lxml") | ||
| alpine_versions = [ | ||
| link.text | ||
| for link in index_page.find_all("a") | ||
| if link.text.startswith("v") or link.text.startswith("edge") | ||
| ] | ||
|
|
||
| if not alpine_versions: | ||
| if logger: | ||
| logger( | ||
| f"No versions found in {base_url!r}", | ||
| level=logging.DEBUG, | ||
| ) | ||
| return [] | ||
|
|
||
| advisory_directory_links = [urljoin(base_url, version) for version in alpine_versions] | ||
|
|
||
| return advisory_directory_links | ||
|
|
||
|
|
||
| def fetch_advisory_links( | ||
| advisory_directory_page: str, | ||
| advisory_directory_link: str, | ||
| logger: callable = None, | ||
| ) -> Iterable[str]: | ||
| """ | ||
| Yield json file urls present in `advisory_directory_page` | ||
| """ | ||
| advisory_directory_page = BeautifulSoup(advisory_directory_page, features="lxml") | ||
| anchor_tags = advisory_directory_page.find_all("a") | ||
| if not anchor_tags: | ||
| if logger: | ||
| logger( | ||
| f"No anchor tags found in {advisory_directory_link!r}", | ||
| level=logging.DEBUG, | ||
| ) | ||
| return iter([]) | ||
| for anchor_tag in anchor_tags: | ||
| if anchor_tag.text.endswith("json"): | ||
| yield urljoin(advisory_directory_link, anchor_tag.text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ziadhany this is bit brittle. I've created a mirror for Alpine secdb here https://github.com/aboutcode-org/aboutcode-mirror-alpine-secdb let's use this instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I’ll update the code. I didn’t notice we have a mirror
| return (cls.collect_and_store_advisories,) | ||
|
|
||
| def advisories_count(self) -> int: | ||
| return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's return count based on packages key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about this? The problem is that we create an AdvisoryData entry for every CVE.
For example (not related): CVE-2019-3828, CVE-2020-1733.
https://nvd.nist.gov/vuln/detail/CVE-2019-3828
https://nvd.nist.gov/vuln/detail/CVE-2020-1733
"packages": [
{
"pkg": {
"name": "ansible",
"secfixes": {
"2.6.3-r0": [
"CVE-2018-10875"
],
"2.7.9-r0": [
"CVE-2018-16876"
],
"2.8.11-r0": [
"CVE-2019-3828",
"CVE-2020-1733",
"CVE-2020-1740"
],
getting the correct count means we should loop over every package alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ziadhany since we already have all the advisory files locally, we can instead return the count of CVEs from these files.
Perhaps we can return something like this?
sum(len(re.findall(r'\bCVE-\d{4}-\d+\b', a.read_text())) for a in secdb.rglob("*.json"))
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
…aseImporterPipelineV2 Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Fix duplication on advisory_id Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
26f912d to
0bb7b03
Compare
Issue: