Skip to content

feat(providers): add Dockerfile/Containerfile provider for image analysis#542

Open
a-oren wants to merge 2 commits into
guacsec:mainfrom
a-oren:TC-4938
Open

feat(providers): add Dockerfile/Containerfile provider for image analysis#542
a-oren wants to merge 2 commits into
guacsec:mainfrom
a-oren:TC-4938

Conversation

@a-oren

@a-oren a-oren commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add DockerfileProvider that parses FROM instructions to extract the base image reference and generates a CycloneDX SBOM via syft for component and stack analysis
  • Support multi-stage Dockerfiles (uses final FROM), suffixed filenames (Dockerfile.dev, Containerfile.prod), multiple --flag tokens, and reject ARG substitution and FROM scratch
  • Normalize Docker Hub image references in ImageRef.getPackageURL() so bare names (node) and library-prefixed names (docker.io/library/node) produce the same PURL (docker.io/node)

Implements TC-4938

Test plan

  • Ecosystem.resolveProvider returns DockerfileProvider for Dockerfile, Containerfile, and suffixed variants
  • FROM line parsing extracts correct image reference from single-stage Dockerfile
  • FROM line parsing uses last FROM in multi-stage Dockerfile
  • FROM line parsing strips single and multiple --flag tokens
  • FROM line parsing handles image digests (httpd@sha256:...)
  • FROM line parsing is case-insensitive
  • FROM line parsing rejects ARG-substituted FROM targets
  • FROM line parsing rejects FROM scratch
  • readLicenseFromManifest returns null
  • validateLockFile does not throw
  • Non-Dockerfile prefix (Dockerfilesomething) is rejected
  • Docker Hub bare name normalized to docker.io/node in PURL
  • Docker Hub library/ prefix stripped in PURL
  • Both bare and library forms produce same PURL
  • Non-Docker-Hub registries unchanged in PURL
  • Docker Hub user images (docker.io/myuser/myimage) unchanged in PURL
  • All 24 new tests pass, all existing tests unaffected
  • Spotless formatting passes

🤖 Generated with Claude Code

Summary by Sourcery

Add support for analyzing Dockerfile and Containerfile manifests by extracting their base image and generating an OCI CycloneDX SBOM, and normalize Docker Hub image references in image package URLs.

New Features:

  • Introduce a DockerfileProvider that parses Dockerfile/Containerfile FROM instructions and produces SBOM content for component and stack analysis.
  • Extend ecosystem type and provider resolution to treat Dockerfile and Containerfile (including suffixed variants) as OCI manifests analyzable via syft.

Enhancements:

  • Normalize Docker Hub image references in ImageRef so bare and library-prefixed forms produce a consistent repository_url in generated package URLs.

Tests:

  • Add comprehensive tests for DockerfileProvider behavior, including FROM parsing edge cases, ecosystem resolution, and provider contract methods.
  • Add tests ensuring Docker Hub normalization and non-Docker Hub registries/user namespaces behave correctly in ImageRef package URLs.

…ysis

Add DockerfileProvider that parses FROM instructions to extract base
image references and generates CycloneDX SBOMs via syft. Supports
multi-stage builds (uses final FROM), suffixed filenames (Dockerfile.dev),
multiple --flag tokens, and rejects ARG substitution and FROM scratch.

Also normalize Docker Hub image references in ImageRef.getPackageURL()
so bare names (node) and library-prefixed names (docker.io/library/node)
produce the same PURL (docker.io/node), aligning with the JS client.

Implements: TC-4938

Assisted-by: Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sourcery-ai

sourcery-ai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Adds a Dockerfile/Containerfile provider that parses FROM instructions to derive a base image, generates a CycloneDX SBOM via syft, and normalizes Docker Hub image references in ImageRef PURLs, with tests covering multi-stage Dockerfiles, flags, ARG substitution, scratch images, filename resolution, and Docker Hub normalization behavior.

Sequence diagram for DockerfileProvider SBOM generation from Dockerfile

sequenceDiagram
  participant Ecosystem
  participant DockerfileProvider
  participant ImageUtils
  participant ImageRef

  Ecosystem->>Ecosystem: getProvider(manifestPath)
  Ecosystem->>Ecosystem: isDockerfile(filename)
  Ecosystem-->>DockerfileProvider: new DockerfileProvider(manifestPath)

  DockerfileProvider->>DockerfileProvider: provideComponent()
  DockerfileProvider->>DockerfileProvider: generateSbomContent()
  DockerfileProvider->>DockerfileProvider: parseLastFromImage(manifestPath)
  DockerfileProvider-->>ImageUtils: parseImageRef(imageReference)
  ImageUtils-->>DockerfileProvider: ImageRef
  DockerfileProvider->>ImageUtils: generateImageSBOM(imageRef)
  ImageUtils-->>DockerfileProvider: sbomNode
  DockerfileProvider->>DockerfileProvider: objectMapper.writeValueAsBytes(sbomNode)
  DockerfileProvider-->>Ecosystem: Content(CYCLONEDX_MEDIA_TYPE)
Loading

File-Level Changes

Change Details Files
Introduce DockerfileProvider and integrate it into the Ecosystem for Dockerfile/Containerfile manifests.
  • Add Ecosystem.Type.DOCKERFILE mapped to the oci type and syft executable short name.
  • Add Ecosystem.resolveProvider logic that detects Dockerfile/Containerfile names (including suffixed variants) and returns DockerfileProvider.
  • Implement DockerfileProvider to parse the last FROM line, validate it, resolve an ImageRef, and generate CycloneDX SBOM content for component and stack analysis.
  • Ensure readLicenseFromManifest returns null and validateLockFile is a no-op for Dockerfile manifests.
  • Add filename helper that only treats Dockerfile/Containerfile and dot-suffixed variants as Dockerfile manifests.
src/main/java/io/github/guacsec/trustifyda/tools/Ecosystem.java
src/main/java/io/github/guacsec/trustifyda/providers/DockerfileProvider.java
Implement robust FROM-line parsing for Dockerfile/Containerfile to support multi-stage builds, flags, digests, and error cases.
  • Parse the Dockerfile as lines, match FROM (case-insensitive), and track the image from the last FROM instruction.
  • Strip leading --flag tokens (including multiple flags) before extracting the image reference token.
  • Reject Dockerfiles that have no FROM instruction with a specific IOException.
  • Reject ARG-substituted FROM targets by detecting ${...} in the image token.
  • Reject FROM scratch since it has no base image to analyze.
src/main/java/io/github/guacsec/trustifyda/providers/DockerfileProvider.java
src/test/resources/tst_manifests/dockerfile/multi_stage/Dockerfile
src/test/resources/tst_manifests/dockerfile/single_stage/Dockerfile
src/test/resources/tst_manifests/dockerfile/arg_substitution/Dockerfile
src/test/resources/tst_manifests/dockerfile/from_scratch/Dockerfile
src/test/resources/tst_manifests/dockerfile/containerfile/Containerfile
src/test/resources/tst_manifests/dockerfile/lowercase_from/Dockerfile
src/test/resources/tst_manifests/dockerfile/multiple_flags/Dockerfile
src/test/resources/tst_manifests/dockerfile/with_digest/Dockerfile
src/test/resources/tst_manifests/dockerfile/with_platform/Dockerfile
src/test/resources/tst_manifests/dockerfile/no_from/Dockerfile
src/test/resources/tst_manifests/dockerfile/suffixed/Dockerfile.dev
Normalize Docker Hub image references in ImageRef.getPackageURL() so bare and library-prefixed forms generate consistent PURLs.
  • Introduce a Docker Hub library prefix constant and derive repository_url from image.getNameWithoutTag and image.getSimpleName.
  • Normalize bare Docker Hub names by mapping simpleName-only repositories to docker.io/simpleName.
  • Normalize docker.io/library/ repositories to docker.io/ while preserving case handling in qualifiers.
  • Ensure non-Docker Hub registries and Docker Hub user namespaces are left unchanged.
  • Lowercase repository_url when adding it as a qualifier when it differs from the simple image name.
src/main/java/io/github/guacsec/trustifyda/image/ImageRef.java
src/test/java/io/github/guacsec/trustifyda/image/ImageRefTest.java
Add test coverage for DockerfileProvider behavior and Ecosystem integration.
  • Add parameterized and unit tests verifying Ecosystem.getProvider returns DockerfileProvider for Dockerfile, Containerfile, and suffixed names.
  • Test FROM parsing for single-stage, multi-stage, platform flags, multiple flags, digests, lowercase FROM, ARG substitution, FROM scratch, and missing FROM cases.
  • Verify non-Dockerfile prefixes like Dockerfilesomething are rejected.
  • Assert DockerfileProvider.readLicenseFromManifest returns null and validateLockFile returns without throwing.
src/test/java/io/github/guacsec/trustifyda/providers/Dockerfile_Provider_Test.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The Dockerfile provider’s FROM parsing currently only checks for ${ to detect ARG substitution; consider using a more robust pattern (e.g. matching ${...} tokens in the image segment) to avoid false positives/negatives if ${ appears in other contexts on the line.
  • In ImageRef.getPackageURL, Docker Hub library/ normalization lowercases the entire repository string, which may unintentionally lose case information for user-controlled segments; consider normalizing only the library/ prefix while preserving the original case of the image name.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The Dockerfile provider’s FROM parsing currently only checks for `${` to detect ARG substitution; consider using a more robust pattern (e.g. matching `${...}` tokens in the image segment) to avoid false positives/negatives if `${` appears in other contexts on the line.
- In `ImageRef.getPackageURL`, Docker Hub `library/` normalization lowercases the entire repository string, which may unintentionally lose case information for user-controlled segments; consider normalizing only the `library/` prefix while preserving the original case of the image name.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 72.00000% with 14 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@2db003f). Learn more about missing BASE report.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #542   +/-   ##
=======================================
  Coverage        ?   68.78%           
  Complexity      ?     1018           
=======================================
  Files           ?       66           
  Lines           ?     4302           
  Branches        ?      758           
=======================================
  Hits            ?     2959           
  Misses          ?     1000           
  Partials        ?      343           
Flag Coverage Δ
integration-tests 68.78% <72.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@a-oren

a-oren commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

[sdlc-workflow/verify-pr] Re: @sourcery-ai[bot] review —

  1. "ARG substitution detection could be more robust": Classified as suggestion — proposes using regex for ${...} detection instead of contains("${"). This is a valid alternative approach but is not documented in CONVENTIONS.md and has no established codebase pattern requiring regex-based detection. The current approach is sufficient for Dockerfile ARG syntax. No sub-task created.

  2. "Docker Hub normalization lowercases entire repository string": Classified as suggestion — observes that repository URLs are lowercased during normalization. The concern about case loss is moot because the repository_url qualifier is always lowercased (line 164) and Docker Hub requires lowercase image names. No sub-task created.

@a-oren

a-oren commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Verification Report for TC-4938 (commit bf0257f)

Check Result Details
Review Feedback PASS 2 sourcery-ai suggestions classified; no code change requests
Root-Cause Investigation N/A No sub-tasks created — nothing to investigate
Scope Containment WARN All task-specified files present; 14 additional files are tests and fixtures
Diff Size PASS 441 lines across 16 files proportionate for new provider with comprehensive test coverage
Commit Traceability PASS Single commit references TC-4938 via "Implements: TC-4938"
Sensitive Patterns PASS No secrets, credentials, or sensitive data detected
CI Status PASS All 45 CI checks pass
Acceptance Criteria PASS 9 of 9 criteria met
Test Quality WARN Repetitive Test Detection: some ImageRefTest methods could be parameterized; Test Documentation: PASS; Eval Quality: N/A
Test Change Classification ADDITIVE All test changes purely additive — new file + new methods appended
Verification Commands N/A No verification commands specified

Overall: PASS

All acceptance criteria are satisfied. CI passes across all platforms. The two WARN items are informational:

  • Scope Containment: 14 out-of-scope files are all directly supporting the feature (ImageRef Docker Hub normalization, test code, test fixtures)
  • Test Quality: Minor parameterization opportunity in ImageRefTest Docker Hub normalization tests

This comment was AI-generated by sdlc-workflow/verify-pr v0.11.0.

@a-oren a-oren requested review from Strum355 and ruromero July 1, 2026 11:45
Aligns with the JavaScript client by allowing users to set
TRUSTIFY_DA_RECOMMEND=false to append ?recommend=false to analysis URLs,
disabling Trusted Content recommendations in responses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants