AboutCode

/ landscape / AboutCode

What it is #

AboutCode is an umbrella organisation that maintains an ecosystem of open-source tools for software composition analysis (SCA), licence compliance, and vulnerability tracking. The projects share a coherent design around open data and a set of common primitives for identifying and describing software packages.

The shared primitives are the connective tissue of the stack. PURL (Package URL) is a universal package identifier (pkg:npm/lodash@4.17.21, pkg:pypi/django@4.2.0) that the AboutCode tools use everywhere to refer to packages across ecosystems. SPDX and CycloneDX are the SBOM and licence-data standards the tools produce and consume. ABOUT files are small text files attached to vendored source for tracking licence and provenance metadata.

Tools in the ecosystem #

ScanCode Toolkit is the core scanner: a CLI and library that detects copyrights, licences, and package dependencies in a codebase by analysing source files. Output formats include JSON, YAML, CSV, SPDX, and CycloneDX. It is mature (run across tens of thousands of tests per commit) and used as the engine behind several other AboutCode projects.

ScanCode.io is a workflow engine wrapping ScanCode Toolkit. It runs scan pipelines as projects, stores their results in a database, and exposes them through a web UI and an API. It is what you reach for when you want ScanCode to run in production rather than as a one-off CLI invocation.

DejaCode is an enterprise licence-compliance system of record. It tracks which packages a project depends on, what licences apply, what vulnerabilities are known, and what usage policies the organisation has set. DejaCode does not scan or detect on its own; it integrates with ScanCode.io for scanning, VulnerableCode for vulnerability data, and PurlDB for package metadata, and it produces SBOM exports in CycloneDX and SPDX.

VulnerableCode is a vulnerability database aggregator. It pulls advisory data from a wide range of upstream and downstream sources (NVD, GitHub Security Advisories, GitLab, npm and PyPI advisories, distribution security trackers for Alpine, Debian, RHEL, SUSE, Ubuntu, Arch, plus project-specific feeds for Mozilla, Postgres, OpenSSL, curl, Apache projects, Rust, Ruby, Elixir, and others) and normalises them into a consistent data model. The architecture distinguishes importers (raw ingestion) from improvers (enrichment passes that add CVSS scores, EPSS predictions, exploit availability, and KEV catalogue references). A public instance runs at public.vulnerablecode.io, and the codebase supports self-hosted instances as well.

PurlDB is a package metadata database keyed by PURL. It indexes packages from many registries, recording version history, file contents, licence findings, and other attributes. It functions as a lookup layer for the rest of the stack.

Comparison to OpenVet #

The AboutCode stack covers software composition analysis, licence compliance, and vulnerability tracking. OpenVet covers signed audits of package code. The meaningful comparison sits in two slices where the concerns overlap: VulnerableCode’s vulnerability aggregation, and the licence-compliance ambition of DejaCode.

The structural differences, scoped to those overlapping slices:

Reactive vs proactive. VulnerableCode aggregates known vulnerabilities after they have been disclosed and catalogued in some advisory source. OpenVet aims to surface problems before they reach a consumer by having an auditor read the code and either sign claims about it or surface concerns as findings on the audit. The reactive layer is useful as a backstop, but VulnerableCode by design only reports what is already known.

Operating model. AboutCode tools are scanners and aggregators: they parse source code, run detection rules, ingest advisories, and store the results in databases. OpenVet audits are human-authored: an auditor reads the code and signs claims about specific properties. OpenVet itself does not run any checks, it only provides the tooling and the hosting for audits.

Output shape. VulnerableCode emits vulnerability records (CVE IDs, affected versions, fixed versions, severity scores, exploit metadata). ScanCode emits scan findings (copyrights, licence detections, package relationships). DejaCode aggregates these into compliance records and SBOM exports. OpenVet auditors produce structured claims (impl-crypto, uses-network) describing what the package itself does; aggregation across the dependency tree is performed by the OpenVet tooling against the consumer’s lockfile.

Trust model. VulnerableCode and DejaCode are operated as centralised databases (a public instance run by AboutCode, or a self-hosted instance); using them means trusting the operator’s curation and the instance’s integrity. OpenVet’s trust is publisher-rooted: a consumer chooses which logs to trust by URL, and every audit and log commit is cryptographically signed.

Ecosystem. VulnerableCode and ScanCode cover a very wide range of ecosystems: OS distributions, language registries, and many project-specific advisory feeds. OpenVet covers Cargo, npm, PyPI, Go modules, and RubyGems.

Using alongside OpenVet #

VulnerableCode and OpenVet overlap directly on vulnerability data. OpenVet aims to re-publish known vulnerabilities as audits in its own data model; VulnerableCode does the same kind of normalisation work upstream, by aggregating advisories from many sources into a single corpus. The natural OpenVet preference is to ingest from upstream sources directly (NVD, GHSA, distribution advisories, rustsec, and similar) rather than through VulnerableCode, because each intermediary in the chain from upstream to audit extends the trust set the consumer is implicitly accepting. VulnerableCode remains useful in its own right for consumers who want a web UI and API onto an aggregated vulnerability corpus.

DejaCode’s licence-compliance role is in territory OpenVet may also express in the future, but does not currently. A project that needs licence compliance today should reach for DejaCode (or another compliance tool); the shape OpenVet might eventually take for licences is not yet decided.

ScanCode and ScanCode.io are aimed at software composition analysis: detecting copyrights, licences, and package origins by parsing source. This sits in a different problem space from OpenVet audits. OpenVet extracts the dependency graph from package-manager metadata directly, and licence determination on an audited package is the auditor’s own work against the source. ScanCode is the right tool for SCA workflows, but it is not an input to OpenVet’s audit flow.

Edit this page on GitLab