Image: Shutterstock
Open-source software (OSS) with and without AI/ML components (e.g., code, libraries, pre-trained models) form the backbone of the ever-growing complex software supply chain. For example, the widely popular Hugging Face model hub hosts more than 60K pre-trained models (PTMs) for public use to develop new AI software for various end-user and business applications. Enterprises around the globe use software for their business tasks and processes that integrate (AI/ML) components from multiple vendors. Statistics estimate that approximately 90 percent of commercial software products are either OSS components or proprietary packages that are built with third-party software components. In other words, complex and evolving dependencies intrinsically characterise the modern software product.
Software Supply Chain Vulnerabilities and the Need for SBOMs
Vulnerabilities in these software dependencies then define the degree of reliability and security of the end product (alternatively, the software supply chain), with the number of vulnerabilities growing exponentially with the size and complexity of such supply chains. Such vulnerabilities in individual software components are usually due to competitive business interests of moving first in the market and locking in customers – this often leads to non-rigorous testing and validation of software for vulnerabilities that remain undetected and unpatched. Malicious actors could milk these vulnerabilities (with considerable ease) in varying capacities, resulting in cyber-attacks with severe (catastrophic) systemic societal and economic consequences. Examples of such cyber attacks in the recent past include the SolarWinds case, the MOVEit case, the Log4J case, the Kaseya case, and the 3CX case. The recent Microsoft/Crowdstrike case (though reportedly not a cyber incident) is also a glaring example of a high systemic risk resulting from software dependencies. The scary thing is that the cost of software vulnerability-induced systemic cyber-attacks would exceed the aggregate capacity of the global insurance market (Source: Marsh), with a single attack event likely to result in a maximum cyber loss impact ranging from $2.8 billion to $1 trillion (Source: US Government and Accountability Office).
The Software Bill of Materials (SBOMs) family that today includes AIBOMs and DataBOMs emerged since the early 2010s are records facilitating the management of software dependencies with the primary objectives of vulnerability management, enhanced software license compliance, and increased transparency in a software supply chain. After all, transparency yields trust, and trust yields security, which might contribute to the business competitiveness of SBOM-adopting enterprises. SBOMs are popularly reported using SPDX, CycloneDX and SWID structure formats as standards. However, it was not until 2021, after the SolarWinds and Log4J cyber-attacks, that the US government formally pushed the adoption of SBOMs by necessitating all companies selling software to the US government to provide SBOMs. Currently, it is not just the US government, but multiple US banks, companies across the Fortune 500, and organisations across Europe, India, and the Asia-Pacific are embracing SBOM programs to achieve software supply chain transparency, trust, resilience, security, and mitigate risks of systemic cyber risk impact. The not-so-good news for the business world, largely driven by software supply chains, is that SBOM adoption rates are far below benchmark standards. Approximately less than 20 percent of business organisations do not receive SBOMs along with their third-party software components. This is a grossly below-par number if the vision is to improve the resilience of software supply chain networks that are increasingly becoming pervasive in societal applications.
In this article, we provide our viewpoints in relation to some important challenges towards scaling SBOM adoption in favour of improving software supply chain security (SSSC) and propose action items to alleviate these challenges.
Challenges and Action Items to Scaling SBOM Adoption
We lay down five important challenges to SBOM adoption at scale in the context of improving software supply chain security (SSSC). These include:
#1 – A Lack of Awareness, Fit, and Expertise
SBOMs are indeed valuable but can present skewed complexity-value tradeoffs to certain consumers (e.g., B2B consumers), many of whom may either be unaware of what to do with the plethora (i.e., complexity) of information in it or not get what they want in the SBOM they have access to. One of the main reasons for this is that the tools (e.g., spreading across in-house, commercial, and open-source categories) to generate SBOMs and formats (e.g., SPDX and CycloneDX schemas) for SBOM data representations are extremely inconsistent and non-interoperable across businesses; do not translate to clear actionable information (e.g., how does an AI model/code in the SBOM communicate SSSC risk); and most importantly lack a good tool support market for consumers to make best use of SBOM information due to lack of sufficient demand. In addition, some stakeholders (e.g., those associated with OSS projects) in the software supply chain are currently not aware of SBOMs. Moreover, many software vendors and system software developers face challenges in producing SBOMs because they either do not have the right tools to generate SBOMs for many practical application settings (e.g., SBOMs for embedded C/C++ code, binaries) and/or do not have sufficient knowledge/expertise about security best practices applicable to software supply chains, and consequently might make ‘irrelevant’ SBOMs.
Also read: Considering insurance to manage IoT-driven catastrophic cyber-risk
Action Items Mitigating Challenge – Awareness can be somewhat created by ‘enforcing’ compliance measures on SBOM stakeholders and improving education about SBOMs concerning SSSC. There should be SBOMs specific to binaries and programming languages with no package managers. A similar approach should be adopted for AI/ML frameworks (with appended training datasets) – BOMs formatted to standard schemas for better interoperability. On top of all this, SBOMs should be structured (e.g., in a top-down tree-like fashion) to be easy to browse/search and walk through, irrespective of the number of data fields.
#2 – Increase in Data Field Complexity
SBOMs should ideally fit the needs of the application at hand. There are some fields in every SBOM that are commonly agreed upon (e.g., version numbers, license number, software components) – however, there are other fields that should be there for the purpose of improved SSSC (e.g., a link to software vulnerability database such as the NVD, AI/ML model provenance, dataset version, dataset biases, hardware part numbers) and cannot be made uniform across each SBOM due to their case based applicability – at the same time adds to the inherent issue of incompatibility and lack of interoperability between SBOM standards if added. On the flip side, the cost of increased SBOM complexity is increased with the addition of non-generic fields.
Action Items Mitigating Challenge – SBOMs should have concise, easy-to-read specifications with fields marked as mandatory (e.g., commonly agreed upon fields) and recommended (e.g., link to vulnerability databases). Optional SBOM fields must be tagged as ‘optional’ if there are special cases where such fields are useful so that they can be of use to the general SBOM user.
#3 – SBOM ‘Noise’
SBOMs should ideally contain all the relevant fields (from SBOM generator tools) necessary for an enterprise’s security decision-making team to make effective informed decisions (both for itself and for downstream enterprises in a software supply chain) from SBOM field dependencies and relevant data fields. In contrast, the currently available SBOMs are ‘noisy’ – either incomplete and/or inaccurate. Alternatively, SBOMs in current practice are of mixed quality as per popular consensus among SBOM stakeholders, and all the direct and transitive software (and related websites) dependency information is often missing, i.e., incomplete. Often, a dependency is removed from repositories such as package managers if they are not maintained or are known to be malicious without keeping an entry for the malware (like CVEs for vulnerabilities). Without malware codes not created in practice, it is challenging for SBOM developers to get security threat insights from dead dependency references. In addition, a dependency that is vulnerable is not necessarily a contributor to insecure software, i.e., it will be inaccurate for SBOMs to say that such a dependency will contribute to an exploitable software end product probability 1. This ‘false’ perception of inaccuracy often disincentivises creators not to produce SBOMs as the correctness requirement becomes very strict. Another source of inaccuracy is for SBOMs not to be altered in transit/sharing between one enterprise to another.
Also read: Cyber-security management landscape of the Indian automation industry: Overview, challenges, action
Action Items Mitigating Challenge – It is better not to have an SBOM that is incomplete or inaccurate (due to the confusion it creates with users and in the downstream supply chain) than to have one. Suppose SBOMs are indeed generated using the right tools. In that case, they should be integrated with build automation that not only captures as many dependencies as possible (and that too in binaries and AI/ML models). Tools should be developed to boost the verification of SBOMs, or third-party certification processes should be put in place to verify whether dependency information in an SBOM is the same as that when created from source code and binaries. One could also think of a third-party or government-centralised database of provenance information for software/project repositories that are currently not hosted but enable SBOM developers to get information on them.
#4 – Mis-classifying Software Vulnerabilities
Correctly classifying vulnerabilities (positive or negative) from elements in the SBOMs that affect the vulnerability of the end software product is an extremely difficult challenge. It is often the case that Common Vulnerabilities and Exposures (CVEs) identified in an upstream component do not impact the vulnerability of the software end product simply because these upstream CVEs are not used during the compilation or run-time of the end product. This practice results in false positives. Consequently, consumers might perform vulnerability analysis and find hundreds and thousands of non-critical vulnerabilities (concerning run-time or compile-time) in software components just to pressurise the vendors to upgrade libraries on these components and divert their attention and time from fixing critical vulnerabilities. As another orthogonal case, there is the practice of developers cutting and pasting code from software libraries (including LLMs in modern AI systems) instead of importing the library originating the corresponding piece of code. This practice results in false negatives as it omits known vulnerabilities associated with the code that might spring from other elements of the library that are not imported.
Action Items Mitigating Challenge – Vulnerability-Exploitability eXchange formats allow software suppliers to check whether a vulnerability will affect the end product. However, such tools are not perfect, though they are recommended over a no-check procedure. Prevention of the ill-practice of copy-pasting should be encouraged by proper education of SBOMs are their impact on downstream software supply chain security.
#5 – Privacy Issues
Potential privacy issues with threats to intellectual property (IP) exist. However, suppose the SBOMs are not AIBOMs, DataBOMs, or BOMs related to CPS environments. In that case, the threat to IP is insignificant unless SBOMs are shared publicly, and also because many critical IPs are usually not captured in dependencies or data fields. In the case of the increasingly popular AIBOMs, DataBOMs, or BOMs in relation to CPSs, there is the possibility of privacy breaches as the datasets on which these BOMs are built contain personally identifiable information (PII) and proprietary information. Such breaches will help business competitors and bad actors who can catalyse the launch of litigation cases and exploitation of vulnerabilities against the concerned risk-averse business.
Action Items Mitigating Challenge – Privacy issues can be resolved using Differential Privacy (DP) and k-anonymity privacy-preserving tools though there are yet to be standardised and widespread in the enterprise industry.
[This article has been published with permission from IIM Calcutta. www.iimcal.ac.in Views expressed are personal.]