Inventory Management Use Case: AI Models and Model Cards

Introduction

As organizations increasingly adopt AI, the need to inventory models and datasets has become critical. Keeping a detailed record of these assets helps ensure transparency, traceability, and accountability in their use, particularly in complex operational environments.

An effective inventory allows teams to track the origins, applications, and limitations of AI models and their datasets. This practice supports better decision-making, mitigates risks, and ensures that AI systems operate responsibly and align with organizational objectives and compliance requirements.

Highlighted fields

Property	Usage Description
ancestors	Contains information about the component from which the current component is derived.
externalReferences	A list of references providing additional context or resources for the component.
modelCard	A section detailing the parameters, analysis, and considerations of the AI model.
modelParameters	Contains specific details about the model's functionality, task, architecture, datasets, inputs, and outputs.
datasets	Lists datasets used in the model's training or operation, including their classification and references.
quantitativeAnalysis	Describes the model's performance metrics and associated confidence intervals.
technicalLimitations	Describes the known technical limitations of the model.
performanceTradeoffs	Identifies known tradeoffs in the model's performance or accuracy.
ethicalConsiderations	Highlights ethical risks associated with the model's use and potential mitigation strategies.
fairnessAssessments	Evaluates the impact on at-risk groups, detailing benefits, harms, and mitigation strategies.
tags	Keywords or labels for categorizing the model.
data	Represents a dataset component, including details about its classification and location.

This example showcases a "text-to-speech-model" derived from the "base-phoneme-model," with its lineage captured in the pedigree.ancestors section. The training dataset, "Speech Training Data," is represented as a separate component and directly referenced in the model’s datasets field, ensuring clear traceability and transparency between the model and its training data.

Examples

{
  "$schema": "http://cyclonedx.org/schema/bom-1.6.schema.json",
  "bomFormat": "CycloneDX",
  "specVersion": "1.6",
  "serialNumber": "urn:uuid:3e671687-395b-41f5-a30f-a58921a69b79",
  "version": 1,
  "components": [
    {
      "bom-ref": "component-t2s-model",
      "type": "machine-learning-model",
      "publisher": "Example Inc.",
      "group": "ExampleGroup",
      "name": "text-to-speech-model",
      "version": "2.0",
      "description": "An advanced text-to-speech model built on a fictional base model for generating realistic speech audio from text.",
      "pedigree": {
        "ancestors": [
          {
            "type": "machine-learning-model",
            "name": "base-phoneme-model",
            "version": "0.9.0",
            "description": "A phoneme prediction model used as the foundation for TTS development.",
            "externalReferences": [
              {
                "type": "model-card",
                "url": "https://example.com/base-phoneme-model.cyclonedx.json"
              },
              {
                "type": "formulation",
                "url": "https://example.com/base-phoneme-model.mbom.cyclonedx.json"
              }
            ]
          }
        ]
      },
      "modelCard": {
        "modelParameters": {
          "approach": {
            "type": "supervised"
          },
          "task": "text-to-speech",
          "architectureFamily": "transformer",
          "modelArchitecture": "audio-instruct-encoder",
          "datasets": [
            {
              "ref": "component-t2s-training-data"
            }
          ],
          "inputs": [{ "format": "string" }],
          "outputs": [{ "format": "audio/aac" }]
        },
        "quantitativeAnalysis": {
          "performanceMetrics": [
            {
              "type": "Word Error Rate",
              "value": "3.2%",
              "slice": "General English",
              "confidenceInterval": {
                "lowerBound": "3.0%",
                "upperBound": "3.5%"
              }
            }
          ]
        },
        "considerations": {
          "users": [
             "Developers building voice assistant applications.",
             "Accessibility tools creators for visually impaired users."
          ],
          "useCases": [
            "Converting text to speech for customer service bots.",
            "Generating audiobook narrations for public domain books."
          ],
          "technicalLimitations": [
            "Model performance degrades significantly with non-English languages.",
            "Struggles with highly ambiguous input phrases requiring context."
          ],
          "performanceTradeoffs": [
            "Optimized for speed over handling complex sentence structures accurately.",
            "May produce less natural prosody in low-resource environments."
          ],
          "ethicalConsiderations": [
            {
              "name": "Potential misuse for creating convincing fake audio to impersonate individuals.",
              "mitigationStrategy": "Limit access to trained models and implement watermarking in outputs."
            },
            {
              "name": "Requires dataset transparency to avoid training on unauthorized copyrighted materials.",
              "mitigationStrategy": "Mandate audits and disclosure of dataset origins before training."
            }
          ],
          "fairnessAssessments": [
            {
              "groupAtRisk": "Non-native English speakers",
              "benefits": "Improved accessibility to spoken content in English for non-native speakers.",
              "harms": "Lower output quality and reduced intelligibility for certain accents.",
              "mitigationStrategy": "Diversify training datasets to include a wide range of accents and dialects."
            },
            {
              "groupAtRisk": "Underrepresented demographic groups in training data",
              "benefits": "Potential for increased representation in applications that use the model.",
              "harms": "Reinforcement of systemic biases present in the original datasets.",
              "mitigationStrategy": "Conduct bias audits and incorporate adversarial training to counteract data biases."
            },
            {
              "groupAtRisk": "Individuals concerned about privacy",
              "benefits": "Encourages transparency in model use, fostering trust in TTS systems.",
              "harms": "Risk of misuse through unintended data memorization from the training dataset.",
              "mitigationStrategy": "Ensure datasets are scrubbed of sensitive information and implement privacy-preserving techniques during training."
            }
          ]
        }
      },
      "tags": [
        "audio:text-to-speech",
        "english",
        "chat"
      ]
    },
    {
      "bom-ref": "component-t2s-training-data",
      "type": "data",
      "publisher": "Example Inc.",
      "group": "ExampleGroup",
      "name": "Speech Training Data",
      "version": "SNAPSHOT",
      "data": [
        {
          "type": "dataset",
          "contents": {
            "url": "https://example.com/speech-training-dataset"
          },
          "classification": "public"
        }
      ]
    }
  ]
}

<?xml version="1.0" encoding="UTF-8"?>
<bom xmlns="http://cyclonedx.org/schema/bom/1.6"
     serialNumber="urn:uuid:3e671687-395b-41f5-a30f-a58921a69b79"
     version="1">
    <components>
        <component type="machine-learning-model" bom-ref="component-t2s-model">
            <publisher>Example Inc.</publisher>
            <group>ExampleGroup</group>
            <name>text-to-speech-model</name>
            <version>2.0</version>
            <description>An advanced text-to-speech model built on a fictional base model for generating realistic speech audio from text.</description>
            <pedigree>
                <ancestors>
                    <component type="machine-learning-model">
                        <name>base-phoneme-model</name>
                        <version>0.9.0</version>
                        <description>A phoneme prediction model used as the foundation for TTS development.</description>
                        <externalReferences>
                            <reference type="model-card">
                                <url>https://example.com/base-phoneme-model.cyclonedx.json</url>
                            </reference>
                            <reference type="formulation">
                                <url>https://example.com/base-phoneme-model.mbom.cyclonedx.json</url>
                            </reference>
                        </externalReferences>
                    </component>
                </ancestors>
            </pedigree>
            <modelCard>
                <modelParameters>
                    <approach>
                        <type>supervised</type>
                    </approach>
                    <task>text-to-speech</task>
                    <architectureFamily>transformer</architectureFamily>
                    <modelArchitecture>audio-instruct-encoder</modelArchitecture>
                    <datasets>
                        <ref>component-t2s-training-data</ref>
                    </datasets>
                    <inputs>
                        <input>
                            <format>string</format>
                        </input>
                    </inputs>
                    <outputs>
                        <output>
                            <format>audio/aac</format>
                        </output>
                    </outputs>
                </modelParameters>
                <quantitativeAnalysis>
                    <performanceMetrics>
                        <performanceMetric>
                            <type>Word Error Rate</type>
                            <value>3.2%</value>
                            <slice>General English</slice>
                            <confidenceInterval>
                                <lowerBound>3.0%</lowerBound>
                                <upperBound>3.5%</upperBound>
                            </confidenceInterval>
                        </performanceMetric>
                    </performanceMetrics>
                </quantitativeAnalysis>
                <considerations>
                    <users>
                        <user>Developers building voice assistant applications.</user>
                        <user>Accessibility tools creators for visually impaired users.</user>
                    </users>
                    <useCases>
                        <useCase>Converting text to speech for customer service bots.</useCase>
                        <useCase>Generating audiobook narrations for public domain books.</useCase>
                    </useCases>
                    <technicalLimitations>
                        <technicalLimitation>Model performance degrades significantly with non-English languages.</technicalLimitation>
                        <technicalLimitation>Struggles with highly ambiguous input phrases requiring context.</technicalLimitation>
                    </technicalLimitations>
                    <performanceTradeoffs>
                        <performanceTradeoff>Optimized for speed over handling complex sentence structures accurately.</performanceTradeoff>
                        <performanceTradeoff>May produce less natural prosody in low-resource environments.</performanceTradeoff>
                    </performanceTradeoffs>
                    <ethicalConsiderations>
                        <ethicalConsideration>
                            <name>Potential misuse for creating convincing fake audio to impersonate individuals.</name>
                            <mitigationStrategy>Limit access to trained models and implement watermarking in outputs.</mitigationStrategy>
                        </ethicalConsideration>
                        <ethicalConsideration>
                            <name>Requires dataset transparency to avoid training on unauthorized copyrighted materials.</name>
                            <mitigationStrategy>Mandate audits and disclosure of dataset origins before training.</mitigationStrategy>
                        </ethicalConsideration>
                    </ethicalConsiderations>
                    <fairnessAssessments>
                        <fairnessAssessment>
                            <groupAtRisk>Non-native English speakers</groupAtRisk>
                            <benefits>Improved accessibility to spoken content in English for non-native speakers.</benefits>
                            <harms>Lower output quality and reduced intelligibility for certain accents.</harms>
                            <mitigationStrategy>Diversify training datasets to include a wide range of accents and dialects.</mitigationStrategy>
                        </fairnessAssessment>
                        <fairnessAssessment>
                            <groupAtRisk>Underrepresented demographic groups in training data</groupAtRisk>
                            <benefits>Potential for increased representation in applications that use the model.</benefits>
                            <harms>Reinforcement of systemic biases present in the original datasets.</harms>
                            <mitigationStrategy>Conduct bias audits and incorporate adversarial training to counteract data biases.</mitigationStrategy>
                        </fairnessAssessment>
                        <fairnessAssessment>
                            <groupAtRisk>Individuals concerned about privacy</groupAtRisk>
                            <benefits>Encourages transparency in model use, fostering trust in TTS systems.</benefits>
                            <harms>Risk of misuse through unintended data memorization from the training dataset.</harms>
                            <mitigationStrategy>Ensure datasets are scrubbed of sensitive information and implement privacy-preserving techniques during training.</mitigationStrategy>
                        </fairnessAssessment>
                    </fairnessAssessments>
                </considerations>
            </modelCard>
            <tags>
                <tag>audio:text-to-speech</tag>
                <tag>english</tag>
                <tag>chat</tag>
            </tags>
        </component>
        <component type="data" bom-ref="component-t2s-training-data">
            <publisher>Example Inc.</publisher>
            <group>ExampleGroup</group>
            <name>Speech Training Data</name>
            <version>SNAPSHOT</version>
            <data>
                <type>dataset</type>
                <contents>
                    <url>https://example.com/speech-training-dataset</url>
                </contents>
                <classification>public</classification>
            </data>
        </component>
    </components>
</bom>