AI Models and Model Cards

Introduction

As organizations increasingly adopt AI, the need to inventory models and datasets has become critical. Keeping a detailed record of these assets helps ensure transparency, traceability, and accountability in their use, particularly in complex operational environments.

An effective inventory allows teams to track the origins, applications, and limitations of AI models and their datasets. This practice supports better decision-making, mitigates risks, and ensures that AI systems operate responsibly and align with organizational objectives and compliance requirements.

Highlighted fields

PropertyUsage Description
ancestorsContains information about the component from which the current component is derived.
externalReferencesA list of references providing additional context or resources for the component.
modelCardA section detailing the parameters, analysis, and considerations of the AI model.
modelParametersContains specific details about the model's functionality, task, architecture, datasets, inputs, and outputs.
datasetsLists datasets used in the model's training or operation, including their classification and references.
quantitativeAnalysisDescribes the model's performance metrics and associated confidence intervals.
technicalLimitationsDescribes the known technical limitations of the model.
performanceTradeoffsIdentifies known tradeoffs in the model's performance or accuracy.
ethicalConsiderationsHighlights ethical risks associated with the model's use and potential mitigation strategies.
fairnessAssessmentsEvaluates the impact on at-risk groups, detailing benefits, harms, and mitigation strategies.
tagsKeywords or labels for categorizing the model.
dataRepresents a dataset component, including details about its classification and location.
This example showcases a "text-to-speech-model" derived from the "base-phoneme-model," with its lineage captured in the pedigree.ancestors section. The training dataset, "Speech Training Data," is represented as a separate component and directly referenced in the model’s datasets field, ensuring clear traceability and transparency between the model and its training data.

Examples

{
  "$schema": "http://cyclonedx.org/schema/bom-1.6.schema.json",
  "bomFormat": "CycloneDX",
  "specVersion": "1.6",
  "serialNumber": "urn:uuid:3e671687-395b-41f5-a30f-a58921a69b79",
  "version": 1,
  "components": [
    {
      "bom-ref": "component-t2s-model",
      "type": "machine-learning-model",
      "publisher": "Example Inc.",
      "group": "ExampleGroup",
      "name": "text-to-speech-model",
      "version": "2.0",
      "description": "An advanced text-to-speech model built on a fictional base model for generating realistic speech audio from text.",
      "pedigree": {
        "ancestors": [
          {
            "type": "machine-learning-model",
            "name": "base-phoneme-model",
            "version": "0.9.0",
            "description": "A phoneme prediction model used as the foundation for TTS development.",
            "externalReferences": [
              {
                "type": "model-card",
                "url": "https://example.com/base-phoneme-model.cyclonedx.json"
              },
              {
                "type": "formulation",
                "url": "https://example.com/base-phoneme-model.mbom.cyclonedx.json"
              }
            ]
          }
        ]
      },
      "modelCard": {
        "modelParameters": {
          "approach": {
            "type": "supervised"
          },
          "task": "text-to-speech",
          "architectureFamily": "transformer",
          "modelArchitecture": "audio-instruct-encoder",
          "datasets": [
            {
              "ref": "component-t2s-training-data"
            }
          ],
          "inputs": [{ "format": "string" }],
          "outputs": [{ "format": "audio/aac" }]
        },
        "quantitativeAnalysis": {
          "performanceMetrics": [
            {
              "type": "Word Error Rate",
              "value": "3.2%",
              "slice": "General English",
              "confidenceInterval": {
                "lowerBound": "3.0%",
                "upperBound": "3.5%"
              }
            }
          ]
        },
        "considerations": {
          "users": [
             "Developers building voice assistant applications.",
             "Accessibility tools creators for visually impaired users."
          ],
          "useCases": [
            "Converting text to speech for customer service bots.",
            "Generating audiobook narrations for public domain books."
          ],
          "technicalLimitations": [
            "Model performance degrades significantly with non-English languages.",
            "Struggles with highly ambiguous input phrases requiring context."
          ],
          "performanceTradeoffs": [
            "Optimized for speed over handling complex sentence structures accurately.",
            "May produce less natural prosody in low-resource environments."
          ],
          "ethicalConsiderations": [
            {
              "name": "Potential misuse for creating convincing fake audio to impersonate individuals.",
              "mitigationStrategy": "Limit access to trained models and implement watermarking in outputs."
            },
            {
              "name": "Requires dataset transparency to avoid training on unauthorized copyrighted materials.",
              "mitigationStrategy": "Mandate audits and disclosure of dataset origins before training."
            }
          ],
          "fairnessAssessments": [
            {
              "groupAtRisk": "Non-native English speakers",
              "benefits": "Improved accessibility to spoken content in English for non-native speakers.",
              "harms": "Lower output quality and reduced intelligibility for certain accents.",
              "mitigationStrategy": "Diversify training datasets to include a wide range of accents and dialects."
            },
            {
              "groupAtRisk": "Underrepresented demographic groups in training data",
              "benefits": "Potential for increased representation in applications that use the model.",
              "harms": "Reinforcement of systemic biases present in the original datasets.",
              "mitigationStrategy": "Conduct bias audits and incorporate adversarial training to counteract data biases."
            },
            {
              "groupAtRisk": "Individuals concerned about privacy",
              "benefits": "Encourages transparency in model use, fostering trust in TTS systems.",
              "harms": "Risk of misuse through unintended data memorization from the training dataset.",
              "mitigationStrategy": "Ensure datasets are scrubbed of sensitive information and implement privacy-preserving techniques during training."
            }
          ]
        }
      },
      "tags": [
        "audio:text-to-speech",
        "english",
        "chat"
      ]
    },
    {
      "bom-ref": "component-t2s-training-data",
      "type": "data",
      "publisher": "Example Inc.",
      "group": "ExampleGroup",
      "name": "Speech Training Data",
      "version": "SNAPSHOT",
      "data": [
        {
          "type": "dataset",
          "contents": {
            "url": "https://example.com/speech-training-dataset"
          },
          "classification": "public"
        }
      ]
    }
  ]
}