Tools: User Stories

 

USER STORIES

Over the past year, conversations within the TAI Authenticity working group have revealed that the exponential growth of generative AI is affecting archives, archivists, and their moving image collections. Inspired by the Library of Congress’ user stories for C2PA implementation in Government and Libraries, Archives, and Museums, the TAI Authenticity working group asked AMIA members to submit user stories to help us better understand specific concerns regarding GenAI as they relate to different roles and types of archives within the community. The stories below come from Authenticity Working Group members and from participants at our 2025 AMIA conference. We continue to seek and welcome new submissions. These user stories offer insightful perspectives that help us create tools and resources that will further our mission of ensuring authenticity, transparency, and trust.

 

Yvonne – AV archivist at human rights organization

As a human rights archivist, I want to feel assured that the documentation I am collecting is genuine (i.e. it is what it says it is). In my context, I may be uncertain about authenticity because videos are often captured by anonymous or unknown frontline witnesses, and uploaded to social media and chat apps, where they are often re-shared by others. Manipulated and misleading media proliferates on these platforms, including by authoritarian actors with resources to create misinformation campaigns that exploit mistrust and confusion, which undermines my ability to discern what is true.

Besides wanting to describe content accurately in my archive, I need to show some evidence of authenticity / how I determined authenticity when I provide these materials to users. My users may include legal professionals, fact-finders, human rights advocates, journalists, and communities impacted by human rights violations. Some ways that these users may accept media as authentic might be if there is testimony or corroboration from a trustworthy source, if there is an assessment of authenticity from a trustworthy expert, if there are distinctive markers of authenticity in the object itself, or if the process or system that created the documentation is known to ensure authenticity.

 

Phil – Cataloger and Researcher at a stock footage archive

As a cataloguer and researcher for an archival film company, authenticating moving image collections is an important part of maintaining trust with patrons that seek to view and/or license our collections. It is my responsibility to review footage not yet publicly available for viewing, and provide the proper descriptive metadata elements. Crucially, this includes providing exact or estimated dates, places, and people of note. Trust is important for a small archival film company. If I know the work I’m doing within the company is based on authentic film or tape-based material, then that will provide confidence for any patron that may have questions about, or seeks to license, our material. C2PA, from a technical level, may not be of much use to the collections we already possess, but adhering to baseline standards of authentication would certainly be helpful

 

Greg – Librarian at a university

As a curator at a university film archive my mission is to support the work of students and faculty by enabling trusted access to the university’s film collections and providing subject matter expertise to guide their inquiry. Few students have experience with moving images beyond smart phones, social media, and YouTube. The GenAI revolution in synthetic imagery threatens to increase the gap between their lived experiences and the material past, a gap that undermines critical thinking and diminishes an appreciation of our democratic ideals.

Introducing students to the concept of media provenance and showcasing to them the unique opportunities they have at our university to interact with and learn from an extensive, global film record of the 20th century has become for me an imperative. Our film archive can support this work but we need new models—relying on streaming our collections from the web as the primary (perhaps only) mode of access will no longer suffice. The ubiquity of Gen AI content (and soon the facility with which convincing historic imagery can be conjured) urges us to create learning spaces in which quality access to archival media for citation, illustration, and reuse in research projects (even reuse in conjunction with Gen AI) regrounds students and faculty in a relationship between the digital objects they manipulate and the material films from which they are derived.

 

Jennifer – Restoration and Preservation Manager

Working with film and archival collections, I have many concerns about generative AI and making sure our content is authentic. I am interested in non-generative AI tools that can help us restore, preserve, and provide access to more moving image materials

 

Amber – AV Preservation Librarian

I have several concerns about generative AI within the scope of a preservation librarian. GenAI can bring into question how we authenticate collection materials as genuine. Language within donor agreements and licensing agreements does not yet factor in concerns around generative AI. How Gen AI tools may be used for generating metadata and captions/transcripts for accessibility is a concern. I have ethical questions about uses of materials in the AI context. I am concerned about the technological superiority of a small number of tech companies in connection with authentication tokens like C2PA.

 

Margaret – Film Archivist

As a film archivist working within Special Collections at a large university library, I am concerned with the possibility of inauthentic reuse of our footage without our knowledge or permission.

 

Danica – Archivist

As an archivist working with the stories of everyday people, I am concerned about the notion of “findability” seen as an uncritical positive. Who are materials more “findable” for? I am concerned about what discovery tools will exist in 10-20 years, and where that leaves the right to be forgotten. I am also concerned about what gen AI is doing to the environment.

 

Ilana – Archival Producer

I conduct research for documentary filmmakers, and I’m concerned about accidentally passing on inauthentic content to filmmakers I work for, especially content from stock footage houses.

 

Nicole – Processing Archivist

As a processing archivist working with personal materials, I am concerned about the privacy and ecological impacts of generative AI. In terms of privacy, donors could be concerned as their personal reputations would be at stake. Furthermore, as AI becomes more ubiquitous, I worry  that I may be using AI without knowing it.

 

 

 

 

 

 

 


This resource is part of a toolkit created by the Trust in Archives Initiative. © 2026. Licensed under the Creative Commons Attribution–NonCommercial–ShareAlike 4.0 (CC BY-NC-SA 4.0) license.

 

Version 1.0 – April 2026. 

Tools: Strategic Engagement

 

STRATEGIC ENGAGEMENT WITH TECH COMPANIES: AUDIO VISUAL ARCHIVES ENGAGING WITH AI TECH COMPANIES

 

AI Technology Companies (“AI Tech Co”) increasingly seek what archives hold—rich media collections to train their multimodal and large language models (“AI”). Many archives have already had their online collections scraped without permission and are now facing offers from companies eager to secure further access.

To support archives navigating these pressures, we are developing decision-making rubrics to help institutions assess collaboration opportunities with AI Tech Cos.

These tools are designed to guide archives in weighing whether such deals represent sustainable business opportunities that strengthen their futures—or risky bargains that could compromise their long-term best interests. This paper serves as part of that tool, and is intended to help as a comprehensive resource for guidance on some key considerations and critical questions to ask.

The document covers:
What is Generative AI?
What type of companies want to obtain your data?
10 Key Things to Consider when looking at AI Tech Companies and why this is important
Five more key areas to think about (introspective)
Your Internal Technical Preparation
Questions to ask the AI Tech Co.

 

What is Generative AI?

  • Generative AI (“GenAI”) refers to the use of AI to create new content, like text, images, music, audio and video.

  • GenAI models are trained on very large datasets from which they learn the patterns and structure and then generate new synthetic content that has similar characteristics.

  • More and more content globally is being created using GenAI tools, and audiovisual archives could be used to train data sets, should their rights and permissions allow it…

  • However, accuracy, authenticity and “truth” is a growing matter of concern

  • And GenAI content brings about new legislation and copyright/ownership questions

  • Audiovisual archives are challenged to implement policies to address these points; but the archive does not work in isolation from its wider organisation and must collaborate with stakeholders to design its policies

Who wants archive content?

Types of AI Tech Companies:

  • ‘Informed’ aggregators: Companies knowledgeable about the space with industry contacts who connect content libraries and negotiate deals.

  • ‘Digital Locker’ businesses: Organisations collecting content to become intermediaries between content owners and AI Tech Cos / LLMs, handling content delivery for a percentage fee.

  • ‘Jack of all trades’ aggregators: Companies seeking to acquire multiple rights types (FAST, VOD, AI) with limited transparency, typically offering 50/50 revenue sharing arrangements.

  • Bespoke language model builders: A fourth emerging model where businesses build custom language models based on specific content and license these models to various clients, creating subscription-based revenue streams.

 

 

10 Headline Considerations for AV Archives Considering Allowing Content Use in AI Training

  • Intellectual Property Rights: Ensure clear agreements on ownership and usage rights of AI-trained models

  • Content Licensing: Develop specific licensing terms for AI training purposes

  • Data Privacy: Implement measures to protect sensitive information in the content

  • Quality Control: Maintain oversight on how the content is used and represented in AI outputs

  • Ethical Use: Set guidelines to prevent misuse or creation of harmful content

  • Brand Protection: Ensure AI usage doesn’t negatively impact the archive’s reputation, or ‘brand’

  • Legal Compliance: Stay updated on evolving laws regarding AI and copyright, which may differ across jurisdictions/territories

  • Content Valuation: Assess the long-term value of your content in the context of AI training

  • Technology Partnership: Carry out thorough due diligence, how credible are they and are they in this for the long-term, do they have a good track record of working with others, do they have robust security and ethical standards

  • Revenue Sharing: Consider models for profit sharing from AI-generated content and work with AI Tech Co’s who will ensure you really benefit from a financial return.

 

Why Is This Important?

Before moving forward, consider philosophically why are you doing this? In what way does this promote the archives values?

Archives should carefully weigh these factors to protect their interests while potentially benefiting from the emerging AI landscape. The use of audiovisual and audio archives for AI algorithm training is a rapidly evolving field with significant promise for knowledge discovery, preservation, and access. However, it is equally fraught with legal, ethical, technical, and cultural complexities. Success depends on a commitment to robust governance, transparency, documentation, and human oversight, as well as ongoing dialogue among archivists, technologists, legal experts, and affected communities. Only through such a multidisciplinary and principled approach can the transformative potential of AI be harnessed while safeguarding the integrity, rights, and diversity embodied in audiovisual heritage.

 

 

 

 

 

 

 

 

 

 

 

 

Let’s take a dive into five key areas for further consideration before choosing to work with AI Tech Companies:

  1. Strategic, Reputation and Risks

  2. Rights and Permissions Pertaining to Your Data/Media

  3. Rewards

  4. Resource Allocation and Potential Cost Impacts

  5. Technology Considerations

 

1. Strategic, Reputation and Risks

  • Why are you doing this–what are the benefits of allowing access to your content?

  • Are there risks associated with allowing access to your content; does it devalue your content in any way ; are there consequences for your business model?

  • Have you vetted the AI Tech Co and compared them with others?

  • Consider what is valuable to your business versus what is valuable to the AI Tech Co?

  • To what extent can you define the deal terms on offer, or is the AI Tech Co fixed in their offer without space for negotiation?

  • Are you granting access to all or some of your archive? Can your content be segmented to meet an AI Tech Co’s specific needs and interests?

  • If there is a breach/technical problem with an AI company, directly or indirectly impacting your media, how might this impact your reputation and relationships with your underlying rights holders? If this problem puts the AI Tech Co out of business, how will you get your media back?

  • Some of the AI Solutions can generate harmful content of the persons in the media being shared with the AI Tech Co, for example Vishing Scams.  Because of the widespread abuse already to distort various types of media, audio, images and/or video considerations must be taken in this case. Have you considered this possibility?

  • With the level of risk for dealing with an AI Tech Co, careful consideration as to what AI Tech Co is chosen must be made. If you go with an unknown, small and new business there is no way to guarantee you will get your data back, or secured if that company is gone tomorrow. This can be another way of your data/media getting in the wrong hands. If the AI Tech Co goes out of business suddenly your contract/agreement with them is no longer viable. Consider only looking at larger reputable organizations?

 

2. Rights and Permissions Pertaining to Your Data/Media

  • How is your data/media being used by the AI Tech Co?

  • What is the right that you are granting to them – how is it defined – and can you grant that right?

  • Does your content have ethical, compliance or data privacy considerations?

  • Does your content have talent (actors, writers, presenters, contributors) attached and how is this handled in any deals?

  • Does your content have audio/music attached and how is this handled in any deals?

  • Who owns the resulting AI outputs if they are based on your content and data? If your data is being used to train the AI Engine, discuss with your internal groups that the trained media/or output might not be owned by your org.

  • At the end of the agreement, what happens to the content/data that you have supplied? What protocols are required for data retrieval or deletion should the partnership end?

  • What controls does the AI Tech Co have in place to ensure that they adhere to the grant of rights and that your content/data isn’t misused.

  • What is the established procedure for addressing compliance concerns with content after machine learning implementation? Are there any penalties associated with non-compliance?

  • Any engagement with an AI Tech Co that puts existing licensing, agreements or copyrights at risk from misuse can potentially put your organization at risk legally, however if your data is already publicly available on the open internet it will be very difficult to track who misused the data.

 

3. Rewards

  • What is the AI Tech Co’s compensation model – will you be paid by volume of content, type of content, etc.?

  • How long will you have to wait in your agreement to see a payment for sharing your content?

  • If your content is being used to train the AI, consider the potential for low profit margin or any profit at all since the output could potentially not be owned by your org.

  • Will you receive payment from the AI Tech Co purely for use of your data? Or will you have to wait until they make a profit? How transparent are they willing to be with this?

  • Does certain content (e.g. by genre) attract a premium?

  • If you have data to accompany your content, does this attract a premium? (data could be: descriptive, technical, rights, offline e.g. papers)

  • If you have “offline content” (rushes, production data, etc) that can only be accessed by going through you, does this attract a premium?

  • Are the upsides purely financial or can you get anything more from the technology relationships e.g. support (incl finance) to unlock the required content, data enrichment, catalogue enhancements?

  • If your content is being used to develop AI tools, would your organisation benefit from use of the improved tools and can your org have access to those tools – and include this in the deal terms?

  • Consider whether you can obtain enhanced data derived from content you provided to the deal for your own use. Establish clear terms for accessing, using, and redistributing these enhanced datasets, including specific exclusivity rights and usage limitations.

4. Resource Allocation and Potential Cost Impacts

  • At the earliest possible stage (before entering into a deal), you will need to assess if you have what an AI Tech Co is potentially looking for (volumes, subject matter, tech spec, etc.) – who will do this work?

  • Will you have any resources available to work alongside the AI Tech Co to ensure the correct data is being shared and operational processes are acceptable to the original agreement?

  • Have the resources that will work alongside the AI Tech Co provided an internal impact assessment on what they think the actual impacts would be for the work?

  • What effort is needed to make an actual deal work? Who are your internal (or external) stakeholders, are they all involved and do they have the time to input to the decision making process?

  • Will you incur any costs to enter into a deal e.g. legal, rights, technical or operational resources? This can include legal oversight, technical assistance in confirming your organization is prepared to interact with the AI Tech Co, and potential technical augmentation to prepare and secure your internal technical systems.

  • Who in your organization would own and approve a deal with an AI Tech Co?

  • Do you need to create or prepare assets and data in order to deliver to the AI Tech Co and, if there are costs to make your content ready, will the AI Tech Co contribute towards helping to unlock the content and make it ready? (see further Technology Considerations below)

  • If you are considering sharing your data, can you include considerations for watermarking your data for that particular sharing? This will keep your data traceable and ensure if data is leaked, you can trace it to a particular vendor.  Consider this extra cost as added insurance for your data. (however, if some of your data is already publicly available it will be difficult to trace that data back to the contracted AI Tech Co.)

  • Are you incurring one-time costs, or will you have costs and resource allocation ongoing throughout the life of the deal?

 

5.     Technology Considerations

The detailed technical questions included are intended to support a serious look inward at your organization’s technical capabilities, readiness and intentions so that you make careful decisions when planning to work with an AI Tech Co.  Before you move a conversation forward with outside AI vendors, seriously consider why this is something you want to pursue?  This is not something to rush into, there is no guaranteed financial return. There is a much higher risk of financial loss, reputation damage, and technical issues if the correct decisions are not made. As many institutions would like to entrust all the technical work (connectivity, moving the data assets, etc) to the AI Tech Co and leave all of the heavy lifting to them, there is even greater risk in allowing the decisions to happen outside of your organization, by that AI Tech Co. Do you really want to be the first in line for AI capabilities, even if the full business benefits are not yet defined? In a rush to get something new and potentially valuable (or perceived as valuable) for your organization, not vetting a company enough or asking yourselves the right questions could take your organization into a very negative situation.

 

Internal Technical Questions

Things to ask yourself prior to dealing with AI Tech Co (or any Technical Company)

The detailed technical questions included are intended to support a serious look inward at your organization’s technical capabilities, readiness and intentions so that you make careful decisions when planning to work with an AI Tech Co.

  • How up to date is your technology? Are you aware if the technology that houses your data/media is connected to any other sensitive organisational data like client information or employee data? Understand your technical capabilities. Your company Roadmap should include internal technical preparation prior to engaging your media with an AI Tech Co.

  • Do you manage your own technology around your data/media?  If not, and the technology that houses your media is managed by an outside party, have you shared your intent to interact with an AI Tech Co with that outside Media Management party?

  • What questions have they (the outside company that manages your data/media) provided to you about this?  Do they have restrictions? Have they suggested specific technology design specifications that the AI Tech Co would have to adhere to?

  • Will your outside technology partner that manages your media today have any restrictions or requirements that will incur costs for potential changes? Will they require a new Managed Services Agreement?

  • If you’ve identified the data/media you want to use with the AI Tech Co, is that data/media secure and separate from your other technology or at least isolated from data/media you do not intend to use?

  • Can you include your technical staff or someone that manages your data/media that can speak to the AI Tech Co to explain your systems and your technical requirements?

  • If you are not aware of your technology limitations and whether your media is properly segmented, you would need to identify persons (internal or external consultants) to do this.  What is the timeline for this review? Based on this review, what is the timeline and cost for preparation with a potential AI Tech Co?

  • Have you researched AI Tech Co’s to interact with? Have any AI Tech Co’s approached you? Have you reached out to any other organisations that may have worked with those AI Tech Co’s? What was their experience? [This also plays into the AI Tech Co’s credibility]

  • What function is the AI Tech Co providing to your organisation? Do you really require AI to make these changes for you? If it’s not about training for the AI application and maybe you are looking at services for Up-resing or resolution refinement or media restoration? Do you already have an existing application that might better suit your organisation until you can be assured that your technical software and hardware is secure and ready to connect with the AI Tech Co for services?

  • Have you priced similar tools that you might be able to purchase  and bring into your organisation (and get training on) instead of risking your data being shared with an outside AI Tech Co? (For example, OCI Media Flow by Oracle).

  • Have you talked to Technical Security specialists to discuss best practices to help you prepare your data and systems for dealing with an outside AI Tech Co?

  • Have you researched the potential risks with allowing the AI Tech Co to train their AI models on your data? For example: allowing an AI Tech Co to use your media for its training runs a risk of that AI application creating derivative works that look very similar to the media you provided which could lead to larger problems with licensing and even potential devaluation of the original media. (Also can lead to harming your media- See Risks section above)

 

 

Questions to ask the AI Tech Co

Something to think about when shopping around for AI Tech Co’s is that everything is a sale for them. You are dealing with sales people first. They will tell you ‘yes’ to anything to make you comfortable about choosing them and allowing them access to your data.  So although they might not be technically capable of doing some of the technical changes you inquire about, they will tell you whatever is needed to make you comfortable about doing business with them.  Ask them pointed questions, if you are not technical, have your technical staff or consultant on hand and ask them direct questions. Do your homework about that company.  Anything that you want to enforce about your concerns and use of your data you should be able to get in writing in an agreement with that AI Technical Company. Something else to think about: Some of the largest data breaches and exposures of internal proprietary data were done because of incorrect configurations with an AI Tool/Application (See McDonald’s/Paradox; ChatGPT Conversation History Leak; T-Mobile API Misconfiguration; etc).

One additional note: If you are considering discussion with an AI Tech Co or any Technical Company for work on your archives for support for things like up-resing or anything other than use for training the AI Tech Co’s AI Engine, many of these questions will help you plan your goals as well.

 

Capability

  • Can they assure their application is not using copyrighted material? (where applicable) Or is it at least annotated? How do they do this?

  • Have they done PoC’s (proof of concept) with other organisations?  Are they willing to share any of the inputs and outputs from those PoC’s? Share results.

  • Are they able to do a demonstration of their product for you without using your data/media? [This also plays into credibility]

  • If another company manages your data/media for you, share the restrictions with the AI Tech Co – and do they have any issues with these requirements?

  • How exactly will the AI Tech Co’s product work?  Will they connect to your internal systems?

  • How exactly do they propose grabbing your approved data/media and importing it into their AI application? In other words, how will they physically connect their systems to yours to obtain your data?

  • Can they provide a visual design showing a proposal of their systems connecting to your system and explain what will be used?

  • How will they process your data?  Can they share their design for this or is the actual AI processing a black box?

  • Will the AI Tech Co allow you to review and approve the system design they propose on how they plan to connect to your systems (in order to grab the data/media) before implementing it?

  • Can the AI Tech Co ensure that they provide a Secure Enclave for the data that cannot be co-mingled or shared with the AI Tech Co’s other client’s data?

 

Performance/Accuracy

  • How do they (the AI Tech Co) measure and evaluate their application’s performance?  How do they monitor this?

  • For Technical Services like Up-Resing:

    How do they handle hallucinations?  What does their mitigation plan look like for corrections? What sort of time frame could that be for corrections (potentially)?

    • For example: super refined resolution can add objects that weren’t there before or create unnatural/incorrect movements.  Phantom voices, incorrect words or changing speaker identities.

  • For Technical Services other than AI Training: How do they separate their specific services like image repair or Up-resing from the AI Training?

    • Does the AI Tech Co also provide data repair separate from the AI Training, and can they explain how they keep the data being repaired separate from training data?

  • Can the AI Tech Co provide performance metrics or error rates?

    • For example: For audio restoration or refinement – do they have a word error rate?

  • How do they, (AI Tech Co) handle scalability?

  • Does the AI Tech Co allow you to review and approve the final output of your data/media if you do use their AI services?

 

Privacy/Risk/Security – and More About Credibility

  • Request a Security Assessment/risk report from the AI Tech Co.

    • Why this matters: if the AI Tech Co  is directly connecting to your organisation’s infrastructure/systems to pull/push data/media for this AI process, knowing if they have had an assessment, or if there is a risk report on their systems can help with decisions on moving forward.  This would be good to understand if they are going to integrate with your systems to intake your data/media. Also to help in understanding the AI Tech Company’s risk factor of doing business with them. Risk = Likelihood x Impact.

  • Has the AI Tech Co indemnified themselves from hallucinations?

    • This question is handy if you are using the AI Tech Co for professional services like “Up-resing”.

  • Does the AI solution/design from the AI Tech Co rely on Third party vendors or other outside dependencies? If so, how are these dependencies secured? Is it possible for them to do a Software Composition Analysis check?

    • A Software Composition Analysis (SCA) check is an automated process done by a trusted outside party (cyber security firm) and checks the system components for vulnerabilities and creates a report outlining this called a SBOM, or Software Bill of Materials.

  • What best practices are followed by the AI Tech Co to monitor and secure the AI chain mitigating risks with other 3rd party vendor components?

  • Is there a fence between your organisation’s assets (data/media) and other client’s assets being stored and processed by the AI Tech Co within their systems?

  • How does the AI Tech Co restrict the use of your data/media exclusively to purposes you’ve authorised?

  • How does the AI Tech Co ensure that your organisation retains full ownership of all data/media put into their AI system(s)?

  • What assurances can they/AI Tech Co provide that your data/media will not be used for training, marketing or analytics without explicit consent?

    • This question is applicable in various forms regardless of how you engage with the AI Tech Co.  Maybe you will allow them to train their AI Engine but you don’t want your data to be used in demos/advertising.

  • Is the development team of the AI Tech Co off-shore? How will they ensure that all data/media, this includes inputs and outputs and training data (where applicable) remains within the geographical boundaries specified by your organisation?

    • For example: If the AI Tech Co is within your country’s location and adheres to your data/media privacy laws.  Their AI development teams might not be within the same geographical boundaries.  If your organisation is located within the United Kingdom and the AI Tech Co is as well, however their AI development teams are located in Belarus, or India, or another country not bound by your location privacy laws. How could you enforce the security of your data? Ask the AI Tech Co not to allow offshore development or storage with your original data/media.

  • Is their AI Engine/process within a secured closed loop and not touching the open internet?

    • This is not about user access (user access is outlined in the bullet directly above).  This is about the AI Tech Co’s actual AI Engine/Application and its overall design. If their design is exposed to the open internet, your data/media will not and cannot be secured regardless of their assurance, and is very much at risk.

  • If the AI Tech Co is directly integrating their systems to your organisation’s systems, ask them if they would be willing to do an AI specific pentest (“Penetration Test”) on the systems once integration is done?

    • A Pen Test or Penetration Test is an authorized test/attack done by a security personnel or security testing person to check system vulnerabilities in the network, application, or systems.  A simulated cyber attack that helps identify areas of concern so that they can be corrected to guard against bad actors.

    • Pentesting could be an added expense but necessary to ensure systems and their configurations are as secure as possible. Also find out who will be responsible for the Pentesting costs since it is not uncommon for companies to use outside consulting parties for this.

  • Has the AI Tech Co had a Pentest done on their AI model before?

Potential Cost/Pricing Model

If you are shopping for AI Technical Services (restoration, Up-resing, etc) instead of being approached for AI Training, aside from the hidden costs listed above, below is a very high-level guide for potential cost models.

  • Based on your organisation’s requirements, what does their cost/pricing model look like? Do they have tiered pricing?

  • If you allow them to train their AI Engine with your data, do you get reduced pricing for this other service?

  • What is their cost based on? Is it per image/or track? By transaction? By minute or compute hour?

  • What is their timeline for delivery of your requested service? What sort of discounts do you get if the product isn’t on time or not to your specifications (and needs corrections)?

  • If you’ve checked the AI Tech Co’s references, if they used them for services other than AI training, did they feel comfortable with the costs of the service? Were there any surprise/hidden costs?

 

 

 

 

 


This resource is part of a toolkit created by the Trust in Archives Initiative. © 2026. Licensed under the Creative Commons Attribution–NonCommercial–ShareAlike 4.0 (CC BY-NC-SA 4.0) license.

 

Version 1.0 – April 2026. 

Content Licensing: Prohibited Use

 

CONTENT LICENSING LANGUAGE TEMPLATE:
PROHIBITING USE

The following content licensing language template is intended as an addendum for Archives that license archival material, especially for publication purposes. This license is the result of a collaborative effort by members of the Trust in Archives Initiative (TAI).

Consideration was given to the fact that many standard post-production techniques (including but not limited to color correction, image stabilization, dirt and scratch reduction, etc.) are now being performed by assistive Artificial Intelligence (AI) software. In many cases, Licensees may be unable to easily know whether software used is “AI powered” or not.

The focus of this template is to ensure the preservation, integrity, and authenticity of the archival content by preventing the use of AI software on it. For those Archives that may allow for limited use of AI on their materials, please see the Limited Use template here; TAI has provided both licenses as we recognize that Archives may license materials to projects with dramatically different publishing processes (such as a documentary or a museum catalogue).

Please note: The information contained in this site is provided for informational purposes only, and should not be construed as legal advice. Please have an attorney review before adding to your existing content licenses.

Addendum for Licenses Prohibiting AI Use

Licensor [name of institution] (referred to as “Licensor” below), maintains an archival collection (referred to as “Content” below), which Licensor makes available to licensees such as [name of client] (referred to as “Licensee” below) for use in its program (referred to as “Program” below). The Licensor wishes to make clear that, as a steward and conservator of archival material, maintaining the integrity and authenticity of the Content is the Licensor’s highest priority, and that the Licensee has a duty with respect to the handling of Content. As such, the use of generative tools (e.g., Artificial Intelligence “AI”, or “smart” software) including but not limited to generative software, Large Language Models (LLMs), Small Language Models (SLMs), or Vision Language Models (VLMs) – referred to categorically as “AI software” below – is prohibited on the Content being licensed by Licensee.

Furthermore, the Licensor does not permit the use of licensed Content, or any materials from its Content library, or associated metadata (including, without limitation, caption information, description, summary, or keywords) to be used for machine learning, data mining, or for training AI software, large or small language models, or other AI software tools. Furthermore, no permissions are granted to Licensee by Licensor, or to any third party or assignee, to train AI software on the Content as it appears in the Licensee’s Program.

If there is an intent to use AI software on licensed Content, the Licensee must receive prior approval from the Licensor. In the event that the Licensor believes that Licensee has breached this agreement, the Licensor shall provide written notice outlining the alleged breach and the reasonable steps that must be taken to remedy it (within x amount of time).

 

 

 

 

 

 

 


This resource is part of a toolkit created by the Trust in Archives Initiative. © 2026. Licensed under the Creative Commons Attribution–NonCommercial–ShareAlike 4.0 (CC BY-NC-SA 4.0) license.

Version 1.0 – April 2026. 

Content Licensing: Limited Use

 

CONTENT LICENSING LANGUAGE TEMPLATE:
LIMITED USE

The following content licensing language template is intended as an addendum for Archives that license archival material, especially for production purposes. This license is the result of a collaborative effort by members of the Trust in Archives Initiative (TAI).

The focus of this template is to ensure the preservation, integrity, and authenticity of the archival content. In cases where Artificial Intelligence (AI) manipulation is used, the template helps ensure that these changes are flagged, and a review is conducted to ensure compliance with standards established by the Archive.

Consideration was given to the fact that many standard post-production techniques (including but not limited to color correction, image stabilization, dirt and scratch reduction, etc.) are now being performed by assistive AI software. In many cases, Licensees may be unable to easily report whether the software used in post-production is “AI powered” or not.

For those Archives that may wish to outright prohibit any use of AI on their materials, please see the Prohibited Use template here; TAI has provided both licenses as we recognize that Archives may license materials to projects with dramatically different publishing processes (such as a documentary or a museum catalogue).

Please note: The information contained on this site is provided for informational purposes only, and should not be taken as legal advice. Please have an attorney review before adding to your existing content licenses.

 

Addendum for Licenses allowing for some AI Use

Licensor [name of institution] (referred to as “Licensor” below), maintains an archival collection (referred to as “Content” below), which Licensor makes available to licensees such as [name of client] (referred to as “Licensee” below) for use in its program (referred to as “Program” below). The Licensor wishes to make clear that, as a steward and conservator of archival material, maintaining the integrity and authenticity of the Content is the Licensor’s highest priority, and that the Licensee has a duty with respect to the handling of Content. This is especially important where the use of generative tools (e.g., Artificial Intelligence “AI”, or “smart” software) including but not limited to generative software, Large Language Models (LLMs), Small Language Models (SLMs), or Vision Language Models (VLMs) – referred to categorically as “AI software” below– is concerned.

The Licensor restricts the use of AI software on Content being licensed by Licensee. Any enhancement of Licensor’s visual Content beyond standard post-production (color correction, scratch and dust reduction, subtitling, etc.) by the Licensee must be disclosed and authorized; any enhancement of Licensor’s audio content beyond standard post-production (equalization, noise reduction, etc.) by the Licensee must be disclosed and authorized. Any AI software assisted enhancement that could affect the integrity of Licensor’s content (for example, adding detail or sharpening based on generative interpretation of visual artifacts; the addition of new, generative content to existing content; the addition of generative frames or handles to content; or the rendering of new speech or other audio based on existing material) is not permitted without written authorization by Licensor. Any enhancement to content in post-production by the Licensee that meets the above parameters must be disclosed, reviewed, and approved in writing by Licensor prior to publishing.

Furthermore, the Licensor does not permit the use of licensed Content, or any materials from its Content library, or associated metadata (including, without limitation, caption information, description, summary, or keywords) to be used for machine learning, data mining, or for training AI software, large or small language models, or other AI software tools.  Furthermore, no permissions are granted to Licensee by Licensor, or to any third party or assignee, to train AI software on the Content as it appears in the Licensee’s Program.

If AI software is used to alter or enhance Content through mutual agreement by Licensor and Licensee, the Licensee will disclose the use in the credits of the Program consistent with industry standards. (For examples, please refer to the Archival Producers’ Alliance crediting guidance).

 

SCHEDULE A

Licensor will review Licensee’s use of AI Tools on Content within [XX] Business days of request, Assets needed for approval process include:

  • Before and after representations (such as screenshots, .movs, .mp3s) for ease of comparison

  • Name of software used

  • Date of AI manipulation, if known

  • Prompts used, if applicable

 

 

 

 

 

 


This resource is part of a toolkit created by the Trust in Archives Initiative. © 2026. Licensed under the Creative Commons Attribution–NonCommercial–ShareAlike 4.0 (CC BY-NC-SA 4.0) license.

Version 1.0 – April 2026. 

Tools: Taxonomies

 

TAXONOMIES

Taxonomies Comparison Grid

The goal of the Trust in Archives Initiative (TAI) Taxonomies subgroup was to survey existing taxonomies and taxonomic guidance in order to describe AI generated or edited media (AI media). After surveying, the subgroup made a list of recommended taxonomic fields for cataloguing AI media. The IPTC Photo Metadata Working Group’s recommendations, first released for public comment on Aug 1, 2025, were a major jumping-off point. The goal was not to create a technical manual or a format-specific guide but a general guide that reflects emerging standards for describing AI media.

Surveyed taxonomies and taxonomic guidance

Members of the subcommittee sought examples of metadata for AI media by reaching out to their professional networks, including open requests for input from the AMIA email list, and multiple metadata communities on Slack. The standards that came to light span multiple formats and vary in technical specificity.

IPTC Photo Metadata Working Group
Program for Cooperative Cataloging (PCC) Standing Committee on Standards
AI4LAM Speech-to-Text Working Group (Transcript Provenance Metadata Elements)
Coalition for Content Provenance and Authenticity (C2PA)
Various open linked data ontologies

 

Synthesis of surveyed taxonomies

Members of the subcommittee considered the following questions when synthesizing the surveyed taxonomies:

  • What metadata is unique to AI media? What information should be captured for this media that is not necessary for other media?

    • This encompasses technical, provenance, and ethical concerns.

  • What information is necessary to determine the veracity of AI media?

    • Authentication remains an evolving issue. IPTC and C2PA overlap in important ways, but they are not interchangeable, and adoption varies across vendors, platforms, and workflows. As a result, no single current standard guarantees complete provenance in every case. Cataloguers should therefore treat provenance metadata as partial evidence that must be interpreted in context.

  • Who is doing this cataloguing and when is it happening?

    • The members of TAI come from diverse fields, including non-profit libraries and archives, media production, and stock footage libraries. The person cataloging the media might be its creator or a downstream repository. The media might have known or unknown provenance. A cataloguer receiving the media may not be able to complete some of the fields recommended by TAI or may need to request specific metadata from the source of the media.

  • Where does the metadata come from?

    • In some cases, the metadata recommended in this guide will not be available even when provenance is known. Much of the metadata described here must be generated at the point of creation and cannot be entirely reproduced retrospectively. Documentation practices vary widely: some model developers publish model cards, system cards, dataset documentation, or provenance manifests, while others provide only partial disclosures or none at all. Cataloguers should therefore expect uneven metadata availability and record both what is known and what could not be verified.

    • Institutions that need to catalog AI media will need to become advocates for metadata access and transparency. In some cases, there will be no leverage to request transparency from AI companies; in others, the brokering of archival media from institutions to AI companies that need large data sets may be a point of leverage. In this developing field, libraries, archives, media producers, and others can advocate for access to the metadata they need to document AI media.

    • The format of the metadata will vary depending on whether it was recorded at the point of creation or retroactively described. It can be useful in either case, but may lack uniformity.

    • Even when provenance is partly known, AI-related metadata may be incomplete, inconsistently recorded, or unavailable to downstream repositories. Some workflows preserve detailed technical records, while others leave only partial traces. Cataloguers may therefore need to distinguish between metadata captured at the point of creation and metadata reconstructed later from surrounding documentation.

Sources of AI Metadata

  • A model card is a document that accompanies an AI model and describes what the model is, how it was trained or evaluated, its intended uses, known limitations, and relevant ethical or bias considerations. When available, model cards can provide a useful starting point for archival description because they may identify the model, summarize training or evaluation context, and document important constraints on interpretation and reuse. For a widely used introduction, see Mitchell et al., *Model Cards for Model Reporting*: https://arxiv.org/abs/1810.03993

    Model cards are not consistently available, however, and their level of detail varies widely. Some open models are accompanied by substantial documentation, while many commercial systems provide only partial disclosures or no model card at all. For this reason, cataloguers should treat model cards as one possible source of evidence rather than as a complete or uniform record.

  • Some relevant metadata may be embedded in the media itself or preserved in sidecar data, such as IPTC/XMP fields, C2PA manifests, timestamps, or exported workflow files. Other important information, including prompt text, prompt authorship, training-data disclosures, or internal processing notes, often must be requested directly from the creator, production team, or platform.

  • When exact technical metadata is unavailable, cataloguers may need to combine creator-supplied free-text descriptions with controlled vocabulary and local notes. In practice, requests should focus on the records most useful for establishing provenance: model name and version, prompt text, prompt writer, reference media, workflow or processing steps, date generated, and any available model card or provenance manifest. These details may be requested through deposit forms, acquisition agreements, creator questionnaires, or direct follow-up with the depositor or vendor.

TAI AI Taxonomy

AI Model

Definition: The foundational model name and version used to generate this media.
Example: “Wan2.2-T2V-A14B-Diffusers Text-to-Video”; “CogVideoX-2B”
Source: Model cards, system documentation, creator-supplied workflow files, platform job history, API logs, provenance manifests, or direct confirmation from the creator or vendor.

– Hugging Face `Model Cards` docs: https://huggingface.co/docs/hub/en/model-cards
– Mitchell et al., *Model Cards for Model Reporting*: https://arxiv.org/abs/1810.03993

Training Data

Definition: The collection of data used to train an AI/ML model, shaping its capabilities and biases.
Example: A dataset of newsreels containing war footage.
Source: Model cards, technical documentation, research papers, vendor disclosures, or creator-supplied notes. In many cases, training-data information will be partial, generalized, or unavailable and should be recorded with appropriate caution.

AI Text Prompt Description

Definition: The text instructions or other human-authored inputs provided to an AI system in order to generate, transform, or modify the media.
Example: “Enhance film grain, preserve original texture, upscale to 4K.”
Source: Source: Creator-provided prompt logs, workflow or project files, API request history, platform job exports, or written documentation supplied by the depositor. If the prompt is not embedded in the asset or a provenance manifest, it usually must be requested directly.

AI Prompt Writer Name

Definition: Name of the person who wrote the prompt used for generating this media.
Example: “Kevina Tidwell”; “Unknown prompt writer”
Note: May not always be known or necessary to capture.
Source:

Reference Media

Definition: Media supplied to an AI system as an input, source, or base from which new or modified media was generated.
Example: “Getty Images 2218833057”
Source: The source media itself, asset management records, project files, edit decision lists, workflow inputs, or creator-supplied documentation identifying the base or input media.

AI Workflow

Definition: The documented sequence of AI-driven steps, parameters, and tools applied to archival material. It describes how content was processed, from input to output.
Example: An Upscaling Workflow using ComfyUI, that can be easily replicated using the final output with the workflows embedded into a JSON file.
•  See https://dripart.mintlify.app/tutorials/basic/upscale.
•  Preserving Intent in Nonfiction Media: A Responsible Approach to AI Enhancement | Topaz Labs

Source: Exported workflow files (for example, ComfyUI JSON), project files, scripts, API logs, processing notes, standard operating procedures, or creator/vendor documentation describing the steps and settings used.

Date Generated

Definition: The date on which the media was generated, materially modified, or output by an AI-assisted workflow.
Example: 2025-01-12
Source: Embedded file metadata, provenance manifests, platform job history, project files, or creator-supplied production records. If only an approximate date is known, that uncertainty should be recorded in a note.

Next steps

Further work should be done to guide cataloguers on what metadata should be recorded, where to get it, and how to ask for it from AI media creators and AI companies, whether for individual media objects or for collections acquired in bulk.

 

 

 

 


This resource is part of a toolkit created by the Trust in Archives Initiative. © 2026. Licensed under the Creative Commons Attribution–NonCommercial–ShareAlike 4.0 (CC BY-NC-SA 4.0) license.

 

Version 1.0 – April 2026. 

Tools: Authenticity – Due Diligence Questions

 

DUE DILIGENCE QUESTIONS TO ESTABLISH ARCHIVAL AUTHENTICITY

As AI-generated media becomes increasingly commonplace, it is more important than ever for archives to know and be assured of the provenance and authenticity of the records they take into their custody.* There are technological methods being developed to detect AI-generation or modification and to convey or disclose provenance information. However, these mechanisms may remain out of reach for many content producers and smaller archives– at least in the short-term– due to financial and personnel limitations. There is still a need for accessible approaches for examining provenance and verifying authenticity in the age of generative AI that do not require specialized tools. To this end, the Authenticity Working Group is developing a simple checklist of questions that archives can ask depositors as part of their due diligence in ascertaining the reliability of the records they accession. Archives can retain the answers to these questions for their records and develop their own systems to share this data with other institutions they work with.

Archival producers and media makers are similarly concerned about the authenticity of the media they obtain from archives for use in documentaries. The second section contains questions that producers can use when working with archives, or that archives can answer for themselves to attest to the authenticity of materials already in their collections.

*Provenance refers to the origins, custody, and ownership history of a record. A record’s provenance is key to its intelligibility and significance as documentary evidence. Authenticity, a closely related concept, refers to the record being genuine and free from tampering. A record’s authenticity enables it to be relied upon as evidence or proof of what it documents.

 

Questions for Archives to Ask Donors to Help Establish Authenticity of Donated Material(s)

  1. Are you the creator? If so, when were the materials created, and have the materials consistently been in your custody since they were created?

  2. If you are not the creator, can you document the provenance of the material (the method of transfer; owner name and life dates; location of ownership; and date of transfer)?

  3. What was the format (analog or digital) in which the material was created? If born digital, please be specific about hardware/software used.

  4. Do other copies, or versions, of this material exist elsewhere?

  5. Have you altered the original material in any way? If so, when, how and why?

  6. Can you provide any additional descriptive or technical information?

  7. To what degree of certainty can you attest to these assertions? (e.g. documentation attesting authenticity)

  8. Were any of these materials created using generative AI? If so, which ones? Are they labeled? Can you describe what was done, what app/device and AI tool/model was used, and what source materials were involved (if any), what prompts were used? How was the generation documented?

 

Creating attestations for archival materials already within your collections

  1. Whose custody has this been in? Has this been stored in-house or out-of house?

  2. Is the original a physical object (e.g. film, tape) in the possession of the archive?

  3. What year was the material acquired?

  4. How has the material been accessed/altered within the archive for preservation purposes (e.g. digitization, processing, etc) and with what hardware and software? Has this been documented?

  5. Is there an agreement with the donor/creator of materials showing proof of ownership and acquisition (physical and/or digital copies)?

  6. If the material is a derivative of the original, how was it created? Using what tools?

  7. If digital collections are stored using a cloud service, what service is used? Have there been any security breaches? Can the archive ascertain that the collection was not affected?

  8. What digital integrity and preservation safeguards have been applied to the collection, and for how long?

  9. Is there a plan to retain these and future attestations that is durable and extensible? How can the archive communicate this attestation to users of its collections?

 

 

 

 

 

 


This resource is part of a toolkit created by the Trust in Archives Initiative. © 2026. Licensed under the Creative Commons Attribution–NonCommercial–ShareAlike 4.0 (CC BY-NC-SA 4.0) license.

 

Version 1.0 – April 2026. 

AI Toolkit for Archives

Generative artificial intelligence is rapidly reshaping how audiovisual materials are created, reused, and interpreted. For archives, libraries, and cultural heritage organizations, these changes raise urgent questions about authenticity, rights, access, and the responsible use of archival materials.

This toolkit, developed by the Working Groups of the Trust in Archives Initiative, is designed to help archives navigate this evolving landscape. The toolkit provides practical guidance on issues including authentication and provenance, licensing considerations, working with technology companies, and the development of shared taxonomies to describe AI-generated or AI-altered materials.

Because both AI technologies and archival practices continue to evolve, these tools are intended as a living resource. They will be updated and expanded over time, and feedback from the community is welcomed to help inform future revisions and the development of additional tools.

 


This resource is part of a toolkit created by the Trust in Archives Initiative. © 2026. Licensed under the Creative Commons Attribution–NonCommercial–ShareAlike 4.0 (CC BY-NC-SA 4.0) license.

 

Version 1.0 – April 2026. 

Tools: Content Licensing

 

CONTENT LICENSING LANGUAGE

These templates are designed to ensure the preservation, integrity, and authenticity of the archival content. In cases where Artificial Intelligence (AI) manipulation is used, the template helps ensure that these changes are flagged, and a review is conducted to ensure compliance with standards established by the Archive.

Consideration was given to the fact that many standard post-production techniques (including but not limited to color correction, image stabilization, dirt and scratch reduction, etc.) are now being performed by assistive AI software. In many cases, Licensees may be unable to easily report whether the software used in post-production is “AI powered” or not.

To address a wide range of use cases, the Trust in Archives Initiative (TAI) Content Licensing Working Group has developed two license templates. These are designed to reflect the differing needs of production and publication workflows, ranging from film and documentary projects to print and digital publications.

It’s important to note that as AI technologies evolve rapidly, we are at an inflection point where the interests of those holding collections may not always align with those seeking to reuse or train on these materials. Introducing language that limits use may raise questions and, in some cases, lead to pushback. If you encounter this, please let us know so we can continue refining these tools to remain practical and responsive, and aligned with the needs of the community.

 

For those Archives that may allow for
limited use of AI on their materials

For those Archives that may wish to outright
prohibit any use of AI on their materials

TAI Tools: Authentication

 

DUE DILIGENCE QUESTIONS TO ESTABLISH ARCHIVAL AUTHENTICITY

As AI-generated media becomes increasingly commonplace, it is more important than ever for archives to know and be assured of the provenance and authenticity of the records they take into their custody.* There are technological methods being developed to detect AI-generation or modification and to convey or disclose provenance information. However, these mechanisms may remain out of reach for many content producers and smaller archives– at least in the short-term– due to financial and personnel limitations. There is still a need for accessible approaches for examining provenance and verifying authenticity in the age of generative AI that do not require specialized tools. To this end, the Authenticity Working Group is developing a simple checklist of questions that archives can ask depositors as part of their due diligence in ascertaining the reliability of the records they accession. Archives can retain the answers to these questions for their records and develop their own systems to share this data with other institutions they work with.

Archival producers and media makers are similarly concerned about the authenticity of the media they obtain from archives for use in documentaries. The second section contains questions that producers can use when working with archives, or that archives can answer for themselves to attest to the authenticity of materials already in their collections.

 

USER STORIES

Over the past year, conversations within the TIA Authenticity working group have revealed that the exponential growth of generative AI is affecting archives, archivists, and their moving image collections. Inspired by the Library of Congress’ user stories for C2PA implementation in Government and Libraries, Archives, and Museums, the TIA Authenticity working group asked AMIA members to submit user stories to help us better understand specific concerns regarding GenAI as they relate to different roles and types of archives within the community. The stories below come from Authenticity Working Group members and from participants at our 2025 AMIA conference. We continue to seek and welcome new submissions. These user stories offer insightful perspectives that help us create tools and resources that will further our mission of ensuring authenticity, transparency, and trust.

 

 

*Provenance refers to the origins, custody, and ownership history of a record. A record’s provenance is key to its intelligibility and significance as documentary evidence. Authenticity, a closely related concept, refers to the record being genuine and free from tampering. A record’s authenticity enables it to be relied upon as evidence or proof of what it documents.

TAI Tools: Strategic Engagement with Tech Companies

AI Technology Companies (“AI Tech Co”) increasingly seek what archives hold—rich media collections to train their multimodal and large language models (“AI”). Many archives have already had their online collections scraped without permission and are now facing offers from companies eager to secure further access.

To support archives navigating these pressures, we are developing decision-making rubrics to help institutions assess collaboration opportunities with AI Tech Cos.

These tools are designed to guide archives in weighing whether such deals represent sustainable business opportunities that strengthen their futures—or risky bargains that could compromise their long-term best interests. This paper serves as part of that tool, and is intended to help as a comprehensive resource for guidance on some key considerations and critical questions to ask.

The document covers:
• What is Generative AI?
• What type of companies want to obtain your data?
• 10 Key Things to Consider when looking at AI Tech Companies and why this is important
• Five more key areas to think about (introspective)
• Your Internal Technical Preparation
• Questions to ask the AI Tech Co.