TAI Tools: Strategic Engagement with Tech Companies

AI Technology Companies (“AI Tech Co”) increasingly seek what archives hold—rich media collections to train their multimodal and large language models (“AI”). Many archives have already had their online collections scraped without permission and are now facing offers from companies eager to secure further access.
To support archives navigating these pressures, we are developing decision-making rubrics to help institutions assess collaboration opportunities with AI Tech Cos.

These tools are designed to guide archives in weighing whether such deals represent sustainable business opportunities that strengthen their futures—or risky bargains that could compromise their long-term best interests. This paper serves as part of that tool, and is intended to help as a comprehensive resource for guidance on some key considerations and critical questions to ask.

In this paper we cover:
• What is Generative AI?
• What type of companies want to obtain your data?
• 10 Key Things to Consider when looking at AI Tech Companies and why this is important
• Five more key areas to think about (introspective)
• Your Internal Technical Preparation
• Questions to ask the AI Tech Co.

 

Tools

Generative artificial intelligence is rapidly reshaping how audiovisual materials are created, reused, and interpreted. For archives, libraries, and cultural heritage organizations, these changes raise urgent questions about authenticity, rights, access, and the responsible use of archival materials. This set of tools, developed by the Working Groups of the Trust in Archives Initiative, is designed to help archives navigate this evolving landscape. The toolkit provides practical guidance on issues including authentication and provenance, licensing considerations, working with technology companies, and the development of shared taxonomies to describe AI-generated or AI-altered materials.

Because both AI technologies and archival practices continue to evolve, these tools are intended as a living resource. They will be updated and expanded over time, and feedback from the community is welcomed to help inform future revisions and the development of additional tools.

TAI Tools: Authentication

As AI-generated media becomes increasingly commonplace, it is more important than ever for archives to know and be assured of the provenance and authenticity of the records they take into their custody.* There are technological methods being developed to detect AI-generation or modification and to convey or disclose provenance information. However, these mechanisms may remain out of reach for many content producers and smaller archives– at least in the short-term– due to financial and personnel limitations. There is still a need for accessible approaches for examining provenance and verifying authenticity in the age of generative AI that do not require specialized tools. To this end, the Authenticity Working Group is developing a simple checklist of questions that archives can ask depositors as part of their due diligence in ascertaining the reliability of the records they accession. Archives can retain the answers to these questions for their records and develop their own systems to share this data with other institutions they work with.

Archival producers and media makers are similarly concerned about the authenticity of the media they obtain from archives for use in documentaries. The second section contains questions that producers can use when working with archives, or that archives can answer for themselves to attest to the authenticity of materials already in their collections.

*Provenance refers to the origins, custody, and ownership history of a record. A record’s provenance is key to its intelligibility and significance as documentary evidence. Authenticity, a closely related concept, refers to the record being genuine and free from tampering. A record’s authenticity enables it to be relied upon as evidence or proof of what it documents.

Licensing

As new technologies expand the ability to reuse, transform, and even synthetically create archival materials, clear licensing language has become essential—not only to protect collections, set boundaries for use, and ensure alignment with institutional values, but also to safeguard the authenticity of the materials themselves.

To meet this need, the Licensing Language Working Group is developing adaptable boilerplate language that archives can use when updating license agreements to address generative AI. These tools are intended as a practical framework, helping institutions safeguard authenticity, protect collections, and ensure their policies reflect both mission and values while responding thoughtfully to emerging technological capabilities.

Authentication

As AI-generated media becomes increasingly prevalent, archives face mounting pressure to ensure the provenance and authenticity of the records they steward. While emerging technologies can help detect AI-generation or modification and disclose provenance information, many of these mechanisms remain costly, complex, or limited in reliability, particularly for smaller archives with constrained resources.

The Authenticity Working Group is developing a rage of tools and recommendations to address these challenges. Our goal is to help archives of all sizes evaluate available options and implement the most effective methods of authentication and attestation for their unique contexts, ensuring that collections remain trusted sources of the historical record.

Taxonomies

With GenAI now pervasive across media creation, archives need a shared language to guide their policies and to frame the implications for collections. Questions arise, for example, about when upscaling stops being a matter of improving quality and instead constitutes the creation of a new work, or how best to describe materials that are AI-generated or AI-altered in consistent and transparent terms.

The Taxonomies Working Group is addressing these challenges by gathering glossaries from across the field, analyzing existing frameworks, and identifying where gaps remain. Our goal is to create a comprehensive, field-wide metaglossary that enables archivists to clearly understand, evaluate, and communicate the range of machine-learning processes and their impact on archival media.

Relationships

Tech companies increasingly seek what archives hold—rich media collections to train their multimodal and large language models. Many archives have already had their online collections scraped without permission and are now facing offers from companies eager to secure further access.

To support archives navigating these pressures, the Strategic Engagement Working Group is developing decision-making rubrics to help institutions assess collaboration opportunities with tech companies. These tools are designed to guide archives in weighing whether such deals represent sustainable business opportunities that strengthen their futures—or risky bargains that could compromise their long-term best interests.