Policy Archives - Creative Commons https://creativecommons.org/category/policy/ Tue, 10 Mar 2026 15:36:06 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.7 AI’s Infrastructure Era: Reflections from the AI Impact Summit in Delhi https://creativecommons.org/2026/03/04/ais-infrastructure-era/?utm_source=rss&utm_medium=rss&utm_campaign=ais-infrastructure-era Wed, 04 Mar 2026 17:24:28 +0000 https://creativecommons.org/?p=77592 Last month, we published a preview of what we intended to bring to the AI Impact Summit in Delhi: a focus on data governance, shared infrastructure, and democratic approaches to AI that genuinely advance the public interest rather than replicate existing power imbalances. That piece outlined our core interventions and the principles that have guided our thinking as we grapple with how to ensure openness, agency, and equity in the age of AI. 

The post AI’s Infrastructure Era: Reflections from the AI Impact Summit in Delhi appeared first on Creative Commons.

]]>
Last month, we published a preview of what we intended to bring to the AI Impact Summit in Delhi: a focus on data governance, shared infrastructure, and democratic approaches to AI that genuinely advance the public interest rather than replicate existing power imbalances. That piece outlined our core interventions and the principles that have guided our thinking as we grapple with how to ensure openness, agency, and equity in the age of AI. 

Since then, the Summit—a major global gathering of policymakers, technologists, civil society leaders, and researchers—unfolded against the backdrop of widespread calls for cooperative frameworks and measurable outcomes. For an excellent summary of the highs and lows of the Summit, take a look at this article by CC Board Member Jeni Tennison.

From CC’s perspective, what became clear in Delhi is that AI governance is shifting. The conversation is moving beyond high-level principles and into harder, more structural questions about infrastructure, stewardship, and power.

A photo of a mural in Delhi, showing a cartoon figure in a striped shirt taking a photo of a succulent with a pink background.
Photo by Rebecca Ross/Creative Commons, 2026, CC BY 4.0.

Data as a Leverage Point

Concerns about data capture and extraction abounded at the Summit. But alongside those concerns, a persistent theme emerged: data scarcity.

Participants repeatedly pointed to the lack of high-quality, localized, representative datasets as a fundamental constraint on public interest AI. The call for “really good data” came from startups, researchers, governments, and civil society actors alike—many working to build contextually grounded systems. Without accessible datasets, cultural representation is limited, competition falters, open-source development slows, and meaningful innovation remains concentrated in the hands of those with the most resources.

The gaps are especially pronounced across Global South languages and cultural contexts. Researchers are working to supplement large models with local norms and knowledge to address bias and misrepresentation. This is particularly urgent in sectors such as health, agriculture, climate, and development, where high-quality open datasets could unlock substantial public benefit.

There is a real tension here. High-quality open data is required to power public interest AI. At the same time, without guardrails, open data can be exposed to extraction and misuse. Communities are often presented with a false choice: open their data and risk exploitation, or close their data and risk exclusion from shaping AI systems that affect them. Addressing this tension is essential if governance frameworks are to support both individual agency and shared stewardship. In essence, we need to:

  • Fill existing gaps in shared governance infrastructure through collaborative frameworks and development of globally accessible tools that balance the tension between agency and access;
  • Uphold an understanding of data governance as something that is deeply participatory and democratic, and an absolute necessity for any AI system that becomes part of the public infrastructure, whether privately held or not;
  • Rebalance the power inequities in the current landscape overall, with our focus being on the data layer.

We believe that the path forward is not enclosure. It is stewardship. Governance mechanisms, interoperability standards, and access frameworks will determine who participates in the AI ecosystem and who does not. If we want AI systems that reflect diverse knowledge and lived realities, we must build the infrastructure that makes responsible openness durable.

Openness as a Method for Collaboration 

At the Summit, openness was not framed as a philosophical preference. It was framed as a structural necessity and a baseline condition for equity, competition, collaboration, and democratic accountability.

But the mental models we use to think about open versus closed must evolve. Openness cannot stop at model weights. It must extend across code, data, infrastructure, tooling, standards, and usability. And, crucially, openness and guardrails are not opposites. Responsible governance is not in tension with open systems; it is what makes them sustainable.

In this sense, openness is no longer the ceiling of ambition. It is the floor.

The Implementation Gap

Despite widespread agreement on concentration risks, data bottlenecks, and the speed of AI development, there was palpable exhaustion with principles that lack implementation pathways. Participants pointed to attempts like the Hiroshima AI Process and statements from past Summits as being great in theory but missing in practice. What’s missing are durable intermediaries capable of stewarding shared resources and translating shared values into operational systems. 

This is where the conversation becomes especially consequential for Creative Commons.

For more than two decades, CC has built legal and social interoperability at global scale. We have designed data governance frameworks that allow sharing of knowledge to function across jurisdictions and sectors. We have stewarded a commons model that balances openness with structure, enabling participation and mutual benefit through principles like attribution.

While debates about the limits of copyright were not central to most discussions in Delhi, there was significant interest in expanding high-quality open data, strengthening digital public infrastructure, and supporting community-led AI development​​—all areas deeply aligned with our expertise.

AI governance must move from principles to infrastructure. Shared, open digital infrastructure that works across borders is what Creative Commons is known for building. We believe that building the next generation of infrastructure for sharing—which would support the data layer of public interest AI—is not a departure from our mission. It is a timely extension of it and builds on the groundwork we have been laying for the past few years.

An infrastructure like this could include identifying high-impact open dataset initiatives in sectors such as health, agriculture, climate, and education to be opened up and prepared for machine reuse. It would require developing safe and trusted data-sharing models, with nuanced approaches depending on what data are being shared. This isn’t just about legal tools absent the context in which they are used; it is about comprehensive data governance mechanisms that balance openness with accountability and ensure interoperability across jurisdictions. 

Collaborative Construction

As we’ve talked about before, a central challenge in AI governance is avoiding false choices. Overly restrictive guardrails risk enclosing the commons, limiting access to knowledge, and stifling innovation and scientific discovery. Yet the absence of guardrails undermines trust, enables exploitation, and erodes the foundations of openness itself. Creative Commons operates in this critical middle space.

Our interventions at the Summit focused on advancing governance frameworks that protect human agency, cultural context, and trust in information while preserving openness, access, and reuse. An AI ecosystem that serves the public interest must be standardized where possible and contextual where required, especially across diverse linguistic, cultural, and regional settings.

If the Summit made one thing evident, it is that there is readiness for partnership. Policymakers, funders, technologists, and civil society leaders are looking for institutions capable of translating shared values into durable systems.

If We Do Not Intervene

It is worth being explicit about the alternative trajectory.

If sharing of data is only driven by commercial markets and not the public interest, and if data infrastructure consolidates in the hands of a few actors, “sovereignty” risks becoming a commercial product rather than a public capacity. Cultural representation will become extractive rather than participatory. Open models may technically exist, but without access to high-quality datasets, they will struggle to compete. The language of openness could persist while the data infrastructure beneath it quietly closes. What is the value of open weights and open code when the very essence of our cultures and languages isn’t carefully and deliberately shared, through robust open datasets?

The infrastructure phase of AI governance has begun. Creative Commons intends to help build what comes next—in partnership with those who share a commitment to an AI ecosystem that is open, inclusive, and grounded in the public interest. 

A huge thank you to our partners, event organizers, and co-panelists who helped to shape a meaningful engagement for CC during the Summit. We are particularly grateful for the thoughtful welcome provided by CivicDataLab, who ensured balanced dialogue and representation between those attending from elsewhere and those actively engaged on the ground in India. If we chatted during the Summit, we look forward to ongoing discussions. If we didn’t have a chance to connect, our doors are always open—send us a note! 

The post AI’s Infrastructure Era: Reflections from the AI Impact Summit in Delhi appeared first on Creative Commons.

]]>
How to Keep the Internet Human https://creativecommons.org/2026/02/12/how-to-keep-the-internet-human/?utm_source=rss&utm_medium=rss&utm_campaign=how-to-keep-the-internet-human Thu, 12 Feb 2026 19:16:53 +0000 https://creativecommons.org/?p=77506 I like to say I am a “writer who lawyers”. I begin here because I want to name my biases up front. I am a lawyer, but I come to this work first and foremost as a writer thinking about the conditions that will allow us to continue to share knowledge publicly. And in spite of—or perhaps because of—the fact that I am a lawyer, I have a healthy skepticism about the power of legal terms and conditions. The law will play a role, but the challenge of keeping the internet human will ultimately be navigated by the stories we imagine and tell.  We need new stories.

The post How to Keep the Internet Human appeared first on Creative Commons.

]]>
It is time to update our mental models about open knowledge

I like to say I am a “writer who lawyers”. I begin here because I want to name my biases up front. I am a lawyer, but I come to this work first and foremost as a writer thinking about the conditions that will allow us to continue to share knowledge publicly. And in spite of—or perhaps because of—the fact that I am a lawyer, I have a healthy skepticism about the power of legal terms and conditions. The law will play a role, but the challenge of keeping the internet human will ultimately be navigated by the stories we imagine and tell. 

We need new stories. 

I spent the first 15 years of my legal career working in intellectual property. For most of that time, I was part of the open movement, fighting overly restrictive intellectual property laws to promote access to knowledge. But over time, I began to feel like the message of open licensing did not resonate with me in the same way, especially in my identity as a writer. Eventually I left the open movement to go into the field of privacy. 

Immersing myself in digital privacy led me to realize why the story of open felt incomplete. We had been undervaluing the role of boundaries around reuse. The tension between the instinct to share and the need for boundaries around reuse is the point. And right now, that tension is completely out of balance. Instead, what exists online is a free-for-all.

disequilibrium/a broken commons graphic. Pursuit of knowledge leads to the instict to share which leads to a free-for-all.

If you are familiar with the concept of a commons, you know it requires shared rules that govern reuse of resources. Those shared rules represent a mutual commitment by producers and reusers, and they ensure that the cycle leads to collective benefit and begins again. A free-for-all, on the other hand, has no shared rules. As a result, we are losing the instinct to share. 

What happened to the commons? 

It would be easy to blame AI for this situation, but it is not so straightforward. AI is simply speeding up and exacerbating longstanding challenges with open knowledge. As privacy scholar Daniel Solove has written, “AI is continuous with the data collection and use that has been going on throughout the digital age.” 

In preparation for this talk, I went back and reread the brilliant CC Summit keynote “Open As In Dangerous” by Chris Bourg from 2018 and the seminal Paradox of Open report by the Open Future Foundation. For many years, these and countless other voices have been warning us about the vulnerabilities that open knowledge creates. Whether it is the use of CC-licensed photos for facial surveillance technology or the creation of Grokipedia, it is clear that open content is particularly vulnerable to abuse. 

But of course, it is not just open content that is vulnerable. All content online today has essentially been treated as fair game. The free-for-all extends to everything online. 

This has led to a vast renegotiation of what it means to share publicly, still currently underway. We see this in the massive wave of litigation against AI services, the rise of paywalls and commercial licensing deals, the introduction of new technologies to increase control over content in ways that scale back the open web, and the extreme backlash against AI by creators and the general public.

All of this constitutes a threat to open access to knowledge. It is unlikely that the incentives to share can outweigh all of the growing countervailing forces at play: economic, moral, safety, more. We cannot respond by accepting these risks and harms as inherent and inevitable costs of public sharing knowledge.  

Changing our mental models

To meet the moment, we need to rethink our most fundamental assumptions about open knowledge. 

The old taxonomies no longer apply. 

For a very long time, we have used categories to help us determine the appropriate rules for sharing knowledge. Open content could be licensed one way, while open data had different parameters. This distinction no longer applies when everything online is used as data by machines. Even the difference between copyrighted material and public domain is not very useful, since even copyrighted works are largely used by machines for the public domain material within them (e.g., facts and ideas). 

Copyright is not the main event.

The original “enemy” of the open movement was copyright, and things were simpler back then. Even the most restrictive open license was more permissive than the default under copyright law, so any boundaries we set around the commons were still fighting the copyright war. Overly restrictive copyright laws still cause problems today, but they are no longer the biggest threat against the commons. In fact, it is copyright’s weakness in the context of machine reuse that is the real challenge. The inapplicability of copyright in protecting against unwanted machine reuse guts the CC licenses of the same ability, creating the free-for-all even on CC-licensed content. And importantly, because the aim was to avoid having CC licenses impose restrictions on activity that was otherwise allowed under copyright, this was by design

We have to stop confusing property with morality.

This is where I depart from my younger self and from many of my peers in the open movement. I think we have let important principles like the notion that facts and ideas should not be privately owned, or the fact that some permissionless reuse plays a critical role in free expression, convince us that the scope of copyright is an ethical line. The logic goes: if no one can own it, then no rules should apply. This leads to an impoverished sense of morality, where the only justification for constraint is property rights. As Robin Wall Kimmerer says, “In that property mindset, how we consume doesn’t really matter because it’s just stuff and the stuff all belongs to us. There is no moral constraint on consumption.” 

The ethics of sharing—which is what open is about—needs to be broader than what we can own. 

Boundaries benefit us all.

Boundaries on reuse are what create the reciprocity that fuels a commons. Without them, there is no assurance that sharing leads to collective benefit, and people lose their instinct to share. But boundaries can also have social value in their own right. Even when sharing in public, people rightfully expect some boundaries around how their works are used, regardless of what copyright law says. This is foundational in the field of privacy, but somehow we lose sight of it when we are sitting in the realm of content sharing. Daniel Solove writes: “People expect some degree of privacy in public, and such expectation is reasonable as well as important for freedom, democracy, and individual wellbeing.” Similarly, we establish boundaries around reuse of knowledge because those protections serve us all. 

Open should not be a purity test. 

The open movement has had incredible success creating global standards, and this has helped make it so successful. But the emphasis on standardization has led us to hyper-focus on definitions, and this focus is distracting us from the bigger picture. What matters is not open versus closed, or even abundance versus scarcity. We need to focus on values, not prescriptions. Open licensing has always been conditional, and it has always been a spectrum. This means we have to accept that there will be gray areas. What we lose in certainty, we will gain in relevance and moral clarity. As Rebecca Solnit says, “Categories are where thoughts go to die.” 

Where do we go from here? 

All of this leads back to where we began. We have to reconstruct the mutual commitment that keeps the commons cyclical.

Equilibrium/a healthy commons graphic. Pursuit of knowledge leads to the instinct to share, which leads to mutual commitment, which leads to collective benefit, which leads back to the pursuit of knowledge.

Rebuilding the mutual commitment that comes with sharing knowledge requires us to balance opposing values. On the one hand, we must protect important freedoms of the reusing public. On the other, we must establish boundaries around responsible reuse. The goal is to be as open as possible and as restrictive as necessary. And before we start panicking about slippery slopes, we should remember there is an important limiting principle we can leverage:  does the boundary shift power in ways that further concentrate it or redistribute it? We can also ask whether there are ways to mitigate a boundary’s effect on access. 

We already have a good sense of the dimensions of boundaries around responsible reuse. They all have roots in the existing CC license suite.

Attribution: While the AI landscape complicates methods and norms for attribution, the principle is more important than ever for informational integrity, authors rights, and transparency. 

Reciprocity: Molly Van Houweling calls this “extractability,” the idea that those extracting facts and ideas from others’ works have a moral responsibility to ensure that knowledge remains extractable by others. This is essentially about crafting a ShareAlike obligation for the age of AI. 

Financial sustainability: This has been a longtime challenge in the open movement, and it is more urgent than ever. It is not about preserving business models, it is about financially sustaining the production of knowledge and culture as public goods. 

Prohibitions on harmful use cases: This dimension may feel less familiar in open licensing, but the sentiment is one we hear regularly. There are simply some use cases or even actors that feel out of bounds for people sharing knowledge because of the harm they cause. 

How do we catalyze a mutual commitment around prosocial boundaries in the current free-for-all environment? Open Future Foundation’s Paul Keller has written: “For any response to succeed in preserving a diverse and sustainable information ecosystem, collective action is required—both bottom-up, through coordinated action by information producers, and top-down, through political will to enable redistribution via fiscal interventions.” There is no single solution, and we need to tackle it from all directions. 

For the bottom-up efforts, we can leverage the tools we have. Norms and social pressure have a role to play, though it is hard to put full faith in voluntary action right now. We can also explore methods for legal control, including both contract and copyright law. As Nilay Patel has said, “Copyright is the only functioning regulation on the internet,” which makes it impossible to avoid considering it as one lever to employ.1 Finally, there is the strategy of controlling access. This is the most uncomfortable tactic because of the collateral damage it risks, and it requires extreme care. But if AI companies will not pay attention voluntarily, technical controls around access look increasingly necessary. 

There are many in the open movement already experimenting with these efforts, including the Mozilla Data Collective, the differentiated access model proposed by Europeana and the Open Future Foundation, the NOODL license, and many more. Creative Commons is also actively thinking about how to build a framework that re-instills mutual commitment into the ecosystem. Many of you have been following along as we experiment with an AI preference signals framework we’ve been calling CC signals. While the path we will take is evolving, the goal is the same. We need to come together to define and sustain the boundaries that serve us all. 

I will end with the words of Ruha Benjamin: “We need to give the voice of the cynical, skeptical grouch that patrols the borders of our imagination a rest.” 

We can imagine a better way. 


1 While copyright law is ill-equipped to function as a method of control over machine reuse (and rightly so, considering the importance of not treating facts and ideas as private property), copyright law still has a role to play because of the uncertainty around its application on a global scale. Granting copyright permission in exchange for agreement to certain conditions could still be a valuable offer to some reusers. 

 

The post How to Keep the Internet Human appeared first on Creative Commons.

]]>
Where CC Stands on Pay-to-Crawl https://creativecommons.org/2025/12/12/where-cc-stands-on-pay-to-crawl/?utm_source=rss&utm_medium=rss&utm_campaign=where-cc-stands-on-pay-to-crawl Fri, 12 Dec 2025 15:47:38 +0000 https://creativecommons.org/?p=77373 As we’ve discussed before, the rise of large artificial intelligence (AI) models has fundamentally disrupted the social contract governing machine use of web content. Today, machines don’t just access the web to make it more searchable or to help unlock new insights; they feed algorithms that fundamentally change (and threaten) the web we know. What once functioned as a mostly reciprocal ecosystem now risks becoming extractive by default.

The post Where CC Stands on Pay-to-Crawl appeared first on Creative Commons.

]]>
As we’ve discussed before, the rise of large artificial intelligence (AI) models has fundamentally disrupted the social contract governing machine use of web content. Today, machines don’t just access the web to make it more searchable or to help unlock new insights; they feed algorithms that fundamentally change (and threaten) the web we know. What once functioned as a mostly reciprocal ecosystem now risks becoming extractive by default.

In response, new approaches are emerging to support creators, publishers, and stewards of content to reclaim agency over how their works are used.

Pay-to-crawl is one approach beginning to come into focus. Pay-to-crawl refers to emerging technical systems used by websites to automate compensation for when their digital content—such as text, images, and structured data—is accessed by machines. We’ve recently published our interpretation and observations of pay-to-crawl systems in this dedicated issue brief.

A bird's eye view photo of an orange sand mine with transport lorries, but the image is slightly distorted by digital artefacts.
Distorted Sand Mine” by Lone Thomasky & Bits&Bäume, licensed under CC BY 4.0.

CC’s Position on Pay-to-Crawl

Implemented responsibly, pay-to-crawl could represent a way for websites to sustain the creation and sharing of their content, and manage substitutive uses, keeping content publicly accessible where it might otherwise not be shared or would disappear behind even more restrictive paywalls.

However, we do have significant reservations.

Pay-to-crawl may represent an appropriate strategy for independent websites seeking to prevent AI crawlers from knocking them offline or to generate supplementary revenue. But elsewhere, pay-to-crawl systems could be cynically exploited by rightsholders to generate excessive profits, at the expense of human access and without necessarily benefiting the original creators.

Pay-to-crawl systems themselves could become new concentrations of power, with the ability to dictate how we experience the web. They could seek to watch and control how content is used in ways that resemble the worst of Digital Rights Management (DRM), turning the web from a medium of sharing and remixing into a tightly monitored content delivery channel.

We’re also concerned that indiscriminate use of pay-to-crawl systems could block off access to content for researchers, nonprofits, cultural heritage institutions, educators, and other actors working in the public interest. Legal rights to access content afforded by exceptions and limitations to copyright law, such as noncommercial research (in the EU) or fair use exemptions (in the US), as well as provisions for translation and accessibility tools, have been carefully negotiated and adjusted over time. These rights could be impeded by the introduction of blunt, poorly designed pay-to-crawl systems.

Proposed Principles for Responsible Pay-to-Crawl 

Pay-to-crawl systems are not neutral infrastructure. It’s vital that these systems are built and used in ways that serve the interests of creators and the commons, rather than simply create barriers to the sharing of knowledge and creativity, and benefit the few.

We’re proposing the following set of principles as a way to guide the development of pay-to-crawl systems in alignment with this vision:

  1. Pay-to-crawl should not become a default setting.
    Pay-to-crawl represents a strategy that may work for some websites, and not all websites share the same underlying concerns. Pay-to-crawl systems should not be deployed as an automatic or assumed setting on behalf of websites by others, such as domain hosts, content delivery networks, and other web service providers.
  2. Pay-to-crawl systems should enable choice and nuance, not blanket rules.
    Pay-to-crawl systems should enable websites to distinguish between—and set variable controls for—different types of content users (such as commercial AI companies, nonprofits, researchers, or even specific organizations), as well as types and purposes of machine use (such as model training, indexing for search, and inference/retrieval). Systems should not affect direct human browsing and use of content, including by restricting translation or accessibility services.
  3. Pay-to-crawl systems should allow for throttling, not just blocking.
    Pay-to-crawl systems should enable websites to manage hosting costs and other impacts of heavy machine traffic without walling off content entirely. For instance, systems could allow websites to throttle traffic driven by ‘agentic browsing’ or ‘inference’ undertaken by large AI models, while permitting other forms of machine access that involve far lower traffic, such as for research or archival.
  4. Pay-to-crawl systems should preserve public interest access and legal rights.
    Pay-to-crawl systems should not obstruct access to content for researchers, nonprofits, cultural heritage institutions, educators and other actors working in the public interest. Nor should these systems block lawful uses of content protected by copyright exceptions and limitations, and other legal rights afforded in the public interest. The act of deciding not to abide by a pay-per-crawl system should not, by itself, convert an otherwise lawful use into an illegal act.
  5. Pay-to-crawl systems should use open, interoperable, and standardized components.
    Pay-to-crawl systems should not become proprietary chokepoints or gatekeepers. We urge particular caution in the use of proprietary components for authentication and payment that might result in websites getting locked into a particular pay-to-crawl system.
  6. Pay-to-crawl systems should enable collective contributions to the commons.
    Pay-to-crawl systems that only enable financial transactions between singular websites and content users risk creating a highly transactional future, where the value of content is atomized. Pay-to-crawl systems should support collective forms of payment, such as to coalitions of creators and publishers, and wider conceptions of what it means to contribute to the digital commons.
  7. Pay-to-crawl systems should avoid surveillance and DRM-like architectures.
    Pay-to-crawl systems must not introduce excessive logging, fingerprinting, or behavioral tracking related to the use of content. Systems should minimize data collection to only what is needed to authenticate users and settle payments, rather than seek to follow content downstream or dictate how it can be used.

The Path Forward: Showing Up Where the Future Is Being Decided

We believe now is the moment to engage, to influence, and to infuse pay-to-crawl systems with values that prioritize reciprocity, openness, and the commons.

We welcome feedback and dialogue on the principles outlined here. Your input will help guide our engagement with pay-to-crawl systems and related initiatives moving forward, as well as inform the wider CC community’s understanding of them.

Thank you to Jack Hardinges for his contributions to this post.

The post Where CC Stands on Pay-to-Crawl appeared first on Creative Commons.

]]>
We Asked, You Answered: How Your Feedback Shapes CC Signals https://creativecommons.org/2025/08/27/we-asked-you-answered-how-your-feedback-shapes-cc-signals/?utm_source=rss&utm_medium=rss&utm_campaign=we-asked-you-answered-how-your-feedback-shapes-cc-signals Wed, 27 Aug 2025 13:49:54 +0000 https://creativecommons.org/?p=76968 Signals © 2021 by Hugo Parasol is licensed under CC BY-NC-SA 2.0 In June we kicked off a public feedback period on our proposal for CC signals. CC signals is a preference signals framework designed to sustain the commons and ensure the continued sharing of knowledge in the age of AI.  The goal is to…

The post We Asked, You Answered: How Your Feedback Shapes CC Signals appeared first on Creative Commons.

]]>
Signals © 2021 by Hugo Parasol is licensed under CC BY-NC-SA 4.0
Signals © 2021 by Hugo Parasol is licensed under CC BY-NC-SA 2.0

In June we kicked off a public feedback period on our proposal for CC signals. CC signals is a preference signals framework designed to sustain the commons and ensure the continued sharing of knowledge in the age of AI. 

The goal is to give holders of large datasets a way to set criteria for how their data may be used within AI training models. To give an example, a dataset holder may wish to require that any AI training that uses their data gives credit back to the original source (e.g. attribution), or that the resulting AI model is open. Like the CC licenses, CC signals builds on the idea of ‘some rights reserved’ and that creators and knowledge holders deserve meaningful choices in how their work is used. You can learn more on our website

Since our kickoff event, we have been listening closely to feedback. We heard from hundreds of creators, librarians, technologists, legal experts, and open advocates. We asked for feedback and you delivered! Your voices – supportive, skeptical, frustrated, or curious – are essential in shaping how CC signals develops. We’d like to summarize what we heard and how this feedback is being incorporated and addressed.

What We Heard

Across the conversations, several themes emerged: 

Concerns that CC is prioritizing AI companies over creators. A recurring concern is that CC signals seem to give legitimacy to AI training without doing enough to protect creators. 

Confusion and disagreement about the CC licenses and AI training. We heard frustration that the CC licenses are not being interpreted or enforced in ways that some creators expected. 

Strong calls for opt-outs. Many wondered why the draft CC signals did not include an opt-out option. 

Asking politely for AI developers to give back in exchange for datasets is not enough. We heard doubts that CC signals would work in practice, given the widespread evidence of AI companies ignoring copyright, licenses, and even technical protocols like robots.txt. 

Broader critique of AI’s role in society. There is a spectrum of views on AI across the CC community. Many of you stand firmly at the anti-AI end. For these voices, no technical framework, like CC signals, feels adequate without stronger laws and regulations. 

We haven’t been clear on who this tool is meant to serve and the use cases it is meant to address. Naturally, the needs of an individual creator, like an artist, are quite different from those operating at an institutional or collective level. We heard loud and clear that CC signals, as currently conceived, does not meet the diverse needs of individual creators.

Requests for clarity. Many asked for more details about implementation and interoperability, including our long-term vision for CC signals as part of our broader mission. 

We understand how deeply personal these issues are for many of you, especially artists and creators who feel their work is being taken without consent and are looking for ways to fight back. That frustration is real, and we take it seriously. 

What We’re Doing Next

✔Improving clarity around CC’s position. We know many of you are worried that CC has “taken sides” or is being influenced by AI companies. We want to be clear: the driving motivation of CC signals is to defend and sustain the commons by developing practical tools for knowledge holders. Going forward, we will aim to clarify our guiding principles and positions in ways that translate to product decisions. 

✔Strengthening messaging and education. We are committed to expanding resources on how the CC licenses and CC signals could interact, examples of how signals could work in practice, and deeper dives into questions of copyright within the AI landscape. If you haven’t already, take a look at our legal primer on understanding the CC licenses and AI training. The better informed the CC community is about AI and the commons at large, the more effective we can be as a community to defend the commons. 

✔Clarifying the use cases for CC signals. This phase of CC signals is designed to serve large and open dataset holders, not the individual creator. Your feedback helped us recognize that this focus was not easy to square with our decision to leverage technical protocols used by anyone with a website. As a result, the target audience for CC signals was not clear. As we decide on next steps in product development, we plan to focus on specific use cases to put our goals and objectives into practice. 

✔Deepening global engagement and inviting stakeholders into product development. We plan to continue conversations with diverse audiences to inform the future of CC signals through an iterative process. The rest of this year will be focused on exploring and testing possible integrations of CC signals with pilot adopters. From this, we hope to extrapolate findings as we explore wider adoption of CC signals in the future. 

✔ Maintaining transparency in development. Our GitHub repository will stay open and up to date. We are creating a roadmap that will be shared publicly and will provide consistent updates (either on the blog or via a virtual town hall) on our progress. This feedback loop is not over; it will be built into how CC signals will evolve. 

Looking Ahead

The future of the commons depends on tools that reflect shared values of openness, fairness, and agency. We know many of you remain skeptical. 

CC signals is not final. It is an experiment in building a new layer of choice in an age where the rules are rapidly shifting. We will keep listening, adjusting, and collaborating until we arrive at something that genuinely serves the commons.

Thank you to everyone who took the time to write, question, challenge, and support us. Please stay engaged. Together, we can ensure that Creative Commons continues to stand where it always has: with the community, for the commons.

The post We Asked, You Answered: How Your Feedback Shapes CC Signals appeared first on Creative Commons.

]]>
Why CC Signals: An Update https://creativecommons.org/2025/07/02/why-cc-signals-an-update/?utm_source=rss&utm_medium=rss&utm_campaign=why-cc-signals-an-update Wed, 02 Jul 2025 14:43:26 +0000 https://creativecommons.org/?p=76821 CC Signals – An Update © 2025 by Creative Commons is licensed under CC BY 4.0 Thanks to everyone who attended our CC signals project kickoff last week. We’re receiving plenty of feedback, and we appreciate the insights. We are listening to all of it and hope that you continue to engage with us as…

The post Why CC Signals: An Update appeared first on Creative Commons.

]]>
CC Signals - An Update © 2025 by Creative Commons is licensed under CC BY 4.0
CC Signals – An Update © 2025 by Creative Commons is licensed under CC BY 4.0

Thanks to everyone who attended our CC signals project kickoff last week. We’re receiving plenty of feedback, and we appreciate the insights. We are listening to all of it and hope that you continue to engage with us as we seek to make this framework fit for purpose.

Some of the input focuses on the specifics of the CC signals proposal, offering constructive questions and suggesting ideas for improving CC signals in practice. The most salient type of feedback, however, is touching on something far deeper than the CC signals themselves – the fact that so much about AI seems to be happening to us all, rather than with or for us all, and that the expectations of creators and communities are at risk of being overshadowed by powerful interests.

This sentiment is not a surprise to us. We feel it, too. In fact, it is why we are doing this project. CC’s goal has always been to grow and sustain the thriving commons of knowledge and culture. We want people to be able to share with and learn from each other, without being or feeling exploited. CC signals is an extension of that mission in this evolving AI landscape.

We believe that the current practices of AI companies pose a threat to the future of the commons. Many creators and knowledge communities are feeling betrayed by how AI is being developed and deployed. The result is that people are understandably turning to enclosure. Eventually, we fear that people will no longer want to share publicly at all. 

CC signals are a first step to reduce this damage by giving more agency to those who create and hold content. Unlike the CC licenses, they are explicitly designed to signal expectations even where copyright law is silent or unclear, when it does not apply, and where it varies by jurisdiction. We have listened to creators who want to share their work but also have concerns about exploitation. CC signals provide a way for creators to express those nuances.  The CC signals build on top of developing standards for expressing AI usage preferences (e.g., via robots.txt). Creators who want to fully opt out of machine reuse do not need to use a CC signal. CC signals are for those who want to keep sharing, but with some terms attached.

The challenge we’re all facing in this age of AI is how to protect the integrity and vitality of the commons. The listening we’ve been doing so far, across creator communities and open knowledge networks, has led us here, to CC signals. Our shared commitment is to protect the commons so that it remains a space for human creativity, collaboration, and innovation, and to make clear our expectation that those who draw from it give something in return. 

Our goal is to advocate for reciprocity while upholding our values that knowledge and creativity should not be treated as commodities. 

Our goal is to find a path between a free-for-all and an internet of paywalls.

Copyright will not get us there. Nor should it. And we don’t think the boundaries of copyright tell us everything we need to know about navigating this moment. Just this week, Open Future released a report that calls for going beyond copyright in this debate, on the path to a healthy knowledge commons.

This is the beginning of the conversation, not the end. We are listening. From what we have heard, CC signals, or something like it, is the best practical mechanism to avoid the dual traps of total exploitation or total enclosure, both of which damage the commons. We have shared our current progress because we want to learn how to design it to meet your needs. We invite you to continue sharing feedback so we can shape CC signals together in a way that works for diverse communities.

In the months ahead, we’ll be providing more detail about how CC signals are developing, including key themes we are hearing, along with the questions we are exploring and our next steps.

The post Why CC Signals: An Update appeared first on Creative Commons.

]]>
Introducing CC Signals: A New Social Contract for the Age of AI https://creativecommons.org/2025/06/25/introducing-cc-signals-a-new-social-contract-for-the-age-of-ai/?utm_source=rss&utm_medium=rss&utm_campaign=introducing-cc-signals-a-new-social-contract-for-the-age-of-ai Wed, 25 Jun 2025 13:21:48 +0000 https://creativecommons.org/?p=76675 CC Signals © 2025 by Creative Commons is licensed under CC BY 4.0 Creative Commons (CC) today announces the public kickoff of the CC signals project, a new preference signals framework designed to increase reciprocity and sustain a creative commons in the age of AI. The development of CC signals represents a major step forward…

The post Introducing CC Signals: A New Social Contract for the Age of AI appeared first on Creative Commons.

]]>
CC Signals © 2025 by Creative Commons is licensed under CC BY 4.0
CC Signals © 2025 by Creative Commons is licensed under CC BY 4.0

Creative Commons (CC) today announces the public kickoff of the CC signals project, a new preference signals framework designed to increase reciprocity and sustain a creative commons in the age of AI. The development of CC signals represents a major step forward in building a more equitable, sustainable AI ecosystem rooted in shared benefits. This step is the culmination of years of consultation and analysis. As we enter this new phase of work, we are actively seeking input from the public. 

As artificial intelligence (AI) transforms how knowledge is created, shared, and reused, we are at a fork in the road that will define the future of access to knowledge and shared creativity. One path leads to data extraction and the erosion of openness; the other leads to a walled-off internet guarded by paywalls. CC signals offer another way, grounded in the nuanced values of the commons expressed by the collective.

Based on the same principles that gave rise to the CC licenses and tens of billions of works openly licensed online, CC signals will allow dataset holders to signal their preferences for how their content can be reused by machines based on a set of limited but meaningful options shaped in the public interest. They are both a technical and legal tool and a social proposition: a call for a new pact between those who share data and those who use it to train AI models.

“CC signals are designed to sustain the commons in the age of AI,” said Anna Tumadóttir, CEO, Creative Commons. “Just as the CC licenses helped build the open web, we believe CC signals will help shape an open AI ecosystem grounded in reciprocity.”

CC signals recognize that change requires systems-level coordination. They are tools that will be built for machine and human readability, and are flexible across legal, technical, and normative contexts. However, at their core CC signals are anchored in mobilizing the power of the collective. While CC signals may range in enforceability, legally binding in some cases and normative in others, their application will always carry ethical weight that says we give, we take, we give again, and we are all in this together. 

“If we are committed to a future where knowledge remains open, we need to collectively insist on a new kind of give-and-take,” said Sarah Hinchliff Pearson, General Counsel, Creative Commons. “A single preference, uniquely expressed, is inconsequential in the machine age. But together, we can demand a different way.”

Now Ready for Feedback 

More information about CC signals and early design decisions are available on the CC website. We are committed to developing CC signals transparently and alongside our partners and community. We are actively seeking public feedback and input over the next few months as we work toward an alpha launch in November 2025. 

Get Involved

Join the discussion & share your feedback

To give feedback on the current CC signals proposal, hop over to the CC signals GitHub repository. You can engage in a few ways: 

  1. Read about the technical implementation of CC signals
  2. Join the discussion to share feedback about the CC signals project
  3. Submit an issue for any suggested direct edits

Attend a CC signals town hall

We invite our community to join us for a brief explanation of the CC signals framework, and then we will open the floor to you to share feedback and ask questions. 

Tuesday, July 15
6–7 PM UTC
Register here.

Tuesday, July 29
1–2 PM UTC
Register here.

Friday, Aug 15
3–4 PM UTC
Register here. 

Support the movement

CC is a nonprofit. Help us build CC signals with a donation

The age of AI demands new tools, new norms, and new forms of cooperation. With CC signals, we’re building a future where shared knowledge continues to thrive. Join us.

The post Introducing CC Signals: A New Social Contract for the Age of AI appeared first on Creative Commons.

]]>
Understanding CC Licenses and AI Training: A Legal Primer https://creativecommons.org/2025/05/15/understanding-cc-licenses-and-ai-training-a-legal-primer/?utm_source=rss&utm_medium=rss&utm_campaign=understanding-cc-licenses-and-ai-training-a-legal-primer Thu, 15 May 2025 17:51:13 +0000 https://creativecommons.org/?p=76580 Whether you are a creator, researcher, or anyone licensing your work with a CC license, you might be wondering how it can be used to train AI. Many AI developers, who wish to comply with the CC license terms, are also seeking guidance.  The application of copyright law to AI training is complex. The CC…

The post Understanding CC Licenses and AI Training: A Legal Primer appeared first on Creative Commons.

]]>
Whether you are a creator, researcher, or anyone licensing your work with a CC license, you might be wondering how it can be used to train AI. Many AI developers, who wish to comply with the CC license terms, are also seeking guidance. 

The application of copyright law to AI training is complex. The CC licenses are copyright licenses, so it follows that applying CC licenses to AI training is just as complex. 

The short answer is: AI training is often permitted by copyright. This means that the CC license conditions have limited application to machine reuse. This also means that using a more restrictive CC license in an effort to prevent AI training is not an effective approach. In fact, restrictive licensing may actually end up preventing the kind of sharing you want (like allowing for translation, for example), while not being effective to block AI training. 

For the long answer, read our new guide that provides a legal analysis and overview of the considerations when using CC-licensed works for AI training. 

👉  For an at-a-glance overview, head over to the Using CC-Licensed Works for AI training webpage

👉  For a more in-depth analysis, check out our handy PDF download

👉 For those who love a visual, take a look at our supplementary flowchart

If the CC licenses have limited application to machine reuse, what agency do creators have in the AI ecosystem? 

This is an important question. As you’ve heard us talk about before, we’re actively developing a CC preference signals framework to help bridge this gap. The framework is designed to offer new choices for stewards of large collections of content to signal their preferences when sharing their works, using scaffolding inspired by the architecture of the CC licenses. This is not mediated through copyright or the CC licenses. It is governed by something that tends to be even more widely adopted: a social contract. Stand by for the release of the paper prototype of CC preference signals framework at the end of June 2025. 

While you are here, please consider making an annual recurring donation via our Open Infrastructure Circle. This work will require a large amount of resourcing, over many years, to make happen. 

The post Understanding CC Licenses and AI Training: A Legal Primer appeared first on Creative Commons.

]]>
CC @ SXSW: Protecting the Commons in the Age of AI https://creativecommons.org/2025/04/09/cc-sxsw-protecting-the-commons-in-the-age-of-ai/?utm_source=rss&utm_medium=rss&utm_campaign=cc-sxsw-protecting-the-commons-in-the-age-of-ai Wed, 09 Apr 2025 15:18:38 +0000 https://creativecommons.org/?p=76386 SXSW by Creative Commons is licensed under CC BY 4.0 If you’ve been following along on the blog this year, you’ll know that we’ve been thinking a lot about the future of open, particularly in this age of AI. With our 2025-2028 strategy to guide us, we’ve been louder about a renewed call for reciprocity…

The post CC @ SXSW: Protecting the Commons in the Age of AI appeared first on Creative Commons.

]]>
SXSW by Creative Commons is licensed under CC BY 4.0

If you’ve been following along on the blog this year, you’ll know that we’ve been thinking a lot about the future of open, particularly in this age of AI. With our 2025-2028 strategy to guide us, we’ve been louder about a renewed call for reciprocity to defend and protect the commons as well as the importance of openness in AI and open licensing to avoid an enclosure of the commons. 

Last month, we took some of these conversations on the road and hosted the Open House for an Open Future during SXSW in Austin, TX, as part of a weekend-long Wiki Haus event with our friends at the Wikimedia Foundation. 

During the event, we spoke with Audrey Tang and Cory Doctorow about the future of open, especially as we look towards CC’s 25th anniversary in 2026.  In this wide-ranging conversation, a number of themes were reflected that capture both where we’ve been over the last 25 years and where we should be focusing for the next 25 years, including: 

  • The Fight for Technological Self-Determination: Contractual restrictions are increasingly being used to lock down essential technologies, from printer ink to hospital ventilators. The push for openness and economic fairness must go beyond just content-sharing and extend to fighting for the rights of people to repair, modify, and use technology freely.
  • Shifting from Resistance to Building Alternatives: The open movement is not just about opposing corporate restrictions but also about creating viable, open alternatives. Initiatives like Gov Zero show that fostering decentralized, user-controlled platforms can help counteract monopolistic digital ecosystems.
  • The Power of Exit as a Lever for Change: Simply having the option to leave restrictive platforms can influence corporate behavior. Efforts like Free Our Feeds and Bluesky aim to create credible exit strategies that prevent users from being locked into exploitative digital environments.
  • Beyond Copyright: New Frameworks for Openness and Innovation: While Creative Commons began as a response to copyright limitations, the next phase should focus on broader issues like supporting an infrastructure for open sharing, ethical AI development, and open governance models that empower communities rather than just limiting corporate control.
  • Reclaiming the Ethos of Open Source and Free Software: The movement must reconnect with its ethical roots, focusing on freedom to create, share, and innovate—not just openness for the sake of efficiency. This includes resisting corporate capture of “openness” and ensuring technological advances serve public interest rather than private profit.

Since the proliferation of mainstream AI, we’ve been analyzing the limitations of copyright (and, by extension, the CC licenses since they are built atop copyright law) as the right lens to think about guardrails for AI training. This means we need new tools and approaches in this age of AI that complement open licensing, while also advancing the AI ecosystem toward the public interest. Preference signals are based on the idea that creators and dataset holders should be active participants in deciding how and/or if their content is used for AI training. Our friends at Bluesky, for example, have recently put forth a proposal on User Intents for Data Reuse, which is well worth a read to conceptualize how a preference signals approach could be considered on a social media platform. We’ve also been actively participating in the IETF’s AI Preferences Working Group, since submitting a position paper on the subject mid-2024 .

SXSW by Creative Commons is licensed under CC BY 4.0

As CC gets closer to launching a protocol based on prosocial preference signals—a simple pact between those stewarding the data and those reusing it for generative AI training—we had the opportunity during SXSW to chat with some great thought leaders about this very topic. Our panelists were Aubra Anthony, Senior Fellow, Technology and International Affairs Program at Carnegie Endowment for International Peace; Zachary J. McDowell, Phd, Assistant Professor, Department of Communication, University of Illinois at Chicago; Lane Becker, President, Wikimedia LLC at Wikimedia Foundation, and our very own Anna Tumadóttir, CEO, Creative Commons to explore sharing in the age of AI.  A few key takeaways from this conversation included: 

  • Balancing Norms and Legal Frameworks: There is a growing interest in developing normative approaches and civil structures that go beyond traditional legal frameworks to ensure equitable use and transparency.
  • Navigating AI Traffic and Commercial Use: Wikimedia is adapting to the influx of AI-driven bot traffic and exploring how to differentiate between commercial and non-commercial use. The idea of treating commercial traffic differently and finding ways to fundraise off bot traffic is becoming more prominent, raising important questions about sustainability in an open knowledge ecosystem. From CC’s perspective, we’ve found that as our open infrastructures mature they become increasingly taken for granted, a notion that is not conducive to a sustainable open ecosystem.
  • Openness in the Age of AI: There is growing reticence around openness, with creators becoming more cautious about sharing content due to the rise of generative AI (note, this is exactly what our preference signals framework is meant to address, so stay tuned!). We should emphasize the need for open initiatives to adapt to the broader social and economic context, balancing openness with creators’ concerns about protection and sustainability.
  • Making Participation Easy and Understandable: To encourage widespread participation in open knowledge systems and for preference signal adoption, tools will need to be simple and intuitive. Whether through collective benefit models or platform cooperativism, ease of use and clarity are essential to engaging the broader public in contributing to open initiatives.

Did you know that many social justice and public good organizations are unable to participate in influential and culture-making events like SXSW due to a lack of funding? CC is a nonprofit organization and all of our activities must be cost-recovery. We’d like to sincerely thank our event sponsor, the John S. and James L. Knight Foundation for making this event and these conversations possible. If you would like to contribute to our work, consider joining the Open Infrastructure Circle which will help to fund a framework that makes reciprocity actionable when shared knowledge is used to train generative AI.

The post CC @ SXSW: Protecting the Commons in the Age of AI appeared first on Creative Commons.

]]>
Reciprocity in the Age of AI https://creativecommons.org/2025/04/02/reciprocity-in-the-age-of-ai/?utm_source=rss&utm_medium=rss&utm_campaign=reciprocity-in-the-age-of-ai Wed, 02 Apr 2025 16:17:32 +0000 https://creativecommons.org/?p=76373 Reciprocal Roof (Shed) by Ziggy Liloia is licensed under CC BY-NC 2.0 A lot has changed in the past few years, and it is high time for Creative Commons (CC) to be louder about our values. Underpinning our recently released strategic plan is a renewed call for reciprocity. Neutrality serves only the status quo and…

The post Reciprocity in the Age of AI appeared first on Creative Commons.

]]>
Reciprocal Roof (Shed) by Ziggy Liloia is licensed under CC BY-NC 2.0

A lot has changed in the past few years, and it is high time for Creative Commons (CC) to be louder about our values. Underpinning our recently released strategic plan is a renewed call for reciprocity. Neutrality serves only the status quo and there is nothing neutral about fighting for a more equitable world through open practices and sharing knowledge.  

Since the inception of CC, there have been two sides to the licenses. There’s the legal side, which describes in explicit and legally sound terms, what rights are granted for a particular item. But, equally there’s the social side, which is communicated when someone applies the CC icons. The icon acts as identification, a badge, a symbol that we are in this together, and that’s why we are sharing. Whether it’s scientific research, educational materials, or poetry, when it’s marked with a CC license it’s also accompanied by a social agreement which is anchored in reciprocity. This is for all of us.

But, with the mainstream emergence of generative AI, that social agreement has come into question and come under threat, with knock-on consequences for the greater commons. Current approaches to building commercial foundation models lack reciprocity. No one shares photos of ptarmigans to get rich, no one contributes to articles about Huldufólk​ seeking fame. It is about sharing knowledge. But when that shared knowledge is opaquely ingested, credit is not given, and the crawlers ramp up server activity (and fees) to the degree where the human experience is degraded, folks are demotivated to continue contributing.

The open movement has always fought for shared knowledge to be accessible for everyone and anyone to use, to learn from. We don’t want to slow down scientific discovery. If we can more rapidly learn, discover, and innovate, with the use of new technologies, that’s wonderful. As long as we’re actually in this together.

What we ultimately want, and what we believe we need, is a commons that is strong, resilient, growing, useful (to machines and to humans)—all the good things, frankly. But as our open infrastructures mature they become increasingly taken for granted, and the feeling that “this is for all of us” is replaced with “everyone is entitled to this”. While this sounds the same, it really isn’t. Because with entitlement comes misuse, the social contract breaks, reciprocation evaporates, and ultimately the magic weakens. 

Reciprocity in the age of AI means fostering a mutually beneficial relationship between creators/data stewards and AI model builders. For AI model builders who disproportionately benefit from the commons,  reciprocity is a way of giving back to the commons that is community and context specific. 

(And in case it wasn’t already clear, this piece isn’t about policy or laws, but about centering people). 

This is where our values need to enter the equation: we cannot sit neutrally by and allow “this is for everyone” to mean that grossly disproportionate benefits of the commons accrue to the few. That our shared knowledge pools get siphoned off and kept from us. 

We believe reciprocity must be embedded in the AI ecosystem in order to uphold the social contract behind sharing.  If you benefit from the commons, and (critically) if you are in a position to give back to the commons, you should. Because the commons are for everyone, which means we all need to uphold the value of the commons by contributing in whatever way is appropriate. 

There never has been, nor should there be, a mandatory 1:1 exchange between each individual and the commons. What’s appropriate then, as a way to give back? So many possibilities come to mind, including:

  • Increasing agency as a means to achieve reciprocity by allowing data holders to signal their preferences for AI training 
  • Credit, in the form of attribution, when possible
  • Open infrastructure support
  • Cooperative dataset development
  • Putting model weights or other components into the commons

When we talk about defending the commons, it involves sustaining them, growing them, and making sure that the social contract remains intact for future generations of humans. And for that to happen, it’s time for some reciprocity.

Part of CC being louder about our values is also taking action in the form of a social protocol that is built on preference signals, a simple pact between those stewarding data and those reusing it for generative AI. Like CC licenses, they are aimed at well-meaning actors and designed to establish new social norms around sharing and access based on reciprocity. We’re actively working alongside values-aligned partners to pilot a framework that makes reciprocity actionable when shared knowledge is used to train generative AI. Consider joining the Open Infrastructure Circle to help us move this work forward.

The post Reciprocity in the Age of AI appeared first on Creative Commons.

]]>
Why Digital Public Goods, including AI, Should Depend on Open Data https://creativecommons.org/2025/01/27/why-digital-public-goods-including-ai-should-depend-on-open-data/?utm_source=rss&utm_medium=rss&utm_campaign=why-digital-public-goods-including-ai-should-depend-on-open-data Mon, 27 Jan 2025 17:34:43 +0000 https://creativecommons.org/?p=75806 Acknowledging that some data should not be shared (for moral, ethical and/or privacy reasons) and some cannot be shared (for legal or other reasons), Creative Commons (CC) thinks there is value in incentivizing the creation, sharing, and use of open data to advance knowledge production. As open communities continue to imagine, design, and build digital…

The post Why Digital Public Goods, including AI, Should Depend on Open Data appeared first on Creative Commons.

]]>
Acknowledging that some data should not be shared (for moral, ethical and/or privacy reasons) and some cannot be shared (for legal or other reasons), Creative Commons (CC) thinks there is value in incentivizing the creation, sharing, and use of open data to advance knowledge production. As open communities continue to imagine, design, and build digital public goods and public infrastructure services for education, science, and culture, these goods and services – whenever possible and appropriate – should produce, share, and/or build upon open data.

Open Data by Auregann is licensed under CC BY-SA 3.0.

Open Data and Digital Public Goods (DPGs)

CC is a member of the Digital Public Goods Alliance (DPGA) and CC’s legal tools have been recognized as digital public goods (DPGs). DPGs are “open-source software, open standards, open data, open AI systems, and open content collections that adhere to privacy and other applicable best practices, do no harm, and are of high relevance for attainment of the United Nations 2030 Sustainable Development Goals (SDGs).” If we want to solve the world’s greatest challenges, governments and other funders will need to invest in, develop, openly license, share, and use DPGs.

Open data is important to DPGs because data is a key driver of economic vitality with demonstrated potential to serve the public good. In the public sector, data informs policy making and public services delivery by helping to channel scarce resources to those most in need; providing the means to hold governments accountable and foster social innovation. In short, data has the potential to improve people’s lives. When data is closed or otherwise unavailable, the public does not accrue these benefits.

CC was recently part of a DPGA sub-committee working to preserve the integrity of open data as part of the DPG Standard. This important update to the DPG Standard was introduced to ensure only open datasets and content collections with open licenses are eligible for recognition as DPGs. This new requirement means open data sets and content collections must meet the following criteria to be recognised as a digital public good.

  1. Comprehensive Open Licensing:
    1. The entire data set/content collection must be under an acceptable open licence. Mixed-licensed collections will no longer be accepted.
  2. Accessible and Discoverable:
    1. All data sets and content collection DPGs must be openly licensed and easily accessible from a distinct, single location, such as a unique URL.
  3. Permitted Access Restrictions:
    1. Certain access restrictions – such as logins, registrations, API keys, and throttling – are permitted as long as they do not discriminate against users or restrict usage based on geography or any other factors.

The DPGA writes: “This new requirement is designed to increase trust and confidence in all DPGs by ensuring that users can fully engage with solutions without concerns over intellectual property infringement. Simplifying access and usage aligns with the DPGA’s goal of making DPGs truly open and accessible for widespread adoption… it helps foster an environment and ecosystem where innovation can thrive without legal uncertainties.”

AI and Open Data

As CC examines AI and its potential to be a public good that helps solve global challenges, we believe open data will play a similarly important role.

CC recognizes AI is a rapidly developing space, and we appreciate everyone’s diligent work to create definitions, recommendations, and guidance for and warnings about AI. After two years of community consultation, the Open Source Initiative released version 1.0 of the Open Source AI Definition (OSAID) on October 28, 2024. This definition is an important step in starting the conversation about what open means for AI systems. However, the OSAID’s data sharing requirements remain contentious, particularly around whether and how training data for AI models should be shared.

CC is of the opinion that just because it is difficult to build and release open datasets, that does not mean we should not encourage it. In cases where training data should not or cannot be shared, we encourage detailed summaries that explain the contents of the dataset and give instructions for reproducibility, but nonetheless that data should be defined as closed. When data can be made open and shared, it should be.

We agree with Liv Marte Nordhaug, CEO, Digital Public Goods Alliance who said in a recent post: “With regards to AI systems, there is a need to ensure that we don’t inadvertently undermine the open data movement and open data as a category of DPGs by advancing an approach to AI systems that is more permissive than for other categories of DPGs. Maintaining a high bar on training data could potentially result in fewer AI systems meeting the DPG Standard criteria. However, SDG relevance, platform independence, and do-no-harm by design are features that set DPGs apart from other open source solutions—and for those reasons, the inclusion of [AI] training data is needed.”

Next Steps

CC will continue to work with the DPGA, and other partners, as it develops a standard as to what qualifies an AI model to be a digital public good. In that arena we will advocate for open datasets, and consideration of a tiered approach, so that components of an AI model can be considered digital public goods, without the entire model needing to have every component openly shared. Updated recommendations and guidelines that recognize the value of fully open AI systems that use and share open datasets will be an important part of ensuring AI serves the public good.


¹Digital Public Goods Standard
²Data for Better Lives. World Bank (2021). CC BY 3.0 IGO

The post Why Digital Public Goods, including AI, Should Depend on Open Data appeared first on Creative Commons.

]]>