{"id":76543,"date":"2025-05-15T17:32:39","date_gmt":"2025-05-15T17:32:39","guid":{"rendered":"https:\/\/creativecommons.org\/?page_id=76543"},"modified":"2025-05-28T14:00:51","modified_gmt":"2025-05-28T14:00:51","slug":"using-cc-licensed-works-for-ai-training-2","status":"publish","type":"page","link":"https:\/\/creativecommons.org\/using-cc-licensed-works-for-ai-training-2\/","title":{"rendered":"Using CC-Licensed Works for AI Training"},"content":{"rendered":"

The application of copyright law to AI training is complex.\u00a0<\/span><\/p>\n

Around the world, there are varying exceptions and limitations to copyright that permit AI training. Jurisdictions also often consider the specific uses of the copyrighted work within AI training and outputs.\u00a0<\/span><\/p>\n

*Reminder* CC licenses are copyright licenses.<\/span><\/i><\/p>\n

That means that applying CC licenses to AI training is just as complex.<\/span><\/p>\n

In an effort to untangle some of this complexity, we have prepared the following primer that tackles the issues from two distinct angles.\u00a0<\/span><\/p>\n

    \n
  1. Practical guidance for how to follow the CC licenses for training data, even in situations where copyright may not require it.\u00a0<\/span><\/li>\n
  2. A legalistic approach, analyzing <\/span>when<\/span><\/i> compliance with the CC licenses is legally required.\u00a0<\/span><\/li>\n<\/ol>\n

    This primer may also be a useful tool for creators applying CC licenses to their work to better understand how their works may be used in AI training.\u00a0<\/span><\/p>\n

    How to Follow the CC Licenses for Training Data*<\/span><\/h2>\n

    *Even in situations where copyright may not require it.\u00a0<\/span><\/p>\n

    Keep in mind: following this guidance is likely to lead to overcompliance with both copyright law and CC license terms. It assumes the most restrictive legal interpretation for those who wish to take a conservative approach and minimize risk.\u00a0<\/span><\/p>\n

    What follows is a quick look at CC-licensed content and AI training. Please review the full analysis<\/a> for more details.\u00a0\u00a0\u00a0<\/span><\/p>\n

    \u00a0Attribution (BY)<\/b><\/p>\n

    All CC licenses require attribution to the creator(s) of the licensed material. For AI model training, attribution could be a simple link to the source of the dataset used to train the model.\u00a0 Where retrieval-augmented generation (RAG) or other methods are available, providing attribution to the CC-licensed work tied to the particular model output with a link to the source is ideal.\u00a0<\/span><\/p>\n

    ShareAlike (SA)<\/b><\/p>\n

    CC BY-SA<\/span><\/a> and <\/span>CC BY-NC-SA<\/span><\/a> require that adaptations be shared under the same license. If AI models or outputs are based on ShareAlike content and they will be shared publicly, following the ShareAlike condition would require AI developers to use the same CC license as the original works.\u00a0<\/span><\/p>\n

    \u00a0 NonCommercial (NC)<\/b><\/p>\n

    CC BY-NC<\/span><\/a> and <\/span>CC BY-NC-SA<\/span><\/a> give permission for NonCommercial uses only. If AI training data includes the NonCommercial restriction, then following the NC restriction would require that all stages, from copying the data during training to sharing the trained model, must not be for commercial gain.<\/span><\/p>\n

    NoDerivatives (ND)<\/b><\/p>\n

    CC BY-ND<\/span><\/a> and <\/span>CC BY-NC-ND<\/span><\/a> prohibit creating derivative works. Following the NoDerivatives restriction would require that ND-licensed content not be used as training data.<\/span><\/p>\n

    An Analysis of When Compliance with the CC Licenses is Legally Required\u00a0<\/span><\/h2>\n

    I\u2019m an AI developer. Do I Have to Comply With the CC License?<\/span><\/i><\/p>\n

    It depends. CC licenses apply only when copyright permission is required. If exceptions or limitations apply, then the CC license terms don\u2019t apply.\u00a0<\/span><\/p>\n

    Step 1 – Does copyright apply?<\/span><\/h3>\n

    There are two main aspects to the analysis of whether copyright applies when using CC-licensed works as inputs to training.\u00a0\u00a0<\/span><\/b><\/p>\n