February 1, 2023
ChatGPT is a cutting-edge program created by OpenAI that has the ability to understand and communicate in human language, making it an incredibly powerful tool for users. It is designed to assist you in a wide range of tasks such as answering questions, having a conversation, generating text and much more.
The above paragraph was generated using ChatGPT. It took around longer to type in the prompt “Write a short introduction explaining ChatGPT in optimistic terms” than it did for the program to spit out a result. It would be just as easy to generate an article on any subject you can think of, create copy for a website, or hold a conversation.
From a copyright law point of view, this new technology raises a question regarding prose which is very similar in characteristics to human generated copyrighted creations – is it subject of copyright protection in the same way? If so, who is the owner of copyrights? And who is liable when the AI generated text infringes on existing copyrighted works?
ChatGPT is a powerful, disruptive new technology emerging from the shadows of silicon valley, a program that generate convincingly human writing from simple inputs. A product developed by OpenAI, a Delaware-based nonprofit operated from San Francisco, ChatGPT is a consumer-friendly web program that showcases the capabilities of OpenAI’s Generative Pretrained Transformer (GPT) technology.
ChatGPT is an application built on top of GPT-3.5, and was launched as a publicly available prototype in November of 2022. Since then, it has quickly attracted attention from the general public, as it’s the first exposure many consumers have had to a user-friendly modern language modeling software.
Language models are probability distributions applied to sequences of words. In simple terms, they generate text sequentially, based on the most likely word next in the sequence. They do not think or formulate ideas in the same way as humans, and this is important to remember when considering their position in intellectual property law.
It’s easy to think of a language model like ChatGPT as an individual, and it’s tempting to think that it might be defined legally as such, treated similarly to a natural person, and be able to create unique intellectual property. But the truth is much more complicated.
Language models are created using a technique called Machine Learning, a branch of computer science that allows computers to improve themselves and adapt. These models are “trained” by feeding them information, then providing feedback based on how well they create an output. The text that ChatGPT “writes” is not organically synthesized; it’s pieced together using over 175 billion parameters, or rules, created by digesting huge amounts of human text.
Around 60% of this text was gathered through Common Crawl, a web-archive service containing petabytes of data freely collected from the web, including copyrighted works. issues can arise when companies start treating the outputs of AIs as wholly original creations.
Intellectual property law is difficult to apply to the outputs of AI, because the actual creator of the content is not easily identified. Is it the AI itself? ChatGPT might seem to organically write text, but it cannot do so without external input, and it’s not legally an individual. Is it the prompter? Is it the company that created the AI? Is it The creators of the data used to train it? Clearly, the content comes from somewhere, but it’s hard to pin down a single source. Is such a creation mad by a non-human source even subject to copyrights at all? Fornow, the U.S. Copyright Office will not register works created by artificial intelligence because it lacks the human authorship necessary to support a copyright claim. This policy will essentially make the output a part of the public domain. With regards to the question of ownership in the outputs derived from the training data we would need to asses the origin of the data, ownership of the database and similarities between the training data and the outputs.
Similar questions may be asked regarding the potential creation of allegedly infring content by AI sources. Given the process of creating text and images, based on previous copyrighted creations, should content created by an AI system be regarded to infringe on these previous works? It’s important to note that the legality of Common Crawl data used for commercial purposes varies based on local intellectual property laws. In countries without fair use laws, such as Germany, commercial use of common crawl data becomes much more complicated (although this system is not without workarounds). Common Crawl data is collected and distributed in the US under fair use claims, and is perfectly legal for an organization like OpenAI to use in training its models. However, To what extent does the fair use doctrine, where it applies, grants a defence against copyright infringement and when should we find that the rationales on which it is based (enabling individuals to express themselves and enrich the world of human creation) are not upheld by its application? If we conclude that an infringement had occurred – who is the infringer? The prompter? The company operating the AI? On January 13th, a class action lawsuit (Sarah Andersen et al v. Stability AI Inc. et al) was filed against three companies that use machine learning technology to generate artwork. The plaintiffs are artists who allege that IP laws were violated by these companies, because their protected artwork was used to train the generative software. The complaint alleges that “Ultimately, it is merely a complex collage tool”. OpenAI hosts a similar program, Dall-E 2, and while not named in the lawsuit, could face similar litigation.
Additional legal issues that ChatGPT rises can relate to generating identical outputs or damaging outputs. The requirement of independent creation may prevent two or more individuals that received the same generated out put from ChatGPT from enforcing copyright infringement, or it could also depend on the earlier output generated in a way that the second user in such instance would infringe the first user rights and then the first user could have a claim against ChatGPT as well and hold it liable. In an event that ChatGPT generates an identical replica of a copyrighted work or in an event where it generates defamatory content it could create potential liability, which begs the issue of who is liable in this case ChatGPT or the users.
OpenAI is no stranger to IP law issues. Following the release of ChatGPT and the attention it received, OpenAI Inc. roughly doubled the size of their legal team. In early November 2022, before the release of ChatGPT, a class-action lawsuit was filed against Github, Microsoft, and OpenAI for the violation of intellectual property law in DOE 1 et al v. GitHub, Inc. et al. The complaint is that these companies widely ignored and violated software licenses while training language models that help programmers quickly write code.
The products in question, Github Copilot and OpenAI Codex, are built using the same technology that powers ChatGPT. According to the class complaint, “Copilot ignores, violates, and removes the Licenses offered by thousands—possibly millions—of software developers, thereby accomplishing software piracy on an unprecedented scale.” As the use of AI becomes more accessible and widespread, it’s important to keep in mind the potential implications.
Whether you’re interested in working with ChatGPT or have intellectual property that could be at risk, it’s worth learning more about the nature of these technologies and their place in the legal system.
ChatGPT is a public-facing application of language models, but the true capabilities of AI, and what they mean for IP law, extend far deeper. To try out ChatGPT and OpenAI’s other offerings, you can go here.