This is a summary of Bernd Fiten's (Senior Associate at Timelex) presentation on the legal aspects of Artificial Intelligence (AI) at Incuebrux on 24 May 2023. Given the short time-frame of the presentation (10 minutes), this summary contains only a very brief introduction to some legal aspects of AI. The speaker chose to cover three aspects in this introduction: the proposal for a European AI Act and two legal topics of interest in the context of AI: copyright and data protection. Nevertheless, other aspects are also relevant, such as trade secrets and lawyer's professional secrecy. Those other aspects were not taken into account for the presentation.
We already had a text for the proposed 2021 AI Act, a European regulation. However, things have evolved since then, including the arrival of OpenAI’s ChatGPT at the end of 2022. That is why the European Parliament in early May 2023 proposed some amendments to the text of the AI Act. This new text takes into account specific AI systems, such as ChatGPT, which previously was not subject to this proposed regulation. Taking into account the specific risks posed by this type of AI, it led to considerable legal uncertainty. Now, new definitions are included (at least in the amendments) for “foundation models” and “general purpose AI” systems, which aim to include ChatGPT and similar AI tools.
The purpose of the AI Act is twofold, meaning that it aims to mitigate risks relating to AI, but also to foster innovation in this field. This is important, since there certainly are many positives to AI, but we have since discovered that there are also risks. The AI Act is drafted to address those risks, first and foremost by banning certain forms of AI, those with the greatest societal risks, where society believes that such risks are unacceptable. Examples include AI systems used for social scoring: evaluating or classifying individuals based on their social behavior, socio-economic status or their known or predicted personality traits.
If we look at what an AI system means based on the AI Act, we see that the proposal's definition of an AI system originally referred to specific technologies. The amendment proposes a more general definition which is much broader, technology-neutral and also refers to different levels of autonomy.
As already mentioned, there will be a ban on AI systems whose risk is unacceptably high. For other systems whose risk is high (but not unacceptably high), the AI Act will impose specific obligations. These obligations include conformity assessments, a registration before placing the AI systems on the market, certain monitoring and notification obligations. To monitor compliance, a supervisory authority will also be established.
In addition to the previous, there will also be certain transparency obligations in relation to the persons interacting with and/or affected by the AI system. The main goal being that the AI system must be explainable (which will not always be obvious) and it must be clear to users that they are interacting with an AI system.
In terms of copyright law, a legal distinction must be made between input and output.
AI systems usually need a lot of data in the development and training phase. Of course, that input may include copyrighted material, for which you need permission (or a license) from the copyright owner. During the discussion in the European Parliament, it appeared that some members of the Parliament wished to prohibit using copyrighted material to train AI models. From a practical perspective, this would make it almost impossible to properly train an AI model. After discussions, a middle-ground was chosen by the introduction of a transparency requirement. The amendment provides that the AI model owner would have to disclose the copyrighted material used for training.
The focus of the presentation was primarily on the rights the user of the AI system may have to its output. One could think for example, a response by ChatGPT, or an image created by DALL-E.
As it stands, the ownership rights to that output may be determined by the terms of use of the AI tool. For example, ChatGPT stipulates that you retain all rights to the output. On the other hand, tools such as Midjourney (for generating images) state that you only acquire a license. The terms and conditions may also depends on whether you have a pro license, and thus whether or not you pay to use the AI tool. Please note that this ownership right is distinct from the actual copyright to the output.
Even if you get ownership rights to the output, there is still the question of whether you can claim any copyright at all on output generated by an AI tool. For this, it is useful to make a distinction between output generated solely by AI systems and an output supported by AI systems. With current AI systems, we're talking mostly about the latter. It takes a prompt from a human to put AI to work (e.g., ChatGPT).
Suppose you generate an image and the AI tool grants you all the rights to the output. Can you claim copyright to it? First and foremost, the conditions of copyright (i.e., work, mode of expression, and originality) must be met. In the context of AI, the last condition (i.e. the originality condition), is the most interesting. The first condition of a "work" may also be interesting, since the work must have been made by a human being. That seems to exclude output solely generated by an AI from copyright protection. However, we will not go deeper into it for this short exposé.
Thus, to meet that originality condition, there must be a work constituting an author's own intellectual creation. The work must be an expression of the author’s personality, by making free and creative choices at various points in the creation process. That process has roughly three parts, namely: design, execution, and redaction. In each of these parts of a process, the author's (or user's) expression is different.
Applying this to AI systems, the user’s expression is most dominant as he or she writes a prompt. Then the AI model goes to work on that prompt in the execution phase. Thus, in the second phase, the AI is dominant. Finally, there is an output, and the user is likely to edit or modify that output. Here again the user becomes dominant, but not necessarily so. This leads to the question who or what had the most impact on the expression of the final output?
To determine this, a case-by-case assessment is required of whether the AI-supported output could be protected under copyright law. For the avoidance of doubt, the mere fact that AI, or more specifically technology in general, was used to create a work does not necessarily prevent protection under copyright. The Court of Justice had already confirmed this in the famous Painer-case.
To help determine the importance of the AI user’s influence to the final output, the user who wishes to be sure that the output will be protected under copyright law, would do well to document the process as well as possible so that he or she can demonstrate the ways in which he or she made free creative choices in the creation process.
In terms of data protection, a distinction must be made between roughly two processing activities: on the one hand, data is needed to train the model; on the other hand, the model can also generate its own output. It is not excluded that the output contains personal data.
The main thing to remember here is that in many (free) AI models, inputs may (and will) be used to improve the model. In other words, if you provide personal data to ChatGPT, then that data can be used to improve the model, leading to further processing. In addition, it is not impossible that such data may subsequently appear in another user’s output.
If we apply the obligations of the GDPR to these processing activities, many uncertainties arise, because of the a tension between how AI models work, and how the GDPR wants us to handle personal data. For example, the GDPR has the principle of data minimization, which is not to process more data than necessary to achieve the purpose. But AI models benefit from feeding it a lot of data, without knowing whether the fed data is actually necessary or not.
That tension also exists when choosing a legal basis. Many AI models are trained with public data on the Internet. Of course the consent of the data subject cannot be sought there. In general, this only leaves the legitimate interest legal basis as an option. However, the legitimate interest legal basis requires a balancing of interests and it remains to be seen how the balance of interests would look like. Last, this field of tension is also reflected in the question of how data subjects can be informed and how certain data can be removed from the AI model.
That such a field of tension exists was also evidenced by the fact that the Garante, namely the Italian data protection authority had immediately imposed a temporary ban via the President on the use of ChatGPT in Italy. The Garante had several objections. One of the main objections was the legal basis invoked by OpenAI. Although the ban has been lifted by the Garante, it is still not fully transparent how OpenAI was able to reassure the Garante. We can only establish that a form to opt out has been more prominently displayed on the OpenAI website.
The European Data Protection Board (EDPB) has also established a ChatGPT Task Force to develop recommendations on the new issues. However, there is little news on this and there is no known timing. We will have to wait and see how this evolves and whether there will be a common position by the European supervisory authorities.
For more information about the event, visit the INCUEBRUX event page.
If you want to know more about AI, we have some other interesting reads: