Monday, November 07, 2022

Machine learning and rights of use

A while ago, I wrote a bit about AI-based image generators. Such generators were trained using tons of images and can generate images based on text or image prompts. Besides generating images with an understanding of the context of the prompts, they can also imitate learned styles.

Recently, GitHub Copilot was sued for infringement of rights. GitHub Copilot was trained on the source code of open-source projects hosted on GitHub. You would think that the use of open-source code should not be a problem, right? Well, that is the wrong understanding of open-source software. Open-source does not mean free. While most open-source projects are free for people to use, they usually come with different types of licenses that limit the scenarios in which they can be used. For example, there are those that are only free for private, non-commercial use. There are also projects that require some form of attribution if they are used in other projects.

The problem is that GitHub Copilot does not provide such attribution. The user is given a piece of code that has been learnt by the AI. However, that piece of code may have been lifted from an open-source project as-is. And that original project may have certain limitations, stated in its license, about how its source code can be reused. Users of GitHub Copilot may thus run of the risk of being sued for infringement of rights is they violate those licenses. And the problem is, they are unlikely to know that they have violate those licenses, until they get sued, because GitHub Copilot does not tell them where the "suggested code" is coming from, nor the associated license applicable to that piece of code.

Will AI image generators be the next to be sued?

Why? Because AI image generators could have similarly been trained on copyrighted images, or images with a license regarding their use/reuse. The end user, however, will not know. So it may well be the case that the user uses a generated image, and then gets sued by the original illustrator for violation of rights. Of course, for images, it is a lot harder to assert such rights, unless the generated image is an exact copy of the original. Still, we have heard of criticism of Chinese animators imitating the style of Japanese anime. With AI image generators, such imitation becomes easier, whether intentional or not. Imagine a game creator using such an AI image generator to create visuals, only to have the copyright owners of Gundam come knocking on his/her door for copyright infringement because the mechas in the game look like those in some part of the Gundam universe (which is huge, diverse, and difficult for a single person to know unless he/she is a hardcore fan, but nonetheless, all copyrighted).

So let's not train on copyrighted material. Let's just use images created by various illustrators posted on the internet for free use. Well, unfortunately, these illustrators may, like open-source software, have tagged a license about how their works can be reused. Imitating their works have the same risk of rights infringement.

What does this mean? Well, I guess the first issue to be addressed is the question of attribution. If something is generated by AI based on the work of a person or entity, that outcome needs to be attributed to the source. Another aspect is for AI generation to be told the limits to work within. For example, only generate images based on images that have been licensed for non-commercial reuse. Or suggest code only from projects that have licenses compatible with the target project.

By the way, other forms of AI generators also have the same risk. For example, there are AI generators for text. Give it a prompt, and it will generate a paragraph or even article for you. Unfortunately, you may end up being sued for plagiarism if the AI had unknowingly lifted passages wholesale from some related work. This is the problem with current machine learning techniques: they are based on using AI to learn from existing materials. Most of these sources, however, are not free; they come with limitations on how they can be used, reused, or reproduced. Until we can develop a method for AI to create stuff without having to learn from tons of existing human works, the risk of infringing the rights of other people will always be there.

Update 22 January 2023: An article about copyright issues of AI art generators.

No comments: