Can AI Legally Clone Open Source? Unpacking "Clean Room as a Service"

Recently, a project called "MALUS" went viral in the software development community. By pointing an AI agent at the content of an Open Source project, MALUS claimed to produce analogous software that bypasses the commercial restrictions of “copyleft” copyright licences (like GPL or LGPL) by ensuring that the original source code is never actually copied. MALUS purportedly achieves this via a “Clean Room as a Service”.

To understand why this is a problem, we need to understand some history. Back in the 1980s, IBM dominated the PC market, largely because they held the copyright to their proprietary BIOS. Competitors like Compaq wanted to build IBM-compatible clones but couldn't legally copy the BIOS code.

Their solution was a "clean room" design. Compaq used two completely separate and isolated teams. Team A reverse-engineered the IBM BIOS and wrote a list of functional specifications. Team B, consisting of a group of engineers who had absolutely no access to IBM’s original code, used only those specifications to write a brand new, compatible BIOS. Because Team B only recreated the functionality (idea) and never copied the actual code (expression), IBM's copyright wasn't infringed.

Historically, clean-room reverse engineering was painfully slow and astronomically expensive. Today, however, AI-powered reverse engineering appears to have overcome this hurdle. To this end, MALUS implements a first large language model as “Team A” and a second large language model as “Team B”, and never the twain shall meet.

Because copyright subsists in the specific “expression” and not the “idea” itself, so long as the second language model does not receive any of the original source code and only follows the specification created by the first large language model, the argument is that the original open-source copyright license, which may restrict derivative works to also being released under the same license, is bypassed entirely whilst producing new, functionally equivalent code.

From a UK legal standpoint, this targets a fundamental vulnerability in copyright law: the idea-expression dichotomy that I alluded to in the foregoing. If an AI agent merely extracts unprotectable “ideas” from the protected source code, it operates squarely in a legal grey area that copyright law struggles to handle.

A key question that has not yet been fully answered in the courts is whether, despite the second large language model (Team B) not directly reviewing the source code on inference, the final output is nonetheless still a copy of the expression in the source code by virtue of how the second large language model is trained.

While MALUS appears to be a satirical warning, the potential legal loophole it highlights is very serious. A key consideration in light of MALUS is to not rely solely on copyright and to re-evaluate whether patent protection is a more suitable option, which unlike copyright can protect underlying functionality of computer code (or “ideas”). Another consideration to take is to protect your software documentation if you are an open source publisher: attempt to minimise the amount of documentation that is available publicly that might aid an AI agent creating a specification of your code.

A final thought is to remember that the general concept of “Clean Room as a Service” using AI agents has not actually been tested in court and may, despite claims from MALUS, nonetheless be found to infringe copyright.

This article is not to be construed as legal advice.

When it comes to copyright and patent law, computer code is in a bizarre place.

www.plagiarismtoday.com/...

Search

Search

Can AI Legally Clone Open Source? Unpacking "Clean Room as a Service"

Latest Insights

Can AI Legally Clone Open Source? Unpacking "Clean Room as a Service"

Impact Beyond Orbit - Marks & Clerk and Space Network on the dual-purpose value of space tech innovation

Prosthetics powered by space innovation