Index | Thread | Search

From:
"Theo de Raadt" <deraadt@openbsd.org>
Subject:
Re: [patch] ext4fs rw
To:
Christian Schulte <cs@schulte.it>
Cc:
Damien Miller <djm@mindrot.org>, Thomas de Grivel <thodg@kmx.io>, tech@openbsd.org
Date:
Mon, 23 Mar 2026 00:20:51 -0600

Download raw body.

Thread
Christian Schulte <cs@schulte.it> wrote:

> Am 23.03.2026 um 03:24 schrieb Damien Miller:
> > On Sat, 21 Mar 2026, Theo de Raadt wrote:
> > 
> >> I have looked at the diffs.
> >>
> >> There is a claim that University of California holds copyright over large
> >> chunks of code which are new.  These are perhaps mostly copied, but have
> >> been changed in novel ways.  I didn't dig deep enough to decide if the
> >> changes are trivial or complicated, I just looked at the volume.
> >>
> >> There is a different claim that you hold copyright over large chunks of
> >> new code.
> > 
> > This is IMO the essence of the problem here. This isn't using AI as a
> > code review, refactoring or merely mechanical tool, but instead using
> > it in context where it is writing code in (what appears to be) excess
> > of the originator's knowledge and skill. This code in question is highly
> > specific too, it's not like "go draw a stick figure of a person" and
> > more like "go paint Salvador Dali's Metamorphosis of Narcissus".
> > 
> > Who is the copyright holder in this case?
> I am kind of lost in this thread. Regarding the initial request to merge
> support for an ext4 filesystem into base, I searched for documentation
> about that ext4 filesystem in question. I found some GPL licensed wiki
> pages. The majority of available documentation either directly or
> indirectly points at GPL licensed code. In my understanding of the issue
> discussed in this thread this already introduces licensing issues. Even
> if you would write an ext4 filesystem driver from scratch for base, you
> would almost always need to incorporate knowledge carrying an illiberal
> license. I would rate that a show stopper, but I may be misunderstanding
> the issue at hand.

People are allowed to read code which handles a data format, then write
new code which handles that data format.  This is permitted for the purposes
of compatibility.  Look around you, our world is full of reimplimented
codebases for the societally beneficial interest of having compatible file
and protocols.

The new code would be based upon the understanding of a human, which is
why it can be put under Copyright, and after various reuse rights are
granted, it can be intergrated by code aggregators like OpenBSD.

The very first time we heard about these diffs, there was a claim that the
"author" avoided looking at the original code, but instead had the AI do this
for him.  I have to take that original claim at face value.  An AI cannot
write new code because it is instead just regurgitating a variation of the
old code.  It is not following the human process which is permitted to
create a new work for compatiblity and then access the legal structure we
rely on.

But the story has been changing, and the AI did almost nothing.  I'm going
so step away and hope that another human untaints this process.