Index | Thread | Search

From:
Alexander Bluhm <bluhm@openbsd.org>
Subject:
Re: limit softnet threads to number of cpu
To:
Mark Kettenis <mark.kettenis@xs4all.nl>
Cc:
Theo de Raadt <deraadt@openbsd.org>, mlarkin@nested.page, tech@openbsd.org
Date:
Mon, 8 Sep 2025 23:42:42 +0200

Download raw body.

Thread
On Mon, Sep 08, 2025 at 10:02:44PM +0200, Mark Kettenis wrote:
> > From: "Theo de Raadt" <deraadt@openbsd.org>
> > Date: Mon, 08 Sep 2025 13:15:49 -0600
> > 
> > Mike Larkin <mlarkin@nested.page> wrote:
> > 
> > > I had a long reply typed up about the fact that we probably know the
> > > number of CPUs by the time the first network driver attaches but then
> > > realized that doesn't then solve the question of when the softnet tasks
> > > would be created. It would have to be shimmed in somewhere after cpuX
> > > but before any network driver, and that would be less clean than your
> > > approach below.
> > 
> > I think we had an architecture where the other cpu's attached very,
> > very late.
> > 
> > Was it sgi?
> 
> Not sure about sgi, but yes it used to be all over the place.  We
> moved to an "attach CPUs first/early" model in part because we wanted
> to distribute the interrupts for multiqueue NICs over different CPUs.
> 
> That is somewhat related to the issue being discussed here.  At least
> my understanding is that NIC queues are tied to softnet threads.  I am
> wondering if creating the softnet threads "on demand" when the network
> drivers attach would make sense.

Threads "on demand" is what dhill@ implemented in softnet_count()
when it is called first.  But this looks very fragile to me as it
depends on the order of other code.  Especially it requires that
softnet_count() is called first before any other cpu is started to
avoid races.  If we really want this logic, we can discuss that
separately.

> But I also think that at some point
> we need a way to rebalance the interrupts over the available CPUs.
> 
> Anyway, while I'm not entirely happy with the diff, I don't think that
> should block this it.

My diff would not make architectures with late cpu attach worse.

Order on amd64 is:
- init task queues and threads
- autoconf cpu
- autoconf drivers, use number of cpu for interrupts
- destroy threads more than number of cpu
- start remaining threads

When cpu attaches late:
- init task queues and threads
- autoconf drivers, use 1 cpu for interrupts
- autoconf cpu
- destroy threads more than number of cpu
- start remaining threads

In both cases number of cpu is the same as number of threads.  Like
kettenis@ explained, drivers would only use cpu0 for interrupts.
My diff does not change that.

bluhm