Download raw body.
octeon: interrupts and barier cleanup
01.04.2026 22:44, Kirill A. Korinsky пишет:
> Visa, tech@,
>
> in my work to bring support queues in cnmac I think I discovered a weird
> issue in interrupts and barrier in octeon.
>
> octciu_intr_establish() has:
>
> 201 int cpuid = cpu_number();
> ...
> 208 #ifdef MULTIPROCESSOR
> 209 /* Span work queue interrupts across CPUs. */
> 210 if (IS_WORKQ_IRQ(irq))
> 211 cpuid = irq % ncpus;
> 212 #endif
> ...
> 222 ih->ih_irq = irq;
>
> my understanding is that octciu_intr_establish() happens on autoconf where
> only CPU0 exists, and on octeon we increase ncpus in cpu hatch, and not
> attach, which means that ncpus is 1 at octciu_intr_establish() time.
>
> What makes cpuid always 1 for that code and schedules all interrupts on
> CPU0.
>
> Next, octciu_intr_barrier():
>
> 290 void
> 291 octciu_intr_barrier(void *_ih)
> 292 {
> 293 struct cpu_info *ci = NULL;
> 294 #ifdef MULTIPROCESSOR
> 295 struct octciu_intrhand *ih = _ih;
>
> 296 if (IS_WORKQ_IRQ(ih->ih_irq))
> 297 ci = get_cpu_info(ih->ih_irq % ncpus);
> 298 #endif
>
> 299 sched_barrier(ci);
> 300 }
>
> because ih->ih_irq is 0 here, it picks cpu0 only.
>
> kn@'s diff https://marc.info/?l=openbsd-bugs&m=177171071006897&w=2
> introduced a regression on my ER-4:
> https://marc.info/?l=openbsd-bugs&m=177177669428825&w=2 and this regression
> can be avoided by removing the block in octciu_intr_establish().
>
> Here is the diff where I remove this block and % ncpus.
>
> This diff keeps interrupts on CPU0 which is, actually, current behaviour and
> unblocks kn@'s diff to enable softnet threads on octeon which unblocks my
> work of making cnmac with RX queue:
> https://marc.info/?l=openbsd-tech&m=177506394121597&w=2
>
> So far I see this diff as safe because it doesn't change current behaviour
> but I may miss some details.
We briefly discussed this off-list already; let's continue with visa on Cc.
Reading a diff now, I think you do change behaviour, see inline comments.
According to octciu.c r1.7 and r1.14, distributing interrupts seemed to have
some effect back then in 2017 and 2019.
I didn't check when or how exactly ncpus[found] semantic changed and thus
changed octciu(4) behaviour, but the current `% ncpus` effectively forcing all
interrupts onto CPU0 due to ncpus=1 seems unintended indeed.
>
> OK to commit it?
>
> Index: sys/arch/octeon/dev/octciu.c
> ===================================================================
> RCS file: /home/cvs/src/sys/arch/octeon/dev/octciu.c,v
> diff -u -p -r1.19 octciu.c
> --- sys/arch/octeon/dev/octciu.c 11 Dec 2022 05:31:05 -0000 1.19
> +++ sys/arch/octeon/dev/octciu.c 1 Apr 2026 19:39:40 -0000
> @@ -250,12 +250,6 @@ octciu_intr_establish(int irq, int level
> panic("%s: illegal irq %d", __func__, irq);
> #endif
>
> -#ifdef MULTIPROCESSOR
> - /* Span work queue interrupts across CPUs. */
> - if (IS_WORKQ_IRQ(irq))
> - cpuid = irq % ncpus;
> -#endif
> -
So -current is equivalent to `cpuid = 0;' due to `ncpus == 1'.
With that hunk removed you get
arch/mips64/include/cpu.h:#define cpu_number() (curcpu()->ci_cpuid)
which may be non-zero, which is not the same.
> flags = (level & IPL_MPSAFE) ? CIH_MPSAFE : 0;
> level &= ~IPL_MPSAFE;
>
> @@ -363,7 +357,7 @@ octciu_intr_barrier(void *_ih)
> struct octciu_intrhand *ih = _ih;
>
> if (IS_WORKQ_IRQ(ih->ih_irq))
> - ci = get_cpu_info(ih->ih_irq % ncpus);
> + ci = get_cpu_info(ih->ih_irq);
`ih_irq >= 16' seems possible according to
arch/octeon/dev/octciu.c:#define IS_WORKQ_IRQ(x) ((unsigned int)(x) < 16)
which can be more than CPUs in the system, i.e. you may get NULL:
arch/mips64/mips64/cpu.c-#ifdef MULTIPROCESSOR
arch/mips64/mips64/cpu.c-struct cpu_info *
arch/mips64/mips64/cpu.c:get_cpu_info(int cpuno)
arch/mips64/mips64/cpu.c-{
arch/mips64/mips64/cpu.c- struct cpu_info *ci;
arch/mips64/mips64/cpu.c- CPU_INFO_ITERATOR cii;
arch/mips64/mips64/cpu.c-
arch/mips64/mips64/cpu.c- CPU_INFO_FOREACH(cii, ci) {
arch/mips64/mips64/cpu.c- if (ci->ci_cpuid == cpuno)
arch/mips64/mips64/cpu.c- return ci;
arch/mips64/mips64/cpu.c- }
arch/mips64/mips64/cpu.c- return NULL;
arch/mips64/mips64/cpu.c-}
which is not reachable here in -current.
> #endif
>
> sched_barrier(ci);
>
>
I presume you get "saved" by then defaulting to the primary (first, I think) CPU:
kern/kern_sched.c-void
kern/kern_sched.c:sched_barrier(struct cpu_info *ci)
kern/kern_sched.c-{
kern/kern_sched.c- struct sched_barrier_state sb;
kern/kern_sched.c- struct task task;
kern/kern_sched.c- CPU_INFO_ITERATOR cii;
kern/kern_sched.c-
kern/kern_sched.c- if (ci == NULL) {
kern/kern_sched.c- CPU_INFO_FOREACH(cii, ci) {
kern/kern_sched.c- if (CPU_IS_PRIMARY(ci))
kern/kern_sched.c- break;
kern/kern_sched.c- }
kern/kern_sched.c- }
kern/kern_sched.c- KASSERT(ci != NULL);
kern/kern_sched.c-
kern/kern_sched.c- if (ci == curcpu())
kern/kern_sched.c- return;
kern/kern_sched.c-
kern/kern_sched.c- sb.ci = ci;
kern/kern_sched.c- cond_init(&sb.cond);
kern/kern_sched.c- task_set(&task, sched_barrier_task, &sb);
kern/kern_sched.c-
kern/kern_sched.c- task_add(systqmp, &task);
kern/kern_sched.c- cond_wait(&sb.cond, "sbar");
kern/kern_sched.c-}
which, again, is not how -current behaves, if I read that correctly.
octeon: interrupts and barier cleanup