Index | Thread | Search

From:
Claudio Jeker <cjeker@diehard.n-r-g.com>
Subject:
Re: avoid lock contention in futex syscalls
To:
David Gwynne <david@gwynne.id.au>
Cc:
tech@openbsd.org, mpi@openbsd.org
Date:
Thu, 1 May 2025 11:20:12 +0200

Download raw body.

Thread
  • Claudio Jeker:

    avoid lock contention in futex syscalls

  • On Thu, May 01, 2025 at 02:07:18PM +1000, David Gwynne wrote:
    > this is very similar to the change made to __thrsleep and __thrwakeup
    > in src/sys/kern/kern_synch.c r1.214.
    > 
    > currently all futexes in the system coordinate using a single global
    > lock. if you have heavily threaded code building locks in userland
    > out of futexes, this lock gets hammered. this is true even if
    > userland thinks it's operating on separate locks, it all ends up
    > serialised in the kernel. this can reduce the throughput of these
    > heavily threaded programs.
    > 
    > like the __thrsleep diff, the big change is hashing futex waiters
    > into an array of locks/lists (or buckets) based on their "id" to
    > try and avoid contending on a single lock.
    > 
    > also like the __thrsleep diff, this change also tries to avoid having
    > a thread waiting in futex_wait re-take the lock when waking up.
    > futex_wake is holding the bucket lock when waking up sleeping
    > threads, so having the sleeping thread try take the bucket lock again
    > would immediately put it back to sleep again. having futex_wait sleep
    > without the lock means it can return back to userland sooner.
    > 
    > a feature of futexes is that multiple threads can wait on the same
    > address and get woken up together. this is currently implemented by
    > allocating a struct to represent this userland address, and then queuing
    > the waiting threads on this struct. while pools aren't slow, they're
    > not free, so this diff removes this struct and queues threads directly.
    > this means the futex wakups may have to iterate more, but in practice
    > this is amortised by having multiple lists/locks (which results in
    > shorter lists of threads), and avoiding the overhead of the pool
    > operations. my observation is that most futex ops dont share wait
    > addresses, so every futex wait would result in a pool get and put
    > anyway.
    > 
    > another feature of futexes that __thrsleep doesnt have is the ability
    > to move the address threads are sleeping on. this means that threads
    > can move between buckets in the hash. this means care must be taken to
    > avoid deadlocks between the locks on each bucket, and when a waiting
    > thread wakes up after a timeout expires it has to be careful to remove
    > itself from the right bucket after such a requeue.
    > 
    > most users of thrsleep have been migrated to use futexes instead,
    > with a big exception being go because benchmarks showed that
    > __thrsleep was significantly faster. that was true before i changed
    > __thrsleep last year, and is even more true now. im hoping this
    > change brings futex performance into the same ballpark as __thrsleep
    > so we can keep work toward removing __thrsleep. unfortunately i
    > dont know enough go to test this myself, so i'm hoping someone
    > *cough*jsing*cough* will try and let me know.
    > 
    > thanks to phessler for his patience and help testing and reporting
    > issues with this diff by running it through a ton of ports builds.
    > also, because go is such a heavy user of __thrsleep, and because
    > of the lack of issues after i made that change, im feeling confident
    > about this diff.
    
    I have been running with this or previous versions of this diff for a
    while. In my tests it seems to help and makes futex much faster but at the
    same time it shifts the code I'm testing into spinning more on the kernel
    lock. So as usual you win some and you lose some :)
     
    -- 
    :wq Claudio
    
    
  • Claudio Jeker:

    avoid lock contention in futex syscalls