From: Mark Kettenis <mark.kettenis@xs4all.nl>
Subject: Re: pthread_mutex_timedlock(3): don't block forever if the timeout is NULL
To: Scott Cheloha <scottcheloha@gmail.com>
Cc: tech@openbsd.org, mpi@openbsd.org, guenther@openbsd.org
Date: Mon, 05 Feb 2024 19:50:01 +0100

> Date: Mon, 5 Feb 2024 11:44:18 -0600
> From: Scott Cheloha <scottcheloha@gmail.com>
> 
> On Mon, Feb 05, 2024 at 04:50:18PM +0100, Martin Pieuchot wrote:
> > On 27/01/24(Sat) 18:19, Scott Cheloha wrote:
> > > On Tue, Jan 09, 2024 at 11:14:08PM -0600, Scott Cheloha wrote:
> > > > Both pthread_mutex_timedlock(3) implementations pass the timeout
> > > > parameter from the caller through to the internal implementation
> > > > function.  In both of those functions, a NULL timeout pointer means
> > > > "no timeout" or "block indefinitely".
> > > > 
> > > > So, we have accidentally extended the pthread_mutex_timedlock(3)
> > > > specification with special, undocumented behavior when the timeout
> > > > pointer is NULL.
> > 
> > Is this a bug?  It sounds sane to me.  Behave just like
> > pthread_mutex_lock(3).
> 
> I agree that it sounds sane from an interface design perspective to
> have NULL mean "no timeout".  At least, that would be consistent with
> many other POSIX interfaces.  However...
> 
> > > > The easy fix is to copy the timeout to the stack before passing it on.
> > > > 
> > > > This change will cause any code that depends upon the current behavior
> > > > to segfault instead of blocking indefinitely.  I think it's unlikely
> > > > such code exists.  It it does exist, it is not portable.
> > 
> > I'm not sure adding an explicit NULL-dereference in a library is the way to
> > go.  If anything it should be better to return an error.
> 
> We're talking about a programmer error here.  I think failing hard and
> faulting in this case would be better than returning an error.  There
> is a real possibility that the programmer will not see the EINVAL and
> the caller will proceed as if they have locked the mutex.
> 
> > But I see no evidence that passing a NULL pointer should behave
> > differently than pthread_mutex_lock(3).
> 
> The behavior is undefined: 
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_timedlock.html
> 
> Snipped to only include text relevant to the timeout:
> 
> > #include <pthread.h>
> > #include <time.h>
> > 
> > int pthread_mutex_timedlock(pthread_mutex_t *restrict mutex,
> >       const struct timespec *restrict abstime);
> > 
> > The pthread_mutex_timedlock() function shall lock the mutex object
> > referenced by mutex. If the mutex is already locked, the calling thread
> > shall block until the mutex becomes available as in the pthread_mutex_lock()
> > function.  If the mutex cannot be locked without waiting for another thread
> > to unlock the mutex, this wait shall be terminated when the specified
> > timeout expires.
> > 
> > The timeout shall expire when the absolute time specified by abstime
> > passes, as measured by the clock on which timeouts are based (that is,
> > when the value of that clock equals or exceeds abstime), or if the
> > absolute time specified by abstime has already been passed at the time
> > of the call.
> > 
> > The timeout shall be based on the CLOCK_REALTIME clock. The resolution
> > of the timeout shall be the resolution of the clock on which it is
> > based. The timespec data type is defined in the <time.h> header.
> > 
> > Under no circumstance shall the function fail with a timeout if the
> > mutex can be locked immediately. The validity of the abstime parameter
> > need not be checked if the mutex can be locked immediately.
> > 
> > [...]
> > 
> > [EINVAL]
> >     The process or thread would have blocked, and the abstime parameter
> >     specified a nanoseconds field value less than zero or greater than
> >     or equal to 1000 million.
> > 
> > [...]
> > 
> > [ETIMEDOUT]
> >     The mutex could not be locked before the specified timeout expired. 
> 
> There is no mention of special behavior when the timeout input is
> NULL, so I don't think we should provide special behavior.  If NULL
> didn't mean "no timeout" in so many other interfaces there wouldn't
> even be anything to discuss.
> 
> The simplest thing is to interpret the standard literally.
> 
> The current behavior looks like an accident.

At the same time your proposed code looks artificial.  Not convinced
this is worth complicating the code for.

> Index: rthread_mutex.c
> ===================================================================
> RCS file: /cvs/src/lib/libc/thread/rthread_mutex.c,v
> diff -u -p -r1.5 rthread_mutex.c
> --- rthread_mutex.c	13 Feb 2019 13:09:32 -0000	1.5
> +++ rthread_mutex.c	5 Feb 2024 17:38:03 -0000
> @@ -211,9 +211,11 @@ pthread_mutex_trylock(pthread_mutex_t *m
>  }
>  
>  int
> -pthread_mutex_timedlock(pthread_mutex_t *mutexp, const struct timespec *abs)
> +pthread_mutex_timedlock(pthread_mutex_t *mutexp, const struct timespec *absp)
>  {
> -	return (_rthread_mutex_timedlock(mutexp, 0, abs, 1));
> +	struct timespec abs = *absp;
> +
> +	return (_rthread_mutex_timedlock(mutexp, 0, &abs, 1));
>  }
>  
>  int
> Index: rthread_sync.c
> ===================================================================
> RCS file: /cvs/src/lib/libc/thread/rthread_sync.c,v
> diff -u -p -r1.6 rthread_sync.c
> --- rthread_sync.c	10 Jan 2024 04:28:43 -0000	1.6
> +++ rthread_sync.c	5 Feb 2024 17:38:03 -0000
> @@ -179,9 +179,11 @@ pthread_mutex_trylock(pthread_mutex_t *p
>  }
>  
>  int
> -pthread_mutex_timedlock(pthread_mutex_t *p, const struct timespec *abstime)
> +pthread_mutex_timedlock(pthread_mutex_t *p, const struct timespec *abstimep)
>  {
> -	return (_rthread_mutex_lock(p, 0, abstime));
> +	struct timespec abstime = *abstimep;
> +
> +	return (_rthread_mutex_lock(p, 0, &abstime));
>  }
>  
>  int
> 
>