From: Martin Pieuchot <mpi@grenadille.net>
Subject: Re: Prevent recursion in bufbackoff() by discarding DMA pages
To: Mark Kettenis <mark.kettenis@xs4all.nl>
Cc: tech@openbsd.org, beck@openbsd.org
Date: Wed, 2 Oct 2024 11:55:09 +0200

On 01/10/24(Tue) 11:52, Mark Kettenis wrote:
> > Date: Mon, 30 Sep 2024 10:00:55 +0200
> > From: Martin Pieuchot <mpi@grenadille.net>
> > 
> > Currently when bufbackoff(), called by the page daemon, ends up reducing
> > the DMA cache it tries to flip every buffer.  Flipping buffers implies
> > allocating new pages which is something we want to avoid in such
> > situation.
> 
> Well, if there is plenty of "high" memory, allocating pages should not
> be a problem.

I noticed the problem when there isn't any more "high" memory available.

When almost all the RAM is used, the page daemon is awaken with a huge
deficit of pages to recover.  This amount is first passed to bufbackoff()
which basically empty its cache at once, starting by the "high" pages.
Because this isn't enough, it then tries to free & flip the DMA buffers
and obviously fail.

> > On top of that, once uvm_pagerealloc_multi() failed, bufbackoff() is
> > called again, creating a recursion.
> 
> Yes, that is bad.
> 
> > To prevent these scenarios and speed-up the page daemon I'd like to
> > commit the diff below which always discard DMA buffers when they are
> > being freed from bufbackoff().
> > 
> > The downside of this approach is that we now discard buffer if only
> > DMA-reachable memory is missing.
> > 
> > ok?
> 
> I wonder if we should do something like the diff below instead.  This
> removes the bufbackoff() call from buf_realloc_pages() and gives full
> control of the UVM_PLA_* flags to the callers.
>
> This makes buf_flip_high() entirely opportunistic.  And buf_flip_dma()
> can still make the buffer cache back off by waking up the pagedeamon.
> 
> Simplifying the code like this fells better than adding more plaster.
> Or am I missing something?  I didn't really test this under memory
> pressure yet...

Well, from where I stand, always discarding is a simplification.  It
makes amd64's OOM behavior closer to what other archs do: don't try
to alloc memory in the page daemon.

> P.S. I wonder if we should simplify buf_alloc_pages() in the same way.

I don't know.  However I know that I'd like to stop freeing pages in
interrupt context (which is expensive).  In other words stop flipping.
So I'd rather not spend time making this code shine and work towards
killing it instead.

> Index: kern/vfs_bio.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/vfs_bio.c,v
> diff -u -p -r1.213 vfs_bio.c
> --- kern/vfs_bio.c	3 Feb 2024 18:51:58 -0000	1.213
> +++ kern/vfs_bio.c	1 Oct 2024 09:34:55 -0000
> @@ -382,7 +382,8 @@ buf_flip_high(struct buf *bp)
>  	splassert(IPL_BIO);
>  
>  	/* Attempt to move the buffer to high memory if we can */
> -	if (buf_realloc_pages(bp, &high_constraint, UVM_PLA_NOWAIT) == 0) {
> +	if (buf_realloc_pages(bp, &high_constraint,
> +	    UVM_PLA_NOWAIT | UVM_PLA_NOWAKE) == 0) {
>  		KASSERT(!ISSET(bp->b_flags, B_DMA));
>  		bcstats.highflips++;
>  		ret = 0;
> Index: kern/vfs_biomem.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/vfs_biomem.c,v
> diff -u -p -r1.51 vfs_biomem.c
> --- kern/vfs_biomem.c	24 Oct 2021 00:02:25 -0000	1.51
> +++ kern/vfs_biomem.c	1 Oct 2024 09:34:55 -0000
> @@ -340,20 +340,8 @@ buf_realloc_pages(struct buf *bp, struct
>  		pmap_update(pmap_kernel());
>  	}
>  
> -	do {
> -		r = uvm_pagerealloc_multi(bp->b_pobj, bp->b_poffs,
> -		    bp->b_bufsize, UVM_PLA_NOWAIT | UVM_PLA_NOWAKE, where);
> -		if (r == 0)
> -			break;
> -	} while	((bufbackoff(where, atop(bp->b_bufsize)) == 0));
> -
> -	/*
> -	 * bufbackoff() failed, so there's no more we can do without
> -	 * waiting.  If allowed do, make that attempt.
> -	 */
> -	if (r != 0 && (flags & UVM_PLA_WAITOK))
> -		r = uvm_pagerealloc_multi(bp->b_pobj, bp->b_poffs,
> -		    bp->b_bufsize, flags, where);
> +	r = uvm_pagerealloc_multi(bp->b_pobj, bp->b_poffs,
> +	    bp->b_bufsize, flags, where);
>  
>  	/*
>  	 * If the allocation has succeeded, we may be somewhere different.