Index | Thread | Search

From:
Marc Espie <marc.espie.openbsd@gmail.com>
Subject:
the little bug that wasn't
To:
tech@openbsd.org
Date:
Wed, 4 Dec 2024 11:10:59 +0100

Download raw body.

Thread
  • Marc Espie:

    the little bug that wasn't

This is a somewhat longer write-up about the fun I had two days ago.

The ports tree had a problem. For quite a few months now, something
didn't seem right, you started pkg_add -u, you let it proceed through
a few hundreds packages, and you tried to start an application, and
you got an error message, like two days ago

$ gnuplot
nausicaa$ gnuplot
ld.so: gnuplot: can't load library 'libharfbuzz.so.18.10'
Killed 
$ 

Invariably it had to do with the minor version number not being exactly
right, but no-one took the time to look any further.

Until two days ago.

I thought I had it figured out: this had to be a bug in ld.so, and it had
probably to do with the handling of minor version numbers.

So I looked. ld.so being a bit "low level", it's not necessarily easy to
debug directly, but it's got a DL_DEB() macro that allows you to do printf
debugging (I know, don't laugh, but it's often easier than to try to figure
out something smarter). So I looked, and yeah, gnuplot did set up the right
hint, and yeah, the test for the right library seemed okay, so why didn't it
work ?

Then it hit me: the cache. ld.so was totally fine, but there's this cache
updated by ldconfig(8) and it was the part that was gettin out of whack.

So why didn't we notice this before.

It used to be that we ran outside executables all the time, and the pkg
tools have got a "just in time" mechanism to handle that (it used to be
that the oldest version of the tools did @exec ldconfig -R manually):

some part that says we changed libraries, and some part that runs ldconfig
if libraries have changed prior to running an external program.

But we got @tags: a mechanism to run commands like update-desktop-database
just once, right before cleaning up shared data at the end of the pkg_add
run.

So... turns out that pkg_add behavior was perfect for itself, but defying
users's expectations (including mine) that commands would be usable right
after being updated.

The "fix" (quality of life improvement really) was one line:
	$state->ldconfig->ensure;
right after the end of each individual update.

(and so it was a long standing issue: @tags happened somewhere around 2019,
and some astute users probably starting noticing the issue around 2020).


It's now the 2nd time ldconfig has gotten me to look in the wrong location
in 15 years. 

Sneaky bastard.

-- 
	Marc