From: Alexander Bluhm Subject: Re: sysctl: unlock KERN_CONSBUF and KERN_MSGBUF To: Vitaliy Makkoveev Cc: tech@openbsd.org Date: Tue, 6 May 2025 18:56:00 +0200 On Mon, May 05, 2025 at 08:19:23PM +0300, Vitaliy Makkoveev wrote: > On Mon, May 05, 2025 at 06:02:47PM +0200, Alexander Bluhm wrote: > > On Sat, May 03, 2025 at 01:55:30AM +0300, Vitaliy Makkoveev wrote: > > > We only need to keep buffer's header consistent. Store it to local > > > variable and deliver data with two steps. > > > > I think you have to cover the cases if oldp == NULL or newp == NULL. > > sysctl_rdstruct() has if (newp) return (EPERM); and if (oldp) error > > = copyout(sp, oldp, len); > > > > Thanks for pointing. I kept the error paths order as it was in > sysctl_rdstruct(). > > > > Do we really need to fill `ump' with nulls and do manually field to > > > field copy? msgbuf header is all longs, it wouldn't be padded. > > > > I would write: memset(&ump, 0, sizeof(ump)); ump = *mp; Then the > > compiler can figure out what to do and we are safe from leaking > > memory to userland. In this particular case all fields long should > > be safe, but setting to 0 is more robust. Copy kernel stack to > > userland is a common pitfall. > > > > I prefer to do this copying manually as we do in other similar places. > The proposed "memset(); ump = *mp;" is not obvious to me and could be > not obvious to others. The memset() and manual copying is doubtless. I looked into amd64 assember output and saw that my solution copies char msg_bufc[1] as 8 bytes. I would not expect that compiler includes padding. So your code is better. > Index: sys/kern/kern_sysctl.c > =================================================================== > RCS file: /cvs/src/sys/kern/kern_sysctl.c,v > retrieving revision 1.466 > diff -u -p -r1.466 kern_sysctl.c > --- sys/kern/kern_sysctl.c 4 May 2025 13:42:07 -0000 1.466 > +++ sys/kern/kern_sysctl.c 5 May 2025 17:13:22 -0000 > @@ -574,10 +574,51 @@ kern_sysctl(int *name, u_int namelen, vo > return (sysctl_rdstruct(oldp, oldlenp, newp, > &mbs, sizeof(mbs))); > } > - case KERN_MSGBUFSIZE: > - case KERN_CONSBUFSIZE: { > - struct msgbuf *mp; > - mp = (name[0] == KERN_MSGBUFSIZE) ? msgbufp : consbufp; > + case KERN_CONSBUF: > + if ((error = suser(p))) > + return (error); > + /* FALLTHROUGH */ > + case KERN_MSGBUF: { > + extern struct mutex log_mtx; > + const size_t hlen = offsetof(struct msgbuf, msg_bufc); > + struct msgbuf ump, *mp = (name[0] == KERN_MSGBUF) ? > + msgbufp : consbufp; > + > + /* > + * deal with cases where the message buffer has > + * become corrupted. > + */ > + if (!mp || mp->msg_magic != MSG_MAGIC) > + return (ENXIO); > + if (oldp && ((hlen + mp->msg_bufs) > *oldlenp)) > + return (ENOMEM); > + if (newp) > + return (EPERM); > + if (oldp == NULL) > + return (0); if (newp) return (EPERM); if (oldp == NULL) return (0); if ((hlen + mp->msg_bufs) > *oldlenp) return (ENOMEM); I would prefer this order as it checks oldp only once. > + > + memset(&ump, 0, sizeof(ump)); > + ump.msg_magic = mp->msg_magic; > + ump.msg_bufs = mp->msg_bufs; > + mtx_enter(&log_mtx); > + ump.msg_bufx = mp->msg_bufx; > + ump.msg_bufr = mp->msg_bufr; > + ump.msg_bufd = mp->msg_bufd; > + mtx_leave(&log_mtx); Could you put the memset() and all assignments into the mutex block? Then the compiler could optmize away the memset of fields that are set immediately afterwards. Although our clang is dumb and does double stores, maybe other compiler behave smarter. > + > + /* copy header... */ > + if ((error = copyout(&ump, oldp, hlen))) > + return (error); > + /* ...and the data. */ > + error = copyout(mp->msg_bufc, oldp + hlen, mp->msg_bufs); > + > + return (error); > + } > + case KERN_CONSBUFSIZE: > + case KERN_MSGBUFSIZE: { > + struct msgbuf *mp = (name[0] == KERN_MSGBUFSIZE) ? > + msgbufp : consbufp; > + This case does not look very MP-safe. Although userland should be only interested in msg_bufs and this is an atomic long read, we provide the whole structure without protection. Could we merge that into the previous cases? Then we don't have to duplicate the copying with mutex. size_t bufsize = (name[0] == KERN_CONSBUF || name[0] == KERN_MSGBUF) ? mp->msg_bufs : 0; if ((hlen + bufsize) > *oldlenp) ... error = copyout(mp->msg_bufc, oldp + hlen, bufsize); > /* > * deal with cases where the message buffer has > * become corrupted. > @@ -668,22 +709,6 @@ kern_sysctl_locked(int *name, u_int name > if (newp && !error) > domainnamelen = newlen; > return (error); > - case KERN_CONSBUF: > - if ((error = suser(p))) > - return (error); > - /* FALLTHROUGH */ > - case KERN_MSGBUF: { > - struct msgbuf *mp; > - mp = (name[0] == KERN_MSGBUF) ? msgbufp : consbufp; > - /* > - * deal with cases where the message buffer has > - * become corrupted. > - */ > - if (!mp || mp->msg_magic != MSG_MAGIC) > - return (ENXIO); > - return (sysctl_rdstruct(oldp, oldlenp, newp, mp, > - mp->msg_bufs + offsetof(struct msgbuf, msg_bufc))); > - } > case KERN_CPTIME: > { > CPU_INFO_ITERATOR cii;