From: Visa Hankala Subject: Re: EVFILT_USER and kevent(2) To: tech@openbsd.org Date: Tue, 6 May 2025 03:38:57 +0000 On Sat, Apr 30, 2022 at 01:51:16PM +0000, Visa Hankala wrote: > It has been asked in the past if OpenBSD's kevent(2) should implement > user event filters, also known as EVFILT_USER. This filter type > originates from FreeBSD but is now available also on DragonFly BSD, > NetBSD, and macOS. > > Below is an implementation of EVFILT_USER. The logic should be fairly > straightforward. However, the filter type needs a special case in > kqueue_register() to allow triggering a previously registered user > event without using EV_ADD. > > The code limits the number of user events. Otherwise the user could > allocate copious amounts of kernel memory. The limit is per process > so that programs will not interfere with each other. The current limit > is arbitrary and might need adjusting later. Hopefully a sysctl knob > will not be necessary. > > I am in two minds about EVFILT_USER. On the one hand, having it on > OpenBSD might help with ports. On the other hand, it makes the kernel > perform a task that userspace can already handle using existing > interfaces. I wonder if there is more interest in EVFILT_USER now. Here is a refreshed version of the patch: Index: lib/libc/sys/kqueue.2 =================================================================== RCS file: src/lib/libc/sys/kqueue.2,v retrieving revision 1.51 diff -u -p -r1.51 kqueue.2 --- lib/libc/sys/kqueue.2 20 Aug 2023 19:52:40 -0000 1.51 +++ lib/libc/sys/kqueue.2 6 May 2025 03:20:39 -0000 @@ -561,6 +561,44 @@ e.g. an HDMI cable has been plugged in t On return, .Fa fflags contains the events which triggered the filter. +.It Dv EVFILT_USER +Establishes a user event identified by +.Va ident +which is not associated with any kernel mechanism but is triggered by +user level code. +The lower 24 bits of the +.Va fflags +may be used for user defined flags and manipulated using the following: +.Bl -tag -width XXNOTE_FFLAGSMASK +.It Dv NOTE_FFNOP +Ignore the input +.Va fflags . +.It Dv NOTE_FFAND +Bitwise AND +.Va fflags . +.It Dv NOTE_FFOR +Bitwise OR +.Va fflags . +.It Dv NOTE_FFCOPY +Copy +.Va fflags . +.It Dv NOTE_FFCTRLMASK +Control mask for +.Va fflags . +.It Dv NOTE_FFLAGSMASK +User defined flag mask for +.Va fflags . +.El +.Pp +A user event is triggered for output with the following: +.Bl -tag -width XXNOTE_FFLAGSMASK +.It Dv NOTE_TRIGGER +Cause the event to be triggered. +.El +.Pp +On return, +.Va fflags +contains the users defined flags in the lower 24 bits. .El .Sh RETURN VALUES .Fn kqueue Index: regress/sys/kern/kqueue/Makefile =================================================================== RCS file: src/regress/sys/kern/kqueue/Makefile,v retrieving revision 1.32 diff -u -p -r1.32 Makefile --- regress/sys/kern/kqueue/Makefile 20 Aug 2023 15:19:34 -0000 1.32 +++ regress/sys/kern/kqueue/Makefile 6 May 2025 03:20:40 -0000 @@ -4,7 +4,8 @@ PROG= kqueue-test CFLAGS+=-Wall SRCS= kqueue-pipe.c kqueue-fork.c main.c kqueue-process.c kqueue-random.c \ kqueue-pty.c kqueue-tun.c kqueue-signal.c kqueue-fdpass.c \ - kqueue-exec.c kqueue-flock.c kqueue-timer.c kqueue-regress.c + kqueue-exec.c kqueue-flock.c kqueue-timer.c kqueue-regress.c \ + kqueue-user.c LDADD= -levent -lutil DPADD= ${LIBEVENT} ${LIBUTIL} @@ -52,6 +53,8 @@ kq-regress-5: ${PROG} ./${PROG} -R5 kq-regress-6: ${PROG} ./${PROG} -R6 +kq-user: ${PROG} + ./${PROG} -u TESTS+= kq-exec TESTS+= kq-fdpass @@ -73,6 +76,7 @@ TESTS+= kq-reset-timer TESTS+= kq-signal TESTS+= kq-timer TESTS+= kq-tun +TESTS+= kq-user REGRESS_TARGETS=${TESTS} REGRESS_ROOT_TARGETS=kq-pty-1 Index: regress/sys/kern/kqueue/kqueue-user.c =================================================================== RCS file: regress/sys/kern/kqueue/kqueue-user.c diff -N regress/sys/kern/kqueue/kqueue-user.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ regress/sys/kern/kqueue/kqueue-user.c 6 May 2025 03:20:40 -0000 @@ -0,0 +1,189 @@ +/* $OpenBSD$ */ + +/* + * Copyright (c) 2022 Visa Hankala + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + */ + +#include +#include +#include + +#include +#include +#include + +#include "main.h" + +int +do_user(void) +{ + const struct timespec ts = { 0, 10000 }; + struct kevent kev[2]; + int dummy, dummy2, i, kq, n; + + ASS((kq = kqueue()) >= 0, + warn("kqueue")); + + /* Set up an event. */ + EV_SET(&kev[0], 1, EVFILT_USER, EV_ADD, ~0U & ~NOTE_TRIGGER, 0, NULL); + ASS(kevent(kq, kev, 1, NULL, 0, NULL) == 0, + warn("kevent")); + + n = kevent(kq, NULL, 0, kev, 2, &ts); + ASSX(n == 0); + + /* + * Activate the event. + * Fields `data' and `udata' do not get updated without EV_ADD. + */ + EV_SET(&kev[0], 1, EVFILT_USER, 0, NOTE_TRIGGER | NOTE_FFNOP, + 123, &dummy); + n = kevent(kq, kev, 1, NULL, 0, NULL); + ASSX(n == 0); + + /* Check active events. */ + n = kevent(kq, NULL, 0, kev, 2, &ts); + ASSX(n == 1); + ASSX(kev[0].ident == 1); + ASSX(kev[0].fflags == NOTE_FFLAGSMASK); + ASSX(kev[0].data == 0); + ASSX(kev[0].udata == NULL); + + /* Activate the event. Update `data' and `udata'. */ + EV_SET(&kev[0], 1, EVFILT_USER, EV_ADD, NOTE_TRIGGER | NOTE_FFNOP, + 123, &dummy); + n = kevent(kq, kev, 1, NULL, 0, NULL); + ASSX(n == 0); + + /* Check active events. */ + n = kevent(kq, NULL, 0, kev, 2, &ts); + ASSX(n == 1); + ASSX(kev[0].ident == 1); + ASSX(kev[0].fflags == NOTE_FFLAGSMASK); + ASSX(kev[0].data == 123); + ASSX(kev[0].udata == &dummy); + + /* Set up another event. */ + EV_SET(&kev[0], 2, EVFILT_USER, EV_ADD, NOTE_TRIGGER, 654, &dummy2); + n = kevent(kq, kev, 1, NULL, 0, NULL); + ASSX(n == 0); + + /* Check active events. This assumes a specific output order. */ + n = kevent(kq, NULL, 0, kev, 2, &ts); + ASSX(n == 2); + ASSX(kev[0].ident == 1); + ASSX(kev[0].fflags == NOTE_FFLAGSMASK); + ASSX(kev[0].data == 123); + ASSX(kev[0].udata == &dummy); + ASSX(kev[1].ident == 2); + ASSX(kev[1].fflags == 0); + ASSX(kev[1].data == 654); + ASSX(kev[1].udata == &dummy2); + + /* Clear the first event. */ + EV_SET(&kev[0], 1, EVFILT_USER, EV_CLEAR, 0, 0, NULL); + n = kevent(kq, kev, 1, NULL, 0, NULL); + ASSX(n == 0); + + n = kevent(kq, NULL, 0, kev, 2, &ts); + ASSX(n == 1); + ASSX(kev[0].ident == 2); + ASSX(kev[0].fflags == 0); + ASSX(kev[0].data == 654); + ASSX(kev[0].udata == &dummy2); + + /* Delete the second event. */ + EV_SET(&kev[0], 2, EVFILT_USER, EV_DELETE, 0, 0, NULL); + n = kevent(kq, kev, 1, NULL, 0, NULL); + ASSX(n == 0); + + n = kevent(kq, NULL, 0, kev, 2, &ts); + ASSX(n == 0); + + /* Test self-clearing event. */ + EV_SET(&kev[0], 2, EVFILT_USER, EV_ADD | EV_CLEAR, 0x11, 42, &dummy); + n = kevent(kq, kev, 1, kev, 2, &ts); + ASSX(n == 0); + + EV_SET(&kev[0], 2, EVFILT_USER, 0, NOTE_TRIGGER | 0x3, 24, &dummy2); + n = kevent(kq, kev, 1, kev, 2, &ts); + ASSX(n == 1); + ASSX(kev[0].ident == 2); + ASSX(kev[0].fflags == 0x11); + ASSX(kev[0].data == 42); + ASSX(kev[0].udata == &dummy); + + n = kevent(kq, NULL, 0, kev, 2, &ts); + ASSX(n == 0); + + EV_SET(&kev[0], 2, EVFILT_USER, 0, NOTE_TRIGGER | 0x3, 9, &dummy2); + n = kevent(kq, kev, 1, kev, 2, &ts); + ASSX(n == 1); + ASSX(kev[0].ident == 2); + ASSX(kev[0].fflags == 0); + ASSX(kev[0].data == 0); + ASSX(kev[0].udata == &dummy); + + EV_SET(&kev[0], 2, EVFILT_USER, EV_DELETE, 0, 0, NULL); + n = kevent(kq, kev, 1, kev, 2, &ts); + ASSX(n == 0); + + /* Change fflags. */ + EV_SET(&kev[0], 1, EVFILT_USER, 0, NOTE_FFCOPY | 0x00aa00, 0, NULL); + n = kevent(kq, kev, 1, kev, 2, &ts); + ASSX(n == 0); + EV_SET(&kev[0], 1, EVFILT_USER, 0, NOTE_FFOR | 0xff00ff, 0, NULL); + n = kevent(kq, kev, 1, kev, 2, &ts); + ASSX(n == 0); + EV_SET(&kev[0], 1, EVFILT_USER, 0, NOTE_TRIGGER | NOTE_FFAND | 0x0ffff0, + 0, NULL); + n = kevent(kq, kev, 1, kev, 2, &ts); + ASSX(n == 1); + ASSX(kev[0].ident == 1); + ASSX(kev[0].fflags == 0x0faaf0); + ASSX(kev[0].data == 0); + ASSX(kev[0].udata == &dummy); + + /* Test event limit. */ + for (i = 0;; i++) { + EV_SET(&kev[0], i, EVFILT_USER, EV_ADD, 0, 0, NULL); + n = kevent(kq, kev, 1, NULL, 0, NULL); + if (n == -1) { + ASSX(errno == ENOMEM); + break; + } + ASSX(n == 0); + } + ASSX(i < 1000000); + + /* Delete one event, ... */ + EV_SET(&kev[0], 0, EVFILT_USER, EV_DELETE, 0, 0, NULL); + n = kevent(kq, kev, 1, NULL, 0, NULL); + ASSX(n == 0); + + /* ... after which adding should succeed. */ + EV_SET(&kev[0], 0, EVFILT_USER, EV_ADD, 0, 0, NULL); + n = kevent(kq, kev, 1, NULL, 0, NULL); + ASSX(n == 0); + + EV_SET(&kev[0], i, EVFILT_USER, EV_ADD, 0, 0, NULL); + n = kevent(kq, kev, 1, NULL, 0, NULL); + ASSX(n == -1); + ASSX(errno == ENOMEM); + + close(kq); + + return (0); +} Index: regress/sys/kern/kqueue/main.c =================================================================== RCS file: src/regress/sys/kern/kqueue/main.c,v retrieving revision 1.16 diff -u -p -r1.16 main.c --- regress/sys/kern/kqueue/main.c 20 Aug 2023 15:19:34 -0000 1.16 +++ regress/sys/kern/kqueue/main.c 6 May 2025 03:20:40 -0000 @@ -17,7 +17,7 @@ main(int argc, char **argv) int n, ret, c; ret = 0; - while ((c = getopt(argc, argv, "efFiIjlpPrR:stT:")) != -1) { + while ((c = getopt(argc, argv, "efFiIjlpPrR:stT:u")) != -1) { switch (c) { case 'e': ret |= do_exec(argv[0]); @@ -63,8 +63,11 @@ main(int argc, char **argv) n = strtonum(optarg, 1, INT_MAX, NULL); ret |= do_pty(n); break; + case 'u': + ret |= do_user(); + break; default: - fprintf(stderr, "usage: %s -[fFiIlpPrstT] [-R n]\n", + fprintf(stderr, "usage: %s -[fFiIlpPrstTu] [-R n]\n", __progname); exit(1); } Index: regress/sys/kern/kqueue/main.h =================================================================== RCS file: src/regress/sys/kern/kqueue/main.h,v retrieving revision 1.7 diff -u -p -r1.7 main.h --- regress/sys/kern/kqueue/main.h 20 Aug 2023 15:19:34 -0000 1.7 +++ regress/sys/kern/kqueue/main.h 6 May 2025 03:20:40 -0000 @@ -29,3 +29,4 @@ int do_reset_timer(void); int do_signal(void); int do_timer(void); int do_tun(void); +int do_user(void); Index: sys/kern/kern_descrip.c =================================================================== RCS file: src/sys/kern/kern_descrip.c,v retrieving revision 1.210 diff -u -p -r1.210 kern_descrip.c --- sys/kern/kern_descrip.c 30 Dec 2024 02:46:00 -0000 1.210 +++ sys/kern/kern_descrip.c 6 May 2025 03:20:41 -0000 @@ -39,6 +39,7 @@ #include #include +#include #include #include #include @@ -1201,6 +1202,7 @@ fdfree(struct proc *p) vrele(fdp->fd_cdir); if (fdp->fd_rdir) vrele(fdp->fd_rdir); + KASSERT(atomic_load_int(&fdp->fd_nuserevents) == 0); pool_put(&fdesc_pool, fdp); } Index: sys/kern/kern_event.c =================================================================== RCS file: src/sys/kern/kern_event.c,v retrieving revision 1.201 diff -u -p -r1.201 kern_event.c --- sys/kern/kern_event.c 10 Feb 2025 16:45:46 -0000 1.201 +++ sys/kern/kern_event.c 6 May 2025 03:20:41 -0000 @@ -30,6 +30,7 @@ #include #include +#include #include #include #include @@ -135,6 +136,10 @@ int filt_timerattach(struct knote *kn); void filt_timerdetach(struct knote *kn); int filt_timermodify(struct kevent *kev, struct knote *kn); int filt_timerprocess(struct knote *kn, struct kevent *kev); +int filt_userattach(struct knote *kn); +void filt_userdetach(struct knote *kn); +int filt_usermodify(struct kevent *kev, struct knote *kn); +int filt_userprocess(struct knote *kn, struct kevent *kev); void filt_seltruedetach(struct knote *kn); const struct filterops kqread_filtops = { @@ -180,12 +185,22 @@ const struct filterops timer_filtops = { .f_process = filt_timerprocess, }; +const struct filterops user_filtops = { + .f_flags = FILTEROP_MPSAFE, + .f_attach = filt_userattach, + .f_detach = filt_userdetach, + .f_event = NULL, + .f_modify = filt_usermodify, + .f_process = filt_userprocess, +}; + struct pool knote_pool; struct pool kqueue_pool; struct mutex kqueue_klist_lock = MUTEX_INITIALIZER(IPL_MPFLOOR); struct rwlock kqueue_ps_list_lock = RWLOCK_INITIALIZER("kqpsl"); int kq_ntimeouts = 0; int kq_timeoutmax = (4 * 1024); +unsigned int kq_usereventsmax = 1024; /* per process */ #define KN_HASH(val, mask) (((val) ^ (val >> 8)) & (mask)) @@ -202,6 +217,7 @@ const struct filterops *const sysfilt_op &timer_filtops, /* EVFILT_TIMER */ &file_filtops, /* EVFILT_DEVICE */ &file_filtops, /* EVFILT_EXCEPT */ + &user_filtops, /* EVFILT_USER */ }; void @@ -731,6 +747,91 @@ filt_timerprocess(struct knote *kn, stru return (active); } +int +filt_userattach(struct knote *kn) +{ + struct filedesc *fdp = kn->kn_kq->kq_fdp; + u_int nuserevents; + + nuserevents = atomic_inc_int_nv(&fdp->fd_nuserevents); + if (nuserevents > atomic_load_int(&kq_usereventsmax)) { + atomic_dec_int(&fdp->fd_nuserevents); + return (ENOMEM); + } + + kn->kn_ptr.p_useract = ((kn->kn_sfflags & NOTE_TRIGGER) != 0); + kn->kn_fflags = kn->kn_sfflags & NOTE_FFLAGSMASK; + kn->kn_data = kn->kn_sdata; + + return (0); +} + +void +filt_userdetach(struct knote *kn) +{ + struct filedesc *fdp = kn->kn_kq->kq_fdp; + + atomic_dec_int(&fdp->fd_nuserevents); +} + +int +filt_usermodify(struct kevent *kev, struct knote *kn) +{ + unsigned int ffctrl, fflags; + + if (kev->fflags & NOTE_TRIGGER) + kn->kn_ptr.p_useract = 1; + + ffctrl = kev->fflags & NOTE_FFCTRLMASK; + fflags = kev->fflags & NOTE_FFLAGSMASK; + switch (ffctrl) { + case NOTE_FFNOP: + break; + case NOTE_FFAND: + kn->kn_fflags &= fflags; + break; + case NOTE_FFOR: + kn->kn_fflags |= fflags; + break; + case NOTE_FFCOPY: + kn->kn_fflags = fflags; + break; + default: + /* ignored, should not happen */ + break; + } + + if (kev->flags & EV_ADD) { + kn->kn_data = kev->data; + kn->kn_udata = kev->udata; + } + + /* Allow clearing of an activated event. */ + if (kev->flags & EV_CLEAR) { + kn->kn_ptr.p_useract = 0; + kn->kn_data = 0; + } + + return (kn->kn_ptr.p_useract); +} + +int +filt_userprocess(struct knote *kn, struct kevent *kev) +{ + int active; + + active = kn->kn_ptr.p_useract; + if (active && kev != NULL) { + *kev = kn->kn_kevent; + if (kn->kn_flags & EV_CLEAR) { + kn->kn_ptr.p_useract = 0; + kn->kn_fflags = 0; + kn->kn_data = 0; + } + } + + return (active); +} /* * filt_seltrue: @@ -1411,6 +1512,17 @@ again: filter_detach(kn); knote_drop(kn, p); goto done; + } else if (kn->kn_fop == &user_filtops) { + /* Call f_modify to allow NOTE_TRIGGER without EV_ADD. */ + mtx_leave(&kq->kq_lock); + active = filter_modify(kev, kn); + mtx_enter(&kq->kq_lock); + if (active) + knote_activate(kn); + if (kev->flags & EV_ERROR) { + error = kev->data; + goto release; + } } if ((kev->flags & EV_DISABLE) && ((kn->kn_status & KN_DISABLED) == 0)) Index: sys/sys/event.h =================================================================== RCS file: src/sys/sys/event.h,v retrieving revision 1.73 diff -u -p -r1.73 event.h --- sys/sys/event.h 6 Aug 2024 08:44:54 -0000 1.73 +++ sys/sys/event.h 6 May 2025 03:20:41 -0000 @@ -40,8 +40,9 @@ #define EVFILT_TIMER (-7) /* timers */ #define EVFILT_DEVICE (-8) /* devices */ #define EVFILT_EXCEPT (-9) /* exceptional conditions */ +#define EVFILT_USER (-10) /* user event */ -#define EVFILT_SYSCOUNT 9 +#define EVFILT_SYSCOUNT 10 #define EV_SET(kevp, a, b, c, d, e, f) do { \ struct kevent *__kevp = (kevp); \ @@ -130,6 +131,19 @@ struct kevent { #define NOTE_ABSTIME 0x00000010 /* timeout is absolute */ /* + * data/hint flags for EVFILT_USER, shared with userspace + */ +#define NOTE_FFNOP 0x00000000 /* ignore input fflags */ +#define NOTE_FFAND 0x40000000 /* AND fflags */ +#define NOTE_FFOR 0x80000000 /* OR fflags */ +#define NOTE_FFCOPY 0xc0000000 /* copy fflags */ + +#define NOTE_FFCTRLMASK 0xc0000000 /* masks for operations */ +#define NOTE_FFLAGSMASK 0x00ffffff + +#define NOTE_TRIGGER 0x01000000 /* trigger the event */ + +/* * This is currently visible to userland to work around broken * programs which pull in or . */ @@ -244,6 +258,7 @@ struct knote { union { struct file *p_fp; /* file data pointer */ struct process *p_process; /* process pointer */ + int p_useract; /* user event active */ } kn_ptr; const struct filterops *kn_fop; void *kn_hook; /* [o] */ Index: sys/sys/filedesc.h =================================================================== RCS file: src/sys/sys/filedesc.h,v retrieving revision 1.46 diff -u -p -r1.46 filedesc.h --- sys/sys/filedesc.h 12 May 2022 13:33:09 -0000 1.46 +++ sys/sys/filedesc.h 6 May 2025 03:20:41 -0000 @@ -87,6 +87,7 @@ struct filedesc { LIST_HEAD(, kqueue) fd_kqlist; /* [f] kqueues attached to this * filedesc */ int fd_flags; /* [a] flags on this filedesc */ + u_int fd_nuserevents; /* [a] number of kqueue user events */ }; /* Index: usr.bin/kdump/mksubr =================================================================== RCS file: src/usr.bin/kdump/mksubr,v retrieving revision 1.40 diff -u -p -r1.40 mksubr --- usr.bin/kdump/mksubr 13 Aug 2023 08:29:28 -0000 1.40 +++ usr.bin/kdump/mksubr 6 May 2025 03:20:41 -0000 @@ -583,6 +583,27 @@ cat <<_EOF_ or = 1; if_print_or(fflags, NOTE_ABSTIME, or); break; + case EVFILT_USER: + if (fflags & NOTE_FFCTRLMASK) { + switch (fflags & NOTE_FFCTRLMASK) { + case NOTE_FFAND: + printf("NOTE_FFAND"); + break; + case NOTE_FFOR: + printf("NOTE_FFOR"); + break; + case NOTE_FFCOPY: + printf("NOTE_FFCOPY"); + break; + } + or = 1; + } + if_print_or(fflags, NOTE_TRIGGER, or); + if (fflags & NOTE_FFLAGSMASK) { + printf("%s%#x", or ? "|" : "", + fflags & NOTE_FFLAGSMASK); + } + break; } printf(">"); }