From: Claudio Jeker Subject: Re: cache route at pf state To: Alexander Bluhm Cc: tech@openbsd.org Date: Mon, 22 Jul 2024 10:15:40 +0200 On Sun, Jul 21, 2024 at 11:41:21AM +0200, Alexander Bluhm wrote: > On Sun, Jul 21, 2024 at 09:48:15AM +0200, Claudio Jeker wrote: > > On Sat, Jul 20, 2024 at 07:20:32PM +0200, Alexander Bluhm wrote: > > > Hi, > > > > > > When forwarding packets, rtable_match takes quite some time. > > > > > > http://bluhm.genua.de/perform/results/2024-07-19T07:25:47Z/2024-07-19T00%3A00%3A00Z/btrace/ssh_perform%40lt13_iperf3_-6_-cfdd7%3Ae83e%3A66bc%3A0346%3A%3A36_-P10_-t10-btrace-kstack.0.svg?s=rtable_match > > > > > > This can be avoided by caching the route at the pf state. > > > > > > Comments, tests? > > > > I think this is a bad idea. > > Caching this into a pf state will have negative consequences in various > > scenarios. This will cache route entries for possible a long time for > > sessions that are idle. A remote actor may use this to attack the local > > system. > > Of course pf states have different timeouts than ARP or ND6. Or > are you more concerned about dynamic BGP routes that get referenced > by states? Both. In most cases BGP routers will probably avoid states since routing at that level is asymetric. > I think the risk is limited. If someone can fill up your state > table, the additional route reference is not that evil. I can have million of states. Having million of extra route objects hanging around for 24h is not great. > State timeouts for traffic from only one side are rather short. > When using ARP or ND6 attacks to non existing neigbors both routes > and states will time out. But yes, timeout may be longer for states. > As counter measure we could activate cache only for states that > have seen bidirectinal traffic. With IPv6 you can create as many bidirectional states as you like until the operator notices and blackholes your /64. > For a router not forwarding to direct neigbors I would expect that > the state table gets full before the routing table. Note that only > references from unused states to stale routes are a problem. For > a dynamic router with statefull filtering I would expect that a DoS > uses all states. Then too many routes do not matter. I don't know if this is true. I had never tried this out. I did hit various problems when systems are getting stressed out by an attacker. So I'm just generally sceptical about making the pf state octopus even bigger. It has tentacles into far too many bits. > I don't see the immediate threat, but that does not prove there is > none. > > > In my opinion I would prefer if you looked into the reason why > > rtable_match() takes so much time. Unless you run with large routing > > tables the lookups should be cheap. Is there a lock or mutex that causes > > this? > > It is a lockless SRP ART table. As I understand it, mpi@ has > optimized the lookup when he implemented it. It is not a MP problem, > route lookup is just another slow thing in comparison to other > packet processing. I doubt this is the case. Something goes very wrong here. Your tests have a minimal routing table with a couple of dozen routes (for IPv6). Lookups should be ultra cheap in that case. Looking at the trace it seems that SRP is actually rather expensive. Not sure why (maybe it is the refcnt). There was some work done to switch art to SMR but rtable_walk makes this complicated. > Any better ideas where to cache the route? You neither like per > CPU cache nor pf state cache. I just think that fixing the root cause and make lookups so cheap that an extra cache will not be benefitial is a better option. Especially art table lookups should be substantially faster than pf state lookups so something is not quite right here. I'm not against per-CPU caches but only if they really move the needle (outside of some constructed test case). Caching routes comes with some overhead so at what point will the cache actually do more harm? > Cache at internet PCB works fine for me and covers most of my use > cases. My impression is, most people use OpenBSD for forwarding > and filtering with pf. So I wanted to improve this. I am still > interested in numbers how much route cache at states speeds up real > life scenarios. I'm also interested in those numbers. > bluhm > > > > Index: net/if_bridge.c > > > =================================================================== > > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_bridge.c,v > > > diff -u -p -r1.370 if_bridge.c > > > --- net/if_bridge.c 14 Apr 2024 20:46:27 -0000 1.370 > > > +++ net/if_bridge.c 20 Jul 2024 14:52:40 -0000 > > > @@ -1619,7 +1619,7 @@ bridge_ipsec(struct ifnet *ifp, struct e > > > #if NPF > 0 > > > if ((encif = enc_getif(tdb->tdb_rdomain, > > > tdb->tdb_tap)) == NULL || > > > - pf_test(af, dir, encif, &m) != PF_PASS) { > > > + pf_test(af, dir, encif, &m, NULL) != PF_PASS) { > > > m_freem(m); > > > tdb_unref(tdb); > > > return (1); > > > @@ -1725,7 +1725,7 @@ bridge_ip(struct ifnet *brifp, int dir, > > > #endif /* IPSEC */ > > > #if NPF > 0 > > > /* Finally, we get to filter the packet! */ > > > - if (pf_test(AF_INET, dir, ifp, &m) != PF_PASS) > > > + if (pf_test(AF_INET, dir, ifp, &m, NULL) != PF_PASS) > > > goto dropit; > > > if (m == NULL) > > > goto dropit; > > > @@ -1768,7 +1768,7 @@ bridge_ip(struct ifnet *brifp, int dir, > > > #endif /* IPSEC */ > > > > > > #if NPF > 0 > > > - if (pf_test(AF_INET6, dir, ifp, &m) != PF_PASS) > > > + if (pf_test(AF_INET6, dir, ifp, &m, NULL) != PF_PASS) > > > goto dropit; > > > if (m == NULL) > > > return (NULL); > > > Index: net/if_tpmr.c > > > =================================================================== > > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_tpmr.c,v > > > diff -u -p -r1.35 if_tpmr.c > > > --- net/if_tpmr.c 23 Dec 2023 10:52:54 -0000 1.35 > > > +++ net/if_tpmr.c 20 Jul 2024 14:52:40 -0000 > > > @@ -295,7 +295,7 @@ tpmr_pf(struct ifnet *ifp0, int dir, str > > > return (NULL); > > > } > > > > > > - if (pf_test(fam->af, dir, ifp0, &m) != PF_PASS) { > > > + if (pf_test(fam->af, dir, ifp0, &m, NULL) != PF_PASS) { > > > m_freem(m); > > > return (NULL); > > > } > > > Index: net/if_veb.c > > > =================================================================== > > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_veb.c,v > > > diff -u -p -r1.35 if_veb.c > > > --- net/if_veb.c 13 Feb 2024 12:22:09 -0000 1.35 > > > +++ net/if_veb.c 20 Jul 2024 14:52:40 -0000 > > > @@ -644,7 +644,7 @@ veb_pf(struct ifnet *ifp0, int dir, stru > > > if (m == NULL) > > > return (NULL); > > > > > > - if (pf_test(fam->af, dir, ifp0, &m) != PF_PASS) { > > > + if (pf_test(fam->af, dir, ifp0, &m, NULL) != PF_PASS) { > > > m_freem(m); > > > return (NULL); > > > } > > > @@ -839,7 +839,7 @@ veb_ipsec_proto_out(struct mbuf *m, sa_f > > > #if NPF > 0 > > > encifp = enc_getif(tdb->tdb_rdomain, tdb->tdb_tap); > > > if (encifp != NULL) { > > > - if (pf_test(af, PF_OUT, encifp, &m) != PF_PASS) { > > > + if (pf_test(af, PF_OUT, encifp, &m, NULL) != PF_PASS) { > > > m_freem(m); > > > return (NULL); > > > } > > > Index: net/pf.c > > > =================================================================== > > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/pf.c,v > > > diff -u -p -r1.1203 pf.c > > > --- net/pf.c 14 Jul 2024 18:53:39 -0000 1.1203 > > > +++ net/pf.c 20 Jul 2024 15:13:53 -0000 > > > @@ -113,6 +113,7 @@ struct pf_queuehead *pf_queues_inactive; > > > struct pf_status pf_status; > > > > > > struct mutex pf_inp_mtx = MUTEX_INITIALIZER(IPL_SOFTNET); > > > +struct mutex pf_st_ro_mtx = MUTEX_INITIALIZER(IPL_SOFTNET); > > > > > > int pf_hdr_limit = 20; /* arbitrary limit, tune in ddb */ > > > > > > @@ -6638,7 +6639,7 @@ pf_route(struct pf_pdesc *pd, struct pf_ > > > ip->ip_src = ifatoia(rt->rt_ifa)->ia_addr.sin_addr; > > > > > > if (st->rt != PF_DUPTO && pd->dir == PF_IN) { > > > - if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS) > > > + if (pf_test(AF_INET, PF_OUT, ifp, &m0, NULL) != PF_PASS) > > > goto bad; > > > else if (m0 == NULL) > > > goto done; > > > @@ -6758,7 +6759,7 @@ pf_route6(struct pf_pdesc *pd, struct pf > > > ip6->ip6_src = ifatoia6(rt->rt_ifa)->ia_addr.sin6_addr; > > > > > > if (st->rt != PF_DUPTO && pd->dir == PF_IN) { > > > - if (pf_test(AF_INET6, PF_OUT, ifp, &m0) != PF_PASS) > > > + if (pf_test(AF_INET6, PF_OUT, ifp, &m0, NULL) != PF_PASS) > > > goto bad; > > > else if (m0 == NULL) > > > goto done; > > > @@ -7533,7 +7534,8 @@ pf_counters_inc(int action, struct pf_pd > > > } > > > > > > int > > > -pf_test(sa_family_t af, int fwdir, struct ifnet *ifp, struct mbuf **m0) > > > +pf_test(sa_family_t af, int fwdir, struct ifnet *ifp, struct mbuf **m0, > > > + struct route *ro) > > > { > > > #if NCARP > 0 > > > struct ifnet *ifp0; > > > @@ -7761,7 +7763,7 @@ pf_test(sa_family_t af, int fwdir, struc > > > pf_syncookie_validate(&pd)) { > > > struct mbuf *msyn = pf_syncookie_recreate_syn(&pd); > > > if (msyn) { > > > - action = pf_test(af, fwdir, ifp, &msyn); > > > + action = pf_test(af, fwdir, ifp, &msyn, NULL); > > > m_freem(msyn); > > > if (action == PF_PASS || action == PF_AFRT) { > > > PF_STATE_ENTER_READ(); > > > @@ -8039,12 +8041,47 @@ done: > > > action = pf_refragment6(&pd.m, mtag, NULL, NULL, NULL); > > > } > > > #endif /* INET6 */ > > > + /* update interface index for pflow(4) */ > > > if (st && action != PF_DROP) { > > > if (!st->if_index_in && dir == PF_IN) > > > st->if_index_in = ifp->if_index; > > > else if (!st->if_index_out && dir == PF_OUT) > > > st->if_index_out = ifp->if_index; > > > } > > > + /* the pf state keeps cache of route */ > > > + if (ro && st && pd.m && action == PF_PASS && pf_ouraddr(pd.m) != 1) { > > > + /* check cache without lock, worst case is additional lookup */ > > > + if (memcmp(ro, &st->route, sizeof(*ro)) != 0) { > > > + mtx_enter(&pf_st_ro_mtx); > > > + if (st->route.ro_rt != NULL) { > > > + rtfree(ro->ro_rt); > > > + *ro = st->route; > > > + rtref(ro->ro_rt); > > > + } > > > + mtx_leave(&pf_st_ro_mtx); > > > + } > > > + switch (pd.naf) { > > > + case AF_INET: > > > + route_mpath(ro, &pd.ndaddr.v4, &pd.nsaddr.v4, > > > + pd.m->m_pkthdr.ph_rtableid); > > > + break; > > > +#ifdef INET6 > > > + case AF_INET6: > > > + route6_mpath(ro, &pd.ndaddr.v6, &pd.nsaddr.v6, > > > + pd.m->m_pkthdr.ph_rtableid); > > > + break; > > > +#endif /* INET6 */ > > > + } > > > + if (ro->ro_rt != NULL) { > > > + if (memcmp(&st->route, ro, sizeof(*ro)) != 0) { > > > + mtx_enter(&pf_st_ro_mtx); > > > + rtfree(st->route.ro_rt); > > > + st->route = *ro; > > > + rtref(st->route.ro_rt); > > > + mtx_leave(&pf_st_ro_mtx); > > > + } > > > + } > > > + } > > > > > > out: > > > *m0 = pd.m; > > > @@ -8327,6 +8364,7 @@ pf_state_unref(struct pf_state *st) > > > > > > pf_state_key_unref(st->key[PF_SK_WIRE]); > > > pf_state_key_unref(st->key[PF_SK_STACK]); > > > + rtfree(st->route.ro_rt); > > > > > > pool_put(&pf_state_pl, st); > > > } > > > Index: net/pfvar.h > > > =================================================================== > > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/pfvar.h,v > > > diff -u -p -r1.538 pfvar.h > > > --- net/pfvar.h 13 May 2024 01:15:53 -0000 1.538 > > > +++ net/pfvar.h 20 Jul 2024 14:52:40 -0000 > > > @@ -1640,7 +1640,8 @@ int pf_setup_pdesc(struct pf_pdesc * > > > int, struct pfi_kif *, struct mbuf *, > > > u_short *); > > > > > > -int pf_test(sa_family_t, int, struct ifnet *, struct mbuf **); > > > +int pf_test(sa_family_t, int, struct ifnet *, struct mbuf **, > > > + struct route *); > > > > > > void pf_poolmask(struct pf_addr *, struct pf_addr*, > > > struct pf_addr *, struct pf_addr *, sa_family_t); > > > Index: net/pfvar_priv.h > > > =================================================================== > > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/pfvar_priv.h,v > > > diff -u -p -r1.37 pfvar_priv.h > > > --- net/pfvar_priv.h 21 Jun 2024 12:51:29 -0000 1.37 > > > +++ net/pfvar_priv.h 20 Jul 2024 14:52:40 -0000 > > > @@ -97,6 +97,7 @@ RBT_PROTOTYPE(pf_state_tree, pf_state_ke > > > * S pfsync > > > * L pf_state_list > > > * g pf_purge gc > > > + * R pf_st_ro_mtx > > > */ > > > > > > struct pf_state { > > > @@ -120,6 +121,7 @@ struct pf_state { > > > struct pf_sn_head src_nodes; /* [I] */ > > > struct pf_state_key *key[2]; /* [I] stack and wire */ > > > struct pfi_kif *kif; /* [I] */ > > > + struct route route; /* [R] */ > > > struct mutex mtx; > > > pf_refcnt_t refcnt; > > > u_int64_t packets[2]; > > > Index: netinet/ip_input.c > > > =================================================================== > > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_input.c,v > > > diff -u -p -r1.400 ip_input.c > > > --- netinet/ip_input.c 19 Jul 2024 16:58:31 -0000 1.400 > > > +++ netinet/ip_input.c 20 Jul 2024 14:52:40 -0000 > > > @@ -464,7 +464,7 @@ ip_input_if(struct mbuf **mp, int *offp, > > > * Packet filter > > > */ > > > odst = ip->ip_dst; > > > - if (pf_test(AF_INET, PF_IN, ifp, mp) != PF_PASS) > > > + if (pf_test(AF_INET, PF_IN, ifp, mp, &ro) != PF_PASS) > > > goto bad; > > > m = *mp; > > > if (m == NULL) > > > Index: netinet/ip_output.c > > > =================================================================== > > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_output.c,v > > > diff -u -p -r1.401 ip_output.c > > > --- netinet/ip_output.c 2 Jul 2024 18:33:47 -0000 1.401 > > > +++ netinet/ip_output.c 20 Jul 2024 14:52:40 -0000 > > > @@ -403,7 +403,7 @@ sendit: > > > */ > > > #if NPF > 0 > > > if (pf_test(AF_INET, (flags & IP_FORWARDING) ? PF_FWD : PF_OUT, > > > - ifp, &m) != PF_PASS) { > > > + ifp, &m, NULL) != PF_PASS) { > > > error = EACCES; > > > goto bad; > > > } > > > @@ -584,7 +584,8 @@ ip_output_ipsec_send(struct tdb *tdb, st > > > * Packet filter > > > */ > > > if ((encif = enc_getif(tdb->tdb_rdomain, tdb->tdb_tap)) == NULL || > > > - pf_test(AF_INET, fwd ? PF_FWD : PF_OUT, encif, &m) != PF_PASS) { > > > + pf_test(AF_INET, fwd ? PF_FWD : PF_OUT, encif, &m, NULL) > > > + != PF_PASS) { > > > m_freem(m); > > > return EACCES; > > > } > > > Index: netinet/ipsec_input.c > > > =================================================================== > > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ipsec_input.c,v > > > diff -u -p -r1.206 ipsec_input.c > > > --- netinet/ipsec_input.c 16 Sep 2023 09:33:27 -0000 1.206 > > > +++ netinet/ipsec_input.c 20 Jul 2024 14:52:40 -0000 > > > @@ -580,7 +580,7 @@ ipsec_common_input_cb(struct mbuf **mp, > > > if ((ifp = if_get(m->m_pkthdr.ph_ifidx)) == NULL) { > > > goto baddone; > > > } > > > - if (pf_test(af, PF_IN, ifp, mp) != PF_PASS) { > > > + if (pf_test(af, PF_IN, ifp, mp, NULL) != PF_PASS) { > > > if_put(ifp); > > > goto baddone; > > > } > > > Index: netinet6/ip6_forward.c > > > =================================================================== > > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/ip6_forward.c,v > > > diff -u -p -r1.124 ip6_forward.c > > > --- netinet6/ip6_forward.c 19 Jul 2024 16:58:32 -0000 1.124 > > > +++ netinet6/ip6_forward.c 20 Jul 2024 14:52:40 -0000 > > > @@ -322,7 +322,7 @@ reroute: > > > ip6->ip6_dst.s6_addr16[1] = 0; > > > > > > #if NPF > 0 > > > - if (pf_test(AF_INET6, PF_FWD, ifp, &m) != PF_PASS) { > > > + if (pf_test(AF_INET6, PF_FWD, ifp, &m, NULL) != PF_PASS) { > > > m_freem(m); > > > goto senderr; > > > } > > > Index: netinet6/ip6_input.c > > > =================================================================== > > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/ip6_input.c,v > > > diff -u -p -r1.266 ip6_input.c > > > --- netinet6/ip6_input.c 19 Jul 2024 16:58:32 -0000 1.266 > > > +++ netinet6/ip6_input.c 20 Jul 2024 14:52:40 -0000 > > > @@ -405,7 +405,7 @@ ip6_input_if(struct mbuf **mp, int *offp > > > * Packet filter > > > */ > > > odst = ip6->ip6_dst; > > > - if (pf_test(AF_INET6, PF_IN, ifp, mp) != PF_PASS) > > > + if (pf_test(AF_INET6, PF_IN, ifp, mp, &ro) != PF_PASS) > > > goto bad; > > > m = *mp; > > > if (m == NULL) > > > Index: netinet6/ip6_output.c > > > =================================================================== > > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/ip6_output.c,v > > > diff -u -p -r1.292 ip6_output.c > > > --- netinet6/ip6_output.c 4 Jul 2024 12:50:08 -0000 1.292 > > > +++ netinet6/ip6_output.c 20 Jul 2024 14:52:40 -0000 > > > @@ -619,7 +619,7 @@ reroute: > > > } > > > > > > #if NPF > 0 > > > - if (pf_test(AF_INET6, PF_OUT, ifp, &m) != PF_PASS) { > > > + if (pf_test(AF_INET6, PF_OUT, ifp, &m, NULL) != PF_PASS) { > > > error = EACCES; > > > m_freem(m); > > > goto done; > > > @@ -2840,7 +2840,8 @@ ip6_output_ipsec_send(struct tdb *tdb, s > > > * Packet filter > > > */ > > > if ((encif = enc_getif(tdb->tdb_rdomain, tdb->tdb_tap)) == NULL || > > > - pf_test(AF_INET6, fwd ? PF_FWD : PF_OUT, encif, &m) != PF_PASS) { > > > + pf_test(AF_INET6, fwd ? PF_FWD : PF_OUT, encif, &m, NULL) > > > + != PF_PASS) { > > > m_freem(m); > > > return EACCES; > > > } > > > > > > > -- > > :wq Claudio > -- :wq Claudio