From: Alexander Bluhm Subject: Re: cache route at pf state To: tech@openbsd.org Date: Sun, 21 Jul 2024 11:41:21 +0200 On Sun, Jul 21, 2024 at 09:48:15AM +0200, Claudio Jeker wrote: > On Sat, Jul 20, 2024 at 07:20:32PM +0200, Alexander Bluhm wrote: > > Hi, > > > > When forwarding packets, rtable_match takes quite some time. > > > > http://bluhm.genua.de/perform/results/2024-07-19T07:25:47Z/2024-07-19T00%3A00%3A00Z/btrace/ssh_perform%40lt13_iperf3_-6_-cfdd7%3Ae83e%3A66bc%3A0346%3A%3A36_-P10_-t10-btrace-kstack.0.svg?s=rtable_match > > > > This can be avoided by caching the route at the pf state. > > > > Comments, tests? > > I think this is a bad idea. > Caching this into a pf state will have negative consequences in various > scenarios. This will cache route entries for possible a long time for > sessions that are idle. A remote actor may use this to attack the local > system. Of course pf states have different timeouts than ARP or ND6. Or are you more concerned about dynamic BGP routes that get referenced by states? I think the risk is limited. If someone can fill up your state table, the additional route reference is not that evil. State timeouts for traffic from only one side are rather short. When using ARP or ND6 attacks to non existing neigbors both routes and states will time out. But yes, timeout may be longer for states. As counter measure we could activate cache only for states that have seen bidirectinal traffic. For a router not forwarding to direct neigbors I would expect that the state table gets full before the routing table. Note that only references from unused states to stale routes are a problem. For a dynamic router with statefull filtering I would expect that a DoS uses all states. Then too many routes do not matter. I don't see the immediate threat, but that does not prove there is none. > In my opinion I would prefer if you looked into the reason why > rtable_match() takes so much time. Unless you run with large routing > tables the lookups should be cheap. Is there a lock or mutex that causes > this? It is a lockless SRP ART table. As I understand it, mpi@ has optimized the lookup when he implemented it. It is not a MP problem, route lookup is just another slow thing in comparison to other packet processing. Any better ideas where to cache the route? You neither like per CPU cache nor pf state cache. Cache at internet PCB works fine for me and covers most of my use cases. My impression is, most people use OpenBSD for forwarding and filtering with pf. So I wanted to improve this. I am still interested in numbers how much route cache at states speeds up real life scenarios. bluhm > > Index: net/if_bridge.c > > =================================================================== > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_bridge.c,v > > diff -u -p -r1.370 if_bridge.c > > --- net/if_bridge.c 14 Apr 2024 20:46:27 -0000 1.370 > > +++ net/if_bridge.c 20 Jul 2024 14:52:40 -0000 > > @@ -1619,7 +1619,7 @@ bridge_ipsec(struct ifnet *ifp, struct e > > #if NPF > 0 > > if ((encif = enc_getif(tdb->tdb_rdomain, > > tdb->tdb_tap)) == NULL || > > - pf_test(af, dir, encif, &m) != PF_PASS) { > > + pf_test(af, dir, encif, &m, NULL) != PF_PASS) { > > m_freem(m); > > tdb_unref(tdb); > > return (1); > > @@ -1725,7 +1725,7 @@ bridge_ip(struct ifnet *brifp, int dir, > > #endif /* IPSEC */ > > #if NPF > 0 > > /* Finally, we get to filter the packet! */ > > - if (pf_test(AF_INET, dir, ifp, &m) != PF_PASS) > > + if (pf_test(AF_INET, dir, ifp, &m, NULL) != PF_PASS) > > goto dropit; > > if (m == NULL) > > goto dropit; > > @@ -1768,7 +1768,7 @@ bridge_ip(struct ifnet *brifp, int dir, > > #endif /* IPSEC */ > > > > #if NPF > 0 > > - if (pf_test(AF_INET6, dir, ifp, &m) != PF_PASS) > > + if (pf_test(AF_INET6, dir, ifp, &m, NULL) != PF_PASS) > > goto dropit; > > if (m == NULL) > > return (NULL); > > Index: net/if_tpmr.c > > =================================================================== > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_tpmr.c,v > > diff -u -p -r1.35 if_tpmr.c > > --- net/if_tpmr.c 23 Dec 2023 10:52:54 -0000 1.35 > > +++ net/if_tpmr.c 20 Jul 2024 14:52:40 -0000 > > @@ -295,7 +295,7 @@ tpmr_pf(struct ifnet *ifp0, int dir, str > > return (NULL); > > } > > > > - if (pf_test(fam->af, dir, ifp0, &m) != PF_PASS) { > > + if (pf_test(fam->af, dir, ifp0, &m, NULL) != PF_PASS) { > > m_freem(m); > > return (NULL); > > } > > Index: net/if_veb.c > > =================================================================== > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_veb.c,v > > diff -u -p -r1.35 if_veb.c > > --- net/if_veb.c 13 Feb 2024 12:22:09 -0000 1.35 > > +++ net/if_veb.c 20 Jul 2024 14:52:40 -0000 > > @@ -644,7 +644,7 @@ veb_pf(struct ifnet *ifp0, int dir, stru > > if (m == NULL) > > return (NULL); > > > > - if (pf_test(fam->af, dir, ifp0, &m) != PF_PASS) { > > + if (pf_test(fam->af, dir, ifp0, &m, NULL) != PF_PASS) { > > m_freem(m); > > return (NULL); > > } > > @@ -839,7 +839,7 @@ veb_ipsec_proto_out(struct mbuf *m, sa_f > > #if NPF > 0 > > encifp = enc_getif(tdb->tdb_rdomain, tdb->tdb_tap); > > if (encifp != NULL) { > > - if (pf_test(af, PF_OUT, encifp, &m) != PF_PASS) { > > + if (pf_test(af, PF_OUT, encifp, &m, NULL) != PF_PASS) { > > m_freem(m); > > return (NULL); > > } > > Index: net/pf.c > > =================================================================== > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/pf.c,v > > diff -u -p -r1.1203 pf.c > > --- net/pf.c 14 Jul 2024 18:53:39 -0000 1.1203 > > +++ net/pf.c 20 Jul 2024 15:13:53 -0000 > > @@ -113,6 +113,7 @@ struct pf_queuehead *pf_queues_inactive; > > struct pf_status pf_status; > > > > struct mutex pf_inp_mtx = MUTEX_INITIALIZER(IPL_SOFTNET); > > +struct mutex pf_st_ro_mtx = MUTEX_INITIALIZER(IPL_SOFTNET); > > > > int pf_hdr_limit = 20; /* arbitrary limit, tune in ddb */ > > > > @@ -6638,7 +6639,7 @@ pf_route(struct pf_pdesc *pd, struct pf_ > > ip->ip_src = ifatoia(rt->rt_ifa)->ia_addr.sin_addr; > > > > if (st->rt != PF_DUPTO && pd->dir == PF_IN) { > > - if (pf_test(AF_INET, PF_OUT, ifp, &m0) != PF_PASS) > > + if (pf_test(AF_INET, PF_OUT, ifp, &m0, NULL) != PF_PASS) > > goto bad; > > else if (m0 == NULL) > > goto done; > > @@ -6758,7 +6759,7 @@ pf_route6(struct pf_pdesc *pd, struct pf > > ip6->ip6_src = ifatoia6(rt->rt_ifa)->ia_addr.sin6_addr; > > > > if (st->rt != PF_DUPTO && pd->dir == PF_IN) { > > - if (pf_test(AF_INET6, PF_OUT, ifp, &m0) != PF_PASS) > > + if (pf_test(AF_INET6, PF_OUT, ifp, &m0, NULL) != PF_PASS) > > goto bad; > > else if (m0 == NULL) > > goto done; > > @@ -7533,7 +7534,8 @@ pf_counters_inc(int action, struct pf_pd > > } > > > > int > > -pf_test(sa_family_t af, int fwdir, struct ifnet *ifp, struct mbuf **m0) > > +pf_test(sa_family_t af, int fwdir, struct ifnet *ifp, struct mbuf **m0, > > + struct route *ro) > > { > > #if NCARP > 0 > > struct ifnet *ifp0; > > @@ -7761,7 +7763,7 @@ pf_test(sa_family_t af, int fwdir, struc > > pf_syncookie_validate(&pd)) { > > struct mbuf *msyn = pf_syncookie_recreate_syn(&pd); > > if (msyn) { > > - action = pf_test(af, fwdir, ifp, &msyn); > > + action = pf_test(af, fwdir, ifp, &msyn, NULL); > > m_freem(msyn); > > if (action == PF_PASS || action == PF_AFRT) { > > PF_STATE_ENTER_READ(); > > @@ -8039,12 +8041,47 @@ done: > > action = pf_refragment6(&pd.m, mtag, NULL, NULL, NULL); > > } > > #endif /* INET6 */ > > + /* update interface index for pflow(4) */ > > if (st && action != PF_DROP) { > > if (!st->if_index_in && dir == PF_IN) > > st->if_index_in = ifp->if_index; > > else if (!st->if_index_out && dir == PF_OUT) > > st->if_index_out = ifp->if_index; > > } > > + /* the pf state keeps cache of route */ > > + if (ro && st && pd.m && action == PF_PASS && pf_ouraddr(pd.m) != 1) { > > + /* check cache without lock, worst case is additional lookup */ > > + if (memcmp(ro, &st->route, sizeof(*ro)) != 0) { > > + mtx_enter(&pf_st_ro_mtx); > > + if (st->route.ro_rt != NULL) { > > + rtfree(ro->ro_rt); > > + *ro = st->route; > > + rtref(ro->ro_rt); > > + } > > + mtx_leave(&pf_st_ro_mtx); > > + } > > + switch (pd.naf) { > > + case AF_INET: > > + route_mpath(ro, &pd.ndaddr.v4, &pd.nsaddr.v4, > > + pd.m->m_pkthdr.ph_rtableid); > > + break; > > +#ifdef INET6 > > + case AF_INET6: > > + route6_mpath(ro, &pd.ndaddr.v6, &pd.nsaddr.v6, > > + pd.m->m_pkthdr.ph_rtableid); > > + break; > > +#endif /* INET6 */ > > + } > > + if (ro->ro_rt != NULL) { > > + if (memcmp(&st->route, ro, sizeof(*ro)) != 0) { > > + mtx_enter(&pf_st_ro_mtx); > > + rtfree(st->route.ro_rt); > > + st->route = *ro; > > + rtref(st->route.ro_rt); > > + mtx_leave(&pf_st_ro_mtx); > > + } > > + } > > + } > > > > out: > > *m0 = pd.m; > > @@ -8327,6 +8364,7 @@ pf_state_unref(struct pf_state *st) > > > > pf_state_key_unref(st->key[PF_SK_WIRE]); > > pf_state_key_unref(st->key[PF_SK_STACK]); > > + rtfree(st->route.ro_rt); > > > > pool_put(&pf_state_pl, st); > > } > > Index: net/pfvar.h > > =================================================================== > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/pfvar.h,v > > diff -u -p -r1.538 pfvar.h > > --- net/pfvar.h 13 May 2024 01:15:53 -0000 1.538 > > +++ net/pfvar.h 20 Jul 2024 14:52:40 -0000 > > @@ -1640,7 +1640,8 @@ int pf_setup_pdesc(struct pf_pdesc * > > int, struct pfi_kif *, struct mbuf *, > > u_short *); > > > > -int pf_test(sa_family_t, int, struct ifnet *, struct mbuf **); > > +int pf_test(sa_family_t, int, struct ifnet *, struct mbuf **, > > + struct route *); > > > > void pf_poolmask(struct pf_addr *, struct pf_addr*, > > struct pf_addr *, struct pf_addr *, sa_family_t); > > Index: net/pfvar_priv.h > > =================================================================== > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/pfvar_priv.h,v > > diff -u -p -r1.37 pfvar_priv.h > > --- net/pfvar_priv.h 21 Jun 2024 12:51:29 -0000 1.37 > > +++ net/pfvar_priv.h 20 Jul 2024 14:52:40 -0000 > > @@ -97,6 +97,7 @@ RBT_PROTOTYPE(pf_state_tree, pf_state_ke > > * S pfsync > > * L pf_state_list > > * g pf_purge gc > > + * R pf_st_ro_mtx > > */ > > > > struct pf_state { > > @@ -120,6 +121,7 @@ struct pf_state { > > struct pf_sn_head src_nodes; /* [I] */ > > struct pf_state_key *key[2]; /* [I] stack and wire */ > > struct pfi_kif *kif; /* [I] */ > > + struct route route; /* [R] */ > > struct mutex mtx; > > pf_refcnt_t refcnt; > > u_int64_t packets[2]; > > Index: netinet/ip_input.c > > =================================================================== > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_input.c,v > > diff -u -p -r1.400 ip_input.c > > --- netinet/ip_input.c 19 Jul 2024 16:58:31 -0000 1.400 > > +++ netinet/ip_input.c 20 Jul 2024 14:52:40 -0000 > > @@ -464,7 +464,7 @@ ip_input_if(struct mbuf **mp, int *offp, > > * Packet filter > > */ > > odst = ip->ip_dst; > > - if (pf_test(AF_INET, PF_IN, ifp, mp) != PF_PASS) > > + if (pf_test(AF_INET, PF_IN, ifp, mp, &ro) != PF_PASS) > > goto bad; > > m = *mp; > > if (m == NULL) > > Index: netinet/ip_output.c > > =================================================================== > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_output.c,v > > diff -u -p -r1.401 ip_output.c > > --- netinet/ip_output.c 2 Jul 2024 18:33:47 -0000 1.401 > > +++ netinet/ip_output.c 20 Jul 2024 14:52:40 -0000 > > @@ -403,7 +403,7 @@ sendit: > > */ > > #if NPF > 0 > > if (pf_test(AF_INET, (flags & IP_FORWARDING) ? PF_FWD : PF_OUT, > > - ifp, &m) != PF_PASS) { > > + ifp, &m, NULL) != PF_PASS) { > > error = EACCES; > > goto bad; > > } > > @@ -584,7 +584,8 @@ ip_output_ipsec_send(struct tdb *tdb, st > > * Packet filter > > */ > > if ((encif = enc_getif(tdb->tdb_rdomain, tdb->tdb_tap)) == NULL || > > - pf_test(AF_INET, fwd ? PF_FWD : PF_OUT, encif, &m) != PF_PASS) { > > + pf_test(AF_INET, fwd ? PF_FWD : PF_OUT, encif, &m, NULL) > > + != PF_PASS) { > > m_freem(m); > > return EACCES; > > } > > Index: netinet/ipsec_input.c > > =================================================================== > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ipsec_input.c,v > > diff -u -p -r1.206 ipsec_input.c > > --- netinet/ipsec_input.c 16 Sep 2023 09:33:27 -0000 1.206 > > +++ netinet/ipsec_input.c 20 Jul 2024 14:52:40 -0000 > > @@ -580,7 +580,7 @@ ipsec_common_input_cb(struct mbuf **mp, > > if ((ifp = if_get(m->m_pkthdr.ph_ifidx)) == NULL) { > > goto baddone; > > } > > - if (pf_test(af, PF_IN, ifp, mp) != PF_PASS) { > > + if (pf_test(af, PF_IN, ifp, mp, NULL) != PF_PASS) { > > if_put(ifp); > > goto baddone; > > } > > Index: netinet6/ip6_forward.c > > =================================================================== > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/ip6_forward.c,v > > diff -u -p -r1.124 ip6_forward.c > > --- netinet6/ip6_forward.c 19 Jul 2024 16:58:32 -0000 1.124 > > +++ netinet6/ip6_forward.c 20 Jul 2024 14:52:40 -0000 > > @@ -322,7 +322,7 @@ reroute: > > ip6->ip6_dst.s6_addr16[1] = 0; > > > > #if NPF > 0 > > - if (pf_test(AF_INET6, PF_FWD, ifp, &m) != PF_PASS) { > > + if (pf_test(AF_INET6, PF_FWD, ifp, &m, NULL) != PF_PASS) { > > m_freem(m); > > goto senderr; > > } > > Index: netinet6/ip6_input.c > > =================================================================== > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/ip6_input.c,v > > diff -u -p -r1.266 ip6_input.c > > --- netinet6/ip6_input.c 19 Jul 2024 16:58:32 -0000 1.266 > > +++ netinet6/ip6_input.c 20 Jul 2024 14:52:40 -0000 > > @@ -405,7 +405,7 @@ ip6_input_if(struct mbuf **mp, int *offp > > * Packet filter > > */ > > odst = ip6->ip6_dst; > > - if (pf_test(AF_INET6, PF_IN, ifp, mp) != PF_PASS) > > + if (pf_test(AF_INET6, PF_IN, ifp, mp, &ro) != PF_PASS) > > goto bad; > > m = *mp; > > if (m == NULL) > > Index: netinet6/ip6_output.c > > =================================================================== > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/ip6_output.c,v > > diff -u -p -r1.292 ip6_output.c > > --- netinet6/ip6_output.c 4 Jul 2024 12:50:08 -0000 1.292 > > +++ netinet6/ip6_output.c 20 Jul 2024 14:52:40 -0000 > > @@ -619,7 +619,7 @@ reroute: > > } > > > > #if NPF > 0 > > - if (pf_test(AF_INET6, PF_OUT, ifp, &m) != PF_PASS) { > > + if (pf_test(AF_INET6, PF_OUT, ifp, &m, NULL) != PF_PASS) { > > error = EACCES; > > m_freem(m); > > goto done; > > @@ -2840,7 +2840,8 @@ ip6_output_ipsec_send(struct tdb *tdb, s > > * Packet filter > > */ > > if ((encif = enc_getif(tdb->tdb_rdomain, tdb->tdb_tap)) == NULL || > > - pf_test(AF_INET6, fwd ? PF_FWD : PF_OUT, encif, &m) != PF_PASS) { > > + pf_test(AF_INET6, fwd ? PF_FWD : PF_OUT, encif, &m, NULL) > > + != PF_PASS) { > > m_freem(m); > > return EACCES; > > } > > > > -- > :wq Claudio