From: Alexandr Nedvedicky Subject: pfsync does not update dst peer's state To: tech@openbsd.org Date: Sat, 18 May 2024 20:10:21 +0200 Hello, there was a report on bugs [1] back in Jan. There were few more reports off-list. Matthieu@ provided detailed information from his pfsync boxes. He is using two peering firewalls. He noticed there was excessive amount of pfsync traffic exchanged between peers. Poking further we discovered there were lot's of TCP states with peer end points in: ESTABLISHED : SYN_SENT FIN_WAIT_2 : SYN_SENT * : SYN_SENT on the secondary firewall peer. All those 'malformed' states were received from primary firewall peer. Looking at how pfsync is processing state update message from its peer we soon find a function pfsync_upd_tcp() which reads as follows: 2872 static int 2873 pfsync_upd_tcp(struct pf_state *st, const struct pfsync_state_peer *src, 2874 const struct pfsync_state_peer *dst) 2875 { 2876 int sync = 0; 2877 2878 /* 2879 * The state should never go backwards except 2880 * for syn-proxy states. Neither should the 2881 * sequence window slide backwards. 2882 */ 2883 if ((st->src.state > src->state && 2884 (st->src.state < PF_TCPS_PROXY_SRC || 2885 src->state >= PF_TCPS_PROXY_SRC)) || 2886 2887 (st->src.state == src->state && 2888 SEQ_GT(st->src.seqlo, ntohl(src->seqlo)))) 2889 sync++; 2890 else 2891 pf_state_peer_ntoh(src, &st->src); 2892 2893 if ((st->dst.state > dst->state) || 2894 2895 (st->dst.state >= TCPS_SYN_SENT && 2896 SEQ_GT(st->dst.seqlo, ntohl(dst->seqlo)))) 2897 sync++; 2898 else 2899 pf_state_peer_ntoh(dst, &st->dst); 2900 2901 return (sync); 2902 } lines 2883 - 2891 handle source TCP peer in state. lines 2893 - 2899 update TCP state for destination peer which is the one stuck in SYN_SENT. The condition at line 2895 looks odd. It should be the same check we do for source connection at line 2887. To see what's going on there we need to better understand a context here. the st->src is TCP stage for client we keep on secondary firewall, 'src' comes from update we got from primary firewall peer. st->dst is TCP state for server side connection we keep on secondary firewall, 'dst' comes from firewall peer. The condition here should just prevent peering firewall (secondary) firewall to move state backwards (dst.state > dst->state), We also don't want to update state with old sequence number (SEQ_GT(dst.seqlo, dst->seqlo). In other cases we want to call pf_state_peer_ntoh() to update state. The check 'dst.state >= TCPS_SYN_SENT' is wrong here as it prevents pfsync to call pf_state_peer_ntoh() which updates the state. The check should be in fact same as we have for src peer: st->dst.state == dst->state we do check sequence numbers if there is a tie on state. the onliner tweak is bellow. I still don't know the exact mechanism how pfsync-update storm gets triggered. fixing pfsync_upd_tcp() seems to fix the cause of update storm. OK to commit ? thanks and regards sashan [1] https://marc.info/?l=openbsd-bugs&m=170661373220327&w=2 --------8<---------------8<---------------8<------------------8<-------- diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c index 4ad51489509..0a3fcbfb9f6 100644 --- a/sys/net/if_pfsync.c +++ b/sys/net/if_pfsync.c @@ -2892,7 +2892,7 @@ pfsync_upd_tcp(struct pf_state *st, const struct pfsync_state_peer *src, if ((st->dst.state > dst->state) || - (st->dst.state >= TCPS_SYN_SENT && + (st->dst.state == dst->state && SEQ_GT(st->dst.seqlo, ntohl(dst->seqlo)))) sync++; else