Index | Thread | Search

From:
Alexandr Nedvedicky <sashan@fastmail.net>
Subject:
pfsync does not update dst peer's state
To:
tech@openbsd.org
Date:
Sat, 18 May 2024 20:10:21 +0200

Download raw body.

Thread
Hello,

there was a report on bugs [1] back in Jan. There were few
more reports off-list. Matthieu@ provided detailed information
from his pfsync boxes. He is using two peering firewalls.
He noticed there was excessive amount of pfsync traffic
exchanged between peers. Poking further we discovered
there were lot's of TCP states with peer end points in:
        ESTABLISHED : SYN_SENT
        FIN_WAIT_2  : SYN_SENT
        *           : SYN_SENT
on the secondary firewall peer. All those 'malformed' states were received
from primary firewall peer.

Looking at how pfsync is processing state update message from its peer
we soon find a function pfsync_upd_tcp() which reads as follows:

2872 static int
2873 pfsync_upd_tcp(struct pf_state *st, const struct pfsync_state_peer *src,
2874     const struct pfsync_state_peer *dst)
2875 {
2876         int sync = 0;
2877
2878         /*
2879          * The state should never go backwards except
2880          * for syn-proxy states.  Neither should the
2881          * sequence window slide backwards.
2882          */
2883         if ((st->src.state > src->state &&
2884             (st->src.state < PF_TCPS_PROXY_SRC ||
2885              src->state >= PF_TCPS_PROXY_SRC)) ||
2886
2887             (st->src.state == src->state &&
2888              SEQ_GT(st->src.seqlo, ntohl(src->seqlo))))
2889                 sync++;
2890         else
2891                 pf_state_peer_ntoh(src, &st->src);
2892
2893         if ((st->dst.state > dst->state) ||
2894
2895             (st->dst.state >= TCPS_SYN_SENT &&
2896              SEQ_GT(st->dst.seqlo, ntohl(dst->seqlo))))
2897                 sync++;
2898         else
2899                 pf_state_peer_ntoh(dst, &st->dst);
2900
2901         return (sync);
2902 }

lines 2883 - 2891 handle source TCP peer in state.  lines 2893 - 2899
update TCP state for destination peer which is the one stuck in SYN_SENT.
The condition at line 2895 looks odd. It should be the same check
we do for source connection at line 2887.

To see what's going on there we need to better understand a context here.
the st->src is TCP stage for client we keep on secondary firewall, 'src'
comes from update we got from primary firewall peer.

st->dst is TCP state for server side connection we keep on secondary firewall,
'dst' comes from firewall peer.

The condition here should just prevent peering firewall (secondary)
firewall to move state backwards (dst.state > dst->state),
We also don't want to update state with old sequence number
(SEQ_GT(dst.seqlo, dst->seqlo).  In other cases we want to call
pf_state_peer_ntoh() to update state.

The check 'dst.state >= TCPS_SYN_SENT' is wrong here as it prevents
pfsync to call pf_state_peer_ntoh() which updates the state.
The check should be in fact same as we have for src peer:
    st->dst.state == dst->state
we do check sequence numbers if there is a tie on state.

the onliner tweak is bellow. I still don't know the exact
mechanism how pfsync-update storm gets triggered. fixing
pfsync_upd_tcp() seems to fix the cause of update storm.
                         

OK to commit ?

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-bugs&m=170661373220327&w=2

--------8<---------------8<---------------8<------------------8<--------
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index 4ad51489509..0a3fcbfb9f6 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -2892,7 +2892,7 @@ pfsync_upd_tcp(struct pf_state *st, const struct pfsync_state_peer *src,
 
 	if ((st->dst.state > dst->state) ||
 
-	    (st->dst.state >= TCPS_SYN_SENT &&
+	    (st->dst.state == dst->state &&
 	     SEQ_GT(st->dst.seqlo, ntohl(dst->seqlo))))
 		sync++;
 	else