From: Ryan Vogt Subject: Re: add source link-layer address option to rad(8) To: tech@openbsd.org Date: Wed, 15 May 2024 19:33:28 -0700 On Wed, May 15, 2024 at 09:14:14PM +0200, Florian Obser wrote: > On 2024-05-11 12:34 -07, Ryan Vogt wrote: Thank you again for providing another round of such comprehensive feedback. I've placed a new version of the patch at the bottom, with the race condition fixed using your code, and I've inlined some more discussion. > > That "I" flag is what I'm looking at. With the default route being in > > interface scope, you can do things like: > > > > $ ping6 fdaa:...:1%en0 > > > > but not: > > > > $ ping6 fdaa:...:1 > > > > Interesting, I have no idea what they are doing here. The netstat man > page is not explaining what I is, it and it's not a thing FreeBSD has > either. At least it's not documented in their netstat man page. This one's a little buried in the macOS man pages. From macOS 14.5's netstat(1) man page, regarding the "I" flag: I RTF_IFSCOPE Route is associated with an interface scope And from macOS' route(8) man page: Such property allows for the presence of multiple route entries with the same destination, where each route is associated with a unique interface. On my functional, working-as-intended macOS client, I have 7 entries in my Internet6 routing table with a destination of "default" (netstat -rn), but each is associated with a different interface. My understanding of how macOS chooses which interface to communicate over (wireless, ethernet, tunnel, etc.) is that it takes one of those route entries out of interface scope, making it the global "default" route. So: $ ping6 ipv6.google.com will use my non-interface-scope "default" route, but: $ ping6 ipv6.google.com%en0 lets me specify an interface (whether its corresponding "default" route is in interface scope or not). If there's no source link-layer address option in rad(8)'s RA messages, all 7 "default" Internet6 routing table entries on my macOS client stay in interface scope (%en0, and %utun0 through %utun5). With the source link-layer address option included in rad(8)'s RA messages, though, my %en0 "default" routing table entry (corresponding to my wireless interface) comes out of interface scope and acts as the global "default" route. > This kinda maybe looks like a bug in macOS / iOS where they are not > doing neighbor discovery? I like the "kinda maybe" qualifier, because that's my conclusion as well -- with the same qualifier attached. Even in the absence of a source link-layer address option, I'm still seeing some evidence of neighbour discovery between the macOS client (neighbour solicitation messages) and the OpenBSD gateway (solicited neighbour advertisement messages). So, I don't know if maybe the macOS behaviour (in the absence of a source link-layer address option) is an intended behaviour, given how macOS prioritizes its multiple default routes that co-exist on multiple interfaces? That's meandering conjecture on my part, though. I just don't know enough about the intricacies of how interface scope is supposed to work to be confident in declaring that this behaviour is a bug in macOS/iOS. ...just, kinda maybe probably a bug? > But regardless, rad(8) really should send a > source link-layer address, otherwise every client has to do one more > round trip. Very nice find! Thank you! Here's a question I've been pondering, then. Should the source link- layer address option be a setting in rad.conf(5) at all? It feels weird to be playing devil's advocate regarding my own proposed patch, but it might be worth asking before going forward. You mentioned it was an oversight in rad(8) that the source link-layer address option wasn't implemented initially. You also mentioned that you don't think OpenBSD can support the inbound load sharing described in RFC 4861 section 4.2 -- and, that inbound load sharing is the only reason I can see in the RFC for omitting the source link-layer address option in router advertisement messages. You also pointed out how, when the source link-layer address option is included in RA messages, it saves all clients from having to do an additional round trip. So, is there even a compelling case to be made for being able to exclude the source link-layer address option with a setting in rad.conf(5)? Or, should the source link-layer address option just always be included by rad(8) in its RA messages, and we forego the option to turn it off in rad.conf(5)? > > A few quick notes: > > > > 1. In build_packet(), I've placed the struct pointers > > nd_opt_source_link_addr, nd_opt_mtu, and nd_opt_prefix_info in the > > order they're presented in RFC 4861 section 4.2. > > sure, style(9) has size: > When declaring variables in functions, declare them sorted by size > (largest to smallest), > > I usually can't be arsed to work out which struct is larger, unless I > know it off the top of my head. I need to read more carefully. I could change the location of the nd_opt_source_link_addr declaration in rad/frontend.c's build_packet() to follow style(9), but it's not quite clear where the declaration would go. For example, the nd_opt_mtu and the nd_opt_prefix_info are already in backwards order, relative to the guideline in style(9), so I'd have to start re-ordering other variables to be able to slot the nd_opt_source_link_addr into a size- sorted location. As it is, the existing variables (pre-patch) have a sane, how-are- they-used order to them. If it's okay, I'm inclined to follow that same logical style that's already being used in rad/frontend.c's build_packet(), and leave that part of the proposed patch unchanged from the previous version? > > 2. I'm uncertain why we're doing a strcmp() against the name of the > > interface while iterating over ifap, instead of comparing against > > the interface index. But, since that's how it's done in > > slaacd/frontend.c, where both the name and the index are also > > available, I'm assuming there's a reason we use the name instead of > > the index that I just don't know about. So, I switched to using a > > strcmp() against the interface name instead of comparing against > > the index. > > I was under the impression that an interface index could be > recycled when you unplug a usb device and plug in a different one. But > that does not seem to be true. if_index seems to be monotonically > increasing. I think it's best to leave this for now as is and maybe > change all "may" daemons to use if_index. This probably shows up in > dhcpleased, slaacd and unwind as well. Thanks for the explanation. I've left that the same as it was in the previous version of the proposed patch. > > + if (ra_iface_conf->ra_options.source_link_addr) { > > + ndopt_source_link_addr = (struct nd_opt_source_link_addr *)p; > > + ndopt_source_link_addr->nd_opt_source_link_addr_type = > > + ND_OPT_SOURCE_LINKADDR; > > + ndopt_source_link_addr->nd_opt_source_link_addr_len = 1; > > + if (getifaddrs(&ifap) != 0) { > > + ifap = NULL; > > + log_warn("getifaddrs"); > > + } > > + for (ifa = ifap; ifa; ifa = ifa->ifa_next) { > > + [...snip...] > > + } > > + if (ifap != NULL) > > + freeifaddrs(ifap); > > + p += sizeof(*ndopt_source_link_addr); > > if you fail to find a mac address, for example because you are racing > against an interface being unplugged, you will (try to) send a source > link addr option with a mac address of all zero. > > I think the easiest way to avoid that is doing something like this: > > if (ifap != NULL) { > freeifaddrs(ifap); > p += sizeof(*ndopt_source_link_addr); > } else > len -= sizeof(*ndopt_source_link_addr); Nice catch! I've updated the proposed patch below with this change (that's the only change from the previous version). diff --git a/frontend.c b/frontend.c index 6748e43732d..5370cb3359b 100644 --- a/frontend.c +++ b/frontend.c @@ -56,6 +56,7 @@ #include #include +#include #include #include @@ -120,6 +121,12 @@ struct nd_opt_pref64 { u_int8_t nd_opt_pref64[12]; }; +struct nd_opt_source_link_addr { + u_int8_t nd_opt_source_link_addr_type; + u_int8_t nd_opt_source_link_addr_len; + struct ether_addr nd_opt_source_link_addr_hw_addr; +}; + TAILQ_HEAD(, ra_iface) ra_interfaces; __dead void frontend_shutdown(void); @@ -1099,6 +1106,7 @@ void build_packet(struct ra_iface *ra_iface) { struct nd_router_advert *ra; + struct nd_opt_source_link_addr *ndopt_source_link_addr; struct nd_opt_mtu *ndopt_mtu; struct nd_opt_prefix_info *ndopt_pi; struct ra_iface_conf *ra_iface_conf; @@ -1110,6 +1118,8 @@ build_packet(struct ra_iface *ra_iface) struct ra_rdnss_conf *ra_rdnss; struct ra_dnssl_conf *ra_dnssl; struct ra_pref64_conf *pref64; + struct ifaddrs *ifap, *ifa; + struct sockaddr_dl *sdl; size_t len, label_len; uint8_t *p, buf[RA_MAX_SIZE]; char *label_start, *label_end; @@ -1119,6 +1129,8 @@ build_packet(struct ra_iface *ra_iface) ra_options_conf = &ra_iface_conf->ra_options; len = sizeof(*ra); + if (ra_iface_conf->ra_options.source_link_addr) + len += sizeof(*ndopt_source_link_addr); if (ra_options_conf->mtu > 0) len += sizeof(*ndopt_mtu); len += sizeof(*ndopt_pi) * ra_iface->prefix_count; @@ -1170,6 +1182,37 @@ build_packet(struct ra_iface *ra_iface) ra->nd_ra_retransmit = htonl(ra_options_conf->retrans_timer); p += sizeof(*ra); + if (ra_iface_conf->ra_options.source_link_addr) { + ndopt_source_link_addr = (struct nd_opt_source_link_addr *)p; + ndopt_source_link_addr->nd_opt_source_link_addr_type = + ND_OPT_SOURCE_LINKADDR; + ndopt_source_link_addr->nd_opt_source_link_addr_len = 1; + if (getifaddrs(&ifap) != 0) { + ifap = NULL; + log_warn("getifaddrs"); + } + for (ifa = ifap; ifa; ifa = ifa->ifa_next) { + if (ifa->ifa_addr == NULL || + ifa->ifa_addr->sa_family != AF_LINK) + continue; + if (strcmp(ra_iface->name, ifa->ifa_name) != 0) + continue; + sdl = (struct sockaddr_dl *)ifa->ifa_addr; + if (sdl->sdl_type != IFT_ETHER || + sdl->sdl_alen != ETHER_ADDR_LEN) + continue; + memcpy(&ndopt_source_link_addr-> + nd_opt_source_link_addr_hw_addr, + LLADDR(sdl), ETHER_ADDR_LEN); + break; + } + if (ifap != NULL) { + freeifaddrs(ifap); + p += sizeof(*ndopt_source_link_addr); + } else + len -= sizeof(*ndopt_source_link_addr); + } + if (ra_options_conf->mtu > 0) { ndopt_mtu = (struct nd_opt_mtu *)p; ndopt_mtu->nd_opt_mtu_type = ND_OPT_MTU; diff --git a/parse.y b/parse.y index 66cc0c0ab64..b9d369bde17 100644 --- a/parse.y +++ b/parse.y @@ -122,6 +122,7 @@ typedef struct { %token CONFIGURATION OTHER LIFETIME REACHABLE TIME RETRANS TIMER %token AUTO PREFIX VALID PREFERENCE PREFERRED LIFETIME ONLINK AUTONOMOUS %token ADDRESS_CONFIGURATION DNS NAMESERVER SEARCH MTU NAT64 HIGH MEDIUM LOW +%token SOURCE LINK_LAYER %token STRING %token NUMBER @@ -226,6 +227,9 @@ ra_opt_block : DEFAULT ROUTER yesno { | RETRANS TIMER NUMBER { ra_options->retrans_timer = $3; } + | SOURCE LINK_LAYER ADDRESS yesno { + ra_options->source_link_addr = $4; + } | MTU NUMBER { ra_options->mtu = $2; } @@ -523,6 +527,7 @@ lookup(char *s) {"interface", RA_IFACE}, {"lifetime", LIFETIME}, {"limit", LIMIT}, + {"link-layer", LINK_LAYER}, {"low", LOW}, {"managed", MANAGED}, {"medium", MEDIUM}, @@ -539,6 +544,7 @@ lookup(char *s) {"retrans", RETRANS}, {"router", ROUTER}, {"search", SEARCH}, + {"source", SOURCE}, {"time", TIME}, {"timer", TIMER}, {"valid", VALID}, diff --git a/printconf.c b/printconf.c index 184a5df2dc1..2edfb06edd4 100644 --- a/printconf.c +++ b/printconf.c @@ -78,6 +78,8 @@ print_ra_options(const char *indent, const struct ra_options_conf *ra_options) printf("%srouter lifetime %d\n", indent, ra_options->router_lifetime); printf("%sreachable time %u\n", indent, ra_options->reachable_time); printf("%sretrans timer %u\n", indent, ra_options->retrans_timer); + printf("%ssource link-layer address %s\n", indent, + yesno(ra_options->source_link_addr)); if (ra_options->mtu > 0) printf("%smtu %u\n", indent, ra_options->mtu); diff --git a/rad.c b/rad.c index b283e58fbf1..9093932c563 100644 --- a/rad.c +++ b/rad.c @@ -757,6 +757,7 @@ config_new_empty(void) xconf->ra_options.router_lifetime = ADV_DEFAULT_LIFETIME; xconf->ra_options.reachable_time = 0; xconf->ra_options.retrans_timer = 0; + xconf->ra_options.source_link_addr = 1; xconf->ra_options.mtu = 0; xconf->ra_options.rdns_lifetime = DEFAULT_RDNS_LIFETIME; SIMPLEQ_INIT(&xconf->ra_options.ra_rdnss_list); diff --git a/rad.conf.5 b/rad.conf.5 index f82ccfd216e..a2589cda88c 100644 --- a/rad.conf.5 +++ b/rad.conf.5 @@ -127,6 +127,10 @@ The default is medium. .\" XXX .\" .It Ic retrans timer Ar number .\" XXX +.It Ic source link-layer address Pq Ic yes Ns | Ns Ic no +Add a source link-layer address option to router advertisement messages, to +communicate the link-layer address of the sending interface. +The default is yes. .El .Sh INTERFACES A list of interfaces or interface groups to send advertisements on: diff --git a/rad.h b/rad.h index 787e78c7c4a..61187dba273 100644 --- a/rad.h +++ b/rad.h @@ -97,6 +97,7 @@ struct ra_options_conf { int router_lifetime; /* default router lifetime */ uint32_t reachable_time; uint32_t retrans_timer; + int source_link_addr; /* source link-layer address */ uint32_t mtu; uint32_t rdns_lifetime; SIMPLEQ_HEAD(, ra_rdnss_conf) ra_rdnss_list;