Index | Thread | Search

From:
Marcus Glocker <marcus@nazgul.ch>
Subject:
Re: qwz: enable WPA2 association on WCN7850
To:
Mark Kettenis <mark.kettenis@xs4all.nl>
Cc:
tech@openbsd.org, kirill@korins.ky, stsp@stsp.name, mail@patrick-wildt.de
Date:
Sun, 26 Apr 2026 19:34:58 +0200

Download raw body.

Thread
On Sun, Apr 26, 2026 at 01:45:10PM +0200, Mark Kettenis wrote:

> > Date: Sat, 25 Apr 2026 23:56:07 +0200
> > From: Marcus Glocker <marcus@nazgul.ch>
> > 
> > Bring the qwz driver up to a working WPA2 client connection on the
> > Qualcomm WCN7850 chip.  Tested on the Samsung Galaxy Book4 Edge.
> > 
> > Major changes:
> > 
> > 1. Fix the RX path.
> >    Wire up the WCN7850 descriptor accesses that were unset; override
> >    the descriptor size to match what the FW actually writes (512 bytes
> >    instead of struct sizeof 472); add the first-line filters that drop
> >    FW-injected garbage frames before net80211 mistakes them for fake
> >    auth/deauth.
> > 
> > 2. Fix the TX path.
> >    Port Linux's WiFi7 "TX bank" infrastructure: a per-VDEV register
> >    that holds encap/encrypt/search settings the descriptor used to
> >    carry inline.  Rewrite the TX descriptor builder for the WiFi7 wire
> >    format.  Fix an encrypt_type default that was making the FW try to
> >    WEP-encrypt plain-text EAPOL frames.
> > 
> > 3. Fix MSI interrupt routing.
> >    Correct the DP IRQ group's MSI vector calculation, and free the
> >    vector DP group 0 needs (was being held by an unused pktlog
> >    interrupt).  Without these, RX completions never fired regardless
> >    of how correct the rest of the path was.
> 
> That code still looks a bit dodgy to me.  The equivalent code in
> qwx(4) looks a bit dodgy too though.  I'll have a look over there
> first to see if I can make it a bit less dodgy.  No reason not to move
> forward with qwz(4).
> 
> > 4. Make the WPA2 4-way handshake complete.
> >    Move WMI_PEER_AUTHORIZE to fire after key install, not before; the
> >    old order told the FW crypto was up while plain-text EAPOL was still
> >    in flight, crashing the FW.  Mask the AID to its 14-bit value before
> >    handing it to the FW.  Add the missing REO queue setup for non-QoS
> >    frames, which is where EAPOL lives.
> > 
> > 5. Add non-coherent DMA cache sync on RX and TX.
> >    Without explicit flushes the CPU and FW see different bytes for
> >    the same buffer.  This was the root cause of "garbage RX frames":
> >    they were always real EAPOL Msg 1 frames torn by stale CPU cache
> >    lines.
> 
> DMA on the galaxybook should be coherent.  But bus_dmamap_sync() still
> issues a barrier instruction which might be needed to make sure reads
> by the CPU aren't issued in the wrong order.  So those extra
> bus_dmamap_sync() calls are needed.
> 
> > 
> > 6. Update register/descriptor defines from ath11k to ath12k WiFi7.
> >    The TX descriptor wire format changed completely between
> >    generations: bit positions, field set, even the number of 32-bit
> >    words.  Partial updates wouldn't have worked.
> > 
> > 7. Cleanup.
> >    Remove some debug printfs and the diagnostic counters added during
> >    the bring-up to verify the path was working.
> > 
> > Known limitations:
> > 
> >   - Firmware occasionally crashes after sustained traffic; driver
> >     recovers via the existing RDDM path in if_qwz_pci.c without a
> >     system reboot.  Root-causing this is the next follow-up.
> >   - One PN-replay loop in qwz_dp_peer_rx_pn_replay_config doesn't
> >     iterate the non-QoS TID slot.  Cosmetic for normal use; will
> >     land as a separate small commit.
> > 
> > Further testing, feedback, OKs, welcome.
> 
> Doesn't seem to get me much further on the vivobook.  Tried it first
> with a FritzBox.  After an "ifconfig qwz0 up" I see these messages:
> 
> qwz_pull_reg_chan_list_ext_update_ev: not implemented
> qwz0: failed to extract regulatory info from received event
> qwz_pull_reg_chan_list_ext_update_ev: not implemented
> qwz0: failed to extract regulatory info from received event
> 
> This isn't different from before.  I suppose this is something we can
> ignore for now.
> 
> After dong an "ifconfig qwz0 nwid xxx wpakey yyy" I get a few more of
> these and then:
> 
> qwz0: fatal firmware error
> qwz0: fatal firmware error
> qwz0: fatal firmware error
> 
> I think that is where it resets and tries again, where it becomes:
> 
> qwz0: fatal firmware error
> qwz0: fatal firmware error
> qwz0: failed to send wlan mode request, err = 1
> qwz0: qmi failed to send wlan mode off: 1
> 
> And after a few more resets it becomes:
> 
> qwz0: tx credits timeout
> qwz0: failed to send WMI_PDEV_SET_PARAM cmd
> qwz0: failed to enable MESH MCAST ENABLE for pdev 0: 35
> 
> After that the device appears to be dead.  An attempt to revive with
> "ifconfig qwz0 down; ifconfig qwz0 up" results in:
> 
> ifconfig: qwz0: SIOCSIFFLAGS: Resource temporarily unavailable
> 
> and more:
> 
> qwz0: tx credits timeout
> qwz0: failed to send WMI_PDEV_SET_PARAM cmd
> qwz0: failed to enable MESH MCAST ENABLE for pdev 0: 35
> 
> I did see it reach the "status: active" state at some point, but that
> didn't last for more than a few seconds.
> 
> I also tried with my athn(4) OpenBSD access point.  There it does
> reach the "status: active" state and stays in that state.  If I then
> do "ifconfig qwz0 autoconf", I get a few firmware errors:
> 
> qwz0: fatal firmware error
> qwz0: fatal firmware error
> qwz0: fatal firmware error
> 
> But it recovers and remains associated.  After a while I also see:
> 
> qwz0: fatal firmware error
> qwz0: fatal firmware error
> qwz0: fatal firmware error
> qwz0: peer delete unmap timeout
> qwz0: unable to delete BSS peer: 35
> qwz0: failed to send wlan mode request, err = 1
> qwz0: qmi failed to send wlan mode off: 1
> 
> but it stays associated.  My DHCP server sees the DHCP request and
> offers a lease, but the offered address never gets configured.
> 
> So a little bit of progress.  I don't see a reason not to commit this.

Thanks for the testing and feedback.  While doing more testing in the
meantime, I see similar behavior here.  I only can make a more or less
stable association with my iPhone hot spot.  With my home AP the
association only lasts for a few seconds, and then breaks again with
the firmware error.

I think the goal of the next commit should be to get at least a stable
association working.  Then, if still required, we can go ahead and make
further TX/RX transmission stable to get an DHCP lease.  Without the a
half way stable association, I think it makes no sense to commit the
next diff.

I'll see if I can find out more.