From: Patrick Wildt Subject: Re: smmu(4) on QC Laptops To: Mark Kettenis Cc: tech@openbsd.org, tobhe@openbsd.org, mlarkin@openbsd.org Date: Fri, 1 Aug 2025 21:21:37 +0200 On Sat, May 24, 2025 at 07:21:59AM +0200, Patrick Wildt wrote: > On Fri, May 23, 2025 at 07:58:29PM +0200, Mark Kettenis wrote: > > > Date: Fri, 23 May 2025 08:46:16 +0200 > > > From: Patrick Wildt > > > Cc: kettenis@openbsd.org, tobhe@openbsd.org, mlarkin@openbsd.org > > > Content-Type: text/plain; charset=us-ascii > > > Content-Disposition: inline > > > > > > Hi, > > > > > > Nearly three years ago we disabled smmu(4) for QC laptops because they > > > just rebooted instantly. I've finally gone to the bottom of why this > > > happens. > > > > > > On servers we usually boot up with most peripherals not doing any kind > > > of DMA. A serial console is kind of the only thing we depend on, which > > > usually doesn't use DMA for simplicity. Everything else, like NVMe or > > > USB will get turned on later. For those use cases it's been perfectly > > > fine to just turn the SMMU on with strict filtering, because only once > > > the driver configures puts in a job DMA transactions will occur. > > > > > > On the QC laptops the framebuffer goes through an SMMU. That's one of > > > the reasons we need to keep the streams alive and actively put them into > > > a bypass context, which we do! As soon as a driver attaches, we move > > > the stream over into a strict translation context. > > > > > > Unfortunately the changes I did were not completely sufficient. > > > > > > (1) The IOMMU enforcement happens before the driver gets a chance to > > > stop current work. > > > > > > simplebus(4) creates an IOMMU-bound DMA tag before it attempts to attach > > > the children. This leads do two interesting things: > > > > > > (a) If a simplebus(4) has an IOMMU assigned, it is passed an IOMMU-bound > > > DMA tag. All children will get subjected to it. > > > > > > smmu1: establishing sid 0x423 > > > simplebus1 at simplebus0: "geniqup" > > > "serial" at simplebus1 not configured > > > smmu1: establishing sid 0x123 > > > simplebus2 at simplebus0: "geniqup" > > > > > > (b) If there's no driver to be found, the IOMMU-bound DMA tag is created > > > anyway. > > > > > > smmu1: establishing sid 0x1c00 > > > smmu1: took over 8/1c00/2 for sid 1c00 > > > "display-subsystem" at simplebus0 not configured > > > > > > The good thing is that (a) is not a problem in this case as we do not > > > have any active transactions such mappings, as far as I can see on the > > > X1E. But the framebuffer is behind the display-subsystem which is now > > > subjected to strict IOMMU configuration, making us need to fix (b). > > > > > > One idea I have is that we keep all the setup code where it is, at > > > IOMMU-bound DMA tag creation time, but only move the stream to the > > > strict context block once someone(TM) creates a DMA map using this > > > tag. > > > > > > Is that a reasonable thing to do? I don't think anyone would be doing > > > DMA without creating a DMA map first, right? > > > > Correct. > > > > I'm not sure how this is going to work if we want to write a driver > > for the display controller. Presumably we'd create IOMMU mappings for > > the initial framebuffer. But to do so, we'd need to create a DMA map > > first, and when we do that... game over? > > > > So maybe we need to do the takeover when we do the first DMA map load, > > after we've entered the mappings into the page table? But that only > > works if we have a single mapping for the framebuffer. > > I was hoping that we'd have code that first turns off the transactions, > making us "go blind" until we re-enable it with the new address. But > yes, that would require the disable sequence to happen before the map > is created. We could do it on load, which is kind of a hot-path. I > was hoping for now we put it into map create and tackle this once or > if we get around to writing this driver. > > > On the Apple systems, the framebuffer is also behind an IOMMU. There > > we have some code that creates some initial mappings based on > > information from the device tree. > > > > > (2) Re-use of SMRs doesn't take mask in account. > > > > > > In (1b) you can see that the display subsystem is using sid 0x1c00 with > > > mask 0x2, matching to the following SMR: > > > > > > smmu1: SMR[8] = 0x1c00/0x2 > > > > > > We'd like to re-use that if possible, but because of the strict mask > > > checking that wasn't possible. So I'd like to take the mask into > > > consideration when "taking over" streams. This is only done on QC! > > > > Sorry, you'll have to explain to me what what the SMRs do... > > Think work slightly different depending on SMMU version/feature flags, > but for this machine it works an IOMMU setup like this: > > You configure a context block. The context block is basically the IOMMU > group that owns the pagetable. To assign devices to a context block, > you setup a "Stream Mapping Register", e.g.: > > * Allocate + setup context block 3 (e.g. IOMMU group 3) > * This is the actual pagetable and translation setup > * Allocate a stream map entry (can be any, e.g. 5) > * The array has a fixed length, we need to use one slot to > point device to context block. > * Write(SMR(5), DeviceId + Mask + Valid) > * For each DMA transaction, the SMMUs stream mapping function will > iterate the stream map registers to find an enty where: > StreamId & Mask == DeviceId & Mask > * In this example, after writing SMR(5), slot 5 should match > * Write(S2CR(5) = 3) > * This points stream map 5 to context block 3 > * In this example, after the HW's lookup function returns 5, it will > look into S2CR(5) to find which context block it belongs to > > On this laptop, the SMRs basically tell you what devices are currently > enabled to pass through the IOMMU. We update the SMRs to point all > currenetly allowed devices to a "bypass domain", so that existing > traffic goes into a 1:1 translation. > > One we want attach a driver and want to make a device use stricter IOMMU > mappings, we setup a context block and then switch the device from the > bypass IOMMU group/context block to the strict IOMMU group/context > block. > > In reality we're not actually changing the SMR (that's just the filter > rule), we're updating the S2CR to point the stream to a different group. > > But if we don't "take over" the existing SMR mapping, the code will > allocate a new SMR register, and then the IOMMU will complain about a > match conflict because an incoming transaction matches two SMR entries. > > Let me know if I worded this too confusingly! > > > > Please give this a go on machines where `dmesg | grep ^smmu' gives > > > output, and especially on QC machines with X1E, or SC8280XP/ThinkPad > > > x13s. Please send me some dmesgs, as I have added some debug code > > > that I'd like to see run on other machines. :) > > > > Here's the Ampere eMAG machine. ACPI so some of your changes don't > > matter, but it still works with your diff. > > > > Thanks! :) > > Cheers, > Patrick Updated diff that sets the SMR register not just to the stream id but also the mask we kept saved. diff --git a/sys/arch/arm64/conf/GENERIC b/sys/arch/arm64/conf/GENERIC index adee39bb278..ac593d042ab 100644 --- a/sys/arch/arm64/conf/GENERIC +++ b/sys/arch/arm64/conf/GENERIC @@ -93,7 +93,7 @@ pci* at pciecam? sdhc* at fdt? sdmmc* at sdhc? bwfm* at sdmmc? # Broadcom FullMAC -smmu* at fdt? +smmu* at fdt? early 1 xhci* at fdt? ccp* at fdt? # AMD Cryptographic Co-processor ipmi* at fdt? diff --git a/sys/arch/arm64/dev/smmu.c b/sys/arch/arm64/dev/smmu.c index edf5dd26de8..76ef21ae365 100644 --- a/sys/arch/arm64/dev/smmu.c +++ b/sys/arch/arm64/dev/smmu.c @@ -614,11 +614,11 @@ smmu_domain_create(struct smmu_softc *sc, uint32_t sid) /* Take over QCOM SMRs */ if (sc->sc_is_qcom && sc->sc_smr[i] != NULL && sc->sc_smr[i]->ss_dom == NULL && - sc->sc_smr[i]->ss_id == sid && - sc->sc_smr[i]->ss_mask == 0) { - free(sc->sc_smr[i], M_DEVBUF, - sizeof(struct smmu_smr)); - sc->sc_smr[i] = NULL; + !((sc->sc_smr[i]->ss_id ^ sid) & + ~sc->sc_smr[i]->ss_mask)) { + sc->sc_smr[i]->ss_dom = dom; + dom->sd_smr_idx = i; + break; } if (sc->sc_smr[i] != NULL) continue; @@ -771,6 +771,28 @@ smmu_domain_create(struct smmu_softc *sc, uint32_t sid) reg |= SMMU_CB_SCTLR_ASIDPNE; smmu_cb_write_4(sc, dom->sd_cb_idx, SMMU_CB_SCTLR, reg); + snprintf(dom->sd_exname, sizeof(dom->sd_exname), "%s:%x", + sc->sc_dev.dv_xname, sid); + dom->sd_iovamap = extent_create(dom->sd_exname, 0, + (1LL << iovabits) - 1, M_DEVBUF, NULL, 0, EX_WAITOK | + EX_NOCOALESCE); + + /* Reserve first page (to catch NULL access) */ + extent_alloc_region(dom->sd_iovamap, 0, PAGE_SIZE, EX_WAITOK); + + SIMPLEQ_INSERT_TAIL(&sc->sc_domains, dom, sd_list); + return dom; +} + +static void +smmu_domain_enable(struct smmu_domain *dom) +{ + struct smmu_softc *sc = dom->sd_sc; + uint32_t reg; + + if (dom->sd_enabled) + return; + /* Point stream to context block */ reg = SMMU_S2CR_TYPE_TRANS | dom->sd_cb_idx; if (sc->sc_has_exids && sc->sc_smr) @@ -779,23 +801,14 @@ smmu_domain_create(struct smmu_softc *sc, uint32_t sid) /* Map stream idx to S2CR idx */ if (sc->sc_smr) { - reg = sid; + reg = sc->sc_smr[dom->sd_smr_idx]->ss_id << SMMU_SMR_ID_SHIFT | + sc->sc_smr[dom->sd_smr_idx]->ss_mask << SMMU_SMR_MASK_SHIFT; if (!sc->sc_has_exids) reg |= SMMU_SMR_VALID; smmu_gr0_write_4(sc, SMMU_SMR(dom->sd_smr_idx), reg); } - snprintf(dom->sd_exname, sizeof(dom->sd_exname), "%s:%x", - sc->sc_dev.dv_xname, sid); - dom->sd_iovamap = extent_create(dom->sd_exname, 0, - (1LL << iovabits) - 1, M_DEVBUF, NULL, 0, EX_WAITOK | - EX_NOCOALESCE); - - /* Reserve first page (to catch NULL access) */ - extent_alloc_region(dom->sd_iovamap, 0, PAGE_SIZE, EX_WAITOK); - - SIMPLEQ_INSERT_TAIL(&sc->sc_domains, dom, sd_list); - return dom; + dom->sd_enabled = 1; } void @@ -1299,6 +1312,8 @@ smmu_dmamap_create(bus_dma_tag_t t, bus_size_t size, int nsegments, u_long dva, len; int error; + smmu_domain_enable(dom); + error = sc->sc_dmat->_dmamap_create(sc->sc_dmat, size, nsegments, maxsegsz, boundary, flags, &map); if (error) diff --git a/sys/arch/arm64/dev/smmu_fdt.c b/sys/arch/arm64/dev/smmu_fdt.c index 2c645da3f37..ac9ad9a9089 100644 --- a/sys/arch/arm64/dev/smmu_fdt.c +++ b/sys/arch/arm64/dev/smmu_fdt.c @@ -89,11 +89,6 @@ smmu_fdt_attach(struct device *parent, struct device *self, void *aux) if (OF_getproplen(faa->fa_node, "dma-coherent") == 0) sc->sc_coherent = 1; - if (sc->sc_is_qcom) { - printf(": disabled\n"); - return; - } - if (smmu_attach(sc) != 0) return; diff --git a/sys/arch/arm64/dev/smmuvar.h b/sys/arch/arm64/dev/smmuvar.h index c01e10518ee..03373167089 100644 --- a/sys/arch/arm64/dev/smmuvar.h +++ b/sys/arch/arm64/dev/smmuvar.h @@ -18,6 +18,7 @@ struct smmu_softc; struct smmu_domain { struct smmu_softc *sd_sc; + uint32_t sd_enabled; uint32_t sd_sid; bus_dma_tag_t sd_dmat; int sd_cb_idx;