From: Mark Kettenis Subject: Re: smmu(4): always use WBWA for pagetable access To: Patrick Wildt Cc: tech@openbsd.org, kettenis@openbsd.org Date: Thu, 28 Aug 2025 12:10:38 +0200 > Date: Wed, 27 Aug 2025 14:02:15 +0200 > From: Patrick Wildt > > Hi, > > SMMUv2's TCR IRGNx/ORGNx/SHx attributes are related to memory associated > with translation table walks. Since our translation tables are always > WBWA, there's no need to check the coherent flag. The coherent flag is > only relevant to figure out whether or not we have to flush the caches > for changes to the translation table. That doesn't make sense to me. What is the difference between "memory associated with translation table walks" and the memory of the translation tables itself? I thnk we have the following possibilities: 1. The SMMU intergration is cache-coherent. We allocate the translation tables as Normal (Cachable) and set the WBWA flags. 2a. The SMMU integration is not cache-coherent. We still allocate the translation tables as Normal (Cachable) but we'll have to flush the caches when we modify the them. Setting the WBWA flag should have no effect (but we probably shouldn't set it). 2b. The SMMU integration is not cache-coherent. We allocate the translation tables as Normal-NC (Non-Cachable) such that we don't have to flush them whenever we make changes. Setting the WBWA flag should have no effect, but we definitely shouldn't set it. 3. The SMMU integration is cache-coherent, but we somehow think it isn't. If we allocate our translation tables as Normal-NC, we defenitely are in trouble if we set the WBWA flag. If we allocate them as Normal and do the flushing we could also be in trouble if we set the WBVA flag if the SMMU writes to the translation tables as those write may land in the cache and we might lose those writes if we invalidate the cache lines. > This should be a no-op as I believe that most or all of the machines we > support with an SMMUv2 have a DMA coherent tag. That is probably true. I'd consider hardware where the SMMU doesn't participate in the cache coherency protocol broken. But I wouldn't be surprised if such hardware exists. > Please give this a run on machines where "dmesg | grep ^smmu" shows some > output. > > Cheers, > Patrick > > diff --git a/sys/arch/arm64/dev/smmu.c b/sys/arch/arm64/dev/smmu.c > index 2f81a568069..f57796fdb97 100644 > --- a/sys/arch/arm64/dev/smmu.c > +++ b/sys/arch/arm64/dev/smmu.c > @@ -743,7 +743,9 @@ smmu_v2_domain_create(struct smmu_domain *dom) > if (iovabits >= 40) > dom->sd_4level = 1; > > - reg = SMMU_CB_TCR_TG0_4KB | SMMU_CB_TCR_T0SZ(64 - iovabits); > + reg = SMMU_CB_TCR_TG0_4KB | SMMU_CB_TCR_T0SZ(64 - iovabits) | > + SMMU_CB_TCR_IRGN0_WBWA | SMMU_CB_TCR_ORGN0_WBWA | > + SMMU_CB_TCR_SH0_ISH; > if (dom->sd_stage == 1) { > reg |= SMMU_CB_TCR_EPD1; > } else { > @@ -772,12 +774,6 @@ smmu_v2_domain_create(struct smmu_domain *dom) > break; > } > } > - if (sc->sc_coherent) > - reg |= SMMU_CB_TCR_IRGN0_WBWA | SMMU_CB_TCR_ORGN0_WBWA | > - SMMU_CB_TCR_SH0_ISH; > - else > - reg |= SMMU_CB_TCR_IRGN0_NC | SMMU_CB_TCR_ORGN0_NC | > - SMMU_CB_TCR_SH0_OSH; > smmu_cb_write_4(sc, dom->sd_cb_idx, SMMU_CB_TCR, reg); > > if (dom->sd_4level) { >