Index | Thread | Search

From:
Marcus Glocker <marcus@nazgul.ch>
Subject:
Re: smmu(4) on QC Laptops
To:
Patrick Wildt <patrick@blueri.se>
Cc:
Mark Kettenis <mark.kettenis@xs4all.nl>, tech@openbsd.org, tobhe@openbsd.org, mlarkin@openbsd.org
Date:
Sat, 2 Aug 2025 16:24:01 +0200

Download raw body.

Thread
On Fri, Aug 01, 2025 at 10:21:21PM +0200, Marcus Glocker wrote:

> On Fri, Aug 01, 2025 at 09:21:37PM +0200, Patrick Wildt wrote:
> 
> > On Sat, May 24, 2025 at 07:21:59AM +0200, Patrick Wildt wrote:
> > > On Fri, May 23, 2025 at 07:58:29PM +0200, Mark Kettenis wrote:
> > > > > Date: Fri, 23 May 2025 08:46:16 +0200
> > > > > From: Patrick Wildt <patrick@blueri.se>
> > > > > Cc: kettenis@openbsd.org, tobhe@openbsd.org, mlarkin@openbsd.org
> > > > > Content-Type: text/plain; charset=us-ascii
> > > > > Content-Disposition: inline
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > Nearly three years ago we disabled smmu(4) for QC laptops because they
> > > > > just rebooted instantly.  I've finally gone to the bottom of why this
> > > > > happens.
> > > > > 
> > > > > On servers we usually boot up with most peripherals not doing any kind
> > > > > of DMA.  A serial console is kind of the only thing we depend on, which
> > > > > usually doesn't use DMA for simplicity.  Everything else, like NVMe or
> > > > > USB will get turned on later.  For those use cases it's been perfectly
> > > > > fine to just turn the SMMU on with strict filtering, because only once
> > > > > the driver configures puts in a job DMA transactions will occur.
> > > > > 
> > > > > On the QC laptops the framebuffer goes through an SMMU.  That's one of
> > > > > the reasons we need to keep the streams alive and actively put them into
> > > > > a bypass context, which we do!  As soon as a driver attaches, we move
> > > > > the stream over into a strict translation context.
> > > > > 
> > > > > Unfortunately the changes I did were not completely sufficient.
> > > > > 
> > > > > (1) The IOMMU enforcement happens before the driver gets a chance to
> > > > >     stop current work.
> > > > > 
> > > > > simplebus(4) creates an IOMMU-bound DMA tag before it attempts to attach
> > > > > the children.  This leads do two interesting things:
> > > > > 
> > > > > (a) If a simplebus(4) has an IOMMU assigned, it is passed an IOMMU-bound
> > > > >     DMA tag.  All children will get subjected to it.
> > > > > 
> > > > > smmu1: establishing sid 0x423
> > > > > simplebus1 at simplebus0: "geniqup"
> > > > > "serial" at simplebus1 not configured
> > > > > smmu1: establishing sid 0x123
> > > > > simplebus2 at simplebus0: "geniqup"
> > > > > 
> > > > > (b) If there's no driver to be found, the IOMMU-bound DMA tag is created
> > > > >     anyway.
> > > > > 
> > > > > smmu1: establishing sid 0x1c00
> > > > > smmu1: took over 8/1c00/2 for sid 1c00
> > > > > "display-subsystem" at simplebus0 not configured
> > > > > 
> > > > > The good thing is that (a) is not a problem in this case as we do not
> > > > > have any active transactions such mappings, as far as I can see on the
> > > > > X1E.  But the framebuffer is behind the display-subsystem which is now
> > > > > subjected to strict IOMMU configuration, making us need to fix (b).
> > > > > 
> > > > > One idea I have is that we keep all the setup code where it is, at
> > > > > IOMMU-bound DMA tag creation time, but only move the stream to the
> > > > > strict context block once someone(TM) creates a DMA map using this
> > > > > tag.
> > > > > 
> > > > > Is that a reasonable thing to do?  I don't think anyone would be doing
> > > > > DMA without creating a DMA map first, right?
> > > > 
> > > > Correct.
> > > > 
> > > > I'm not sure how this is going to work if we want to write a driver
> > > > for the display controller.  Presumably we'd create IOMMU mappings for
> > > > the initial framebuffer.  But to do so, we'd need to create a DMA map
> > > > first, and when we do that...  game over?
> > > > 
> > > > So maybe we need to do the takeover when we do the first DMA map load,
> > > > after we've entered the mappings into the page table?  But that only
> > > > works if we have a single mapping for the framebuffer.
> > > 
> > > I was hoping that we'd have code that first turns off the transactions,
> > > making us "go blind" until we re-enable it with the new address.  But
> > > yes, that would require the disable sequence to happen before the map
> > > is created.  We could do it on load, which is kind of a hot-path.  I
> > > was hoping for now we put it into map create and tackle this once or
> > > if we get around to writing this driver.
> > > 
> > > > On the Apple systems, the framebuffer is also behind an IOMMU.  There
> > > > we have some code that creates some initial mappings based on
> > > > information from the device tree.
> > > > 
> > > > > (2) Re-use of SMRs doesn't take mask in account.
> > > > > 
> > > > > In (1b) you can see that the display subsystem is using sid 0x1c00 with
> > > > > mask 0x2, matching to the following SMR:
> > > > > 
> > > > > smmu1: SMR[8] = 0x1c00/0x2
> > > > > 
> > > > > We'd like to re-use that if possible, but because of the strict mask
> > > > > checking that wasn't possible.  So I'd like to take the mask into
> > > > > consideration when "taking over" streams.  This is only done on QC!
> > > > 
> > > > Sorry, you'll have to explain to me what what the SMRs do...
> > > 
> > > Think work slightly different depending on SMMU version/feature flags,
> > > but for this machine it works an IOMMU setup like this:
> > > 
> > > You configure a context block.  The context block is basically the IOMMU
> > > group that owns the pagetable.  To assign devices to a context block,
> > > you setup a "Stream Mapping Register", e.g.:
> > > 
> > >  * Allocate + setup context block 3 (e.g. IOMMU group 3)
> > >    * This is the actual pagetable and translation setup
> > >  * Allocate a stream map entry (can be any, e.g. 5)
> > >    * The array has a fixed length, we need to use one slot to
> > >      point device to context block.
> > >  * Write(SMR(5), DeviceId + Mask + Valid)
> > >    * For each DMA transaction, the SMMUs stream mapping function will
> > >      iterate the stream map registers to find an enty where:
> > >      StreamId & Mask == DeviceId & Mask
> > >    * In this example, after writing SMR(5), slot 5 should match
> > >  * Write(S2CR(5) = 3)
> > >    * This points stream map 5 to context block 3
> > >    * In this example, after the HW's lookup function returns 5, it will
> > >      look into S2CR(5) to find which context block it belongs to
> > > 
> > > On this laptop, the SMRs basically tell you what devices are currently
> > > enabled to pass through the IOMMU.  We update the SMRs to point all
> > > currenetly allowed devices to a "bypass domain", so that existing
> > > traffic goes into a 1:1 translation.
> > > 
> > > One we want attach a driver and want to make a device use stricter IOMMU
> > > mappings, we setup a context block and then switch the device from the
> > > bypass IOMMU group/context block to the strict IOMMU group/context
> > > block.
> > > 
> > > In reality we're not actually changing the SMR (that's just the filter
> > > rule), we're updating the S2CR to point the stream to a different group.
> > > 
> > > But if we don't "take over" the existing SMR mapping, the code will
> > > allocate a new SMR register, and then the IOMMU will complain about a
> > > match conflict because an incoming transaction matches two SMR entries.
> > > 
> > > Let me know if I worded this too confusingly!
> > > 
> > > > > Please give this a go on machines where `dmesg | grep ^smmu' gives
> > > > > output, and especially on QC machines with X1E, or SC8280XP/ThinkPad
> > > > > x13s.  Please send me some dmesgs, as I have added some debug code
> > > > > that I'd like to see run on other machines. :)
> > > > 
> > > > Here's the Ampere eMAG machine.  ACPI so some of your changes don't
> > > > matter, but it still works with your diff.
> > > > 
> > > 
> > > Thanks! :)
> > > 
> > > Cheers,
> > > Patrick
> > 
> > Updated diff that sets the SMR register not just to the stream id but
> > also the mask we kept saved.
> 
> Following the fdt dmesgs without, and with your patch, from my
> Samsung Galaxy Book4 Edge.

I'm not sure what has changed in the meantime, but when I retry your
patch with the latest kernel sources, I'm getting a panic now during
boot triggered by the ufshci(4) DMA allocation:

https://nazgul.ch/pub/smmu_panic.jpg

I haven't investigated further yet, just noticed it now.