Download raw body.
revert m_defrag header align
On Fri, Feb 23, 2024 at 04:54:59PM +0100, Alexander Bluhm wrote:
> On Fri, Feb 23, 2024 at 12:32:13PM +0100, Claudio Jeker wrote:
> > On Fri, Feb 23, 2024 at 11:55:43AM +0100, Alexander Bluhm wrote:
> > > Hi,
> > >
> > > I think we should revert the diff that aligns the mbuf to IP header
> > > in m_defrag(). The cause of the change was that ether_extract_headers()
> > > crashed in em(4) when it was reading TCP header for offloading on
> > > sparc64. This has been fixed by using memcpy().
> > >
> > > Now viommu_dvmamap_load_raw() on sparc64 calls iommu_iomap_insert_page()
> > > twice as often with overlapping pages. m_defrag() is intended as
> > > last resort to make DMA transfers to the hardware. Therefore page
> > > alingment is more important than IP header alignment.
> > >
> > > ok?
> >
> > This makes little sense. We offset a 1500 byte packet by 2 bytes inside a
> > 8k page. This should not result in extra page use. Something else is off
> > here.
>
> My printf debugging tells a different story. Debugging diff
> attached, so you can see what I was looking at.
>
> Note that I had to add a i > map->_dm_segcnt check to avoid that
> too many segments are loaded into DMA. This prevents the "iomap
> insert error: 12 for pa 0xded4e6000" errors I was seeing before.
>
> ixl supports only 8 segments. The driver first tries to load the
> 1500 byte packets combined in 2k clusters and fails. This mbuf
> chain in the TCP output buffer is a result of socket splicing small
> packets. TSO creates a 31922 bytes packet consisting of about 22
> clusters. The first atempt to load it into DMA fails.
>
> _bus_dmamap_load_mbuf: j 9, i 9, _dm_segcnt 8, pktlen 31922
>
> Then m_defrag creates one big cluster that contains the 31922 byte
> packet. ixl supports up to 8 DMA segments each 16k, at least this
> is what the driver says. The cluster is split into 3 segments.
>
> _bus_dmamap_load_mbuf: ds_addr 0x004eba0002, ds_len 8192, j 1, i 3, _dm_segcnt 8, pktlen 31922
> _bus_dmamap_load_mbuf: ds_addr 0x004eba2002, ds_len 8192, j 1, i 3, _dm_segcnt 8, pktlen 31922
> _bus_dmamap_load_mbuf: ds_addr 0x004eba4002, ds_len 15538, j 1, i 3, _dm_segcnt 8, pktlen 31922
>
> Note that the segments start 2 byte after the page due to header
> alignment of the mbuf. That means that each segments covers two
> pages, the page for the remaining 2 byte is overlapping and loaded
> twice.
>
> viommu_dvmamap_load_raw: a 0x004eba0000, aend 0x004eba4000
> viommu_dvmamap_load_raw: a 0x004eba2000, aend 0x004eba4000
> viommu_dvmamap_load_raw: a 0x004eba2000, aend 0x004eba6000
> viommu_dvmamap_load_raw: a 0x004eba4000, aend 0x004eba6000
> viommu_dvmamap_load_raw: a 0x004eba4000, aend 0x004eba8000
> viommu_dvmamap_load_raw: a 0x004eba6000, aend 0x004eba8000
>
> With m_defrag header alignment reverted, we are back to 4 pages
> instead of 6. As the restrictions of ixl DMA for 64k TSO are tight,
> we should not waste resources here. Header alignment after m_defrag
> does not make sense for DMA.
>
> Unfortunately after fixing the two bugs, large TSO on ixl still has
> problems. Memory layout after socket splcing is the worst test
> case. Jan has tested ixl on a different sparc64 that uses iommu
> instead of viommu. There everything works. em and ix TSO works
> also on viommu. Your m_defrag alignment diff had no regressions,
> as it works for em and ix, and ixl on viommu was broken before.
>
> The critical corner case is TSO + ixl + viommu sparc64.
>
> To make progress in debugging I have to
> 1. revert m_defrag header align
> 2. limit _bus_dmamap_load_mbuf to map->_dm_segcnt
> 3. find the remaining bugs
>
The problem is _bus_dmamap_load_mbuf() that has a bug.
while (m) {
vaddr_t vaddr = mtod(m, vaddr_t);
long buflen = (long)m->m_len;
len += buflen;
while (buflen > 0 && i < MAX_DMA_SEGS) {
paddr_t pa;
long incr;
incr = min(buflen, NBPG);
In our case this is a 64k mcluster with 31922. So buflen is bigger than
NBPG (8k) and so incr is set to 8192.
Later on the code does:
buflen -= incr;
vaddr += incr;
This does not work if vaddr is not at the start of a page.
Changing the min() to
incr = min(buflen, PAGE_SIZE - ((u_long)vaddr & PGOFSET));
may fix this.
On the other hand most other archs implemented
_bus_dmamap_load_mbuf() as a wrapper around _dmamap_load_buffer()
so I wonder if sparc64 should do the same.
Also the uio function in sparc64 has the same logic bug and same issue.
I hate that we have so many versions of _bus_dmamap_load_mbuf() in our
tree since it makes it so much harder to adjust that code.
--
:wq Claudio
revert m_defrag header align