Mailing List Archive

On Mon, Mar 18, 2024 at 07:13:43PM +0000, Martin Pieuchot wrote: > What is the idea behind this diff? With a consequent number of CPUs (16 > or more) grabbing a global mutex for every page allocation & free creates > a lot of contention resulting in many CPU cycles wasted in system (kernel) > time. The idea of this diff is to add another layer on top of the global > allocator to allocate and free pages in batch. Note that, in this diff, > this cache is only used for page faults. > > +/* > + * uvm_pcpu_getpage: allocate a page from the current CPU's cache > + */ > +struct vm_page * > +uvm_pcpu_getpage(int flags) > +{ > + struct uvm_percpu *upc = &curcpu()->ci_uvm; > + struct vm_page *pg; > + > + if (upc->upc_count == 0) { > + atomic_inc_int(&uvmexp.pcpmiss); > + if (uvm_pcpu_fillcache()) > + return NULL; > + } else { > + atomic_inc_int(&uvmexp.pcphit); > + } > + > + atomic_dec_int(&uvmexp.percpucaches); > + upc->upc_count--; > + pg = upc->upc_pages[upc->upc_count]; > + > + if (flags & UVM_PLA_ZERO) > + uvm_pagezero(pg); > + > + return pg; > +} > + First 2 minor remarks: 1. maintaining stats in a global struct avoidably reduces single-threaded perf due to atomics and scalability due to cacheline bounces. the hand-rolled mechanism should have its own cpu-local stats. 2. uvm_pagezero on amd64 is implemented with non-temporal stores because of the dedicated kernel thread for background page zeroing. while I consider existence of such a thread to be long obsolete, I'm going to ignore this aspect. key here is that nt stores before direct usage of the page only results in cache misses later on (even more so if you have a LIFO allocation policy and the page was mostly in L3). instead uvm_pagezero_onfault or something could be added to zero "normally". All that aside the real question is why the hand-rolled mechanism in the first place? I see your "pool allocator" has a per-cpu caching layer, and based on that I would expect caching to be implemented on top of it. If the pool allocator has significant shortcommings they should probably get addressed instead of rolling with a dedicated mechanism.

2024-03-21 10:52 Mateusz Guzik:
per-CPU page caches for page faults
- 2024-03-24 10:52 Martin Pieuchot:
  per-CPU page caches for page faults