Index | Thread | Search

From:
j@bitminer.ca
Subject:
proposed www pages with advice on cpu extensions
To:
Tech <tech@openbsd.org>
Date:
Sun, 14 Jul 2024 18:03:36 -0400

Download raw body.

Thread
  • j@bitminer.ca:

    proposed www pages with advice on cpu extensions

Hi @tech,

After the extended discussion on elf_aux_info, I wrote down what I
learned in the form of changes to the porting specialtopics.html file.

Comments, corrections?

(upcoming emulation of arm64 MIDR not included.)


J


Index: index.html
===================================================================
RCS file: /cvs/www/faq/ports/index.html,v
diff -u -p -r1.37 index.html
--- index.html	15 Jul 2020 21:52:04 -0000	1.37
+++ index.html	14 Jul 2024 21:46:26 -0000
@@ -72,6 +72,7 @@ Porter's Handbook
    <li><a href="specialtopics.html#Audio"     >Audio Applications</a>
    <li><a href="specialtopics.html#Mandoc"    >Manual Pages</a>
    <li><a href="specialtopics.html#RcScripts" >rc.d(8) Scripts</a>
+  <li><a href="specialtopics.html#Optimizing">Optimizing with CPU 
Extensions</a>
  </ul>

  <h3><a href="testing.html">Port Testing Guide</a></h3>
Index: specialtopics.html
===================================================================
RCS file: /cvs/www/faq/ports/specialtopics.html,v
diff -u -p -r1.90 specialtopics.html
--- specialtopics.html	4 Nov 2022 23:46:22 -0000	1.90
+++ specialtopics.html	14 Jul 2024 21:46:27 -0000
@@ -30,6 +30,7 @@ Ports - Special Porting Topics
    <li><a href="#Audio"     >Audio Applications</a>
    <li><a href="#Mandoc"    >Manual Pages</a>
    <li><a href="#RcScripts" >rc.d(8) Scripts</a>
+  <li><a href="#Optimizing" >Optimizing with CPU Extensions</a>
  </ul>

  <hr>
@@ -1237,7 +1238,7 @@ daemon="${TRUEPREFIX}/sbin/munin-node"
  pexp="/usr/bin/perl -wT ${daemon}${daemon_flags:+ ${daemon_flags}}"

  rc_pre() {
-        install -d -o _munin /var/run/munin
+	install -d -o _munin /var/run/munin
  }

  rc_cmd $1
@@ -1246,3 +1247,111 @@ rc_cmd $1
  A <a 
href="https://cvsweb.openbsd.org/ports/infrastructure/templates/rc.template?rev=HEAD">
  template script</a> can also be found in the templates directory of 
your
  ports tree.
+
+<h2 id="Optimizing">Optimizing with CPU Extensions</h2>
+
+This section provides information on selecting CPU instruction
+extensions, or CPU features, for ports.
+
+<p>
+Ports that do intensive numerics or intensive text processing may 
benefit from
+various CPU features available for amd64, arm64, or powerpc64 
architectures.
+
+However, be aware that ports are built and used on many user and 
OpenBSD
+developer machines and these may not possess such features.
+Here are some guidelines for making ports operate reliably across 
similar
+architectures with varying CPU features.
+
+<h3>CPU features</h3>
+
+Most CPU architectures are decades old and have evolved to add
+numeric performance features like AVX (amd64), Neon/ASIMD (arm64),
+or VSX (powerpc64).
+Some features are added for kernels to improve or defend
+against CPU defects like Spectre, or to inform the kernel that
+the hardware is updated to fix those defects.
+
+<p>
+Generally, CPU features intended for kernels are not interesting
+to user code.
+On the other hand, some CPU features for user code
+require kernel support to work, otherwise results are unpredictable
+or the code aborts.
+Examples include those listed above.
+
+<p>
+Kernel support for features varies according to the OpenBSD release
+and model of CPU.
+The function elf_aux_info(3) provides indicators of which features
+are available.
+
+<h3>Ports with CPU features enabled</h3>
+
+Any port must work on the base CPU model used to build or run OpenBSD.
+Some users have amd64 machines that are decades old and expect ports
+to work on them.
+Therefore performance-enhancing features, if used, must be
+optional, and automatic.
+
+<p>
+For users, "to work" means the program operates reliably and is
+tested according to software built-in tests or the porter's own tests.
+
+<p>
+Enabling optional CPU features means those tests have to be repeated 
with
+the features enabled and disabled.
+The dynamic switching between feature "on" and "off" must also be 
tested.
+This can more than double or triple the test effort.
+
+<p>
+Additionally, adding optional CPU features raise the risk of compiler
+problems that may result in incorrect results.
+
+<p>
+Here is a checklist of considerations:
+
+<ul>
+<li>Will it make a difference?
+A machine used largely for network routing
+does not benefit from AVX512 numeric performance.
+But if a package has an active user base and needs the speed, this
+may motivate using the features.
+<li>How many people will use it?
+A photo processing program is more popular on OpenBSD than a physics
+code (excluding game engines, of course.)
+Perhaps a game engine will benefit enough to be visibly faster.
+<li>How does the code currently check for new features?
+Typical code uses amd64 instructions (unprivileged) to detect features.
+Arm64 and powerpc64 require kernel support as feature
+detection requires privileged instructions.
+Does the code use getauxval(3) (Linux) or elf_aux_info(3) to detect 
features?
+This makes the code already portable, or may require minimal patches.
+<li>Does the code parse /proc/cpuinfo (Linux) or dmesg(8)?
+This should be avoided as kernel
+support is not guaranteed even if a cpu feature is reported as present.
+<li>Does the code dynamically test if the features are present in
+the currently running instance?
+If built code assumes the compile-time environment is the same as
+runtime then programs might abort.
+<li>Does the code have sufficient built-in tests to usefully compare
+results with and without the features?
+Do they work on OpenBSD?
+<li>Do you have matching equipment on at least one architecture
+to perform tests?
+<li>Can the features easily be controlled, at least for testing and
+not least for debugging when users complain of problems?
+</ul>
+
+<p>
+Some current ports that select on CPU features include:
+
+<ul>
+<li>graphics/opencolorio supports AVX and Neon and enables these
+features.
+<li>math/py-numpy, which detects amd64 features itself and relies on 
getauxval(3)
+on other platforms.
+The symbol NPY_DISABLE_FEATURES= permits control of features in use.
+<li>sysutils/ugrep turns these (AVX, Neon) features off because it
+does not dynamically control their use.
+</ul>
+