Index | Thread | Search

From:
Jeremie Courreges-Anglas <jca@wxcvbn.org>
Subject:
Re: nvme(4) sensors
To:
Jonathan Matthew <jonathan@d14n.org>
Cc:
Mark Kettenis <mark.kettenis@xs4all.nl>, tech@openbsd.org
Date:
Fri, 12 Jul 2024 14:39:03 +0200

Download raw body.

Thread
On Thu, Jul 11, 2024 at 04:03:32PM +0200, Mark Kettenis wrote:
> > Date: Thu, 11 Jul 2024 15:49:27 +0200
> > From: Jonathan Matthew <jonathan@d14n.org>
> > 
> > This adds a basic set of sensors for nvme(4) showing device temperature
> > and overall health.
> 
> Diff doesn't compile on arm64.  Adding #include <sys/sensors.h> to
> dev/ic/nvmevar.h fixes that.
> 
> > It looks like this:
> > 
> > $ sysctl hw.sensors.nvme0
> > hw.sensors.nvme0.temp0=42.85 degC, OK
> > hw.sensors.nvme0.percent0=0.00% (endurance used), OK
> > hw.sensors.nvme0.percent1=100.00% (available spare), OK
> > 
> > If the temperature exceeds the device's threshold, temp0 status changes
> > to critical, and if the available spare capacity falls below the device's
> > threshold, percent1 status changes to critical.
> > 
> > The nvme features used here have been mandatory since version 1.0 of
> > the specification, so it's reasonable to just assume they're available.
> 
> 
> 
> hw.sensors.nvme0.temp0=38.85 degC, OK
> hw.sensors.nvme0.percent0=0.00% (endurance used), OK
> hw.sensors.nvme0.percent1=100.00% (available spare), OK

bioctl:
nvme0: CT1000P3SSD8, P9CR311, 23484544C5D5
nvme0: Max i/o 131072 bytes, Persisent Event Log, Volatile Write Cache, Sanitize 0xa0000002<BlockErase>
nvme0: Features 0x2<NOPSPM>
nvme0: Admin commands 0x17<DST,FORMAT,FWCD,SECSR>, NVM commands 0xd7<SCMP,SDMGMT,SF,SV,SWU,TS>
nvme0: NVMe 1.4, NVM I/O command set, Enabled, Ready
Volume      Status               Size Device
    nvme0 0 Online      1000204886016 sd0     CONCAT
          0: Formats *512 (Better), 4096 (Best)
          0: Features 0x8<UIDREUSE>
          0 Online      1000204886016 1:1.0   Namespace 1 <>

Sensors:
hw.sensors.nvme0.temp0=37.85 degC, OK
hw.sensors.nvme0.percent0=0.00% (endurance used), OK
hw.sensors.nvme0.percent1=100.00% (available spare), OK

> > Do the sensor names make sense?  Is refreshing them once per minute enough?
> 
> Probably.  Maybe if we want the hardware to self-protect we'd need to
> poll more often.  Or use an interrupt if the hardware supports it.
> 
> ok kettenis@ with the include issue fixed.

Also ok jca@

-- 
jca