-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Recently, OpenBLAS added support for VORTEXM4. I attempted to build OpenBLAS with support for both VORTEX and VORTEXM4 in order to achieve optimal performance across Apple M-series chips.
I built the library using CMake with the following options:
-DDYNAMIC_ARCH=ON
-DDYNAMIC_LIST="VORTEX;VORTEXM4"
-DTARGET=VORTEX
Expected behavior
- The common code should be compiled for VORTEX.
- On Apple M1/M2/M3 systems, the VORTEX code path should be selected at runtime.
- On Apple M4/M5 systems, the VORTEXM4 code path should be selected at runtime.
Observed behavior (macOS / Apple Silicon)
To verify runtime dispatch, I set OPENBLAS_VERBOSE=2 and ran tests on two systems:
sysctl "machdep.cpu.brand_string"
machdep.cpu.brand_string: Apple M2 Max
Core: armv8
sysctl "machdep.cpu.brand_string"
machdep.cpu.brand_string: Apple M4 Pro
Core: armv8
In both cases, OpenBLAS reported ARMV8 rather than selecting VORTEX or VORTEXM4. Performance was also lower than expected.
Then I decided to check how this works on other platforms.
Observed behavior (AArch64)
I then tested on several AArch64 systems using the following settings in Makefile.rule:
TARGET=ARMV8
DYNAMIC_ARCH=1
OpenBLAS did not report any specific core type at runtime.
Observed behavior (x86_64 Linux)
Makefile.rule settings:
TARGET=NEHALEM
DYNAMIC_ARCH=1
I observed the following:
Model name: AMD EPYC 7301 16-Core Processor
Core: Zen
Model name: AMD Ryzen 5 5500
Core: Zen
Model name: AMD Ryzen Threadripper PRO 7995WX 96-Cores
Core: Cooperlake
Model name: AMD Ryzen 9 9950X3D 16-Core Processor
Core: Cooperlake
Zen was used for cores without AVX512 and Cooperlake for cores with AVX512.
Observed behavior (Windows on ARM)
I also attempted to build on Windows on ARM with:
-DDYNAMIC_ARCH=ON
-DDYNAMIC_LIST="NEOVERSEN1;CORTEXX1"
-DTARGET=ARMV8
This resulted in a compilation error:
C:\OpenBLAS-0.3.32\driver\others\dynamic_arm64.c(41,10): fatal error: 'strings.h' file not found
41 | #include <strings.h>
| ^~~~~~~~~~~
Summary
- Runtime CPU detection does not appear to select VORTEX/VORTEXM4 on Apple Silicon.
- AArch64 builds do not report a detected core type.
- Cooperlake code path is used on ZEN 4/5.
- Windows on ARM build fails due to missing
<strings.h>.
Any guidance on whether this behavior is expected (or if I am misconfiguring the build) would be appreciated.