Skip to content

DYNAMIC_ARCH woes #5713

@dnoan

Description

@dnoan

Recently, OpenBLAS added support for VORTEXM4. I attempted to build OpenBLAS with support for both VORTEX and VORTEXM4 in order to achieve optimal performance across Apple M-series chips.

I built the library using CMake with the following options:

-DDYNAMIC_ARCH=ON  
-DDYNAMIC_LIST="VORTEX;VORTEXM4"  
-DTARGET=VORTEX

Expected behavior

  • The common code should be compiled for VORTEX.
  • On Apple M1/M2/M3 systems, the VORTEX code path should be selected at runtime.
  • On Apple M4/M5 systems, the VORTEXM4 code path should be selected at runtime.

Observed behavior (macOS / Apple Silicon)

To verify runtime dispatch, I set OPENBLAS_VERBOSE=2 and ran tests on two systems:

sysctl "machdep.cpu.brand_string"
machdep.cpu.brand_string: Apple M2 Max
Core: armv8
sysctl "machdep.cpu.brand_string"
machdep.cpu.brand_string: Apple M4 Pro
Core: armv8

In both cases, OpenBLAS reported ARMV8 rather than selecting VORTEX or VORTEXM4. Performance was also lower than expected.

Then I decided to check how this works on other platforms.

Observed behavior (AArch64)

I then tested on several AArch64 systems using the following settings in Makefile.rule:

TARGET=ARMV8
DYNAMIC_ARCH=1

OpenBLAS did not report any specific core type at runtime.

Observed behavior (x86_64 Linux)

Makefile.rule settings:

TARGET=NEHALEM
DYNAMIC_ARCH=1

I observed the following:

Model name: AMD EPYC 7301 16-Core Processor
Core: Zen
Model name: AMD Ryzen 5 5500
Core: Zen
Model name: AMD Ryzen Threadripper PRO 7995WX 96-Cores
Core: Cooperlake
Model name: AMD Ryzen 9 9950X3D 16-Core Processor
Core: Cooperlake

Zen was used for cores without AVX512 and Cooperlake for cores with AVX512.

Observed behavior (Windows on ARM)

I also attempted to build on Windows on ARM with:

-DDYNAMIC_ARCH=ON  
-DDYNAMIC_LIST="NEOVERSEN1;CORTEXX1"  
-DTARGET=ARMV8

This resulted in a compilation error:

C:\OpenBLAS-0.3.32\driver\others\dynamic_arm64.c(41,10): fatal error: 'strings.h' file not found
   41 | #include <strings.h>
      |          ^~~~~~~~~~~

Summary

  • Runtime CPU detection does not appear to select VORTEX/VORTEXM4 on Apple Silicon.
  • AArch64 builds do not report a detected core type.
  • Cooperlake code path is used on ZEN 4/5.
  • Windows on ARM build fails due to missing <strings.h>.

Any guidance on whether this behavior is expected (or if I am misconfiguring the build) would be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions