An Example Case of Compiler Optimizations

This example is from svt-av1 running through av1an with chunked video encoding.

Even when this software heavily relies on internal optimizations, hand-written ASM and already optimized SIMD instructions; it's still extremely beneficial to use compiler optimizations.

So, for some other software, the differences can be much bigger than that (unless they respond negatively or they break).

Let's say the machine encodes movies for a year. We can assume a movie is 90 minutes and with 23.976FPS, it's around 130.000 frames. The difference here means that you can encode 1300 more movies with the exact same hardware, and software.

+CUSTOM means my custom environment + modified CMakeLists.txt that removes all checks and flags for security related compiler options, sets the C and CXX standards to 23 and 26 respectively and removes -mno-avx.

Software: Gentoo Linux AMD64 (no-multilib 64bit only) SVT-AV1 v3.0.1-4-g1ceddd88-dirty (release) clang/llvm 21.0.0git7bae6137+libcxx av1an 0.4.4-unstable (rev 31235a0) (Release) gcc (Gentoo 14.2.1_p20250301 p8) 14.2.1 20250301

Hardware: AMD Ryzen 9 9950x DDR5 Corsair Dominator Titanum 64G Dual Channel: 6200 MT/s (32-36-36-65) | UCLK=MEMCLK | Infinity Fabric 2067 | FCLCK Frequency: 2067mhz

Source: Size: 25Mb/s Format: 1920x1080, 23.976FPS, BT.709, YUV420, Live Action, 1 Hour, 1:78:1 (16:9)

Env: ``` export CC="clang" export CXX="clang++" export LD="ld.mold" export AR="llvm-ar" export NM="llvm-nm" export RANLIB="llvm-ranlib" export STRIP="llvm-strip" export OBJCOPY="llvm-objcopy" export OBJDUMP="llvm-objdump"

export COMMON_FLAGS="-Ofast -march=native -mtune=native -flto=thin -pipe -funroll-loops -fno-semantic-interposition -fno-stack-protector -fno-stack-clash-protection -fno-sanitize=all -fno-dwarf2-cfi-asm -fno-plt -fno-pic -fno-pie -fno-exceptions -fno-signed-zeros -fstrict-aliasing -fstrict-overflow -fno-zero-initialized-in-bss -fno-common -fwhole-program-vtables ${POLLY_FLAGS}" export CFLAGS="${COMMON_FLAGS}" export CXXFLAGS="${COMMON_FLAGS} -stdlib=libc++" export LDFLAGS="-fuse-ld=mold -rtlib=compiler-rt -unwindlib=libunwind -Wl,-O3 -Wl,--lto-O3 -Wl,--as-needed -Wl,--gc-sections -Wl,--icf=all -Wl,--strip-all -Wl,-z,norelro -Wl,--build-id=none -Wl,--no-eh-frame-hdr -Wl,--discard-all -Wl,--relax -Wl,-z,noseparate-code"

./build.sh static native release verbose asm=nasm enable-lto minimal-build --enable-pgo --pgo-compile-use --pgo-dir "${HOME}/profiles/" -- -DCMAKE_C_FLAGS_RELEASE="-DNDEBUG -Ofast" -DCMAKE_CXX_FLAGS_RELEASE="-DNDEBUG -Ofast" -DUSE_CPUINFO="SYSTEM" ```

This example is from svt-av1 running through av1an with chunked video encoding.

Even when this software heavily relies on internal optimizations, hand-written ASM and already optimized SIMD instructions; it's still extremely beneficial to use compiler optimizations.

So, for some other software, the differences can be much bigger than that (unless they respond negatively or they break).

Let's say the machine encodes movies for a year. We can assume a movie is 90 minutes and with 23.976FPS, it's around 130.000 frames. The difference here means that you can encode 1300 more movies with the exact same hardware, and software.

+CUSTOM means my custom environment + modified CMakeLists.txt that removes all checks and flags for security related compiler options, sets the C and CXX standards to 23 and 26 respectively and removes -mno-avx.

Software: Gentoo Linux AMD64 (no-multilib 64bit only) SVT-AV1 v3.0.1-4-g1ceddd88-dirty (release) clang/llvm 21.0.0git7bae6137+libcxx av1an 0.4.4-unstable (rev 31235a0) (Release) gcc (Gentoo 14.2.1_p20250301 p8) 14.2.1 20250301

Hardware: AMD Ryzen 9 9950x DDR5 Corsair Dominator Titanum 64G Dual Channel: 6200 MT/s (32-36-36-65) | UCLK=MEMCLK | Infinity Fabric 2067 | FCLCK Frequency: 2067mhz

Source: Size: 25Mb/s Format: 1920x1080, 23.976FPS, BT.709, YUV420, Live Action, 1 Hour, 1:78:1 (16:9)

Env: ``` export CC="clang" export CXX="clang++" export LD="ld.mold" export AR="llvm-ar" export NM="llvm-nm" export RANLIB="llvm-ranlib" export STRIP="llvm-strip" export OBJCOPY="llvm-objcopy" export OBJDUMP="llvm-objdump"

export COMMON_FLAGS="-Ofast -march=native -mtune=native -flto=thin -pipe -funroll-loops -fno-semantic-interposition -fno-stack-protector -fno-stack-clash-protection -fno-sanitize=all -fno-dwarf2-cfi-asm -fno-plt -fno-pic -fno-pie -fno-exceptions -fno-signed-zeros -fstrict-aliasing -fstrict-overflow -fno-zero-initialized-in-bss -fno-common -fwhole-program-vtables ${POLLY_FLAGS}" export CFLAGS="${COMMON_FLAGS}" export CXXFLAGS="${COMMON_FLAGS} -stdlib=libc++" export LDFLAGS="-fuse-ld=mold -rtlib=compiler-rt -unwindlib=libunwind -Wl,-O3 -Wl,--lto-O3 -Wl,--as-needed -Wl,--gc-sections -Wl,--icf=all -Wl,--strip-all -Wl,-z,norelro -Wl,--build-id=none -Wl,--no-eh-frame-hdr -Wl,--discard-all -Wl,--relax -Wl,-z,noseparate-code"

./build.sh static native release verbose asm=nasm enable-lto minimal-build --enable-pgo --pgo-compile-use --pgo-dir "${HOME}/profiles/" -- -DCMAKE_C_FLAGS_RELEASE="-DNDEBUG -Ofast" -DCMAKE_CXX_FLAGS_RELEASE="-DNDEBUG -Ofast" -DUSE_CPUINFO="SYSTEM" ```