Singularity

Architecture & Methods — Singularity

How the simulator is built. Code organization, the backend abstraction layer, the Metal and Vulkan compute pipelines, the CUDA stretch backend, shader sharing across MSL/HLSL/CUDA, the build system, CI/CD across two platforms, distribution, and testing strategy. The physics of what each kernel computes lives in PHYSICS.md; this doc covers the engineering of how it computes it across three GPU APIs and one C++ codebase.


1. High-level architecture

                    ┌──────────────────────────────────────┐
                    │       singularity (cross-platform)   │
                    │                                      │
                    │  ┌────────────────────────────────┐  │
                    │  │ App shell (SDL3 + Dear ImGui)  │  │
                    │  │ - window, input, settings UI   │  │
                    │  └────────────────┬───────────────┘  │
                    │                   │                  │
                    │  ┌────────────────┼────────────────┐ │
                    │  │ Renderer (selects backend)      │ │
                    │  │  ┌────────────────────────────┐ │ │
                    │  │  │   abstract RenderBackend   │ │ │
                    │  │  └────────────────────────────┘ │ │
                    │  │            │       │            │ │
                    │  │      ┌─────┼───┐  ┌┼──────────┐ │ │
                    │  │      │ Metal   │  │ Vulkan    │ │ │
                    │  │      │ Backend │  │ Backend   │ │ │
                    │  │      └─────────┘  └───────────┘ │ │
                    │  └─────────────────────────────────┘ │
                    │                   │                  │
                    │  ┌────────────────┼────────────────┐ │
                    │  │   shared C++ physics core       │ │
                    │  │   - State, Camera, Settings     │ │
                    │  │   - geodesic_rhs() (in headers) │ │
                    │  │   - Christoffel symbols         │ │
                    │  └─────────────────────────────────┘ │
                    └──────────────────────────────────────┘

      ┌─────────────────┐                     ┌──────────────────────┐
      │ singularity_cli │                     │ singularity_cuda_cli │
      │ (headless,      │                     │ (NVIDIA-only,        │
      │  drives backend │                     │  offline 4K/8K       │
      │  for Python     │                     │  renderer using      │
      │  verification)  │                     │  CUDA backend)       │
      └─────────────────┘                     └──────────────────────┘
              ▲                                          ▲
              │                                          │
      ┌───────┴──────────────────────────────────────────┴───────┐
      │ verification/ (Python)                                   │
      │ - SymPy: symbolic Christoffel derivation                 │
      │ - SciPy: independent reference integrator                │
      │ - pytest: drives all tests, including                    │
      │   golden-image diffs and backend-equivalence             │
      └──────────────────────────────────────────────────────────┘

      ┌──────────────────────────────────────────────────────────┐
      │ web/ (Next.js + TypeScript + WebGPU)                     │
      │ - imports physics core compiled to WASM via Emscripten   │
      │ - WGSL kernels dispatched by TS host                     │
      └──────────────────────────────────────────────────────────┘

Key property of this layout: the physics math lives in one place — shared C++ headers under core/include/physics/. Every backend (Metal, Vulkan, CUDA, WebGPU) calls those headers from its own kernel boilerplate. A physics bug fixed in schwarzschild.hpp is fixed everywhere simultaneously. The backend-equivalence test (verification/test_backend_equivalence.py) catches any drift if it appears.

2. Repository layout

singularity/
├── README.md
├── docs/
│   ├── PRD.md
│   ├── PHYSICS.md
│   ├── ARCHITECTURE.md
│   └── TODO.md
├── core/                       # platform-agnostic C++ core
│   ├── include/
│   │   ├── physics/            # SHARED across all backends
│   │   │   ├── state.hpp       # State vector
│   │   │   ├── schwarzschild.hpp
│   │   │   ├── kerr.hpp
│   │   │   └── integrator.hpp
│   │   ├── camera.hpp
│   │   ├── settings.hpp
│   │   └── scene.hpp
│   └── src/
│       └── (cpu-side helpers, settings JSON, etc.)
├── render/
│   ├── include/
│   │   └── render_backend.hpp  # The abstraction
│   ├── metal/                  # Metal implementation
│   │   ├── metal_backend.hpp
│   │   ├── metal_backend.mm    # Objective-C++ for ARC interop
│   │   └── shaders/
│   │       ├── geodesic_kernel.metal
│   │       ├── disc_intersection.metal
│   │       └── blit.metal
│   ├── vulkan/                 # Vulkan implementation
│   │   ├── vulkan_backend.hpp
│   │   ├── vulkan_backend.cpp
│   │   └── shaders/
│   │       ├── geodesic_kernel.hlsl   # → DXC → SPIR-V
│   │       ├── disc_intersection.hlsl
│   │       └── blit.hlsl
│   └── cuda/                   # CUDA stretch implementation
│       ├── cuda_backend.cu
│       └── kernels/
│           └── geodesic_kernel.cu
├── shared_shader/              # SHARED math used by every backend's shaders
│   ├── shader_compat.h         # platform macros (DEVICE, INLINE, etc.)
│   ├── geodesic_math.h         # the actual physics, callable from MSL/HLSL/CUDA
│   └── color_math.h
├── app/                        # cross-platform app shell
│   ├── main.cpp
│   ├── app_shell.cpp           # SDL3 window, ImGui setup
│   └── settings_ui.cpp
├── cli/                        # headless binary for verification
│   └── main.cpp
├── cuda_cli/                   # offline renderer (stretch)
│   └── main.cpp
├── tests/                      # C++ unit tests (Catch2)
│   ├── test_camera.cpp
│   ├── test_integrator.cpp
│   ├── test_schwarzschild.cpp
│   └── test_kerr.cpp
├── verification/               # Python physics verification
│   ├── christoffel_sympy.py
│   ├── test_photon_sphere.py
│   ├── test_deflection.py
│   ├── test_isco.py
│   ├── test_redshift.py
│   ├── test_golden_images.py
│   ├── test_backend_equivalence.py
│   ├── golden/                 # Reference PNGs (per backend)
│   │   ├── metal/
│   │   └── vulkan/
│   └── conftest.py
├── web/                        # Next.js docs site + WebGPU demo
│   ├── app/
│   ├── content/                # MDX, mirrors PHYSICS/ARCHITECTURE
│   ├── components/Demo/
│   └── public/wasm/            # Built-from-core WASM artifacts
├── third_party/
│   ├── metal-cpp/              # Apple
│   ├── Vulkan-Hpp/             # Khronos
│   ├── SDL/                    # SDL3
│   ├── imgui/                  # Dear ImGui
│   ├── catch2/
│   └── stb/
├── CMakeLists.txt              # cross-platform, conditional backend selection
├── BUILDING.md                 # platform-specific build instructions
├── .clang-format
├── .clang-tidy
└── .github/workflows/
    ├── ci.yml                  # matrix: macos-14 + windows-2022
    ├── release.yml             # tag → signed .dmg + .msi
    └── docs.yml                # web/ → Vercel

3. The backend abstraction

The interface is small on purpose. Over-abstracting is the failure mode.

// render/include/render_backend.hpp
#pragma once
#include <cstdint>
#include <memory>
#include <vector>

namespace singularity {

struct WindowHandle {
    void* native_window;        // NSWindow* on Mac, HWND on Windows
    void* native_view;          // CAMetalLayer*, NSView* on Mac; ignored on Win
};

struct RenderConfig {
    uint32_t width;
    uint32_t height;
    bool vsync_enabled;
};

struct Scene {
    enum class MetricType { Schwarzschild, Kerr };
    MetricType metric;
    float mass_solar;
    float spin_a_over_M;        // 0 for Schwarzschild
    float disc_inner_M;
    float disc_outer_M;
    bool disc_doppler_on;
    bool disc_redshift_on;
    bool disc_texture_on;
    bool show_overlay;
};

struct CameraState {
    float position[3];
    float basis[9];             // 3x3 orientation matrix, row-major
    float fov_y_radians;
};

struct ImageData {
    std::vector<uint8_t> pixels_rgba;
    uint32_t width;
    uint32_t height;
};

class RenderBackend {
public:
    virtual ~RenderBackend() = default;

    // Lifecycle
    virtual bool initialize(WindowHandle window, RenderConfig config) = 0;
    virtual void shutdown() = 0;
    virtual void resize(uint32_t width, uint32_t height) = 0;

    // Per-frame
    virtual void render_frame(const Scene& scene, const CameraState& camera) = 0;

    // Optional: for verification + screenshot export
    virtual ImageData capture_frame() = 0;

    // Backend identification
    virtual const char* name() const = 0;  // "Metal", "Vulkan", "CUDA"
};

// Factory selects based on platform + build config
std::unique_ptr<RenderBackend> create_default_backend();

} // namespace singularity

What this interface deliberately omits:

What’s inside each backend:

4. Sharing physics math across MSL, HLSL, and CUDA

4.1 The compatibility header

// shared_shader/shader_compat.h
//
// Defines DEVICE, INLINE, and other macros so the same math headers compile
// in MSL, HLSL, CUDA, and even host C++.

#if defined(__METAL_VERSION__)
    // Metal Shading Language (a C++14 dialect)
    #define DEVICE
    #define INLINE inline
    #define CONSTANT constant
    #include <metal_stdlib>
    using namespace metal;
    typedef float3 vec3;
    typedef float4 vec4;
#elif defined(__HLSL_VERSION) || defined(_HLSL)
    // HLSL (used for Vulkan via DXC → SPIR-V)
    #define DEVICE
    #define INLINE inline
    #define CONSTANT static const
    typedef float3 vec3;
    typedef float4 vec4;
#elif defined(__CUDACC__)
    // CUDA C++
    #define DEVICE __device__
    #define INLINE __forceinline__
    #define CONSTANT __constant__
    #include <cuda_runtime.h>
    typedef float3 vec3;
    typedef float4 vec4;
#else
    // Host C++ (so we can unit-test the math on CPU)
    #define DEVICE
    #define INLINE inline
    #define CONSTANT constexpr
    #include "core/include/physics/vec_types.hpp"  // typedefs vec3, vec4
#endif

4.2 The shared math header

// shared_shader/geodesic_math.h
#include "shader_compat.h"

struct State {
    float t, r, theta, phi;
    float ut, ur, utheta, uphi;
};

DEVICE INLINE State geodesic_rhs_schwarzschild(State s, float rs) {
    float f = 1.0f - rs / s.r;
    float r2 = s.r * s.r;
    float sin_t = sin(s.theta);
    float cos_t = cos(s.theta);

    State d;
    d.t = s.ut;
    d.r = s.ur;
    d.theta = s.utheta;
    d.phi = s.uphi;

    // Acceleration terms — Christoffel symbols per PHYSICS.md §3
    d.ut = -(rs / (r2 * f)) * s.ut * s.ur;
    d.ur = -(rs * f / (2.0f * r2)) * s.ut * s.ut
           + (rs / (2.0f * r2 * f)) * s.ur * s.ur
           + s.r * f * (s.utheta * s.utheta + sin_t * sin_t * s.uphi * s.uphi);
    d.utheta = -2.0f * s.ur * s.utheta / s.r
               + sin_t * cos_t * s.uphi * s.uphi;
    d.uphi = -2.0f * s.ur * s.uphi / s.r
             - 2.0f * (cos_t / sin_t) * s.utheta * s.uphi;
    return d;
}

DEVICE INLINE State rk4_step(State y, float h, float rs) {
    State k1 = geodesic_rhs_schwarzschild(y, rs);
    State k2 = geodesic_rhs_schwarzschild(state_add(y, state_scale(k1, 0.5f * h)), rs);
    State k3 = geodesic_rhs_schwarzschild(state_add(y, state_scale(k2, 0.5f * h)), rs);
    State k4 = geodesic_rhs_schwarzschild(state_add(y, state_scale(k3, h)), rs);
    State sum = state_add(state_add(k1, state_scale(k2, 2.0f)),
                          state_add(state_scale(k3, 2.0f), k4));
    return state_add(y, state_scale(sum, h / 6.0f));
}

This one file is included from:

A bug fixed once is fixed everywhere.

5. The Metal backend

5.1 Pipeline state objects

One MTLDevice, one MTLCommandQueue, two pipeline states:

Both PSOs are created at initialize() and reused every frame.

5.2 Per-frame command buffer

auto cmdBuf = commandQueue->commandBuffer();

auto computeEnc = cmdBuf->computeCommandEncoder();
computeEnc->setComputePipelineState(geodesicPipeline);
computeEnc->setTexture(skyboxTexture, 0);
computeEnc->setTexture(outputTexture, 1);
computeEnc->setBuffer(uniformsBuffer[currentFrame % 3], 0, 0);  // triple-buffered
computeEnc->dispatchThreads(MTL::Size(width, height, 1),
                             MTL::Size(32, 32, 1));            // M-series sweet spot
computeEnc->endEncoding();

auto rpd = view->currentRenderPassDescriptor();
auto renderEnc = cmdBuf->renderCommandEncoder(rpd);
renderEnc->setRenderPipelineState(blitPipeline);
renderEnc->setFragmentTexture(outputTexture, 0);
renderEnc->drawPrimitives(MTL::PrimitiveTypeTriangle, 0, 3);
ImGuiOverlay::render(renderEnc);   // ImGui draws into the same render pass
renderEnc->endEncoding();

cmdBuf->presentDrawable(view->currentDrawable());
cmdBuf->commit();

Triple-buffered uniforms prevent CPU/GPU contention.

5.3 Window interop via SDL3

SDL_Window* window = SDL_CreateWindow("Singularity", 1280, 720,
                                      SDL_WINDOW_METAL | SDL_WINDOW_RESIZABLE);
SDL_MetalView metal_view = SDL_Metal_CreateView(window);
CAMetalLayer* layer = (CAMetalLayer*)SDL_Metal_GetLayer(metal_view);
metalBackend->initialize({window, layer}, config);

6. The Vulkan backend

6.1 What’s different

Vulkan is more verbose than Metal. The same Metal setup expands to:

Roughly 2-3× the code of the Metal backend, but it follows a well-known recipe (Sascha Willems’s Vulkan samples are the canonical reference).

6.2 HLSL → SPIR-V → Vulkan

Shaders authored in HLSL. Compiled at build time:

dxc -T cs_6_0 -E main -spirv \
    -fspv-target-env=vulkan1.3 \
    -I shared_shader/ \
    geodesic_kernel.hlsl -Fo geodesic_kernel.spv

shared_shader/shader_compat.h provides the DEVICE/INLINE/vec3 aliases that let the same geodesic_math.h compile in both MSL and HLSL.

6.3 Per-frame command buffer

vk::CommandBuffer cmd = currentFrame.commandBuffer;
cmd.begin({});

// Compute pass
cmd.bindPipeline(vk::PipelineBindPoint::eCompute, geodesicPipeline);
cmd.bindDescriptorSets(vk::PipelineBindPoint::eCompute, geodesicLayout,
                       0, currentFrame.descriptorSet, {});
cmd.dispatch((width + 31) / 32, (height + 31) / 32, 1);

// Barrier: compute output → fragment shader read
vk::ImageMemoryBarrier2 barrier{ ... };
cmd.pipelineBarrier2({ ..., barrier });

// Render pass for blit + ImGui
cmd.beginRenderingKHR({...});
cmd.bindPipeline(vk::PipelineBindPoint::eGraphics, blitPipeline);
cmd.draw(3, 1, 0, 0);
ImGuiOverlay::render(cmd);
cmd.endRenderingKHR();

cmd.end();
graphicsQueue.submit2(...);
swapchain.present(...);

Vulkan-Hpp gives RAII and exception-safe handles, removing most of the manual vkDestroy* bookkeeping of raw Vulkan.

7. The CUDA backend (Phase 8 stretch)

7.1 Why CUDA gets its own backend

CUDA is offline only in our use — high-resolution stills and video for the docs site, leveraging the 3090’s compute headroom. It’s not used for the interactive desktop apps because:

So singularity_cuda_cli is a separate binary that takes a JSON scene config and outputs PNG frames or an FFmpeg-encoded MP4.

7.2 CUDA kernel structure

__global__ void geodesic_kernel(
    cudaTextureObject_t skybox,
    cudaTextureObject_t disc_lut,
    Uniforms u,
    uchar4* output,
    int width, int height)
{
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;
    if (x >= width || y >= height) return;

    float4 accum = make_float4(0, 0, 0, 0);
    for (int s = 0; s < u.samples_per_pixel; ++s) {
        float2 jitter = halton_2d(s);
        Ray r = generate_ray(u.camera, x, y, width, height, jitter);
        accum += integrate_ray(r, u, skybox, disc_lut);
    }
    accum /= float(u.samples_per_pixel);
    output[y * width + x] = tonemap_to_srgb(accum);
}

Same geodesic_math.h shared with Metal and Vulkan. The kernel boilerplate adds antialiasing (Halton sequence supersampling) and removes the real-time constraint.

8. Camera & ray generation

8.1 Pinhole model (CPU side, in core/)

struct Ray {
    float3 origin;
    float3 direction;
};

Ray generate_ray(const CameraState& cam, uint32_t x, uint32_t y,
                 uint32_t w, uint32_t h, float2 jitter = {0.5f, 0.5f})
{
    float aspect = float(w) / float(h);
    float fov_scale = tan(cam.fov_y_radians * 0.5f);

    float2 ndc = (float2{float(x) + jitter.x, float(y) + jitter.y}
                  / float2{float(w), float(h)}) * 2.0f - 1.0f;
    ndc.x *= aspect * fov_scale;
    ndc.y *= fov_scale;

    float3 dir_cam = normalize(float3{ndc.x, ndc.y, -1.0f});
    float3 dir_world = mul(cam.basis, dir_cam);
    return Ray{cam.position, dir_world};
}

This same function compiles in MSL, HLSL, CUDA, and host C++.

8.2 Conversion to spacetime initial state

Cartesian (position, direction) → Boyer-Lindquist (t, r, θ, φ) with t = 0. Four-velocity components set so that g_μν u^μ u^ν = 0 (solve for u^t).

Reference: JMO §3.1.

8.3 Orbital camera controls

CPU-side state: azimuth, elevation, distance. SDL3 mouse drag updates azimuth/elevation; scroll updates distance. Camera basis reconstructed each frame from these three scalars and the BH center.

9. Build system — CMake

9.1 Top-level structure

cmake_minimum_required(VERSION 3.27)
project(singularity LANGUAGES CXX)

option(SINGULARITY_BACKEND_METAL "Build Metal backend" ${APPLE})
option(SINGULARITY_BACKEND_VULKAN "Build Vulkan backend" ${WIN32})
option(SINGULARITY_BACKEND_CUDA "Build CUDA backend" OFF)
option(SINGULARITY_BUILD_WEB "Build WASM target via Emscripten" OFF)

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

add_subdirectory(third_party/SDL)
add_subdirectory(third_party/imgui)

add_library(singularity_core STATIC
    core/src/camera.cpp
    core/src/settings.cpp
    core/src/scene.cpp
)
target_include_directories(singularity_core PUBLIC core/include shared_shader)

if(SINGULARITY_BACKEND_METAL)
    enable_language(OBJCXX)
    add_subdirectory(render/metal)
endif()

if(SINGULARITY_BACKEND_VULKAN)
    find_package(Vulkan REQUIRED)
    add_subdirectory(render/vulkan)
endif()

if(SINGULARITY_BACKEND_CUDA)
    enable_language(CUDA)
    add_subdirectory(render/cuda)
endif()

add_executable(singularity app/main.cpp app/app_shell.cpp app/settings_ui.cpp)
target_link_libraries(singularity PRIVATE singularity_core SDL3::SDL3 imgui)

if(SINGULARITY_BACKEND_METAL)
    target_link_libraries(singularity PRIVATE singularity_render_metal)
endif()
if(SINGULARITY_BACKEND_VULKAN)
    target_link_libraries(singularity PRIVATE singularity_render_vulkan)
endif()

9.2 Shader compilation

Per-platform custom commands. For Metal:

# render/metal/CMakeLists.txt
set(METAL_SHADERS
    shaders/geodesic_kernel.metal
    shaders/disc_intersection.metal
    shaders/blit.metal
)

foreach(SHADER ${METAL_SHADERS})
    get_filename_component(NAME ${SHADER} NAME_WE)
    add_custom_command(
        OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${NAME}.air
        COMMAND xcrun -sdk macosx metal -c ${SHADER}
                -o ${CMAKE_CURRENT_BINARY_DIR}/${NAME}.air
                -I ${CMAKE_SOURCE_DIR}/shared_shader
        DEPENDS ${SHADER}
    )
    list(APPEND AIR_FILES ${CMAKE_CURRENT_BINARY_DIR}/${NAME}.air)
endforeach()

add_custom_command(
    OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/default.metallib
    COMMAND xcrun -sdk macosx metallib ${AIR_FILES}
            -o ${CMAKE_CURRENT_BINARY_DIR}/default.metallib
    DEPENDS ${AIR_FILES}
)

For Vulkan:

# render/vulkan/CMakeLists.txt
foreach(SHADER ${VULKAN_SHADERS})
    get_filename_component(NAME ${SHADER} NAME_WE)
    add_custom_command(
        OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${NAME}.spv
        COMMAND ${DXC_EXECUTABLE} -T cs_6_0 -E main -spirv
                -fspv-target-env=vulkan1.3
                -I ${CMAKE_SOURCE_DIR}/shared_shader
                ${SHADER} -Fo ${CMAKE_CURRENT_BINARY_DIR}/${NAME}.spv
        DEPENDS ${SHADER}
    )
endforeach()

9.3 Compiler flags

-std=c++20
-Wall -Wextra -Wpedantic
-Wno-c++98-compat
-O3 -ffast-math       # release
-O0 -g -fsanitize=address,undefined  # debug

-ffast-math is a deliberate choice: we are not doing scientific-grade physics with NaN propagation. The verification harness catches any algorithm-level numerical issue; -ffast-math lets the compiler vectorize aggressively.

On Windows / MSVC, the equivalents are /std:c++20 /W4 /O2 /fp:fast.

10. CI/CD — GitHub Actions

10.1 Matrix CI

# .github/workflows/ci.yml
on: [push, pull_request]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - clang-format --dry-run --Werror
      - clang-tidy on all .cpp/.hpp
      - black + ruff on verification/

  build-test:
    strategy:
      matrix:
        os: [macos-14, windows-2022]
        include:
          - os: macos-14
            backend: metal
          - os: windows-2022
            backend: vulkan
    runs-on: $
    steps:
      - uses: actions/checkout@v4
      - name: Configure
        run: cmake -B build -DSINGULARITY_BACKEND_$=ON
      - name: Build
        run: cmake --build build --config Release
      - name: Catch2 tests
        run: ctest --test-dir build --output-on-failure
      - name: Python verification
        run: |
          pip install -r verification/requirements.txt
          pytest verification/

  build-app:
    needs: [lint, build-test]
    strategy:
      matrix:
        os: [macos-14, windows-2022]
    runs-on: $
    steps:
      - uses: actions/checkout@v4
      - name: Build .app / .exe
        run: cmake -B build && cmake --build build --config Release
      - uses: actions/upload-artifact@v4
        with:
          name: singularity-$
          path: build/Release/

10.2 Release pipeline

release.yml fires on tag push (v*):

10.3 Docs deploy

docs.yml on changes to web/ or docs/:

11. Distribution

11.1 macOS — .dmg

Signed with Developer ID Application certificate, hardened runtime enabled, notarized through Apple’s notary service, stapled. Workflow documented in docs/RELEASE.md.

Entitlements:

<key>com.apple.security.cs.disable-library-validation</key><true/>
<key>com.apple.security.cs.allow-jit</key><false/>
<key>com.apple.security.network.client</key><true/>

11.2 Windows — .msi

Built with WiX Toolset. Without EV cert, the installer triggers a Windows SmartScreen warning (“Microsoft Defender SmartScreen prevented an unrecognized app…”) — users must click “More info” → “Run anyway.” Documented in download instructions.

EV cert is $400/yr from Sectigo or DigiCert. Skipped at v1.0 unless user feedback demands it.

11.3 Update mechanism

v1.0: none. Manual download from GitHub releases page. Sparkle (macOS) and WinSparkle (Windows) considered for v2.0.

12. Testing strategy

Three test levels, each catching a different class of bug:

Level Tool Catches Runs in CI
Unit Catch2 Logic bugs in pure-C++ helpers (camera math, settings parsing, integrator state arithmetic) ✓ both platforms
Property / verification Python + SymPy + SciPy Physics bugs (wrong Christoffel sign, drifting E, wrong photon sphere radius) ✓ both platforms
Visual regression Golden images via perceptual hash Wrong tone mapping, color space, accretion disc orientation ✓ both platforms (per-backend goldens)
Backend equivalence Cross-backend perceptual hash Drift between Metal and Vulkan implementations ✓ when both backends compiled

12.1 Why a Python harness

The C++ kernel is fast but opaque. Re-deriving the Christoffel symbols in SymPy and asserting they match the hand-coded versions catches algebra errors that no amount of unit testing in C++ would find — because the C++ is the suspect. The harness uses SciPy’s solve_ivp (adaptive Dormand-Prince) as an independent reference integrator; if the C++ RK4 and SciPy DOPRI5 disagree by more than tolerance, one of them is wrong.

12.2 Backend-equivalence test

Same scene, render on Metal, render on Vulkan, compare perceptual hashes. Tolerance set to allow benign GPU floating-point variation (~4 hash bits) but catch real drift (>10).

This test is the load-bearing piece of the cross-platform claim: it asserts the abstraction worked.

12.3 Golden images

Ten 256×256 PNGs of canonical scenes per backend. Stored in verification/golden/{metal,vulkan}/. Regenerated only when physics changes deliberately:

  1. Run python verification/regenerate_golden.py --backend metal --backend vulkan
  2. Manually inspect every regenerated image
  3. Commit with a message starting [golden] to make the change auditable

13. Performance budget

13.1 Mac (M2 base) at 1280×720, 60 FPS = 16.6ms total

Pass Budget Notes
Compute (geodesic_kernel) 12ms The whole game
Render (blit + tone map) 1ms Trivial
ImGui overlay 0.5ms Negligible
Drawable present 0.5ms OS overhead

Default Mac resolution is 1280×720 at 30 FPS (33ms budget) — gives 26ms for compute, generous. 60 FPS achieved on M3 Pro+.

13.2 Windows (RTX 3070) at 1920×1080, 60 FPS = 16.6ms

NVIDIA’s compute throughput gives us roughly 5× the Mac’s headroom at the same resolution. Default Win resolution is 1920×1080 at 60 FPS, with 4K offered as an option.

13.3 CUDA offline — no real-time constraint

256 samples per pixel × 8K = ~67M rays per frame × ~5000 RK4 steps each. On a 3090, this is roughly 20-60 seconds per frame for Schwarzschild, 60-180 seconds per frame for Kerr. Acceptable for offline.

14. Open technical decisions

These need answers before the relevant phase begins:

  1. Tone-mapping curve. ACES Filmic vs Reinhard. Phase 7 decision; default to Reinhard until then.
  2. Skybox texture format. Equirectangular (one PNG, easier authoring) vs cubemap (six PNGs, faster sampling). Phase 2 decision.
  3. Half-precision integration? half storage in MSL/HLSL would halve register pressure but risks precision loss near the horizon. Test in Phase 6.
  4. WASM scope. Phase 1’s 2D model is the obvious target. The 3D Schwarzschild kernel could also run in the browser via WebGPU compute — decide at Phase 7.
  5. VMA on Vulkan or roll our own allocator? Use VMA. The “I wrote my own GPU allocator” bullet is not worth the time cost.

15. References