Architecture & Methods — Singularity
How the simulator is built. Code organization, the backend abstraction layer, the Metal and Vulkan compute pipelines, the CUDA stretch backend, shader sharing across MSL/HLSL/CUDA, the build system, CI/CD across two platforms, distribution, and testing strategy. The physics of what each kernel computes lives in
PHYSICS.md; this doc covers the engineering of how it computes it across three GPU APIs and one C++ codebase.
1. High-level architecture
┌──────────────────────────────────────┐
│ singularity (cross-platform) │
│ │
│ ┌────────────────────────────────┐ │
│ │ App shell (SDL3 + Dear ImGui) │ │
│ │ - window, input, settings UI │ │
│ └────────────────┬───────────────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ │ Renderer (selects backend) │ │
│ │ ┌────────────────────────────┐ │ │
│ │ │ abstract RenderBackend │ │ │
│ │ └────────────────────────────┘ │ │
│ │ │ │ │ │
│ │ ┌─────┼───┐ ┌┼──────────┐ │ │
│ │ │ Metal │ │ Vulkan │ │ │
│ │ │ Backend │ │ Backend │ │ │
│ │ └─────────┘ └───────────┘ │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ │ shared C++ physics core │ │
│ │ - State, Camera, Settings │ │
│ │ - geodesic_rhs() (in headers) │ │
│ │ - Christoffel symbols │ │
│ └─────────────────────────────────┘ │
└──────────────────────────────────────┘
┌─────────────────┐ ┌──────────────────────┐
│ singularity_cli │ │ singularity_cuda_cli │
│ (headless, │ │ (NVIDIA-only, │
│ drives backend │ │ offline 4K/8K │
│ for Python │ │ renderer using │
│ verification) │ │ CUDA backend) │
└─────────────────┘ └──────────────────────┘
▲ ▲
│ │
┌───────┴──────────────────────────────────────────┴───────┐
│ verification/ (Python) │
│ - SymPy: symbolic Christoffel derivation │
│ - SciPy: independent reference integrator │
│ - pytest: drives all tests, including │
│ golden-image diffs and backend-equivalence │
└──────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────┐
│ web/ (Next.js + TypeScript + WebGPU) │
│ - imports physics core compiled to WASM via Emscripten │
│ - WGSL kernels dispatched by TS host │
└──────────────────────────────────────────────────────────┘
Key property of this layout: the physics math lives in one place — shared C++ headers under core/include/physics/. Every backend (Metal, Vulkan, CUDA, WebGPU) calls those headers from its own kernel boilerplate. A physics bug fixed in schwarzschild.hpp is fixed everywhere simultaneously. The backend-equivalence test (verification/test_backend_equivalence.py) catches any drift if it appears.
2. Repository layout
singularity/
├── README.md
├── docs/
│ ├── PRD.md
│ ├── PHYSICS.md
│ ├── ARCHITECTURE.md
│ └── TODO.md
├── core/ # platform-agnostic C++ core
│ ├── include/
│ │ ├── physics/ # SHARED across all backends
│ │ │ ├── state.hpp # State vector
│ │ │ ├── schwarzschild.hpp
│ │ │ ├── kerr.hpp
│ │ │ └── integrator.hpp
│ │ ├── camera.hpp
│ │ ├── settings.hpp
│ │ └── scene.hpp
│ └── src/
│ └── (cpu-side helpers, settings JSON, etc.)
├── render/
│ ├── include/
│ │ └── render_backend.hpp # The abstraction
│ ├── metal/ # Metal implementation
│ │ ├── metal_backend.hpp
│ │ ├── metal_backend.mm # Objective-C++ for ARC interop
│ │ └── shaders/
│ │ ├── geodesic_kernel.metal
│ │ ├── disc_intersection.metal
│ │ └── blit.metal
│ ├── vulkan/ # Vulkan implementation
│ │ ├── vulkan_backend.hpp
│ │ ├── vulkan_backend.cpp
│ │ └── shaders/
│ │ ├── geodesic_kernel.hlsl # → DXC → SPIR-V
│ │ ├── disc_intersection.hlsl
│ │ └── blit.hlsl
│ └── cuda/ # CUDA stretch implementation
│ ├── cuda_backend.cu
│ └── kernels/
│ └── geodesic_kernel.cu
├── shared_shader/ # SHARED math used by every backend's shaders
│ ├── shader_compat.h # platform macros (DEVICE, INLINE, etc.)
│ ├── geodesic_math.h # the actual physics, callable from MSL/HLSL/CUDA
│ └── color_math.h
├── app/ # cross-platform app shell
│ ├── main.cpp
│ ├── app_shell.cpp # SDL3 window, ImGui setup
│ └── settings_ui.cpp
├── cli/ # headless binary for verification
│ └── main.cpp
├── cuda_cli/ # offline renderer (stretch)
│ └── main.cpp
├── tests/ # C++ unit tests (Catch2)
│ ├── test_camera.cpp
│ ├── test_integrator.cpp
│ ├── test_schwarzschild.cpp
│ └── test_kerr.cpp
├── verification/ # Python physics verification
│ ├── christoffel_sympy.py
│ ├── test_photon_sphere.py
│ ├── test_deflection.py
│ ├── test_isco.py
│ ├── test_redshift.py
│ ├── test_golden_images.py
│ ├── test_backend_equivalence.py
│ ├── golden/ # Reference PNGs (per backend)
│ │ ├── metal/
│ │ └── vulkan/
│ └── conftest.py
├── web/ # Next.js docs site + WebGPU demo
│ ├── app/
│ ├── content/ # MDX, mirrors PHYSICS/ARCHITECTURE
│ ├── components/Demo/
│ └── public/wasm/ # Built-from-core WASM artifacts
├── third_party/
│ ├── metal-cpp/ # Apple
│ ├── Vulkan-Hpp/ # Khronos
│ ├── SDL/ # SDL3
│ ├── imgui/ # Dear ImGui
│ ├── catch2/
│ └── stb/
├── CMakeLists.txt # cross-platform, conditional backend selection
├── BUILDING.md # platform-specific build instructions
├── .clang-format
├── .clang-tidy
└── .github/workflows/
├── ci.yml # matrix: macos-14 + windows-2022
├── release.yml # tag → signed .dmg + .msi
└── docs.yml # web/ → Vercel
3. The backend abstraction
The interface is small on purpose. Over-abstracting is the failure mode.
// render/include/render_backend.hpp
#pragma once
#include <cstdint>
#include <memory>
#include <vector>
namespace singularity {
struct WindowHandle {
void* native_window; // NSWindow* on Mac, HWND on Windows
void* native_view; // CAMetalLayer*, NSView* on Mac; ignored on Win
};
struct RenderConfig {
uint32_t width;
uint32_t height;
bool vsync_enabled;
};
struct Scene {
enum class MetricType { Schwarzschild, Kerr };
MetricType metric;
float mass_solar;
float spin_a_over_M; // 0 for Schwarzschild
float disc_inner_M;
float disc_outer_M;
bool disc_doppler_on;
bool disc_redshift_on;
bool disc_texture_on;
bool show_overlay;
};
struct CameraState {
float position[3];
float basis[9]; // 3x3 orientation matrix, row-major
float fov_y_radians;
};
struct ImageData {
std::vector<uint8_t> pixels_rgba;
uint32_t width;
uint32_t height;
};
class RenderBackend {
public:
virtual ~RenderBackend() = default;
// Lifecycle
virtual bool initialize(WindowHandle window, RenderConfig config) = 0;
virtual void shutdown() = 0;
virtual void resize(uint32_t width, uint32_t height) = 0;
// Per-frame
virtual void render_frame(const Scene& scene, const CameraState& camera) = 0;
// Optional: for verification + screenshot export
virtual ImageData capture_frame() = 0;
// Backend identification
virtual const char* name() const = 0; // "Metal", "Vulkan", "CUDA"
};
// Factory selects based on platform + build config
std::unique_ptr<RenderBackend> create_default_backend();
} // namespace singularity
What this interface deliberately omits:
- No
create_texture/create_buffer/create_pipeline— those live inside each backend’s implementation. Exposing them would force the abstraction to know about API-specific concepts (descriptor sets, MTLBuffer types, etc.) and the abstraction would leak. - No “draw a triangle” or “bind shader” — too low-level. The abstraction is at the level of “render a black hole frame,” not “submit a draw call.”
- No Vulkan-style explicit synchronization — each backend handles its own internal command buffer/queue lifecycle.
What’s inside each backend:
- Each backend creates its own device, command queue/list, pipeline state objects, ImGui integration, swapchain.
- The “scene” is small enough (<100 bytes) to upload as a uniform every frame.
- Texture loading (skybox, accretion-disc LUT) happens at
initialize()time using a backend-specific path — but the file loaded is the same PNG.
4. Sharing physics math across MSL, HLSL, and CUDA
4.1 The compatibility header
// shared_shader/shader_compat.h
//
// Defines DEVICE, INLINE, and other macros so the same math headers compile
// in MSL, HLSL, CUDA, and even host C++.
#if defined(__METAL_VERSION__)
// Metal Shading Language (a C++14 dialect)
#define DEVICE
#define INLINE inline
#define CONSTANT constant
#include <metal_stdlib>
using namespace metal;
typedef float3 vec3;
typedef float4 vec4;
#elif defined(__HLSL_VERSION) || defined(_HLSL)
// HLSL (used for Vulkan via DXC → SPIR-V)
#define DEVICE
#define INLINE inline
#define CONSTANT static const
typedef float3 vec3;
typedef float4 vec4;
#elif defined(__CUDACC__)
// CUDA C++
#define DEVICE __device__
#define INLINE __forceinline__
#define CONSTANT __constant__
#include <cuda_runtime.h>
typedef float3 vec3;
typedef float4 vec4;
#else
// Host C++ (so we can unit-test the math on CPU)
#define DEVICE
#define INLINE inline
#define CONSTANT constexpr
#include "core/include/physics/vec_types.hpp" // typedefs vec3, vec4
#endif
4.2 The shared math header
// shared_shader/geodesic_math.h
#include "shader_compat.h"
struct State {
float t, r, theta, phi;
float ut, ur, utheta, uphi;
};
DEVICE INLINE State geodesic_rhs_schwarzschild(State s, float rs) {
float f = 1.0f - rs / s.r;
float r2 = s.r * s.r;
float sin_t = sin(s.theta);
float cos_t = cos(s.theta);
State d;
d.t = s.ut;
d.r = s.ur;
d.theta = s.utheta;
d.phi = s.uphi;
// Acceleration terms — Christoffel symbols per PHYSICS.md §3
d.ut = -(rs / (r2 * f)) * s.ut * s.ur;
d.ur = -(rs * f / (2.0f * r2)) * s.ut * s.ut
+ (rs / (2.0f * r2 * f)) * s.ur * s.ur
+ s.r * f * (s.utheta * s.utheta + sin_t * sin_t * s.uphi * s.uphi);
d.utheta = -2.0f * s.ur * s.utheta / s.r
+ sin_t * cos_t * s.uphi * s.uphi;
d.uphi = -2.0f * s.ur * s.uphi / s.r
- 2.0f * (cos_t / sin_t) * s.utheta * s.uphi;
return d;
}
DEVICE INLINE State rk4_step(State y, float h, float rs) {
State k1 = geodesic_rhs_schwarzschild(y, rs);
State k2 = geodesic_rhs_schwarzschild(state_add(y, state_scale(k1, 0.5f * h)), rs);
State k3 = geodesic_rhs_schwarzschild(state_add(y, state_scale(k2, 0.5f * h)), rs);
State k4 = geodesic_rhs_schwarzschild(state_add(y, state_scale(k3, h)), rs);
State sum = state_add(state_add(k1, state_scale(k2, 2.0f)),
state_add(state_scale(k3, 2.0f), k4));
return state_add(y, state_scale(sum, h / 6.0f));
}
This one file is included from:
render/metal/shaders/geodesic_kernel.metalrender/vulkan/shaders/geodesic_kernel.hlslrender/cuda/kernels/geodesic_kernel.cutests/test_schwarzschild.cpp(host CPU build for unit-testing the math directly)
A bug fixed once is fixed everywhere.
5. The Metal backend
5.1 Pipeline state objects
One MTLDevice, one MTLCommandQueue, two pipeline states:
geodesicPipeline—MTLComputePipelineStateforgeodesic_kernel. Binds: skybox texture (read), accretion-disc LUT, scene-uniforms buffer, output color texture (write).blitPipeline—MTLRenderPipelineStateforblit_vertex+blit_fragment. Tone-maps the compute output to the drawable. Single fullscreen triangle.
Both PSOs are created at initialize() and reused every frame.
5.2 Per-frame command buffer
auto cmdBuf = commandQueue->commandBuffer();
auto computeEnc = cmdBuf->computeCommandEncoder();
computeEnc->setComputePipelineState(geodesicPipeline);
computeEnc->setTexture(skyboxTexture, 0);
computeEnc->setTexture(outputTexture, 1);
computeEnc->setBuffer(uniformsBuffer[currentFrame % 3], 0, 0); // triple-buffered
computeEnc->dispatchThreads(MTL::Size(width, height, 1),
MTL::Size(32, 32, 1)); // M-series sweet spot
computeEnc->endEncoding();
auto rpd = view->currentRenderPassDescriptor();
auto renderEnc = cmdBuf->renderCommandEncoder(rpd);
renderEnc->setRenderPipelineState(blitPipeline);
renderEnc->setFragmentTexture(outputTexture, 0);
renderEnc->drawPrimitives(MTL::PrimitiveTypeTriangle, 0, 3);
ImGuiOverlay::render(renderEnc); // ImGui draws into the same render pass
renderEnc->endEncoding();
cmdBuf->presentDrawable(view->currentDrawable());
cmdBuf->commit();
Triple-buffered uniforms prevent CPU/GPU contention.
5.3 Window interop via SDL3
SDL_Window* window = SDL_CreateWindow("Singularity", 1280, 720,
SDL_WINDOW_METAL | SDL_WINDOW_RESIZABLE);
SDL_MetalView metal_view = SDL_Metal_CreateView(window);
CAMetalLayer* layer = (CAMetalLayer*)SDL_Metal_GetLayer(metal_view);
metalBackend->initialize({window, layer}, config);
6. The Vulkan backend
6.1 What’s different
Vulkan is more verbose than Metal. The same Metal setup expands to:
- Instance + physical device selection + queue family selection
- Logical device + compute queue + graphics queue (often the same)
- Swapchain creation with explicit
VkSurfaceKHRfrom SDL3 - Descriptor set layouts + descriptor pools + descriptor sets
- Pipeline layout + compute pipeline + graphics pipeline
- Per-frame: command pool + command buffer + fences + semaphores
- Explicit memory allocation (use VMA — Vulkan Memory Allocator — to make this sane)
Roughly 2-3× the code of the Metal backend, but it follows a well-known recipe (Sascha Willems’s Vulkan samples are the canonical reference).
6.2 HLSL → SPIR-V → Vulkan
Shaders authored in HLSL. Compiled at build time:
dxc -T cs_6_0 -E main -spirv \
-fspv-target-env=vulkan1.3 \
-I shared_shader/ \
geodesic_kernel.hlsl -Fo geodesic_kernel.spv
shared_shader/shader_compat.h provides the DEVICE/INLINE/vec3 aliases that let the same geodesic_math.h compile in both MSL and HLSL.
6.3 Per-frame command buffer
vk::CommandBuffer cmd = currentFrame.commandBuffer;
cmd.begin({});
// Compute pass
cmd.bindPipeline(vk::PipelineBindPoint::eCompute, geodesicPipeline);
cmd.bindDescriptorSets(vk::PipelineBindPoint::eCompute, geodesicLayout,
0, currentFrame.descriptorSet, {});
cmd.dispatch((width + 31) / 32, (height + 31) / 32, 1);
// Barrier: compute output → fragment shader read
vk::ImageMemoryBarrier2 barrier{ ... };
cmd.pipelineBarrier2({ ..., barrier });
// Render pass for blit + ImGui
cmd.beginRenderingKHR({...});
cmd.bindPipeline(vk::PipelineBindPoint::eGraphics, blitPipeline);
cmd.draw(3, 1, 0, 0);
ImGuiOverlay::render(cmd);
cmd.endRenderingKHR();
cmd.end();
graphicsQueue.submit2(...);
swapchain.present(...);
Vulkan-Hpp gives RAII and exception-safe handles, removing most of the manual vkDestroy* bookkeeping of raw Vulkan.
7. The CUDA backend (Phase 8 stretch)
7.1 Why CUDA gets its own backend
CUDA is offline only in our use — high-resolution stills and video for the docs site, leveraging the 3090’s compute headroom. It’s not used for the interactive desktop apps because:
- CUDA-graphics interop is messy (CUDA ↔ DirectX or CUDA ↔ Vulkan), more complexity than the project needs.
- The whole value of a CUDA backend is unbounded compute time per frame — at 256 samples per pixel and 8K resolution, even a 3090 takes seconds per frame.
So singularity_cuda_cli is a separate binary that takes a JSON scene config and outputs PNG frames or an FFmpeg-encoded MP4.
7.2 CUDA kernel structure
__global__ void geodesic_kernel(
cudaTextureObject_t skybox,
cudaTextureObject_t disc_lut,
Uniforms u,
uchar4* output,
int width, int height)
{
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= width || y >= height) return;
float4 accum = make_float4(0, 0, 0, 0);
for (int s = 0; s < u.samples_per_pixel; ++s) {
float2 jitter = halton_2d(s);
Ray r = generate_ray(u.camera, x, y, width, height, jitter);
accum += integrate_ray(r, u, skybox, disc_lut);
}
accum /= float(u.samples_per_pixel);
output[y * width + x] = tonemap_to_srgb(accum);
}
Same geodesic_math.h shared with Metal and Vulkan. The kernel boilerplate adds antialiasing (Halton sequence supersampling) and removes the real-time constraint.
8. Camera & ray generation
8.1 Pinhole model (CPU side, in core/)
struct Ray {
float3 origin;
float3 direction;
};
Ray generate_ray(const CameraState& cam, uint32_t x, uint32_t y,
uint32_t w, uint32_t h, float2 jitter = {0.5f, 0.5f})
{
float aspect = float(w) / float(h);
float fov_scale = tan(cam.fov_y_radians * 0.5f);
float2 ndc = (float2{float(x) + jitter.x, float(y) + jitter.y}
/ float2{float(w), float(h)}) * 2.0f - 1.0f;
ndc.x *= aspect * fov_scale;
ndc.y *= fov_scale;
float3 dir_cam = normalize(float3{ndc.x, ndc.y, -1.0f});
float3 dir_world = mul(cam.basis, dir_cam);
return Ray{cam.position, dir_world};
}
This same function compiles in MSL, HLSL, CUDA, and host C++.
8.2 Conversion to spacetime initial state
Cartesian (position, direction) → Boyer-Lindquist (t, r, θ, φ) with t = 0. Four-velocity components set so that g_μν u^μ u^ν = 0 (solve for u^t).
Reference: JMO §3.1.
8.3 Orbital camera controls
CPU-side state: azimuth, elevation, distance. SDL3 mouse drag updates azimuth/elevation; scroll updates distance. Camera basis reconstructed each frame from these three scalars and the BH center.
9. Build system — CMake
9.1 Top-level structure
cmake_minimum_required(VERSION 3.27)
project(singularity LANGUAGES CXX)
option(SINGULARITY_BACKEND_METAL "Build Metal backend" ${APPLE})
option(SINGULARITY_BACKEND_VULKAN "Build Vulkan backend" ${WIN32})
option(SINGULARITY_BACKEND_CUDA "Build CUDA backend" OFF)
option(SINGULARITY_BUILD_WEB "Build WASM target via Emscripten" OFF)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
add_subdirectory(third_party/SDL)
add_subdirectory(third_party/imgui)
add_library(singularity_core STATIC
core/src/camera.cpp
core/src/settings.cpp
core/src/scene.cpp
)
target_include_directories(singularity_core PUBLIC core/include shared_shader)
if(SINGULARITY_BACKEND_METAL)
enable_language(OBJCXX)
add_subdirectory(render/metal)
endif()
if(SINGULARITY_BACKEND_VULKAN)
find_package(Vulkan REQUIRED)
add_subdirectory(render/vulkan)
endif()
if(SINGULARITY_BACKEND_CUDA)
enable_language(CUDA)
add_subdirectory(render/cuda)
endif()
add_executable(singularity app/main.cpp app/app_shell.cpp app/settings_ui.cpp)
target_link_libraries(singularity PRIVATE singularity_core SDL3::SDL3 imgui)
if(SINGULARITY_BACKEND_METAL)
target_link_libraries(singularity PRIVATE singularity_render_metal)
endif()
if(SINGULARITY_BACKEND_VULKAN)
target_link_libraries(singularity PRIVATE singularity_render_vulkan)
endif()
9.2 Shader compilation
Per-platform custom commands. For Metal:
# render/metal/CMakeLists.txt
set(METAL_SHADERS
shaders/geodesic_kernel.metal
shaders/disc_intersection.metal
shaders/blit.metal
)
foreach(SHADER ${METAL_SHADERS})
get_filename_component(NAME ${SHADER} NAME_WE)
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${NAME}.air
COMMAND xcrun -sdk macosx metal -c ${SHADER}
-o ${CMAKE_CURRENT_BINARY_DIR}/${NAME}.air
-I ${CMAKE_SOURCE_DIR}/shared_shader
DEPENDS ${SHADER}
)
list(APPEND AIR_FILES ${CMAKE_CURRENT_BINARY_DIR}/${NAME}.air)
endforeach()
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/default.metallib
COMMAND xcrun -sdk macosx metallib ${AIR_FILES}
-o ${CMAKE_CURRENT_BINARY_DIR}/default.metallib
DEPENDS ${AIR_FILES}
)
For Vulkan:
# render/vulkan/CMakeLists.txt
foreach(SHADER ${VULKAN_SHADERS})
get_filename_component(NAME ${SHADER} NAME_WE)
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${NAME}.spv
COMMAND ${DXC_EXECUTABLE} -T cs_6_0 -E main -spirv
-fspv-target-env=vulkan1.3
-I ${CMAKE_SOURCE_DIR}/shared_shader
${SHADER} -Fo ${CMAKE_CURRENT_BINARY_DIR}/${NAME}.spv
DEPENDS ${SHADER}
)
endforeach()
9.3 Compiler flags
-std=c++20
-Wall -Wextra -Wpedantic
-Wno-c++98-compat
-O3 -ffast-math # release
-O0 -g -fsanitize=address,undefined # debug
-ffast-math is a deliberate choice: we are not doing scientific-grade physics with NaN propagation. The verification harness catches any algorithm-level numerical issue; -ffast-math lets the compiler vectorize aggressively.
On Windows / MSVC, the equivalents are /std:c++20 /W4 /O2 /fp:fast.
10. CI/CD — GitHub Actions
10.1 Matrix CI
# .github/workflows/ci.yml
on: [push, pull_request]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- clang-format --dry-run --Werror
- clang-tidy on all .cpp/.hpp
- black + ruff on verification/
build-test:
strategy:
matrix:
os: [macos-14, windows-2022]
include:
- os: macos-14
backend: metal
- os: windows-2022
backend: vulkan
runs-on: $
steps:
- uses: actions/checkout@v4
- name: Configure
run: cmake -B build -DSINGULARITY_BACKEND_$=ON
- name: Build
run: cmake --build build --config Release
- name: Catch2 tests
run: ctest --test-dir build --output-on-failure
- name: Python verification
run: |
pip install -r verification/requirements.txt
pytest verification/
build-app:
needs: [lint, build-test]
strategy:
matrix:
os: [macos-14, windows-2022]
runs-on: $
steps:
- uses: actions/checkout@v4
- name: Build .app / .exe
run: cmake -B build && cmake --build build --config Release
- uses: actions/upload-artifact@v4
with:
name: singularity-$
path: build/Release/
10.2 Release pipeline
release.yml fires on tag push (v*):
- macOS job: import Apple Developer cert from secrets,
codesign --options runtime,xcrun notarytool submit --wait,xcrun stapler staple,create-dmgto build.dmg. - Windows job: build with
/Release, package withWiXinto.msi. If EV cert available,signtool sign /tr http://timestamp.digicert.com /td sha256 /fd sha256 /n "Mal" Singularity.msi. If not, ship unsigned and document SmartScreen warning workaround. - Both upload to GitHub Releases.
10.3 Docs deploy
docs.yml on changes to web/ or docs/:
- Build WASM:
emcmake cmake -B build-wasm -DSINGULARITY_BUILD_WEB=ON && cmake --build build-wasm - Copy WASM + JS glue to
web/public/wasm/ npm run buildinweb/- Vercel auto-deploys from
main; preview from PR branches
11. Distribution
11.1 macOS — .dmg
Signed with Developer ID Application certificate, hardened runtime enabled, notarized through Apple’s notary service, stapled. Workflow documented in docs/RELEASE.md.
Entitlements:
<key>com.apple.security.cs.disable-library-validation</key><true/>
<key>com.apple.security.cs.allow-jit</key><false/>
<key>com.apple.security.network.client</key><true/>
11.2 Windows — .msi
Built with WiX Toolset. Without EV cert, the installer triggers a Windows SmartScreen warning (“Microsoft Defender SmartScreen prevented an unrecognized app…”) — users must click “More info” → “Run anyway.” Documented in download instructions.
EV cert is $400/yr from Sectigo or DigiCert. Skipped at v1.0 unless user feedback demands it.
11.3 Update mechanism
v1.0: none. Manual download from GitHub releases page. Sparkle (macOS) and WinSparkle (Windows) considered for v2.0.
12. Testing strategy
Three test levels, each catching a different class of bug:
| Level | Tool | Catches | Runs in CI |
|---|---|---|---|
| Unit | Catch2 | Logic bugs in pure-C++ helpers (camera math, settings parsing, integrator state arithmetic) | ✓ both platforms |
| Property / verification | Python + SymPy + SciPy | Physics bugs (wrong Christoffel sign, drifting E, wrong photon sphere radius) |
✓ both platforms |
| Visual regression | Golden images via perceptual hash | Wrong tone mapping, color space, accretion disc orientation | ✓ both platforms (per-backend goldens) |
| Backend equivalence | Cross-backend perceptual hash | Drift between Metal and Vulkan implementations | ✓ when both backends compiled |
12.1 Why a Python harness
The C++ kernel is fast but opaque. Re-deriving the Christoffel symbols in SymPy and asserting they match the hand-coded versions catches algebra errors that no amount of unit testing in C++ would find — because the C++ is the suspect. The harness uses SciPy’s solve_ivp (adaptive Dormand-Prince) as an independent reference integrator; if the C++ RK4 and SciPy DOPRI5 disagree by more than tolerance, one of them is wrong.
12.2 Backend-equivalence test
Same scene, render on Metal, render on Vulkan, compare perceptual hashes. Tolerance set to allow benign GPU floating-point variation (~4 hash bits) but catch real drift (>10).
This test is the load-bearing piece of the cross-platform claim: it asserts the abstraction worked.
12.3 Golden images
Ten 256×256 PNGs of canonical scenes per backend. Stored in verification/golden/{metal,vulkan}/. Regenerated only when physics changes deliberately:
- Run
python verification/regenerate_golden.py --backend metal --backend vulkan - Manually inspect every regenerated image
- Commit with a message starting
[golden]to make the change auditable
13. Performance budget
13.1 Mac (M2 base) at 1280×720, 60 FPS = 16.6ms total
| Pass | Budget | Notes |
|---|---|---|
| Compute (geodesic_kernel) | 12ms | The whole game |
| Render (blit + tone map) | 1ms | Trivial |
| ImGui overlay | 0.5ms | Negligible |
| Drawable present | 0.5ms | OS overhead |
Default Mac resolution is 1280×720 at 30 FPS (33ms budget) — gives 26ms for compute, generous. 60 FPS achieved on M3 Pro+.
13.2 Windows (RTX 3070) at 1920×1080, 60 FPS = 16.6ms
NVIDIA’s compute throughput gives us roughly 5× the Mac’s headroom at the same resolution. Default Win resolution is 1920×1080 at 60 FPS, with 4K offered as an option.
13.3 CUDA offline — no real-time constraint
256 samples per pixel × 8K = ~67M rays per frame × ~5000 RK4 steps each. On a 3090, this is roughly 20-60 seconds per frame for Schwarzschild, 60-180 seconds per frame for Kerr. Acceptable for offline.
14. Open technical decisions
These need answers before the relevant phase begins:
- Tone-mapping curve. ACES Filmic vs Reinhard. Phase 7 decision; default to Reinhard until then.
- Skybox texture format. Equirectangular (one PNG, easier authoring) vs cubemap (six PNGs, faster sampling). Phase 2 decision.
- Half-precision integration?
halfstorage in MSL/HLSL would halve register pressure but risks precision loss near the horizon. Test in Phase 6. - WASM scope. Phase 1’s 2D model is the obvious target. The 3D Schwarzschild kernel could also run in the browser via WebGPU compute — decide at Phase 7.
- VMA on Vulkan or roll our own allocator? Use VMA. The “I wrote my own GPU allocator” bullet is not worth the time cost.
15. References
- Apple, Metal Shading Language Specification v3.1
- Apple, LearnMetalCPP samples
- Khronos, Vulkan-Hpp documentation
- Sascha Willems, Vulkan-Samples
- NVIDIA, CUDA C++ Programming Guide
- Hairer, Nørsett, Wanner, Solving Ordinary Differential Equations I, Springer (1993)
- James, Tunzelmann, Franklin, Thorne (JMO), Class. Quantum Grav. 32 065001 (2015)
- Kip Thorne, The Science of Interstellar, W.W. Norton (2014)