Skip to main content

Overview

Oboromi emulates the Nintendo Switch 2’s GPU using a translation-based approach. The system decodes NVIDIA SM86 (Ampere architecture) shader instructions and translates them to SPIR-V for execution on Vulkan-compatible GPUs.
SM86 is NVIDIA’s shader architecture for the GA10x Ampere series. The Switch 2 likely uses a custom Tegra GPU based on this architecture.

Architecture Pipeline

GPU State Management

State Structure

The State struct manages GPU resources and Vulkan context:
// core/src/gpu/mod.rs:36
pub struct State {
    pub shared_memory: *mut u8,
    pub global_memory: *mut u8,
    pub pc: u64,
    pub vk: VkState,
}

Vulkan Integration

// core/src/gpu/mod.rs:7
pub struct VkState {
    pub entry: ash::Entry,
    pub instance: ash::Instance,
}

impl VkState {
    pub fn init(&mut self) -> ash::prelude::VkResult<()> {
        self.entry = unsafe { ash::Entry::load().unwrap() };
        self.instance = unsafe {
            self.entry.create_instance(&vk::InstanceCreateInfo {
                p_application_info: &vk::ApplicationInfo {
                    api_version: vk::make_api_version(0, 1, 0, 0),
                    ..Default::default()
                },
                ..Default::default()
            }, None)?
        };
        Ok(())
    }
}

SM86 Instruction Decoder

Decoder Architecture

The SM86 decoder maintains a virtual register file and emits SPIR-V instructions:
// core/src/gpu/sm86.rs:185
pub struct Decoder<'a> {
    pub ir: &'a mut spirv::Emitter,
    type_void: u32,
    type_ptr_u32: u32,
    
    // Type declarations for various bit widths and vector sizes
    type_u8: [u32; 5],
    type_u16: [u32; 5],
    type_u32: [u32; 5],
    type_u64: [u32; 5],
    type_s8: [u32; 5],
    type_s16: [u32; 5],
    type_s32: [u32; 5],
    type_s64: [u32; 5],
    type_f16: [u32; 5],
    type_f32: [u32; 5],
    type_f64: [u32; 5],
    type_bool: [u32; 5],
    
    // Abstract state machine
    regs: [u32; MAX_REG_COUNT],
}

Decoder Initialization

// core/src/gpu/sm86.rs:207
impl<'a> Decoder<'a> {
    pub fn init(&mut self) {
        self.type_void = self.ir.emit_type_void();
        
        // Declare scalar types
        self.type_u8[1] = self.ir.emit_type_int(8, 0);
        self.type_u16[1] = self.ir.emit_type_int(16, 0);
        self.type_u32[1] = self.ir.emit_type_int(32, 0);
        self.type_u64[1] = self.ir.emit_type_int(64, 0);
        self.type_f32[1] = self.ir.emit_type_float(32);
        // ... (additional type setup)

        // Define generic pointers (storage class 7 = Function)
        self.type_ptr_u32 = self.ir.emit_type_pointer(7, self.type_u32[1]);

        // Define registers as function-scope variables
        for r in self.regs.iter_mut() {
            *r = self.ir.emit_variable(self.type_ptr_u32, 7);
        }
    }
}

Register File

SM86 supports 254 general-purpose registers (R0-R253) plus a special RZ register (R255) that always reads as zero and discards writes.
// core/src/gpu/sm86.rs:241
fn load_reg(&mut self, reg: usize) -> u32 {
    if reg == 255 {
        // RZ (Zero Register)
        return self.ir.emit_constant_typed(self.type_u32[1], 0u32);
    }
    assert!(reg < self.regs.len(), "Register index out of bounds");
    let ptr = self.regs[reg];
    self.ir.emit_load(self.type_u32[1], ptr)
}

fn store_reg(&mut self, reg: usize, val: u32) {
    if reg == 255 {
        // Write to RZ is ignored
        return;
    }
    let ptr = self.regs[reg];
    self.ir.emit_store(ptr, val);
}

Instruction Formats

Instruction Encoding

SM86 instructions are 128-bit (16 bytes) encoded values:
// Instructions take a u128 parameter representing the binary encoding
pub fn al2p(&mut self, inst: u128) {
    let pg = (((inst >> 12) & 0x7) << 0);           // Predicate guard
    let rd = (((inst >> 16) & 0xff) << 0) as usize; // Destination register
    let ra = (((inst >> 24) & 0xff) << 0) as usize; // Source register A
    let ra_offset = (((inst >> 40) & 0x7ff) << 0);  // Immediate offset
    // ... decode additional fields
}

Example: AL2P (Add to Pointer)

The AL2P instruction adds an immediate offset to a register:
// core/src/gpu/sm86.rs:265
pub fn al2p(&mut self, inst: u128) {
    let rd = (((inst >> 16) & 0xff) << 0) as usize;
    let ra = (((inst >> 24) & 0xff) << 0) as usize;
    let ra_offset = (((inst >> 40) & 0x7ff) << 0) as usize;
    let bop = (((inst >> 74) & 0x3) << 0) as usize;
    
    assert!(ra <= MAX_REG_COUNT || ra == 255);
    assert!(bop == BitSize::B32 as usize);
    
    let base = self.load_reg(ra);
    let offset = self.ir.emit_constant_typed(self.type_u32[1], ra_offset as u32);
    let dst_val = self.ir.emit_iadd(self.type_u32[1], base, offset);
    self.store_reg(rd, dst_val);
}

Supported Instructions

The decoder defines 254+ SM86 instructions including:

Memory Instructions

  • ALD - Attribute Load
  • AST - Attribute Store
  • ATOM - Atomic Operation
  • ATOMG - Global Atomic
  • ATOMS - Shared Atomic

Arithmetic Instructions

  • FADD, FADD32I - Floating-point addition
  • FMUL, FMUL32I - Floating-point multiplication
  • FFMA, FFMA32I - Fused multiply-add
  • DADD, DMUL, DFMA - Double-precision operations

Control Flow

  • BRA - Branch
  • BRX - Branch indexed
  • CALL - Function call
  • EXIT - Shader exit
  • BREAK - Loop break

Conversion

  • F2F - Float-to-float conversion
  • F2I, F2IP - Float-to-integer
  • I2F, I2FP - Integer-to-float

Texture Operations

  • TEX - Texture fetch
  • TLD, TLD4 - Texture load
  • SUTP - Surface store
Most instruction implementations currently contain todo!() placeholders. The full instruction set is being implemented incrementally.

SPIR-V Translation

SPIR-V Emitter

The spirv::Emitter provides a safe Rust API for building SPIR-V modules:
// core/src/gpu/spirv.rs:172
pub struct Emitter {
    words: Vec<u32>,
    next_id: u32,
    bound_idx: usize,
}

Module Structure

A complete SPIR-V shader follows this structure:
  1. Header - Magic number, version, ID bound
  2. Capabilities - Required SPIR-V features
  3. Extensions - Optional extensions
  4. Memory Model - Addressing and memory semantics
  5. Entry Points - Shader entry functions
  6. Execution Modes - Workgroup size, etc.
  7. Debug Info - Names and source locations
  8. Annotations - Decorations (bindings, locations)
  9. Type Declarations - All types used in shader
  10. Constants - Constant values
  11. Global Variables - Uniforms, inputs, outputs
  12. Function Definitions - Shader code

Example: Building a Function

let mut emitter = spirv::Emitter::new();
emitter.emit_header();
emitter.emit_capability(spirv::capability::SHADER);
emitter.emit_memory_model(0, 1); // Logical, GLSL450

// Define types
let void_ty = emitter.emit_type_void();
let fn_ty = emitter.emit_type_function(void_ty, &[]);

// Create entry point function
let main_fn = emitter.emit_function(void_ty, 0, fn_ty);
let entry_label = emitter.emit_label();

// ... shader logic ...

emitter.emit_return();
emitter.emit_function_end();

emitter.finalize();

Supported SPIR-V Operations

The emitter supports 100+ SPIR-V instructions:
  • Arithmetic: iadd, fadd, imul, fmul, fdiv, etc.
  • Logical: logical_and, logical_or, select
  • Comparison: iequal, ford_less_than, sgreater_than
  • Bitwise: shift_left_logical, bitwise_and, bit_reverse
  • Memory: load, store, access_chain
  • Control: branch, branch_conditional, phi
  • Image: image_sample, image_read, image_write
  • Atomic: atomic_iadd, atomic_exchange, atomic_compare_exchange

Texture and Surface Formats

The GPU module defines extensive format enumerations:

Surface Formats

// core/src/gpu/sm86.rs:56
enum SurfaceFormat {
    RGBA32_FLOAT = 0x00c0,
    RGBA32_SINT = 0x00c1,
    RGBA16_UNORM = 0x00c6,
    RGBA8_UNORM = 0x00d5,
    BGRA8_UNORM = 0x00cf,
    R32_FLOAT = 0x00e5,
    // ... 50+ formats
}

Image Formats (for SUTP)

// core/src/gpu/sm86.rs:135
enum ImageFormat {
    RGBA32_FLOAT = 0x02,
    RGBA16_FLOAT = 0x0c,
    RGBA8_UNORM = 0x18,
    RG32_FLOAT = 0x0d,
    R32_FLOAT = 0x29,
    // ... specialized formats
}

Shader Constants

Hardware limits defined as constants:
// core/src/gpu/sm86.rs:7
static MAX_REG_COUNT: usize = 254;
static MAX_UNIFORM_REG_COUNT: usize = 63;
static MAX_CONST_BANK: usize = 17;
static ALLOW_F16_PARTIAL_WRITES: usize = 1;

Vulkan Backend

The Vulkan backend (via ash crate) handles:
  • Instance Creation: Vulkan 1.0+ initialization
  • Device Selection: Picking suitable GPU
  • Pipeline Creation: Compiling SPIR-V shaders
  • Command Submission: Recording and executing GPU work

Future Pipeline

SM86 Shader → Decoder → SPIR-V IR → Vulkan Pipeline → GPU Execution

                          Shader Specialization
                          (constant folding, etc.)

Performance Optimizations

Planned Optimizations

  1. Shader Caching: Cache translated SPIR-V to avoid re-translation
  2. Specialization Constants: Use SPIR-V spec constants for dynamic values
  3. Dead Code Elimination: Remove unused registers and instructions
  4. Register Allocation: Optimize SPIR-V register usage
  5. Instruction Combining: Merge common patterns (e.g., MAD → FMA)

Debugging Support

SPIR-V Validation

// core/src/gpu/spirv.rs:1089
pub fn validate(&self) -> Result<(), &'static str> {
    if self.words.len() < 5 {
        return Err("Module too short for valid header");
    }
    if self.words[0] != 0x07230203 {
        return Err("Invalid SPIR-V magic number");
    }
    // Walk instructions and verify structure
    // ...
}

Binary Export

// core/src/gpu/spirv.rs:1122
pub fn to_bytes(&self) -> Vec<u8> {
    let mut out = Vec::with_capacity(self.words.len() * 4);
    for &w in &self.words {
        out.extend_from_slice(&w.to_le_bytes());
    }
    out
}
Exported SPIR-V can be validated with spirv-val and disassembled with spirv-dis.

Testing

GPU tests verify decoder and emitter functionality:
// core/src/gpu/test.rs
#[test]
fn test_sm86_decoder() {
    let mut ir = spirv::Emitter::new();
    let mut decoder = sm86::Decoder { ir: &mut ir, /* ... */ };
    decoder.init();
    
    // Test instruction decoding
    let inst: u128 = /* ... encoded instruction ... */;
    decoder.al2p(inst);
    
    ir.finalize();
    assert!(ir.validate().is_ok());
}

Future Enhancements

  1. Complete Instruction Set: Implement all 254+ SM86 instructions
  2. Geometry Shaders: Support for geometry and tessellation stages
  3. Compute Shaders: Full compute pipeline with shared memory
  4. Ray Tracing: RTX operations if Switch 2 supports RT cores
  5. Performance Counters: GPU profiling and metrics