Skip to main content

Overview

Oboromi implements a unified memory architecture where all 8 CPU cores and the GPU share a single 12GB address space. This mirrors the Nintendo Switch 2’s rumored memory configuration.
The Switch 2 is expected to have 12GB of LPDDR5 RAM shared between CPU and GPU, similar to the original Switch’s unified memory design.

Memory Layout

Address Space

┌────────────────────────────────────────────────┐
│  0x0000_0000_0000                              │  ← Memory Base
├────────────────────────────────────────────────┤
│                                                │
│           General Purpose Memory               │
│                                                │
│              (~12GB - 8MB)                     │
│                                                │
│  • Heap allocations                            │
│  • Program code                                │
│  • Global/static data                          │
│  • GPU resources                               │
│                                                │
├────────────────────────────────────────────────┤
│  0x0002_FF00_0000                              │  ← Core 7 Stack
│  Core 7 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_FE00_0000                              │  ← Core 6 Stack
│  Core 6 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_FD00_0000                              │  ← Core 5 Stack
│  Core 5 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_FC00_0000                              │  ← Core 4 Stack
│  Core 4 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_FB00_0000                              │  ← Core 3 Stack
│  Core 3 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_FA00_0000                              │  ← Core 2 Stack
│  Core 2 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_F900_0000                              │  ← Core 1 Stack
│  Core 1 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_F800_0000                              │  ← Core 0 Stack
│  Core 0 Stack (1MB)                            │
└────────────────────────────────────────────────┘
   0x0003_0000_0000                                  ← End (12GB)

Memory Configuration

Constants

// core/src/cpu/cpu_manager.rs:4
pub const CORE_COUNT: usize = 8;
pub const MEMORY_SIZE: u64 = 12 * 1024 * 1024 * 1024; // 12GB
pub const MEMORY_BASE: u64 = 0x0;

Architecture Requirement

Oboromi requires a 64-bit host architecture to address the full 12GB memory space. The build will fail on 32-bit systems.
// core/src/cpu/cpu_manager.rs:6
#[cfg(not(target_pointer_width = "64"))]
compile_error!("oboromi requires a 64-bit architecture to emulate 12GB of RAM.");

Memory Allocation

Initialization

The entire 12GB buffer is allocated at startup:
// core/src/cpu/cpu_manager.rs:21
let shared_memory = Pin::new(vec![0u8; MEMORY_SIZE as usize].into_boxed_slice());
let memory_ptr = shared_memory.as_ptr() as *mut u8;
Virtual Memory: Modern operating systems use lazy allocation. The 12GB allocation only reserves address space initially. Physical RAM is committed page-by-page as memory is written.

Memory Pinning

pub struct CpuManager {
    pub cores: Vec<UnicornCPU>,
    // Pin prevents reallocation from invalidating pointers
    pub shared_memory: Pin<Box<[u8]>>,
}
The memory is pinned using Pin<Box<[u8]>> to guarantee:
  1. The buffer never moves in memory
  2. Pointers passed to Unicorn remain valid
  3. No reallocation can occur

Shared Memory Mapping

Per-Core Mapping

Each CPU core maps the same physical buffer:
// core/src/cpu/unicorn_interface.rs:53
unsafe {
    emu.mem_map_ptr(
        0x0,                              // Virtual address
        memory_size,                       // Size (12GB)
        Prot::ALL,                        // Read/Write/Execute
        memory_ptr as *mut std::ffi::c_void  // Host buffer
    )
    .ok()?;
}
This creates a unified address space where:
  • Address 0x1000 on Core 0 points to the same byte as address 0x1000 on Core 7
  • All cores see memory modifications immediately
  • No explicit synchronization needed for memory reads/writes

GPU Memory Sharing

The GPU state also references the shared memory:
// core/src/gpu/mod.rs:40
pub struct State {
    pub shared_memory: *mut u8,   // Same pointer as CPU cores
    pub global_memory: *mut u8,   // Separate GPU-specific memory
    pub pc: u64,
    pub vk: VkState,
}

Stack Allocation

Per-Core Stacks

Each core receives 1MB of dedicated stack space at the top of the address space:
// core/src/cpu/unicorn_interface.rs:62
// Initialize stack pointer to end of memory, offset by core ID
// Give each core 1MB of stack space at the top of memory
let stack_top = memory_size - (core_id as u64 * 0x100000);
let _ = emu.reg_write(RegisterARM64::SP, stack_top);

Stack Addresses

CoreStack Top AddressStack Bottom AddressSize
00x2_F800_00000x2_F700_00001MB
10x2_F900_00000x2_F800_00001MB
20x2_FA00_00000x2_F900_00001MB
30x2_FB00_00000x2_FA00_00001MB
40x2_FC00_00000x2_FB00_00001MB
50x2_FD00_00000x2_FC00_00001MB
60x2_FE00_00000x2_FD00_00001MB
70x2_FF00_00000x2_FE00_00001MB
Stack grows downward from the top address. Core-specific offsets prevent stack collisions during concurrent execution.

Memory Access Patterns

Direct Memory Operations

The CPU wrapper provides typed memory access:

32-bit Operations

// core/src/cpu/unicorn_interface.rs:225
pub fn write_u32(&self, vaddr: u64, value: u32) {
    let mut emu = self.emu.lock().unwrap();
    let bytes = value.to_le_bytes();
    let _ = emu.mem_write(vaddr, &bytes);
}

pub fn read_u32(&self, vaddr: u64) -> u32 {
    let emu = self.emu.lock().unwrap();
    let mut bytes = [0u8; 4];
    if emu.mem_read(vaddr, &mut bytes).is_ok() {
        u32::from_le_bytes(bytes)
    } else {
        0
    }
}

64-bit Operations

// core/src/cpu/unicorn_interface.rs:243
pub fn write_u64(&self, vaddr: u64, value: u64) {
    let mut emu = self.emu.lock().unwrap();
    let bytes = value.to_le_bytes();
    let _ = emu.mem_write(vaddr, &bytes);
}

pub fn read_u64(&self, vaddr: u64) -> u64 {
    let emu = self.emu.lock().unwrap();
    let mut bytes = [0u8; 8];
    if emu.mem_read(vaddr, &mut bytes).is_ok() {
        u64::from_le_bytes(bytes)
    } else {
        0
    }
}

Endianness

All memory operations use little-endian byte order to match ARMv8:
let bytes = value.to_le_bytes();  // Convert to little-endian
u32::from_le_bytes(bytes)         // Parse from little-endian

Memory Coherency

Automatic Coherency

Since all cores map the same host memory buffer, memory coherency is automatic. No cache simulation or coherency protocol is needed at the emulation level.

Test: Shared Memory Verification

// core/src/tests/multicore_test.rs:15
#[test]
fn test_shared_memory_access() {
    let manager = CpuManager::new();
    
    let core0 = manager.get_core(0).expect("Core 0 missing");
    let core1 = manager.get_core(1).expect("Core 1 missing");

    // Write value using Core 0
    let test_addr = 0x1000;
    let test_val = 0xDEADBEEF;
    core0.write_u32(test_addr, test_val);

    // Read value using Core 1 - sees the write immediately
    let read_val = core1.read_u32(test_addr);

    assert_eq!(read_val, test_val);
}
This test confirms that writes from one core are immediately visible to all other cores.

Memory Permissions

Unicorn Memory Protection

The entire address space is mapped with full permissions:
// Prot::ALL = Read | Write | Execute
emu.mem_map_ptr(0x0, memory_size, Prot::ALL, memory_ptr)
Currently, there is no memory protection or segmentation. Future versions may implement:
  • Read-only code sections
  • No-execute data pages
  • Privileged/user mode separation

Performance Characteristics

Memory Allocation Cost

OperationTime
Initial allocation~1-5ms (OS dependent)
First-touch per page~1-2μs (page fault)
Subsequent access~50-100ns (cache hit)

Virtual Memory Benefits

  1. Low Initial Memory: Only allocates pages as written
  2. Fast Startup: No need to zero 12GB upfront
  3. Overcommit Friendly: OS can overcommit if host RAM < 12GB

Memory Bandwidth

Unicorn memory accesses go through:
Guest Read/Write → Unicorn API → Host Memory
Typical latency: 100-500ns per operation (includes Mutex lock overhead).

Memory Debugging

Initialization Verification

// core/src/tests/multicore_test.rs:6
#[test]
fn test_multicore_initialization() {
    let manager = CpuManager::new();
    
    assert_eq!(manager.cores.len(), 8);
    assert_eq!(manager.shared_memory.len() as u64, MEMORY_SIZE);
}

Memory Dumps

For debugging, you can dump memory regions:
// Not in current code, but useful for development
pub fn dump_memory(core: &UnicornCPU, addr: u64, size: usize) {
    let mut buffer = vec![0u8; size];
    let emu = core.emu.lock().unwrap();
    if emu.mem_read(addr, &mut buffer).is_ok() {
        for (i, chunk) in buffer.chunks(16).enumerate() {
            print!("{:08x}: ", addr + (i * 16) as u64);
            for byte in chunk {
                print!("{:02x} ", byte);
            }
            println!();
        }
    }
}

GPU Memory Integration

The GPU shares the same memory space:
// GPU can read/write CPU-visible memory
pub struct State {
    pub shared_memory: *mut u8,  // Points to same buffer as CPUs
    pub global_memory: *mut u8,  // GPU-specific allocations
    // ...
}
This allows:
  • Zero-copy data transfer between CPU and GPU
  • Unified addressing for textures, buffers, etc.
  • Direct GPU writes visible to CPU

Future Enhancements

Memory Management Unit (MMU)

Planned features:
  1. Virtual Address Translation: Separate virtual and physical address spaces
  2. Page Tables: Multi-level page table emulation
  3. TLB Simulation: Translation Lookaside Buffer for performance
  4. Memory Protection: Read-only pages, execute-disable bits

Memory Ordering

  1. Barriers: Implement ARMv8 memory barriers (DMB, DSB, ISB)
  2. Atomic Operations: LL/SC and atomic instructions
  3. Acquire/Release Semantics: Proper memory ordering guarantees

Memory Types

  1. Cacheable vs Uncacheable: Different memory regions
  2. Device Memory: Memory-mapped I/O
  3. Strongly-Ordered: For peripheral access

Memory Best Practices

For Emulator Developers

When working with shared memory:
  1. Always check memory bounds before access
  2. Use typed accessors (read_u32, write_u64) for alignment
  3. Be aware of endianness (ARM is little-endian)
  4. Don’t assume zero-initialization in the middle of memory

Memory Access Example

// Good: Typed access with error handling
let value = core.read_u32(addr);
if value != 0 {
    core.write_u32(addr + 4, value + 1);
}

// Avoid: Raw pointer arithmetic
// let ptr = unsafe { memory_ptr.add(addr as usize) };
// unsafe { *(ptr as *mut u32) = value; }