Memory Architecture

Overview

Oboromi implements a unified memory architecture where all 8 CPU cores and the GPU share a single 12GB address space. This mirrors the Nintendo Switch 2’s rumored memory configuration.

The Switch 2 is expected to have 12GB of LPDDR5 RAM shared between CPU and GPU, similar to the original Switch’s unified memory design.

Memory Layout

Address Space

┌────────────────────────────────────────────────┐
│  0x0000_0000_0000                              │  ← Memory Base
├────────────────────────────────────────────────┤
│                                                │
│           General Purpose Memory               │
│                                                │
│              (~12GB - 8MB)                     │
│                                                │
│  • Heap allocations                            │
│  • Program code                                │
│  • Global/static data                          │
│  • GPU resources                               │
│                                                │
├────────────────────────────────────────────────┤
│  0x0002_FF00_0000                              │  ← Core 7 Stack
│  Core 7 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_FE00_0000                              │  ← Core 6 Stack
│  Core 6 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_FD00_0000                              │  ← Core 5 Stack
│  Core 5 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_FC00_0000                              │  ← Core 4 Stack
│  Core 4 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_FB00_0000                              │  ← Core 3 Stack
│  Core 3 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_FA00_0000                              │  ← Core 2 Stack
│  Core 2 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_F900_0000                              │  ← Core 1 Stack
│  Core 1 Stack (1MB)                            │
├────────────────────────────────────────────────┤
│  0x0002_F800_0000                              │  ← Core 0 Stack
│  Core 0 Stack (1MB)                            │
└────────────────────────────────────────────────┘
   0x0003_0000_0000                                  ← End (12GB)

Memory Configuration

Constants

// core/src/cpu/cpu_manager.rs:4
pub const CORE_COUNT: usize = 8;
pub const MEMORY_SIZE: u64 = 12 * 1024 * 1024 * 1024; // 12GB
pub const MEMORY_BASE: u64 = 0x0;

Architecture Requirement

Oboromi requires a 64-bit host architecture to address the full 12GB memory space. The build will fail on 32-bit systems.

// core/src/cpu/cpu_manager.rs:6
#[cfg(not(target_pointer_width = "64"))]
compile_error!("oboromi requires a 64-bit architecture to emulate 12GB of RAM.");

Memory Allocation

Initialization

The entire 12GB buffer is allocated at startup:

// core/src/cpu/cpu_manager.rs:21
let shared_memory = Pin::new(vec![0u8; MEMORY_SIZE as usize].into_boxed_slice());
let memory_ptr = shared_memory.as_ptr() as *mut u8;

Virtual Memory: Modern operating systems use lazy allocation. The 12GB allocation only reserves address space initially. Physical RAM is committed page-by-page as memory is written.

Memory Pinning

pub struct CpuManager {
    pub cores: Vec<UnicornCPU>,
    // Pin prevents reallocation from invalidating pointers
    pub shared_memory: Pin<Box<[u8]>>,
}

The memory is pinned using Pin<Box<[u8]>> to guarantee:

The buffer never moves in memory
Pointers passed to Unicorn remain valid
No reallocation can occur

Shared Memory Mapping

Per-Core Mapping

Each CPU core maps the same physical buffer:

// core/src/cpu/unicorn_interface.rs:53
unsafe {
    emu.mem_map_ptr(
        0x0,                              // Virtual address
        memory_size,                       // Size (12GB)
        Prot::ALL,                        // Read/Write/Execute
        memory_ptr as *mut std::ffi::c_void  // Host buffer
    )
    .ok()?;
}

This creates a unified address space where:

Address 0x1000 on Core 0 points to the same byte as address 0x1000 on Core 7
All cores see memory modifications immediately
No explicit synchronization needed for memory reads/writes

The GPU state also references the shared memory:

// core/src/gpu/mod.rs:40
pub struct State {
    pub shared_memory: *mut u8,   // Same pointer as CPU cores
    pub global_memory: *mut u8,   // Separate GPU-specific memory
    pub pc: u64,
    pub vk: VkState,
}

Stack Allocation

Per-Core Stacks

Each core receives 1MB of dedicated stack space at the top of the address space:

// core/src/cpu/unicorn_interface.rs:62
// Initialize stack pointer to end of memory, offset by core ID
// Give each core 1MB of stack space at the top of memory
let stack_top = memory_size - (core_id as u64 * 0x100000);
let _ = emu.reg_write(RegisterARM64::SP, stack_top);

Stack Addresses

Core	Stack Top Address	Stack Bottom Address	Size
0	0x2_F800_0000	0x2_F700_0000	1MB
1	0x2_F900_0000	0x2_F800_0000	1MB
2	0x2_FA00_0000	0x2_F900_0000	1MB
3	0x2_FB00_0000	0x2_FA00_0000	1MB
4	0x2_FC00_0000	0x2_FB00_0000	1MB
5	0x2_FD00_0000	0x2_FC00_0000	1MB
6	0x2_FE00_0000	0x2_FD00_0000	1MB
7	0x2_FF00_0000	0x2_FE00_0000	1MB

Stack grows downward from the top address. Core-specific offsets prevent stack collisions during concurrent execution.

Memory Access Patterns

Direct Memory Operations

The CPU wrapper provides typed memory access:

32-bit Operations

// core/src/cpu/unicorn_interface.rs:225
pub fn write_u32(&self, vaddr: u64, value: u32) {
    let mut emu = self.emu.lock().unwrap();
    let bytes = value.to_le_bytes();
    let _ = emu.mem_write(vaddr, &bytes);
}

pub fn read_u32(&self, vaddr: u64) -> u32 {
    let emu = self.emu.lock().unwrap();
    let mut bytes = [0u8; 4];
    if emu.mem_read(vaddr, &mut bytes).is_ok() {
        u32::from_le_bytes(bytes)
    } else {
        0
    }
}

64-bit Operations

// core/src/cpu/unicorn_interface.rs:243
pub fn write_u64(&self, vaddr: u64, value: u64) {
    let mut emu = self.emu.lock().unwrap();
    let bytes = value.to_le_bytes();
    let _ = emu.mem_write(vaddr, &bytes);
}

pub fn read_u64(&self, vaddr: u64) -> u64 {
    let emu = self.emu.lock().unwrap();
    let mut bytes = [0u8; 8];
    if emu.mem_read(vaddr, &mut bytes).is_ok() {
        u64::from_le_bytes(bytes)
    } else {
        0
    }
}

Endianness

All memory operations use little-endian byte order to match ARMv8:

let bytes = value.to_le_bytes();  // Convert to little-endian
u32::from_le_bytes(bytes)         // Parse from little-endian

Memory Coherency

Automatic Coherency

Since all cores map the same host memory buffer, memory coherency is automatic. No cache simulation or coherency protocol is needed at the emulation level.

Test: Shared Memory Verification

// core/src/tests/multicore_test.rs:15
#[test]
fn test_shared_memory_access() {
    let manager = CpuManager::new();
    
    let core0 = manager.get_core(0).expect("Core 0 missing");
    let core1 = manager.get_core(1).expect("Core 1 missing");

    // Write value using Core 0
    let test_addr = 0x1000;
    let test_val = 0xDEADBEEF;
    core0.write_u32(test_addr, test_val);

    // Read value using Core 1 - sees the write immediately
    let read_val = core1.read_u32(test_addr);

    assert_eq!(read_val, test_val);
}

This test confirms that writes from one core are immediately visible to all other cores.

Memory Permissions

Unicorn Memory Protection

The entire address space is mapped with full permissions:

// Prot::ALL = Read | Write | Execute
emu.mem_map_ptr(0x0, memory_size, Prot::ALL, memory_ptr)

Currently, there is no memory protection or segmentation. Future versions may implement:

Read-only code sections
No-execute data pages
Privileged/user mode separation

Performance Characteristics

Memory Allocation Cost

Operation	Time
Initial allocation	~1-5ms (OS dependent)
First-touch per page	~1-2μs (page fault)
Subsequent access	~50-100ns (cache hit)

Virtual Memory Benefits

Low Initial Memory: Only allocates pages as written
Fast Startup: No need to zero 12GB upfront
Overcommit Friendly: OS can overcommit if host RAM < 12GB

Memory Bandwidth

Unicorn memory accesses go through:

Guest Read/Write → Unicorn API → Host Memory

Typical latency: 100-500ns per operation (includes Mutex lock overhead).

Memory Debugging

Initialization Verification

// core/src/tests/multicore_test.rs:6
#[test]
fn test_multicore_initialization() {
    let manager = CpuManager::new();
    
    assert_eq!(manager.cores.len(), 8);
    assert_eq!(manager.shared_memory.len() as u64, MEMORY_SIZE);
}

Memory Dumps

For debugging, you can dump memory regions:

// Not in current code, but useful for development
pub fn dump_memory(core: &UnicornCPU, addr: u64, size: usize) {
    let mut buffer = vec![0u8; size];
    let emu = core.emu.lock().unwrap();
    if emu.mem_read(addr, &mut buffer).is_ok() {
        for (i, chunk) in buffer.chunks(16).enumerate() {
            print!("{:08x}: ", addr + (i * 16) as u64);
            for byte in chunk {
                print!("{:02x} ", byte);
            }
            println!();
        }
    }
}

GPU Memory Integration

The GPU shares the same memory space:

// GPU can read/write CPU-visible memory
pub struct State {
    pub shared_memory: *mut u8,  // Points to same buffer as CPUs
    pub global_memory: *mut u8,  // GPU-specific allocations
    // ...
}

This allows:

Zero-copy data transfer between CPU and GPU
Unified addressing for textures, buffers, etc.
Direct GPU writes visible to CPU

Future Enhancements

Memory Management Unit (MMU)

Planned features:

Virtual Address Translation: Separate virtual and physical address spaces
Page Tables: Multi-level page table emulation
TLB Simulation: Translation Lookaside Buffer for performance
Memory Protection: Read-only pages, execute-disable bits

Memory Ordering

Barriers: Implement ARMv8 memory barriers (DMB, DSB, ISB)
Atomic Operations: LL/SC and atomic instructions
Acquire/Release Semantics: Proper memory ordering guarantees

Memory Types

Cacheable vs Uncacheable: Different memory regions
Device Memory: Memory-mapped I/O
Strongly-Ordered: For peripheral access

Memory Best Practices

For Emulator Developers

When working with shared memory:

Always check memory bounds before access
Use typed accessors (read_u32, write_u64) for alignment
Be aware of endianness (ARM is little-endian)
Don’t assume zero-initialization in the middle of memory

Memory Access Example

// Good: Typed access with error handling
let value = core.read_u32(addr);
if value != 0 {
    core.write_u32(addr + 4, value + 1);
}

// Avoid: Raw pointer arithmetic
// let ptr = unsafe { memory_ptr.add(addr as usize) };
// unsafe { *(ptr as *mut u32) = value; }

CPU Architecture - CPU core management and execution
Architecture Overview - Complete system architecture
GPU Architecture - GPU memory usage and buffers

Documentation Index

​Overview

​Memory Layout

​Address Space

​Memory Configuration

​Constants

​Architecture Requirement

​Memory Allocation

​Initialization

​Memory Pinning

​Shared Memory Mapping

​Per-Core Mapping

​GPU Memory Sharing

​Stack Allocation

​Per-Core Stacks

​Stack Addresses

​Memory Access Patterns

​Direct Memory Operations

​32-bit Operations

​64-bit Operations

​Endianness

​Memory Coherency

​Automatic Coherency

​Test: Shared Memory Verification

​Memory Permissions

​Unicorn Memory Protection

​Performance Characteristics

​Memory Allocation Cost

​Virtual Memory Benefits

​Memory Bandwidth

​Memory Debugging

​Initialization Verification

​Memory Dumps

​GPU Memory Integration

​Future Enhancements

​Memory Management Unit (MMU)

​Memory Ordering

​Memory Types

​Memory Best Practices

​For Emulator Developers

​Memory Access Example

​Related Documentation

Overview

Memory Layout

Address Space

Memory Configuration

Constants

Architecture Requirement

Memory Allocation

Initialization

Memory Pinning

Shared Memory Mapping

Per-Core Mapping

GPU Memory Sharing

Stack Allocation

Per-Core Stacks

Stack Addresses

Memory Access Patterns

Direct Memory Operations

32-bit Operations

64-bit Operations

Endianness

Memory Coherency

Automatic Coherency

Test: Shared Memory Verification

Memory Permissions

Unicorn Memory Protection

Performance Characteristics

Memory Allocation Cost

Virtual Memory Benefits

Memory Bandwidth

Memory Debugging

Initialization Verification

Memory Dumps

GPU Memory Integration

Future Enhancements

Memory Management Unit (MMU)

Memory Ordering

Memory Types

Memory Best Practices

For Emulator Developers

Memory Access Example

Related Documentation