Overview
Oboromi implements a unified memory architecture where all 8 CPU cores and the GPU share a single 12GB address space. This mirrors the Nintendo Switch 2’s rumored memory configuration.
The Switch 2 is expected to have 12GB of LPDDR5 RAM shared between CPU and GPU, similar to the original Switch’s unified memory design.
Memory Layout
Address Space
┌────────────────────────────────────────────────┐
│ 0x0000_0000_0000 │ ← Memory Base
├────────────────────────────────────────────────┤
│ │
│ General Purpose Memory │
│ │
│ (~12GB - 8MB) │
│ │
│ • Heap allocations │
│ • Program code │
│ • Global/static data │
│ • GPU resources │
│ │
├────────────────────────────────────────────────┤
│ 0x0002_FF00_0000 │ ← Core 7 Stack
│ Core 7 Stack (1MB) │
├────────────────────────────────────────────────┤
│ 0x0002_FE00_0000 │ ← Core 6 Stack
│ Core 6 Stack (1MB) │
├────────────────────────────────────────────────┤
│ 0x0002_FD00_0000 │ ← Core 5 Stack
│ Core 5 Stack (1MB) │
├────────────────────────────────────────────────┤
│ 0x0002_FC00_0000 │ ← Core 4 Stack
│ Core 4 Stack (1MB) │
├────────────────────────────────────────────────┤
│ 0x0002_FB00_0000 │ ← Core 3 Stack
│ Core 3 Stack (1MB) │
├────────────────────────────────────────────────┤
│ 0x0002_FA00_0000 │ ← Core 2 Stack
│ Core 2 Stack (1MB) │
├────────────────────────────────────────────────┤
│ 0x0002_F900_0000 │ ← Core 1 Stack
│ Core 1 Stack (1MB) │
├────────────────────────────────────────────────┤
│ 0x0002_F800_0000 │ ← Core 0 Stack
│ Core 0 Stack (1MB) │
└────────────────────────────────────────────────┘
0x0003_0000_0000 ← End (12GB)
Memory Configuration
Constants
// core/src/cpu/cpu_manager.rs:4
pub const CORE_COUNT: usize = 8;
pub const MEMORY_SIZE: u64 = 12 * 1024 * 1024 * 1024; // 12GB
pub const MEMORY_BASE: u64 = 0x0;
Architecture Requirement
Oboromi requires a 64-bit host architecture to address the full 12GB memory space. The build will fail on 32-bit systems.
// core/src/cpu/cpu_manager.rs:6
#[cfg(not(target_pointer_width = "64"))]
compile_error!("oboromi requires a 64-bit architecture to emulate 12GB of RAM.");
Memory Allocation
Initialization
The entire 12GB buffer is allocated at startup:
// core/src/cpu/cpu_manager.rs:21
let shared_memory = Pin::new(vec![0u8; MEMORY_SIZE as usize].into_boxed_slice());
let memory_ptr = shared_memory.as_ptr() as *mut u8;
Virtual Memory: Modern operating systems use lazy allocation. The 12GB allocation only reserves address space initially. Physical RAM is committed page-by-page as memory is written.
Memory Pinning
pub struct CpuManager {
pub cores: Vec<UnicornCPU>,
// Pin prevents reallocation from invalidating pointers
pub shared_memory: Pin<Box<[u8]>>,
}
The memory is pinned using Pin<Box<[u8]>> to guarantee:
- The buffer never moves in memory
- Pointers passed to Unicorn remain valid
- No reallocation can occur
Shared Memory Mapping
Per-Core Mapping
Each CPU core maps the same physical buffer:
// core/src/cpu/unicorn_interface.rs:53
unsafe {
emu.mem_map_ptr(
0x0, // Virtual address
memory_size, // Size (12GB)
Prot::ALL, // Read/Write/Execute
memory_ptr as *mut std::ffi::c_void // Host buffer
)
.ok()?;
}
This creates a unified address space where:
- Address
0x1000 on Core 0 points to the same byte as address 0x1000 on Core 7
- All cores see memory modifications immediately
- No explicit synchronization needed for memory reads/writes
GPU Memory Sharing
The GPU state also references the shared memory:
// core/src/gpu/mod.rs:40
pub struct State {
pub shared_memory: *mut u8, // Same pointer as CPU cores
pub global_memory: *mut u8, // Separate GPU-specific memory
pub pc: u64,
pub vk: VkState,
}
Stack Allocation
Per-Core Stacks
Each core receives 1MB of dedicated stack space at the top of the address space:
// core/src/cpu/unicorn_interface.rs:62
// Initialize stack pointer to end of memory, offset by core ID
// Give each core 1MB of stack space at the top of memory
let stack_top = memory_size - (core_id as u64 * 0x100000);
let _ = emu.reg_write(RegisterARM64::SP, stack_top);
Stack Addresses
| Core | Stack Top Address | Stack Bottom Address | Size |
|---|
| 0 | 0x2_F800_0000 | 0x2_F700_0000 | 1MB |
| 1 | 0x2_F900_0000 | 0x2_F800_0000 | 1MB |
| 2 | 0x2_FA00_0000 | 0x2_F900_0000 | 1MB |
| 3 | 0x2_FB00_0000 | 0x2_FA00_0000 | 1MB |
| 4 | 0x2_FC00_0000 | 0x2_FB00_0000 | 1MB |
| 5 | 0x2_FD00_0000 | 0x2_FC00_0000 | 1MB |
| 6 | 0x2_FE00_0000 | 0x2_FD00_0000 | 1MB |
| 7 | 0x2_FF00_0000 | 0x2_FE00_0000 | 1MB |
Stack grows downward from the top address. Core-specific offsets prevent stack collisions during concurrent execution.
Memory Access Patterns
Direct Memory Operations
The CPU wrapper provides typed memory access:
32-bit Operations
// core/src/cpu/unicorn_interface.rs:225
pub fn write_u32(&self, vaddr: u64, value: u32) {
let mut emu = self.emu.lock().unwrap();
let bytes = value.to_le_bytes();
let _ = emu.mem_write(vaddr, &bytes);
}
pub fn read_u32(&self, vaddr: u64) -> u32 {
let emu = self.emu.lock().unwrap();
let mut bytes = [0u8; 4];
if emu.mem_read(vaddr, &mut bytes).is_ok() {
u32::from_le_bytes(bytes)
} else {
0
}
}
64-bit Operations
// core/src/cpu/unicorn_interface.rs:243
pub fn write_u64(&self, vaddr: u64, value: u64) {
let mut emu = self.emu.lock().unwrap();
let bytes = value.to_le_bytes();
let _ = emu.mem_write(vaddr, &bytes);
}
pub fn read_u64(&self, vaddr: u64) -> u64 {
let emu = self.emu.lock().unwrap();
let mut bytes = [0u8; 8];
if emu.mem_read(vaddr, &mut bytes).is_ok() {
u64::from_le_bytes(bytes)
} else {
0
}
}
Endianness
All memory operations use little-endian byte order to match ARMv8:
let bytes = value.to_le_bytes(); // Convert to little-endian
u32::from_le_bytes(bytes) // Parse from little-endian
Memory Coherency
Automatic Coherency
Since all cores map the same host memory buffer, memory coherency is automatic. No cache simulation or coherency protocol is needed at the emulation level.
Test: Shared Memory Verification
// core/src/tests/multicore_test.rs:15
#[test]
fn test_shared_memory_access() {
let manager = CpuManager::new();
let core0 = manager.get_core(0).expect("Core 0 missing");
let core1 = manager.get_core(1).expect("Core 1 missing");
// Write value using Core 0
let test_addr = 0x1000;
let test_val = 0xDEADBEEF;
core0.write_u32(test_addr, test_val);
// Read value using Core 1 - sees the write immediately
let read_val = core1.read_u32(test_addr);
assert_eq!(read_val, test_val);
}
This test confirms that writes from one core are immediately visible to all other cores.
Memory Permissions
Unicorn Memory Protection
The entire address space is mapped with full permissions:
// Prot::ALL = Read | Write | Execute
emu.mem_map_ptr(0x0, memory_size, Prot::ALL, memory_ptr)
Currently, there is no memory protection or segmentation. Future versions may implement:
- Read-only code sections
- No-execute data pages
- Privileged/user mode separation
Memory Allocation Cost
| Operation | Time |
|---|
| Initial allocation | ~1-5ms (OS dependent) |
| First-touch per page | ~1-2μs (page fault) |
| Subsequent access | ~50-100ns (cache hit) |
Virtual Memory Benefits
- Low Initial Memory: Only allocates pages as written
- Fast Startup: No need to zero 12GB upfront
- Overcommit Friendly: OS can overcommit if host RAM < 12GB
Memory Bandwidth
Unicorn memory accesses go through:
Guest Read/Write → Unicorn API → Host Memory
Typical latency: 100-500ns per operation (includes Mutex lock overhead).
Memory Debugging
Initialization Verification
// core/src/tests/multicore_test.rs:6
#[test]
fn test_multicore_initialization() {
let manager = CpuManager::new();
assert_eq!(manager.cores.len(), 8);
assert_eq!(manager.shared_memory.len() as u64, MEMORY_SIZE);
}
Memory Dumps
For debugging, you can dump memory regions:
// Not in current code, but useful for development
pub fn dump_memory(core: &UnicornCPU, addr: u64, size: usize) {
let mut buffer = vec![0u8; size];
let emu = core.emu.lock().unwrap();
if emu.mem_read(addr, &mut buffer).is_ok() {
for (i, chunk) in buffer.chunks(16).enumerate() {
print!("{:08x}: ", addr + (i * 16) as u64);
for byte in chunk {
print!("{:02x} ", byte);
}
println!();
}
}
}
GPU Memory Integration
The GPU shares the same memory space:
// GPU can read/write CPU-visible memory
pub struct State {
pub shared_memory: *mut u8, // Points to same buffer as CPUs
pub global_memory: *mut u8, // GPU-specific allocations
// ...
}
This allows:
- Zero-copy data transfer between CPU and GPU
- Unified addressing for textures, buffers, etc.
- Direct GPU writes visible to CPU
Future Enhancements
Memory Management Unit (MMU)
Planned features:
- Virtual Address Translation: Separate virtual and physical address spaces
- Page Tables: Multi-level page table emulation
- TLB Simulation: Translation Lookaside Buffer for performance
- Memory Protection: Read-only pages, execute-disable bits
Memory Ordering
- Barriers: Implement ARMv8 memory barriers (DMB, DSB, ISB)
- Atomic Operations: LL/SC and atomic instructions
- Acquire/Release Semantics: Proper memory ordering guarantees
Memory Types
- Cacheable vs Uncacheable: Different memory regions
- Device Memory: Memory-mapped I/O
- Strongly-Ordered: For peripheral access
Memory Best Practices
For Emulator Developers
When working with shared memory:
- Always check memory bounds before access
- Use typed accessors (
read_u32, write_u64) for alignment
- Be aware of endianness (ARM is little-endian)
- Don’t assume zero-initialization in the middle of memory
Memory Access Example
// Good: Typed access with error handling
let value = core.read_u32(addr);
if value != 0 {
core.write_u32(addr + 4, value + 1);
}
// Avoid: Raw pointer arithmetic
// let ptr = unsafe { memory_ptr.add(addr as usize) };
// unsafe { *(ptr as *mut u32) = value; }