ARM Instruction Decoding #
Decodes 32-bit ARM AArch64 instruction encodings into the Instruction type.
This implements functionality similar to HOL Light's decode function and
ARM_MK_EXEC_RULE, allowing programs to be loaded from byte sequences as in
s2n-bignum.
Main Definitions #
extractBits- Extract a bit field from a 32-bit wordgetBit- Extract a single bitdecodeReg- Convert 5-bit register field to Reg typedecode- Decode a 32-bit instruction encoding to Option InstructiondecodeBytes- Decode a byte list (little-endian) into instruction listProgram.fromBytes- Create a Program from byte sequence
References #
Source: s2n-bignum/arm/proofs/decode.ml (HOL Light decode implementation) Source: s2n-bignum/arm/tutorial/simple.ml:22-37 (byte list representation)
Implementation Notes #
This MVP implementation supports:
- ADD/SUB (register, no shift, 64-bit only)
Future extensions will add:
- ADDS/SUBS/ADCS/SBCS (variants with flags)
- Logical operations (AND/ORR/EOR)
- MOVZ/MOV
- LSL/LSR
- LDR/STR (memory operations)
- Branches (B/BL/Bcond/CBZ/CBNZ)
Bit Extraction Helpers #
ARM instructions are 32-bit little-endian values. Bits are numbered from LSB (bit 0) to MSB (bit 31).
Extract bits [high:low] (inclusive) from a 32-bit word.
Example: extractBits 0x8b000022 4 0 extracts bits 4-0, yielding 0x2 (register Rd).
Equations
Instances For
Register Decoding #
ARM uses 5-bit register fields that can encode:
- 0-30: General purpose registers X0-X30
- 31: SP (stack pointer) or XZR (zero register), context-dependent
For the MVP, we treat register field 31 as SP uniformly.
Convert a 5-bit register field to Reg type.
Register encoding:
- 0-30 → X0-X30 (general purpose registers)
- 31 → SP (stack pointer)
Note: ARM uses register 31 as either SP or XZR (zero register) depending on context. This MVP implementation always interprets it as SP.
Equations
- Bignum.Arm.decodeReg bits = if h : bits.toNat < 31 then Bignum.Arm.Reg.X ⟨bits.toNat, h⟩ else Bignum.Arm.Reg.SP
Instances For
Instruction Decoder #
The decoder pattern-matches on instruction encoding bits following the ARM Architecture Reference Manual (ARM ARM) specifications.
Each instruction type has a specific bit pattern:
ADD/SUB (register):
31 30 29 28-24 23-22 21 20-16 15-10 9-5 4-0
sf=1 op S=0 01011 00 0 Rm=5bits 000000 Rn Rd
sf=1: 64-bit operation (we only support 64-bit)
op=0: ADD, op=1: SUB
S=0: Don't set flags (plain ADD/SUB, not ADDS/SUBS)
shift=00, sam=000000: No shift applied
Source: ARM ARM section C3.5.1 (Data-processing - register) HOL Light: s2n-bignum/arm/proofs/decode.ml:165-350
Decode a 32-bit ARM instruction encoding to an Instruction.
Returns some instruction for valid encodings, none for:
- Invalid/unrecognized instruction patterns
- Instructions not yet implemented in this MVP
- Instructions with unsupported operand combinations
Examples:
#eval decode 0x8b000022 -- some (ADD X2 X1 X0)
#eval decode 0xcb010042 -- some (SUB X2 X2 X1)
#eval decode 0x00000000 -- none (invalid)
Equations
- One or more equations did not get rendered due to their size.
Instances For
Byte List Decoding #
ARM instructions are stored in memory as 4-byte sequences in little-endian format. The decodeBytes function converts a byte list into a list of Instructions.
Little-endian layout:
Bytes: [0x22, 0x00, 0x00, 0x8b] → Word: 0x8b000022
byte0 byte1 byte2 byte3 MSB ← LSB
Decode a list of bytes (little-endian) into a list of Instructions.
The function processes bytes 4 at a time, converting each 4-byte sequence into a 32-bit word (little-endian), then decoding the instruction.
Stops at:
- End of byte list
- Invalid instruction (returns instructions decoded so far)
- Incomplete instruction (< 4 bytes remaining)
Example:
#eval decodeBytes [0x22, 0x00, 0x00, 0x8b, 0x42, 0x00, 0x01, 0xcb]
-- Expected: [ADD X2 X1 X0, SUB X2 X2 X1]
Equations
- Bignum.Arm.decodeBytes bytes = Bignum.Arm.decodeBytes.go bytes []
Instances For
Equations
- One or more equations did not get rendered due to their size.
- Bignum.Arm.decodeBytes.go bs acc = acc
Instances For
Create a Program from a byte sequence.
This is the primary interface for loading programs from byte representations, matching the HOL Light style used in s2n-bignum.
Example usage (from Simple.lean):
def simple_program_bytes : List UInt8 :=
[0x22, 0x00, 0x00, 0x8b, -- ADD X2, X1, X0
0x42, 0x00, 0x01, 0xcb] -- SUB X2, X2, X1
def simple_program (pc : Nat) : Program :=
Program.fromBytes (Word64.ofNat pc) simple_program_bytes
Equations
- Bignum.Arm.Program.fromBytes base_addr bytes = { base_addr := base_addr, instructions := Bignum.Arm.decodeBytes bytes }
Instances For
Tests and Validation #
These tests verify the decoder against known instruction encodings from the Simple.lean tutorial.
Memory-Based Decoding #
The arm_decode function reads an instruction from memory at a given PC.
This corresponds to HOL Light's fetch-decode-execute model where programs
live in memory.
Decode an instruction from memory at the given PC.
This corresponds to HOL Light's arm_decode function which fetches and decodes
an instruction from the machine state's memory.
The function:
- Reads 4 bytes from memory at the PC (little-endian)
- Combines them into a 32-bit instruction word
- Decodes the word into an
Instruction
Returns none if:
- Any of the 4 bytes at PC are uninitialized
- The instruction encoding is invalid or unsupported
Equations
- One or more equations did not get rendered due to their size.
Instances For
If bytes are loaded at an address, read4Bytes succeeds.
TODO: Complete the proof. The key steps:
- Extract the 4 bytes from h_loaded
- Show they match the memory reads in read4Bytes
Helper Lemmas for Proofs #
These lemmas help bridge the gap between decoded byte sequences and manually constructed programs, making proofs more maintainable.
Program.fromBytes decodes bytes into the instruction list.