Documentation

Bignum.Arm.Machine.Decode

ARM Instruction Decoding #

Decodes 32-bit ARM AArch64 instruction encodings into the Instruction type.

This implements functionality similar to HOL Light's decode function and ARM_MK_EXEC_RULE, allowing programs to be loaded from byte sequences as in s2n-bignum.

Main Definitions #

References #

Source: s2n-bignum/arm/proofs/decode.ml (HOL Light decode implementation) Source: s2n-bignum/arm/tutorial/simple.ml:22-37 (byte list representation)

Implementation Notes #

This MVP implementation supports:

Future extensions will add:

Bit Extraction Helpers #

ARM instructions are 32-bit little-endian values. Bits are numbered from LSB (bit 0) to MSB (bit 31).

def Bignum.Arm.extractBits (w : UInt32) (high low : ) :

Extract bits [high:low] (inclusive) from a 32-bit word.

Example: extractBits 0x8b000022 4 0 extracts bits 4-0, yielding 0x2 (register Rd).

Equations
Instances For
    def Bignum.Arm.getBit (w : UInt32) (n : ) :

    Extract a single bit at position n.

    Example: getBit 0x8b000022 31 extracts the sf bit (64-bit operation), yielding true.

    Equations
    Instances For

      Register Decoding #

      ARM uses 5-bit register fields that can encode:

      For the MVP, we treat register field 31 as SP uniformly.

      Convert a 5-bit register field to Reg type.

      Register encoding:

      • 0-30 → X0-X30 (general purpose registers)
      • 31 → SP (stack pointer)

      Note: ARM uses register 31 as either SP or XZR (zero register) depending on context. This MVP implementation always interprets it as SP.

      Equations
      Instances For

        Instruction Decoder #

        The decoder pattern-matches on instruction encoding bits following the ARM Architecture Reference Manual (ARM ARM) specifications.

        Each instruction type has a specific bit pattern:

        ADD/SUB (register):
          31    30  29  28-24   23-22 21 20-16     15-10      9-5    4-0
          sf=1  op  S=0 01011   00    0  Rm=5bits  000000     Rn     Rd
        
          sf=1: 64-bit operation (we only support 64-bit)
          op=0: ADD, op=1: SUB
          S=0: Don't set flags (plain ADD/SUB, not ADDS/SUBS)
          shift=00, sam=000000: No shift applied
        

        Source: ARM ARM section C3.5.1 (Data-processing - register) HOL Light: s2n-bignum/arm/proofs/decode.ml:165-350

        Decode a 32-bit ARM instruction encoding to an Instruction.

        Returns some instruction for valid encodings, none for:

        • Invalid/unrecognized instruction patterns
        • Instructions not yet implemented in this MVP
        • Instructions with unsupported operand combinations

        Examples:

        #eval decode 0x8b000022  -- some (ADD X2 X1 X0)
        #eval decode 0xcb010042  -- some (SUB X2 X2 X1)
        #eval decode 0x00000000  -- none (invalid)
        
        Equations
        • One or more equations did not get rendered due to their size.
        Instances For

          Byte List Decoding #

          ARM instructions are stored in memory as 4-byte sequences in little-endian format. The decodeBytes function converts a byte list into a list of Instructions.

          Little-endian layout:

          Bytes: [0x22, 0x00, 0x00, 0x8b] → Word: 0x8b000022
                 byte0  byte1  byte2  byte3        MSB ← LSB
          

          Decode a list of bytes (little-endian) into a list of Instructions.

          The function processes bytes 4 at a time, converting each 4-byte sequence into a 32-bit word (little-endian), then decoding the instruction.

          Stops at:

          • End of byte list
          • Invalid instruction (returns instructions decoded so far)
          • Incomplete instruction (< 4 bytes remaining)

          Example:

          #eval decodeBytes [0x22, 0x00, 0x00, 0x8b, 0x42, 0x00, 0x01, 0xcb]
          -- Expected: [ADD X2 X1 X0, SUB X2 X2 X1]
          
          Equations
          Instances For
            Equations
            Instances For

              Create a Program from a byte sequence.

              This is the primary interface for loading programs from byte representations, matching the HOL Light style used in s2n-bignum.

              Example usage (from Simple.lean):

              def simple_program_bytes : List UInt8 :=
                [0x22, 0x00, 0x00, 0x8b,  -- ADD X2, X1, X0
                 0x42, 0x00, 0x01, 0xcb]  -- SUB X2, X2, X1
              
              def simple_program (pc : Nat) : Program :=
                Program.fromBytes (Word64.ofNat pc) simple_program_bytes
              
              Equations
              Instances For

                Tests and Validation #

                These tests verify the decoder against known instruction encodings from the Simple.lean tutorial.

                Memory-Based Decoding #

                The arm_decode function reads an instruction from memory at a given PC. This corresponds to HOL Light's fetch-decode-execute model where programs live in memory.

                Read 4 bytes from memory starting at the given address. Returns none if any byte is uninitialized.

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For

                  Decode an instruction from memory at the given PC.

                  This corresponds to HOL Light's arm_decode function which fetches and decodes an instruction from the machine state's memory.

                  The function:

                  1. Reads 4 bytes from memory at the PC (little-endian)
                  2. Combines them into a 32-bit instruction word
                  3. Decodes the word into an Instruction

                  Returns none if:

                  • Any of the 4 bytes at PC are uninitialized
                  • The instruction encoding is invalid or unsupported
                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For
                    theorem Bignum.Arm.arm_decode_of_bytes_loaded (s : ArmState) (pc : Word64) (bytes : List UInt8) (h_loaded : bytes_loaded s.mem pc bytes) (h_len : bytes.length 4) (b0 b1 b2 b3 : UInt8) (h0 : bytes[0] = b0) (h1 : bytes[1] = b1) (h2 : bytes[2] = b2) (h3 : bytes[3] = b3) :
                    read4Bytes s pc = some (b0, b1, b2, b3)

                    If bytes are loaded at an address, read4Bytes succeeds.

                    TODO: Complete the proof. The key steps:

                    1. Extract the 4 bytes from h_loaded
                    2. Show they match the memory reads in read4Bytes

                    Helper Lemmas for Proofs #

                    These lemmas help bridge the gap between decoded byte sequences and manually constructed programs, making proofs more maintainable.

                    theorem Bignum.Arm.Program.fromBytes_base_addr (addr : Word64) (bytes : List UInt8) :
                    (fromBytes addr bytes).base_addr = addr

                    Program.fromBytes preserves the base address.

                    Program.fromBytes decodes bytes into the instruction list.