Appearance
Abstract
This EIP proposes the addition of new optimized modular addition, subtraction and multiplication opcodes to the EVM. These support odd moduli up to 4096 bits in size.
Motivation
Benefits of the changes proposed in this EIP:
- enables elliptic curve arithmetic operations on various curves including BLS12-381 to be implemented as EVM contracts
- For operations on values up to 256bits in size, reduces gas cost per operation by 90-95% compared to the current
MULMOD
andADDMOD
opcodes. - for all cases where modexp precompile is useful, it could now be implemented as an EVM contract.
- enables substantial cost reductions for algebraic hash functions (e.g. MiMC/Poseidon), zkp verification in the EVM.
Specification
Overview
During contract execution, a contract calls a setup instruction SETUPX
, sourcing a modulus from a specified memory offset/size and computing several parameters used to speed up modular multiplication (referred to as "Montgomery" parameters). A zeroed memory space (whose size is a stack parameter passed to SETUPX
) is allocated separate from EVM memory.
The modulus, computed parameters and memory space are associated with the current call frame state and referred to as the active modulus state. If SETUPX
is called again to switch to a different modulus, the memory space and Montgomery parameters of the previous active modulus state remain allocated (the memory spaces of active/previously-active modulus state are separate).
New store and load opcodes STOREX
/LOADX
are used to copy multiples values to/from EVM memory and the memory space of the active modulus state.
Arithmetic is performed with ADDMODX
/SUBMODX
/MULMODX
opcodes which take and return no stack items, require a 3-byte immediate value appended to the opcode.
The immediate is interpreted as 3 1-byte values z
, x
, y
which are indexes to the array of EVMMAX values that comprise the memory space of the active modulus state.
An arithmetic operation is performed on inputs at index x
/y
placing the result in index z
.
Conventions
x === y % m
:x % m == y % m
pow(x, -1, m)
: The modular multiplicative inverse ofx
with respect to modulusm
.- Opcode definition syntax is formatted as
mneumonic {immediate - type} {immediate2 - type} ...: stack_arg_1, stack_arg_2, ...
where immediates are listed in the order that they proceed the opcode and stack arguments are ordered starting at the top of the stack. - In the provided pseudocode, it is assumed that opcode gas charging logic is executed prior to execution logic.
- Any exception thrown should immediately end the current execution frame and return to the caller.
Constants
Name | Value | Description |
---|---|---|
STOREX_BASE_GAS | 3 | base gas cost for STOREX opcode |
LOADX_BASE_GAS | 3 | base gas cost for LOADX opcode |
SETUPX_BASE_GAS | 3 | base gas cost for SETUPX opcode |
EVMMAX_MAX_MEM | 65,536 bytes | maximum amount of EVMMAX memory that can be used in a call frame |
MAX_MOD_SIZE | 4096 bits | tentative modulus size limit (can probably be removed because EVMMAX_MAX_MEM_SIZE effectively caps the modulus size) |
MULMODX_SUBQUADRATIC_START | 50 | modulus size in in multiples of 8 bytes where we switch to subquadratic mulmont cost model |
SYSTEM_WORD_SIZE_BITS | varies depending on the system | word size in bits of a client's CPU |
Context Variables
Name | Type | Meaning |
---|---|---|
evmmax_state | EVMMAXState | a variable representing ephemeral state which exists for the duration of the current call and in the scope of the current call frame |
evm_memory | bytes | EVM memory for the current call context |
expand_evm_memory | func(size_words: int) | expands EVM memory by size_words * 32 bytes |
cost_evm_memory_expansion | func(new_size_evm_words: int) -> int | EVM memory expansion cost function, modified according to this EIP |
evm_stack | object | Allows access to the stack via pop() and peek(n) which return int stack elements |
contract_code | bytes | code of the currently-executing contract |
pc | int | EVM program counter |
class EVMMAXState():
def __init__(self):
# ModState currently being used
self.active_mod_state = None
# a lookup of mod_id (int) -> ModState
self.mods = {}
class ModState():
def __init__(self, mod: int, num_vals_used: int, mod: int, r: int, r_squared: int, mod_inv_full=None, mod_inv=None):
self.mod = mod
# size (expressed in multiples of 8 bytes) needed to represent mod
self.val_size_multiplier = math.ceil(len(hex(mod)[2:]) / (2 * 8))
self.num_vals_used = num_vals_used
self.mod_inv = mod_inv
self.mod_inv_full = mod_inv_full
self.r = r
self.r_squared = r_squared
# a memory space of size num_vals_used * val_size_multiplier
self.values = [0] * self.num_vals_used
Helpers
# -----------------------------------------------------------------------------
# gas-charging helpers
def cost_precompute_mont(val_size_multiplier: int) -> int:
PRECOMPUTE_MONT_LO_GAS_A = ?
PRECOMPUTE_MONT_LO_GAS_B = ?
PRECOMPUTE_MONT_HI_GAS_A = ?
PRECOMPUTE_MONT_HI_GAS_B = ?
cost = 0
if val_size_multiplier < MULMODX_SUBQUADRATIC_START:
cost = math.ceil(PRECOMPUTE_MONT_LO_GAS_A * val_size_multiplier + \
PRECOMPUTE_MONT_LO_GAS_B)
else:
cost = math.ceil(PRECOMPUTE_MONT_HI_GAS_A * val_size_multiplier + \
PRECOMPUTE_MONT_HI_GAS_B)
return cost
def cost_addmodx(val_size_multiplier: int) -> int:
ADDMODX_GAS_A = 0.20
ADDMODX_GAS_B = 0.15
cost = 0
if val_size_multiplier == 6:
cost = 1
else:
cost = round(ADDMODX_GAS_A * limb_count + ADDMODX_GAS_B)
if cost == 0:
cost = 1
return cost
def cost_mulmodx(val_size_multiplier: int) -> int:
MULMODX_LO_GAS_A = 0.090
MULMODX_LO_GAS_B = 0
MULMODX_LO_GAS_C = 0.24
MULMODX_HI_GAS_A = 0
MULMODX_HI_GAS_B = 10.0
MULMODX_HI_GAS_C = -270.0
cost = 0
if val_size_multiplier == 6:
cost = 2
elif val_size_multiplier < MULMODX_SUBQUADRATIC_START:
cost = math.ceil(MULMODX_LO_GAS_A * (val_size_multiplier ** 2) + \
MULMODX_LO_GAS_B * val_size_multiplier + \
MULMODX_LO_GAS_C)
else:
cost = math.ceil(MULMODX_HI_GAS_A * val_size_multiplier ** 2 + \
MULMODX_HI_GAS_B * val_size_multiplier + \
MULMODX_HI_GAS_C)
if cost == 0:
cost = 1
return cost
# -----------------------------------------------------------------------------
# bigint helpers
# a bigint is a unsigned number represented as a list of unsigned system words in descending order of significance
# split a double-width value into hi/low words
def hi_lo(double_width_val: int) -> (int, int):
base = 2**SYSTEM_WORD_SIZE_BITS
assert double_width_val < base**SYSTEM_WORD_SIZE_BITS, "val must fit in two words"
return (double_width_val >> SYSTEM_WORD_SIZE_BITS) % base, double_width_val % base
def bigint_to_int(x: [int]) -> int:
res = 0
for i in reversed(range(len(x))):
res += x[i] * 2**(SYSTEM_WORD_BITS * (len(x) - i - 1))
return res
def int_to_bigint(x: int, word_count: int):
res = [0] * word_count
for i in range(word_count):
res[word_count - i - 1] = x & (2**SYSTEM_WORD_BITS - 1)
x >>= SYSTEM_WORD_BITS
return res
# return x - y (omitting borrow-out)
def bigint_sub(x: [int], y: [int]) -> [int]:
num_words = len(x)
res = [0] * num_words
c = 0
for i in reversed(range(num_words)):
c, res[i] = sub_with_borrow(x[i], y[i], c)
return res
# return x >= y
def bigint_gte(x: [int], y: [int]) -> bool:
for (x_word, y_word) in list(zip(x,y)):
if x_word > y_word:
return True
elif x_word < y_word:
return False
# x == y
return True
# CIOS Montgomery multiplication algorithm
#
# input:
# * x, y, mod - bigint inputs of `val_size_multiplier` length. the most significant limb of the modulus cannot be zero.
# * mod_inv - pow(-mod, -1, 2**SYSTEM_WORD_SIZE_BITS)
# requires:
# * x < mod and y < mod
# * mod_int % 2 != 0
# * mod[0] != 0
# returns:
# (x * y * pow(2**(SYSTEM_WORD_SIZE_BITS * val_size_multiplier), -1, mod)) % mod represented as a bigint
# note: references to x_int/y_int/mod_int/t_int refer to the python int representation of the corresponding bigint variable
def mulmont_quadratic(x: [int], y: [int], mod: [int], modinv: int) -> [int]:
assert len(x) == len(y) and len(y) == len(mod), "{}, {}, {}".format(x, y, mod)
assert mod[0] != 0, "modulus must occupy all words"
word_count = len(mod)
t = [0] * (word_count + 2)
for i in reversed(range(word_count)):
# first inner-loop: t <- t + x_int * y[i]
c = 0
for j in reversed(range(word_count)):
c, t[j + 2] = hi_lo(t[j + 2] + x[j] * y[i] + c)
t[0], t[1] = hi_lo(t[1] + c)
m = (modinv * t[-1]) % BASE
c, _ = hi_lo(m * mod[-1] + t[-1])
# second inner-loop:
# 1. t_int <- t_int + modinv * mod_int * t[-1]
# 2. t_int <- t_int // (2**SYSTEM_WORD_SIZE)
# note:
# after step 1:
# * modinv * mod_int * t[-1] === -1 % (2**SYSTEM_WORD_SIZE_BITS)
# * t_int === (t_int + (-1) t_int) % (2**SYSTEM_WORD_SIZE_BITS) === 0 % (2**SYSTEM_WORD_SIZE_BITS)
# so the shift in step 2 is a word-sized right shift.
# Steps 1 and 2 are combined and the shift is implict.
for j in reversed(range(1, word_count)):
c, t[j + 2] = hi_lo(t[j + 1] + mod[j - 1] * m + c)
hi, t[2] = hi_lo(t[1] + c)
t[1] = t[0] + hi
# t_int = (t_int + t_int * mod_int * pow(-(2**(SYSTEM_WORD_SIZE_BITS*len(mod))), -1, mod_int)) // (2 ** (len(mod) * SYSTEM_WORD_SIZE_BITS))
# 0 < t_int < 2 * mod_int
t = t[1:]
if t[0] != 0:
# result occupies len(mod) + 1 words so it must be greater than modulus
return bigint_sub(t, [0] + mod)[1:]
elif bigint_gte(t[1:], mod):
return bigint_sub(t[1:], mod)
else:
return t[1:]
# subquadratic mulmont: same general algorithm as mulmont_quadratic with the assumption
# that any multiplications will be performed using Karatsuba subquadratic multiplication algorithm
# input:
# x, y, mod (int) - x < mod and y < mod
# mod (int) - an odd modulus
# R (int) - a power of two, and greater than mod
# mod_inv (int) - pow(-mod, -1, R)
# output:
# (x * y * pow(R, -1, mod)) % mod
#
def mulmont_subquadratic(x: int, y: int, mod: int, mod_inv_full: int, R: int) -> int:
T = x * y
m = ((T % R) * mod_inv_full) % R
T = T + m * mod
T /= R
if T >= mod:
T -= mod
return T
def mulmont(mod_state: ModState, x: int, y: int) -> int:
if mod_state.val_size_multiplier >= MULMODX_SUBQUADRATIC_START:
return mulmont_subquadratic(x, y, mod_state.mod, mod_state.mod_inv)
else:
x_bigint = int_to_bigint(x, (mod_state.val_size_multiplier * 64) // SYSTEM_WORD_SIZE_BITS)
y_bigint = int_to_bigint(y, (mod_state.val_size_multiplier * 64) // SYSTEM_WORD_SIZE_BITS)
mod_bigint = int_to_bigint(mod_state.mod)
return bigint_to_int(mulmont_quadratic(x_bigint, y_bigint, mod_bigint, mod_state.mod_inv_full, mod_state.r))
New Opcodes
Mneumonic | Opcode | Immediate size (bytes) | Stack in | Stack out |
---|---|---|---|---|
SETUPX | 0x21 | 0 | 4 | 0 |
ADDMODX | 0x22 | 3 | 0 | 0 |
SUBMODX | 0x23 | 3 | 0 | 0 |
MULMODX | 0x24 | 3 | 0 | 0 |
LOADX | 0x25 | 0 | 3 | 0 |
STOREX | 0x26 | 0 | 3 | 0 |
SETUPX
SETUPX : mod_id, mod_offset, mod_size, vals_used
Gas Charging
mod_id = evm.stack.peek(0)
mod_offset = evm_stack.peek(1)
mod_size = evm_stack.peek(2)
vals_used = evm_stack.peek(3)
cost = SETUPX_BASE_GAS
if mod_id in evmmax_state.mods:
# the modulus state keyed by mod_id was already active in this call-frame.
# no additional charge beyond SETUPX_BASE_GAS
return
if vals_used > 256:
raise Exception("cannot use more than 256 values for a given mod_id")
if mod_offset + mod_size > len(evm_memory):
raise Exception("cannot load a modulus that would extend beyond the bounds of EVM memory")
val_size_multiplier = math.ceil(mod_size / 8)
cost += cost_precompute_mont(val_size_multiplier)
cost += cost_evm_memory_expansion(math.ceil((num_vals_used * val_size_multiplier * 8) / 32))
Execution
mod_id = stack.pop()
mod_offset = stack.pop()
mod_size = stack.pop()
vals_used = stack.pop()
mod_inv = None
if mod_id in evmmax_state.mods[mod_id]:
# this mod state was previously used in this call frame.
# the associated montgomery parameters and memory space are already allocated.
# mark mod_id as the current active modulus state
evmmax_state.active_mod_state = evmmax_state.mods[mod_id]
return
val_size_multiplier = math.ceil(mod_size / 8)
mod = int.from_bytes(evm_memory[mod_offset:mod_offset+val_size], byteorder='big')
if mod == 0 or mod % 2 == 0:
raise Exception("modulus must be nonzero and odd")
if val_size_multiplier >= MULMODX_SUBQUADRATIC_START:
mod_inv_full = pow(-r, -1, mod)
else:
mod_inv = pow(-mod, -1, 2**SYSTEM_WORD_SIZE_BITS)
r = 2**(SYSTEM_WORD_SIZE_BITS * val_size_multiplier)
r_squared = r**2 % mod
mod_state = ModState(mod, val_size, r, r_squared, mod_inv_full=mod_inv_full, mod_inv=mod_inv)
evmmax_state.mods[mod_id] = mod_state
evmmax_state.active_mod_state = mod_state
LOADX
LOADX: dst_offset, val_idx, num_vals
Description
Load EVMMAX values in the current active modulus state to EVM memory.
Gas Charging
cost = LOADX_BASE_GAS
dst_offset = evm_stack.peek(0)
val_idx = evm_stack.peek(1)
num_vals = evm_stack.peek(2)
val_size_multiplier = evmmax_state.active_mod_state.val_size_multiplier
if dst_offset + num_vals * val_size_multiplier > len(evm_memory):
cost += cost_evm_mem_expansion(evm_memory, (dst_offset + num_vals * val_size_multiplier) - len(evm_memory))
cost += cost_mulmodx(val_size_multiplier) * mod_state.num_vals
Execution
dst_offset = evm_stack.pop()
val_idx = evm_stack.pop()
num_vals = evm_stack.pop()
if num_vals == 0:
return
mod_state = evmmax_state.active_mod_state
if mod_state == None:
raise Exception("no modulus set")
if val_idx + num_vals > len(mod_state.vals):
raise Exception("attempt to load beyond allocated values")
if dst_offset + num_vals * mod_state.val_size_multiplier > len(evm_memory):
expand_evm_memory(evm_memory, (dst_offset + num_vals * mod_state.val_size_multiplier * 8) - len(evm_memory))
cur_dst_offset = dst_offset
for i in range(num_vals):
mont_val = mod_state.vals[start_val + i]
# convert the value to canonical form
val = mulmont(mod_state, mont_val, 1)
evm_memory[cur_dst_offset:cur_dst_offset + mod_state.val_size_multiplier] = val.to_bytes(mod_state.val_size_multiplier * 8, byteorder='big')
cur_dst_offset += mod_state.val_size_multiplier * 8
STOREX
STOREX: dst_val, offset, num_vals
Description
Store values from EVM memory into EVMMAX memory space of the current active modulus state, validating that they are reduced by the modulus.
Gas Charging
dst_val = evm_stack.peek(0)
offset = evm_stack.peek(1)
num_vals = evm_stack.peek(2)
val_size_multiplier = evmmax_state.active_mod_state.val_size_multiplier
cost = STOREX_BASE_COST + num_vals * cost_mulmodx(val_size_multiplier)
Execution
dst_val = evm_stack.pop()
offset = evm_stack.pop()
num_vals = evm_stack.pop()
if num_vals == 0:
return
mod_state = evmmax_state.active_mod_state
if mod_state == None:
raise Exception("no modulus set")
if dst_val + num_vals > len(mod_state.vals):
raise Exception("attempt to copy to destination beyond allocated values")
if offset + num_vals * mod_state.val_size_multiplier * 8 > len(evm_memory):
raise Exception("source of copy would extend beyond allocated memory")
cur_src_offset = offset
r = 2** (mod_state.val_size_multiplier * SYSTEM_WORD_SIZE_BITS) % mod_state.mod
r_squared = r ** 2 % mod_state.mod
for i in range(num_vals):
val = int.from_bytes(evm_memory[cur_src_offset:cur_src_offset + mod_state.val_size_multiplier * 8], byteorder='big')
if val >= mod_state.modulus:
raise Exception("values cannot be greater than the modulus")
# convert the value to Montgomery form
mont_val = mulmont(mod_state, val, mod_state.r_squared)
mod_state.vals[dst_val + i] = mont_val
cur_offset += mod_state.val_size_multiplier * 8
ADDMODX
ADDMODX {z_offset - byte}, {x_offset - byte}, {y_offset - byte}:
Description
Compute the modular addition of two EVMMAX values, storing the result in an output.
Gas Charging
val_size_multiplier = evmmax_state.active_mod_state.val_size_multiplier
cost = cost_addmodx(val_size_multiplier)
Execution
mod_state = evmmax_state.active_modulus
if mod_state == None:
raise Exception("no mod state set")
z_offset = int(contract_code[pc+1:pc+2])
x_offset = int(contract_code[pc+2:pc+3])
y_offset = int(contract_code[pc+3:pc+4])
if x_offset >= mod_state.num_vals_used or y_offset >= mod_state.num_vals_used or z_offset >= mod_state.num_vals_used:
raise Exception("out of bounds value reference")
mod_state.values[z_offset] = (mod_state.values[x_offset] + mod_state.values[y_offset]) % mod_state.mod
SUBMODX
SUBMODX {z_offset - byte}, {x_offset - byte}, {y_offset - byte}:
Description
Compute the modular subtraction of two EVMMAX values in the current active modulus state, storing the result in an output.
Gas Charging
Same as ADDMODX
.
Execution
mod_state = evmmax_state.active_modulus
if mod_state == None:
raise Exception("no mod state set")
z_offset = int(contract_code[pc+1:pc+2])
x_offset = int(contract_code[pc+2:pc+3])
y_offset = int(contract_code[pc+3:pc+4])
if x_offset >= mod_state.num_vals_used or y_offset >= mod_state.num_vals_used or z_offset >= mod_state.num_vals_used:
raise Exception("out of bounds value reference")
mod_state.values[z_offset] = (mod_state.values[x_offset] - mod_state.values[y_offset]) % mod_state.mod
MULMODX
MULMODX {z_offset - byte}, {x_offset - byte}, {y_offset - byte}:
Description
Compute the Montgomery modular multiplication of two EVMMAX values in the current active modulus state, storing the result in an output.
Gas Charging
val_size_multiplier = evmmax_state.active_mod_state.val_size_multiplier
cost = cost_mulmodx(val_size_multiplier)
Execution
mod_state = evmmax_state.active_modulus
if mod_state == None:
raise Exception("no mod state set")
z_offset = int(contract_code[pc+1:pc+2])
x_offset = int(contract_code[pc+2:pc+3])
y_offset = int(contract_code[pc+3:pc+4])
if x_offset >= mod_state.num_vals_used or y_offset >= mod_state.num_vals_used or z_offset >= mod_state.num_vals_used:
raise Exception("out of bounds value reference")
mod_state.values[z_offset] = mulmont(mod_state, mod_state.values[x_offset], mod_state.values[y_offset])
Changes to Contract Execution
EVM Memory Expansion Cost Function
Any EVM operation which expands memory x
bytes will charge to expand memory to cur_evm_mem_size + x + evmmax_mem_size
bytes where evmmax_mem_size
is the size of all allocated EVMMAX values in the current call context (the sum of the values used by each mod_id
that has been previously/currently set with SETUPX
).
Jumpdest Analysis
Jumpdest analysis is modified to disallow jumps into immediate data for ADDMDOX
/SUBMODX
/MULMODX
.
Rationale
Montgomery Modular Multiplication
EVMMAX values are stored internally in Montgomery form. Expressing values in Montgomery form enables the use of Montgomery reduction in modular multiplication which gives a substantial performance gain versus naive modular multiplication.
Modular addition and subtraction on Montgomery form values is computed the same as normal.
Memory Alignment for EVMMAX Values
LOADX
/STOREX
move 64bit-aligned big-endian values to/from the memory space of the active modulus state. SETUPX
memory expansion pricing is tuned to assume that values will be stored in a as 64bit-aligned values in their EVMMAX memory space.
This choice is made to keep EVMMAX memory aligned to ensure performance.
Gas Costs
Gas models assume a rate of 1 gas per 25ns of execution time.
ADDMODX/SUBMODX/MULMODX
ADDMODX
and SUBMODX
can each be implemented using a single extended-precision addition, and single extended precision subtraction. This justifies a linear cost model.
MULMODX
runtime scales quadratically with input size. After a certain threshold, the quadratic complexity of mulmont_quadratic
dominates and it becomes more performant to use mulmont_subquadratic
. Thus, there is a segmented cost model to reflect different asymptotic behavior between quadratic/subquadratic mulmont
.
ADDMODX
/SUBMODX
/MULMODX
pricing includes the cost of arithmetic and latency of accessing input values from CPU cache.
The price model assumes that the implementation will be generic for most bitwidths with the exception of 321-384bits which is priced aggressively.
LOADX/STOREX
These perform conversion to/from Montgomery and canonical forms for each value copied (a single mulmont
per value converted). The overhead of memory loading/copying is covered by cost_mulmontx
.
SETUPX
Backwards Compatibility
Jumpdest analysis changes in ths EIP could potentially break existing contracts where a jump destination occurs in the 3 bytes proceeding a 0x22
/0x23
/0x24
. This is unlikely to affect many existing contracts. Further analysis of deployed contract bytecode can determine with certainty, which (if any) contracts could be broken.
Security Considerations
Copyright
Copyright and related rights waived via CC0.