Security Analysis, ppad-secp256k1.

Abstract

A formalization and analysis of the security model established by the ppad-secp256k1 library.

Overview

ppad-secp256k1 is a Haskell library supporting primitives and digital signature schemes on the elliptic curve secp256k1, in the spirit of the well-known libsecp256k1 or noble-secp256k1 libraries. Its focus is on achieving the maximum security, speed, and ergonomics of use possible in the context of a garbage-collected language with an optimizing compiler.

This analysis builds on those performed for ppad-fixed, ppad-sha{256,512}, and ppad-hmac-drbg. Please refer to those for additional explanatory details on methodology, etc.

Threat Model

This analysis assumes the same adversary as was supposed for ppad-fixed, viz. one capable of:

invoking library functions with chosen inputs an arbitrary number of times,
measuring execution time with nanosecond precision,
observing cache state, and
observing memory allocation counts and sizes.

As a purely software implementation, a trusted execution environment at the physical layer is assumed; considered out-of-scope are adversaries who can perform attacks over power or electromagnetic emanation-related side channels, speculative execution attacks, and so on.

Of note: the final capability granted to the adversary in the analysis for ppad-hmac-drbg, namely the ability to read arbitrary post-hoc heap state, is withheld here. That capability was granted specifically in that analysis to make guarantees about the effectiveness of wiping DRBG internal state, but an adversary who could read anything that was ever allocated on the heap would trivially be able to recover secret key material in this setting, since the 256-bit secret keys used here are, in general, boxed and heap-allocated.

Security Property

The security property to be demonstrated here is the same as was defended in the analysis for ppad-fixed, adapted to the secp256k1 setting: namely that elliptic curve arithmetic, digital signature construction, and Diffie-Hellman operations run in constant time unless marked otherwise.

(N.b., in some situations, e.g. in signature verification, only public data is operated on, and so the use of variable-time EC scalar multiplication is perfectly acceptable.)

Again, this property can be stated formally: the observable execution trace t(f, x) procured by an attacker when evaluating a sensitive function f on inputs x, where the trace consists of the sequence of instructions executed, memory addresses accessed, and the size and count of allocations made, must be indistinguishable with respect to secret inputs. For secret inputs s0 and s1, and any public input p, the claim is, up to renaming of base pointers and stack addresses, and events of negligible probability, that:

  t(f, (s0, p)) == t(f, (s1, p)).

Analysis

The analysis performed here is specifically for ppad-secp256k1 v0.5.4, compiled with optimizations using the LLVM backend on GHC 9.10.3 and LLVM 19.1.7, targeting aarch64-darwin.

To be examined are the elliptic curve primitives, signature construction functions, and elliptic curve Diffie-Hellman function, first by way of representation and source examination, and then by LLM-assisted STG and assembly analysis. Empirical benchmarks and allocation measurements will then be shown to be unable to refute the constant-time claim.

Note that whilst the security property to be defended is stated formally, this analysis purposefully argues for it as more of an “exceptionally rigorous sanity check” than an exhaustive proof, which would require considerably more effort to construct.

Elliptic Curve Primitives

Points on secp256k1 are represented in terms of both affine and projective coordinates, where individual coordinates in both cases are 256-bit Montgomery-form words provided by ppad-fixed. Each is boxed and heap-allocated:

  -- curve point, affine coordinates
  data Affine = Affine !C.Montgomery !C.Montgomery
    deriving stock (Show, Generic)

  -- curve point, projective coordinates
  data Projective = Projective {
      px :: !C.Montgomery
    , py :: !C.Montgomery
    , pz :: !C.Montgomery
    }
    deriving stock (Show, Generic)

The functions in the library’s exported API only work on these boxed point representations, but, to avoid heap allocation, most internal computations are carried out on raw unboxed tuples:

  -- Unboxed Wider/Montgomery synonym.
  type Limb4 = (# Limb, Limb, Limb, Limb #)

  -- Unboxed Projective synonym.
  type Proj = (# Limb4, Limb4, Limb4 #)

At the algorithmic level, elliptic curve primitives are carefully written so as not to depend on inputs, unless marked otherwise. Point addition is carried out in terms of “Algorithm 7” of Renes et al., 2015, which runs in constant time, and scalar multiplication is implemented using a straightforward constant-time double-and-add algorithm. A constant-time 'select_proj#' function is the CT workhorse used in the latter, itself wrapping constant-time utilities supplied by ppad-fixed:

  select_proj# :: Proj -> Proj -> CT.Choice -> Proj
  select_proj# (# ax, ay, az #) (# bx, by, bz #) c =
    (# W.select# ax bx c, W.select# ay by c, W.select# az bz c #)
  {-# INLINE select_proj# #-}

The library also supports wNAF windowed scalar multiplication, which involves using a precomputed “context” containing a number of multiples of the secp256k1 generator point as a lookup table. Its implementation, which is relatively complex, requires particular care, as we require that memory access patterns not depend on secrets. It makes use of the following low-level indexing utility that ensures that lookups in the provided ByteArray context are performed in constant time, and with input-independent access:

  -- Constant-time table lookup within a window.
  --
  -- Unconditionally scans all entries from 'base' to 'base + size - 1',
  -- selecting the one where 'index' equals 'target'.
  ct_index_proj#
    :: ByteArray
    -> Exts.Word#  -- ^ base index
    -> Exts.Word#  -- ^ size of window
    -> Exts.Word#  -- ^ target index
    -> Proj
  ct_index_proj# arr base size target = loop 0## (# Z, Z, Z #) where
    loop i acc
      | Exts.isTrue# (i `Exts.geWord#` size) = acc
      | otherwise =
          let !idx  = Exts.plusWord# base i
              !pt   = index_proj# arr (Exts.word2Int# idx)
              !eq   = CT.from_word_eq# idx target
              !nacc = select_proj# acc pt eq
          in  loop (Exts.plusWord# i 1##) nacc
  {-# INLINE ct_index_proj# #-}

In the same way as was described in the analysis for ppad-fixed, the STG and ASM can be audited to verify that no variable time patterns or instructions are present in the IR or compile targets. The compiled STG and ASM for this library are both very large and complex (the STG is 100k lines, and the ASM over 200k), and so are productively examined via an LLM with manual spot checks. A pass made via Opus 4.5 produces the following, for example:

  Function: add_proj
  STG Pattern: No secret-dependent branching
  ASM Pattern: Arithmetic only
  Status: CT
  ────────────────────────────────────────
  Function: double
  STG Pattern: No secret-dependent branching
  ASM Pattern: Arithmetic only
  Status: CT
  ────────────────────────────────────────
  Function: neg
  STG Pattern: No secret-dependent branching
  ASM Pattern: Arithmetic only
  Status: CT
  ────────────────────────────────────────
  Function: mul
  STG Pattern: xor#/and#/xor# branchless select
  ASM Pattern: csel, cset, csinc
  Status: CT
  ────────────────────────────────────────
  Function: mul_wnaf
  STG Pattern: Branchless CT table lookup via ct_index_proj#
  ASM Pattern: Masked indexing
  Status: CT

The ‘mul’ STG, for example, is approximately 5500 lines of very heavily inlined code, but is easily seen to match the claim, consisting almost entirely of expressions like the following:

  [..]
  case plusWord2# [s469 sat] of wild781 {
  (#,#) _ s470 ->
  case mul# s464 s466 s468 s470 s429 s431 s433 s435 of y3b {
  (#,,,#) ipv174 ipv175 ipv176 ipv177 ->
  case plusWord2# [ipv168 ipv174] of wild782 {
  (#,#) c471 s471 ->
  case plusWord2# [ipv169 ipv175] of wild783 {
  (#,#) c472 s472 ->
  case plusWord2# [s472 c471] of wild784 {
  (#,#) c473 s473 ->
  case plusWord2# [ipv170 ipv176] of wild785 {
  (#,#) c474 s474 ->
  case or# [c472 c473] of sat {
  __DEFAULT ->
  case plusWord2# [s474 sat] of wild786 {
  (#,#) c475 s475 ->
  case plusWord2# [ipv171 ipv177] of wild787 {
  (#,#) c476 s476 ->
  case or# [c474 c475] of sat {
  [..]

Its CT selection machinery in terms of bitwise XOR/AND operations is also clearly visible, though large and inlined:

  [..]
  case or# [sat sat] of sat {
  __DEFAULT ->
  case xor# [s351 us] of sat {
  __DEFAULT ->
  case and# [ipv182 sat] of sat {
  __DEFAULT ->
  case xor# [s351 sat] of sat {
  __DEFAULT ->
  case xor# [s349 us] of sat {
  __DEFAULT ->
  case and# [ipv182 sat] of sat {
  __DEFAULT ->
  [..]

The same patterns are visible in the aarch64 assembly, where branchless selection occurs via csel/cset/eor/and sequences. On the other hand, Opus 4.5 is able to catch that variable-time scalar multiplication, for example, does not run in constant time based on its ASM, pointing out the presence of the ‘cbz’ (for compare and branch on zero) and ‘tbz’ (for test bit and branch if zero) instructions in the following loop body of 'mul_vartime':

  __blk_cGeg$def:                         ; @"_blk_cGeg$def"
  ; %bb.0:                                ; %nGQY
    stp x22, x20, [sp]
    ldp x8, x9, [x20, #192]
    str x8, [sp, #88]
    ldp x10, x11, [x20, #208]
    orr x10, x10, x11
    orr x8, x8, x9
    orr x8, x8, x10
    cbz x8, LBB905_3
  ; %bb.1:                                ; %cGex
    ldr x11, [x20, #96]
    ldr x8, [sp, #8]
    ldp x9, x10, [x8, #104]
    stp x9, x11, [sp, #72]
    ldr x9, [x8, #120]
    stp x9, x10, [sp, #56]
    ldrb  w9, [sp, #88]
    tbz w9, #0, LBB905_4

Opus 4.5 also audits the 'select_proj' primitive as running in constant time:

  Function: select_proj
  STG Pattern: Calls W.select#
  ASM Pattern: Branchless via ppad-fixed
  Status: CT

Signature Algorithms

ppad-secp256k1 supports both BIP340 Schnorr signatures and RFC6979 ECDSA (the latter with support for both unrestricted and low-S signatures). Hashing functionality is provided by ppad-sha256, and DRBG functionality (for deterministic nonce generation) is provided by ppad-hmac-drbg.

Schnorr signature creation is implemented via several wrappers over the following core:

  _sign_schnorr
    :: (Wider -> Maybe Projective)  -- partially-applied multiplication function
    -> Wider                        -- secret key
    -> BS.ByteString                -- message
    -> BS.ByteString                -- 32 bytes of auxiliary random data
    -> Maybe BS.ByteString
  _sign_schnorr _mul _SECRET m a = do
    p <- _mul _SECRET
    let Affine (C.retr -> x_p) (C.retr -> y_p) = affine p
        s       = S.to _SECRET
        d       = S.select s (negate s) (W.odd y_p)
        bytes_d = unroll32 (S.retr d)
        bytes_p = unroll32 x_p
        t       = xor bytes_d (hash_aux a)
        rand    = hash_nonce (t <> bytes_p <> m)
        k'      = S.to (unsafe_roll32 rand)
    guard (not (S.eq_vartime k' 0)) -- negligible probability
    pt <- _mul (S.retr k')
    let Affine (C.retr -> x_r) (C.retr -> y_r) = affine pt
        k         = S.select k' (negate k') (W.odd y_r)
        bytes_r   = unroll32 x_r
        rand'     = hash_challenge (bytes_r <> bytes_p <> m)
        e         = S.to (unsafe_roll32 rand')
        bytes_ked = unroll32 (S.retr (k + e * d))
        sig       = bytes_r <> bytes_ked
    pure $! sig
  {-# INLINE _sign_schnorr #-}

Note that secret keys are represented as four-limb, 256-bit ‘Wider’ values from ppad-fixed. The implementation is straightforward, following the BIP340 spec, and uses the selection primitive ‘S.select’ from ppad-fixed for constant-time operations. The ‘unroll32’ helper used to convert to a bytestring representation is implemented by way of a fixed, unrolled loop that runs in constant time, and 'unsafe_roll32' performs the inverse operation in the same fashion. Note that a variable-time equality check is performed only for a negligible-probability degenerate case that does not occur in practice (equivalent to producing an all-zero 256-bit hash).

ECDSA signature creation similarly proceeds like so:

  _sign_ecdsa
    :: (Wider -> Maybe Projective) -- partially-applied multiplication function
    -> SigType
    -> HashFlag
    -> Wider
    -> BS.ByteString
    -> Maybe ECDSA
  _sign_ecdsa _mul ty hf _SECRET m = runST $ do
      -- RFC6979 sec 3.3a
      let entropy = int2octets _SECRET
          nonce   = bits2octets h
      drbg <- DRBG.new entropy nonce mempty
      -- RFC6979 sec 2.4
      sign_loop drbg
    where
      d  = S.to _SECRET
      hm = S.to (bits2int h)
      h  = case hf of
        Hash -> SHA256.hash m
        NoHash -> m

      sign_loop g = do
        k <- gen_k g
        let mpair = do
              kg <- _mul k
              let Affine (S.to . C.retr -> r) _ = affine kg
                  ki = S.inv (S.to k)
                  s  = (hm + d * r) * ki
              pure $! (S.retr r, S.retr s)
        case mpair of
          Nothing -> do
            DRBG.wipe g
            pure Nothing
          Just (r, s)
            | W.eq_vartime r 0 -> sign_loop g -- negligible probability
            | otherwise -> do
                DRBG.wipe g
                let !sig = Just $! ECDSA r s
                pure $ case ty of
                  Unrestricted -> sig
                  LowS -> fmap low sig
  {-# INLINE _sign_ecdsa #-}

  -- RFC6979 sec 3.3b
  gen_k :: DRBG.DRBG s -> ST s Wider
  gen_k g = loop g where
    loop drbg = do
      bytes <- DRBG.gen drbg mempty (fi _CURVE_Q_BYTES)
      case bytes of
        Left {}  -> error "ppad-secp256k1: internal error (please report a bug!)"
        Right bs -> do
          let can = bits2int bs
          case W.cmp_vartime can _CURVE_Q of
            LT -> pure can
            _  -> loop drbg -- 2 ^ -128 probability
  {-# INLINE gen_k #-}

This is standard RFC6979 ECDSA, with timing guarantees provided primarily by way of ppad-fixed’s constant-time field arithmetic and the elliptic curve primitives described previously. The RFC6979 helpers ‘bits2int’, ‘bits2octets’ and ‘int2octets’ are all implemented as fixed-schedule unrolled loops, and note again that variable-time operations are only used in negligible-probability degenerate cases that do not occur in practice.

Opus 4.5’s audit finds both the Schnorr and ECDSA signature creation functions to be constant time by way of STG/ASM analysis:

  Function: sign_schnorr
  STG Pattern: Uses mul internally
  ASM Pattern: Inherits CT
  Status: CT
  ────────────────────────────────────────
  Function: sign_schnorr'
  STG Pattern: Uses mul_wnaf internally
  ASM Pattern: Inherits CT
  Status: CT
  ────────────────────────────────────────
  Function: sign_ecdsa
  STG Pattern: Uses mul internally
  ASM Pattern: Inherits CT
  Status: CT
  ────────────────────────────────────────
  Function: sign_ecdsa'
  STG Pattern: Uses mul_wnaf internally
  ASM Pattern: Inherits CT
  Status: CT

Note that the apostrophe-suffixed versions use 'mul_wnaf' internally, requiring a precomputed context to be passed. Signature verification functions are not required to run in constant time, operating only on public inputs:

  Function: verify_schnorr
  Pattern: Uses mul_vartime for public point
  Notes: Non-secret data
  ────────────────────────────────────────
  Function: verify_schnorr'
  Pattern: Uses mul_wnaf for G, mul_vartime for pubkey
  Notes: Non-secret
  ────────────────────────────────────────
  Function: verify_ecdsa
  Pattern: Uses mul_vartime for public point
  Notes: Non-secret data
  ────────────────────────────────────────
  Function: verify_ecdsa'
  Pattern: Uses mul_wnaf for G, mul_vartime for pubkey
  Notes: Non-secret

Elliptic Curve Diffie-Hellman (ECDH)

The ECDH implementation is straightforward, its constant-time properties established primarily by the use of the CT scalar multiplication function:

  ecdh
    :: Projective          -- ^ public key
    -> Wider               -- ^ secret key
    -> Maybe BS.ByteString -- ^ shared secret
  ecdh pub _SECRET = do
    pt@(P _ _ (C.Montgomery -> z)) <- mul pub _SECRET
    let !(Affine (C.retr -> x) _) = affine pt
        !result = SHA256.hash (unroll32 x)
    if CT.decide (C.eq z 0) then Nothing else Just result

It clearly has constant time semantics for nondegenerate inputs, and Opus 4.5 audits it as running in constant-time by way of STG and ASM analysis for good measure:

  Function: ecdh
  STG Pattern: Uses mul internally
  ASM Pattern: Inherits CT
  Status: CT

Empirical

ppad-secp256k1 has a Criterion benchmark suite that produces statistical execution time benchmarks on varying-sized inputs, such that variation in wall-clock time should be readily attributable to noise. For brevity, only derive_pub/wnaf, which is a synonym for mul_wnaf, is shown of the elliptic curve primitives:

  benchmarking derive_pub/wnaf, sk = 2
  time                 29.67 μs   (29.64 μs .. 29.71 μs)
                       1.000 R²   (1.000 R² .. 1.000 R²)
  mean                 29.68 μs   (29.64 μs .. 29.71 μs)
  std dev              121.1 ns   (83.80 ns .. 177.2 ns)

  benchmarking derive_pub/wnaf, sk = 2 ^ 255 - 19
  time                 29.68 μs   (29.65 μs .. 29.72 μs)
                       1.000 R²   (1.000 R² .. 1.000 R²)
  mean                 29.71 μs   (29.68 μs .. 29.75 μs)
  std dev              106.7 ns   (71.74 ns .. 174.7 ns)

  benchmarking schnorr/sign_schnorr' (small)
  time                 76.27 μs   (76.21 μs .. 76.32 μs)
                       1.000 R²   (1.000 R² .. 1.000 R²)
  mean                 76.44 μs   (76.40 μs .. 76.50 μs)
  std dev              162.3 ns   (123.1 ns .. 246.7 ns)

  benchmarking schnorr/sign_schnorr' (large)
  time                 76.35 μs   (76.31 μs .. 76.38 μs)
                       1.000 R²   (1.000 R² .. 1.000 R²)
  mean                 76.37 μs   (76.35 μs .. 76.40 μs)
  std dev              84.10 ns   (67.03 ns .. 112.7 ns)

  benchmarking ecdsa/sign_ecdsa' (small)
  time                 43.95 μs   (43.89 μs .. 44.02 μs)
                       1.000 R²   (1.000 R² .. 1.000 R²)
  mean                 44.04 μs   (43.96 μs .. 44.15 μs)
  std dev              310.2 ns   (259.9 ns .. 384.6 ns)

  benchmarking ecdsa/sign_ecdsa' (large)
  time                 44.02 μs   (44.00 μs .. 44.03 μs)
                       1.000 R²   (1.000 R² .. 1.000 R²)
  mean                 43.99 μs   (43.97 μs .. 44.01 μs)
  std dev              58.86 ns   (46.66 ns .. 72.16 ns)

  benchmarking ecdh/ecdh (small)
  time                 143.6 μs   (143.4 μs .. 143.7 μs)
                       1.000 R²   (1.000 R² .. 1.000 R²)
  mean                 143.7 μs   (143.3 μs .. 144.6 μs)
  std dev              2.022 μs   (846.9 ns .. 3.402 μs)

  benchmarking ecdh/ecdh (large)
  time                 143.8 μs   (143.7 μs .. 143.9 μs)
                       1.000 R²   (1.000 R² .. 1.000 R²)
  mean                 143.8 μs   (143.7 μs .. 143.9 μs)
  std dev              385.2 ns   (265.9 ns .. 544.5 ns)

A ‘weigh’ benchmark suite similarly measures heap allocation performed by each function, and verifies that allocation is held constant across input sizes for all sensitive operations:

  derive_pub

    Case                     Allocated  GCs
    wnaf, sk = 2                   312    0
    wnaf, sk = 2 ^ 255 - 19        312    0

  schnorr

    Case                   Allocated  GCs
    sign_schnorr' (small)     14,416    0
    sign_schnorr' (large)     14,416    0

  ecdsa

    Case                   Allocated  GCs
    sign_ecdsa' (small)      1,560    0
    sign_ecdsa' (large)      1,560    0

  ecdh

    Case          Allocated  GCs
    ecdh (small)        616    0
    ecdh (large)        616    0

Correctness

ppad-secp256k1’s test suite exercises the Schnorr signature implementation against the official BIP340 vectors, and the ECDSA and ECDH implementations against the relevant Wycheproof vectors (the former also being tested against noble-secp256k1’s vectors).

The implementations are highly-optimized, so the full test suite executes very quickly:

  All 3757 tests passed (0.62s)
  Test suite secp256k1-tests: PASS

Conclusion

By our appraisal, all sensitive functions in this library will run in constant time on aarch64-darwin when compiled with optimizations with GHC 9.10.3 and LLVM 19.1.7, unless marked otherwise.

Appendix: Auditing Prompt

The following prompt was used to have Opus 4.5 audit the library via a Claude Code slash command:

  Perform a constant-time audit of the Haskell library at: $ARGUMENTS

  ## Steps

  1. **Identify exports**: Find all exported functions from the
     library's public modules.

  2. **Build with diagnostics**:
     - Enter a Nix shell for the target library by running `nix develop`
       in its subdirectory.
     - `cabal clean && cabal -f+llvm build
        --ghc-options="-ddump-stg-final -ddump-llvm -ddump-to-file
        -dsuppress-all -dno-suppress-type-signatures -dsuppress-uniques"`
     - Locate the `.dump-stg-final` and `.dump-llvm` files in the
       dist-newstyle subdirectory

  3. **Generate assembly**:
     - Strip GHC headers from the `.dump-llvm` files with e.g. `sed -i
       's/^====.*$//g'`
     - Run results through `llc -O3 -mcpu=native -filetype=asm`

  4. **Analyze each exported function**:

     **STG**: Check for input-dependent branching:
     - Single-branch `case` = sequencing (OK)
     - Multi-branch `case` on loop counter = OK
     - Multi-branch `case` on input data = FAIL
     - `let` bindings = allocation (note but OK if constant)

     **ASM (aarch64)**: Flag variable-time patterns:
     - `b.eq`, `b.ne`, `b.hs`, etc. on secret data = FAIL
     - `cbz`, `cbnz`, `tbnz`, `tbz` on secret data = FAIL
     - `adds`/`cset`/`orr`/`and`/`bic`/`sbfx` = OK (branchless)

     **ASM (aarch64)**: Flag secret-dependent indexing patterns:
      - e.g. `stp x25, x26, [sp, #40]` = OK (public stack pointer plus
        fixed immediate offset)
      - e.g. `ldr x0, [x1, x2, lsl #3]` = FAIL (if x2 is secret-derived,
        implies cache timing leak)

  5. **Generate report**: For each function, report CT status and any
     issues found.

  Functions suffixed `_vartime` are expected to be variable-time; note
  but don't flag.

(/)

(/substratum)

(/censor)

(/about)

(/articles)

(/email)

(/github)