Security Analysis, ppad-hmac-drbg.

Abstract

A formalization and analysis of the security model established by the ppad-hmac-drbg library.

Overview

ppad-hmac-drbg implements a deterministic random bit generator (DRBG) per NIST SP-800-90A, using HMAC-SHA256 and HMAC-SHA512 facilities provided by the ppad-sha256 and ppad-sha512 libraries.

Threat Model

This analysis assumes an adversary capable of:

  1. invoking library functions with chosen inputs an arbitrary number of times,
  2. measuring execution time with nanosecond precision,
  3. observing cache state,
  4. observing memory allocation counts and sizes, and
  5. post-hoc reading of the heap, via e.g. core dumps, heap snapshots, or swap.

We assume an adversary who cannot reliably read registers or the stack, and who cannot capture live execution state at will. As a purely software implementation, a trusted execution environment at the physical layer is assumed; considered out-of-scope are adversaries who can perform attacks over power or electromagnetic emanation-related side channels, speculative execution attacks, and so on.

Security Property

The security properties to be demonstrated here are that:

DRBG internal state is secret, in that its exposure to an adversary allows calculation of all future values until the generator is reseeded, and is also long-lived, necessitating its storage on the heap. Since it’s stored on the heap, any heap snapshot that contains the DRBG state will necessarily reveal it.

What is thus to be guaranteed is that there is only one singular DRBG state resident anywhere in the heap at any time. No other intermediate copies of the state – either of its current value, or any past value – must ever exist, such that a post-hoc snapshot of the heap reveals only the state that existed at the time of that snapshot.

This invariant implies that wiping the state on the heap truly annihilates that state on the heap; it does not leave other, non-annihilated states resident anywhere else.

Analysis

The analysis performed here is specifically for ppad-hmac-drbg v0.3.0, compiled under GHC 9.10.3 with optimizations, and is restricted to the source code and Core levels, as seems sufficient for demonstration of the listed security properties.

State Representation

ppad-hmac-drbg’s DRBG state is represented as a pinned, mutable primitive array of unboxed Word32 or Word64 values, depending on the HMAC algorithm used:

  -- HMAC-SHA256
  newtype DRBG s = DRBG (MutablePrimArray s Word32)

  -- HMAC-SHA512
  newtype DRBG s = DRBG (MutablePrimArray s Word64)

The ‘DRBG’ wrapper is a newtype, implying zero runtime cost or allocation (it exists only during type checking, and is erased prior to runtime). The ‘MutablePrimArray’ itself is provided by the core primitive package; it is parameterized by a primitive state token, in practice usually provided by one of the IO (as ‘RealWorld’) or ST (as an abstract type variable) monads, but in general by any ‘PrimMonad’.

Represented as such at the type level, a state value can be instantiated only by the following ‘new’ function, here for a HMAC-SHA256 DRBG:

  new
    :: PrimMonad m
    => BS.ByteString    -- ^ entropy
    -> BS.ByteString    -- ^ nonce
    -> BS.ByteString    -- ^ personalization string
    -> m (DRBG (PrimState m))
  new entropy nonce ps = do
    drbg <- PA.newPinnedPrimArray 34
    init_counter drbg
    PA.setPrimArray drbg 02 08 (0x00000000 :: Word32)
    PA.setPrimArray drbg 10 08 (0x01010101 :: Word32)
    PA.setPrimArray drbg 18 16 (0x00000000 :: Word32)
    update drbg (entropy <> nonce <> ps)
    pure $! DRBG drbg
  {-# INLINABLE new #-}

DRBG state is thus returned in a primitive state monad, ‘m’, having the appropriate primitive state token type, ‘PrimState m’. The ‘MutablePrimArray’ backing the DRBG itself is allocated on the heap, but its payload is a contiguous block of unboxed Word32 values that are not scanned during garbage collection. It is also pinned, meaning that it is guaranteed never to be moved by the garbage collector.

In this HMAC-SHA256 variant, the DRBG state contains 34 elements, namely:

Reads from the state are always made via fixed offsets directly to register or stack-allocated variables, and are inlined into their calling functions to avoid boxing. A typical read of the ‘k’ and ‘v’ values for a HMAC-SHA512 DRBG looks like this:

  !(GHC.Word.W64# k00) <- PA.readPrimArray drbg 01
  !(GHC.Word.W64# k01) <- PA.readPrimArray drbg 02
  !(GHC.Word.W64# k02) <- PA.readPrimArray drbg 03
  !(GHC.Word.W64# k03) <- PA.readPrimArray drbg 04
  !(GHC.Word.W64# k04) <- PA.readPrimArray drbg 05
  !(GHC.Word.W64# k05) <- PA.readPrimArray drbg 06
  !(GHC.Word.W64# k06) <- PA.readPrimArray drbg 07
  !(GHC.Word.W64# k07) <- PA.readPrimArray drbg 08
  !(GHC.Word.W64# v00) <- PA.readPrimArray drbg 09
  !(GHC.Word.W64# v01) <- PA.readPrimArray drbg 10
  !(GHC.Word.W64# v02) <- PA.readPrimArray drbg 11
  !(GHC.Word.W64# v03) <- PA.readPrimArray drbg 12
  !(GHC.Word.W64# v04) <- PA.readPrimArray drbg 13
  !(GHC.Word.W64# v05) <- PA.readPrimArray drbg 14
  !(GHC.Word.W64# v06) <- PA.readPrimArray drbg 15
  !(GHC.Word.W64# v07) <- PA.readPrimArray drbg 16
  let !k0 = Registers (# k00, k01, k02, k03, k04, k05, k06, k07 #)
      !v0 = Registers (# v00, v01, v02, v03, v04, v05, v06, v07 #)

Note that ‘Registers’ is an unlifted newtype wrapping an unboxed tuple, and so is not allocated on the heap. ‘GHC.Word.W64#’ here similarly represents a pattern binding used only to deconstruct, rather than construct, a value, and so the reads themselves don’t allocate.

This can all be verified by examination of a Core dump. GHC translates the reads above to:

  case readWord64Array# ww 1# (eta `cast` <Co:4> :: ...) of { (# ipv, ipv1 #) ->
  case readWord64Array# ww 2# ipv of { (# ipv2, ipv3 #) ->
  case readWord64Array# ww 3# ipv2 of { (# ipv4, ipv5 #) ->
  case readWord64Array# ww 4# ipv4 of { (# ipv6, ipv7 #) ->
  case readWord64Array# ww 5# ipv6 of { (# ipv8, ipv9 #) ->
  case readWord64Array# ww 6# ipv8 of { (# ipv10, ipv11 #) ->
  case readWord64Array# ww 7# ipv10 of { (# ipv12, ipv13 #) ->
  case readWord64Array# ww 8# ipv12 of { (# ipv14, ipv15 #) ->
  case readWord64Array# ww 9# ipv14 of { (# ipv16, ipv17 #) ->
  case readWord64Array# ww 10# ipv16 of { (# ipv18, ipv19 #) ->
  case readWord64Array# ww 11# ipv18 of { (# ipv20, ipv21 #) ->
  case readWord64Array# ww 12# ipv20 of { (# ipv22, ipv23 #) ->
  case readWord64Array# ww 13# ipv22 of { (# ipv24, ipv25 #) ->
  case readWord64Array# ww 14# ipv24 of { (# ipv26, ipv27 #) ->
  case readWord64Array# ww 15# ipv26 of { (# ipv28, ipv29 #) ->
  case readWord64Array# ww 16# ipv28 of { (# ipv30, ipv31 #) ->

where each ‘readWord64Array#’ call, ‘readWord64Array#’ being a GHC primop, returns a primitive state token and unboxed Word64# pair:

  readWord64Array#
    :: MutableByteArray# d
    -> Int#
    -> State# d
    -> (# State# d, Word64# #)

The ‘k0’ and ‘v0’ bindings are omitted entirely, the values instead being inlined at the appropriate call sites, e.g. below (note the ‘ipvx’ bindings corresponding to unboxed Word64 values from above):

  case $w_hmac_rsb
         kp
         sp1
         ((# ipv1, ipv3, ipv5, ipv7, ipv9, ipv11, ipv13, ipv15 #)
          `cast` <Co:2> :: ...)
         ((# ipv17, ipv19, ipv21, ipv23, ipv25, ipv27, ipv29, ipv31 #)
          `cast` <Co:2> :: ...)
         0#Word8
         wild
         (ipv30 `cast` <Co:17> :: ...)

The crux is that the heap-allocated DRBG state representation is singular, and reading it does not create additional copies on the heap. An adversary who can observe access patterns to the mutable primitive array sees only sequential reads at fixed offsets; there is no value-dependent access pattern that leaks information about the state.

State Update

The DRBG state, as a mutable primitive array, is always updated in a primitive state monad, typically IO or ST, in which updates are both well-ordered and destructive. There are precisely two forms of update used:

‘setPrimArray’, which fills a slice of a mutable primitive array with a particular value, is used only when initializing and wiping the DRBG state, and ultimately boils down to use of a GHC primop for a raw memory write (either ‘writeWord32Array#’ or ‘writeWord64Array#’ depending on which DRBG is used). ‘writePrimArray’ works in exactly the same manner, and is used primarily to update the ‘v’ component of the state, e.g.:

  write_v
    :: PrimMonad m
    => PA.MutablePrimArray (PrimState m) Word64
    -> Registers
    -> m ()
  write_v drbg (R v0 v1 v2 v3 v4 v5 v6 v7) = do
    PA.writePrimArray drbg 09 (GHC.Word.W64# v0)
    PA.writePrimArray drbg 10 (GHC.Word.W64# v1)
    PA.writePrimArray drbg 11 (GHC.Word.W64# v2)
    PA.writePrimArray drbg 12 (GHC.Word.W64# v3)
    PA.writePrimArray drbg 13 (GHC.Word.W64# v4)
    PA.writePrimArray drbg 14 (GHC.Word.W64# v5)
    PA.writePrimArray drbg 15 (GHC.Word.W64# v6)
    PA.writePrimArray drbg 16 (GHC.Word.W64# v7)
  {-# INLINE write_v #-}

These updates are again made at fixed offsets and, due to PrimMonad guarantees, are made destructively, in order. There are no secret-dependent access patterns.

Note also that the boxed expressions of the form ‘GHC.Word.W64# vx’ do not allocate due to GHC’s worker/wrapper transformation; the “box” exists only in the type system, but never at runtime. The above is compiled to expressions like the following in Core (here corresponding to the one writing to index 9):

  = \ acc v ww1 ->
      case v `cast` <Co:1> :: ...
      of wild11
      { (# w00, w01, w02, w03, w04,
           w05, w06, w07 #) ->

      [..]

      (case eta
            `cast` <Co:3> :: ...
       of
       { MutablePrimArray arr# ->
       primitive
         $dPrimMonad
         (\ s# ->
            case writeWord64Array#
                   arr#
                   9#
                   w00
                   s#
            of s'#
            { __DEFAULT ->
            (# s'#, () #)
            })
       })

Note that ‘w00’ is an unboxed Word64 value, ultimately having been passed in an unboxed tuple to an inlined 'write_v' function.

Other updates are performed by specialized HMAC helpers. Consider this excerpt from the ‘update’ function, which is responsible for updating DRBG state:

  let !k0 = Registers (# k00, k01, k02, k03, k04, k05, k06, k07 #)
      !v0 = Registers (# v00, v01, v02, v03, v04, v05, v06, v07 #)
      !kp = PA.mutablePrimArrayContents drbg `FP.plusPtr` 08  -- 1 * 8
      !vp = PA.mutablePrimArrayContents drbg `FP.plusPtr` 72  -- 9 * 8
      !sp = PA.mutablePrimArrayContents drbg `FP.plusPtr` 136 -- 17 * 8
  Prim.unsafeIOToPrim $ SHA512._hmac_rsb kp sp k0 v0 0x00 provided_data
  !(GHC.Word.W64# k10) <- PA.readPrimArray drbg 01
  !(GHC.Word.W64# k11) <- PA.readPrimArray drbg 02
  !(GHC.Word.W64# k12) <- PA.readPrimArray drbg 03
  !(GHC.Word.W64# k13) <- PA.readPrimArray drbg 04
  !(GHC.Word.W64# k14) <- PA.readPrimArray drbg 05
  !(GHC.Word.W64# k15) <- PA.readPrimArray drbg 06
  !(GHC.Word.W64# k16) <- PA.readPrimArray drbg 07
  !(GHC.Word.W64# k17) <- PA.readPrimArray drbg 08
  let !k1 = Registers (# k10, k11, k12, k13, k14, k15, k16, k17 #)
  Prim.unsafeIOToPrim $ SHA512._hmac_rr vp sp k1 v0

Here pointers to the DRBG state components are retrieved and passed directly to the '_hmac_rsb' and '_hmac_rr' functions from ppad-sha512. These functions have been carefully written so that they 1) do not accept heap-allocated arguments, e.g. bytestrings, and 2) write their computed HMACs directly to the supplied pointers. At no point during their operation do they allocate intermediate copies of the state components on the heap.

Note first the basic ‘hmac’ function from ppad-sha512:

  hmac :: BS.ByteString -> BS.ByteString -> MAC
  hmac k m
    | Arm.sha512_arm_available = MAC (Arm.hmac k m)
    | otherwise = MAC (cat (_hmac (prep_key k) m))
  {-# INLINABLE hmac #-}

This is unsuitable for our purposes as it accepts the two HMAC arguments in the form of heap-allocated bytestrings; use of this function would allocate secret DRBG state components, scattering them indiscriminately across the garbage-controlled heap.

The '_hmac_rr' helper, on the other hand, looks like this:

  _hmac_rr
    :: Ptr Word64 -- ^ register state
    -> Ptr Word64 -- ^ block state
    -> Registers  -- ^ key
    -> Registers  -- ^ message
    -> IO ()
  _hmac_rr rp bp k m = do
    let !key   = pad_registers k
        !block = pad_registers_with_length m
    _hmac_bb rp bp key block
  {-# INLINABLE _hmac_rr #-}

  _hmac_bb
    :: Ptr Word64  -- ^ register state
    -> Ptr Word64  -- ^ block state
    -> Block       -- ^ padded key
    -> Block       -- ^ padded message
    -> IO ()
  _hmac_bb rp bp k m = do
    poke_registers rp (iv ())
    update rp bp (xor k (Exts.wordToWord64# 0x3636363636363636##))
    update rp bp m
    let !inner = pad_registers_with_length (peek_registers rp)
    poke_registers rp (iv ())
    update rp bp (xor k (Exts.wordToWord64# 0x5C5C5C5C5C5C5C5C##))
    update rp bp inner
  {-# INLINABLE _hmac_bb #-}

The ‘rr’ is an abbreviation for ‘Registers, Registers’, indicating that the arguments are passed as ‘Registers’, which again are unlifted newtypes wrapping unboxed tuples. They are padded to ‘Blocks’, which are also register or stack-allocated, without allocating, and are then passed to '_hmac_bb' (where ‘bb’ abbreviates ‘Block, Block’). Inside '_hmac_bb', all updates are performed destructively by raw memory writes, and the peek_registers helper reads directly to a ‘Registers’ value via GHC primops:

  peek_registers
    :: Ptr Word64
    -> Registers
  peek_registers (GHC.Ptr.Ptr addr) = R
    (Exts.indexWord64OffAddr# addr 0#)
    (Exts.indexWord64OffAddr# addr 1#)
    (Exts.indexWord64OffAddr# addr 2#)
    (Exts.indexWord64OffAddr# addr 3#)
    (Exts.indexWord64OffAddr# addr 4#)
    (Exts.indexWord64OffAddr# addr 5#)
    (Exts.indexWord64OffAddr# addr 6#)
    (Exts.indexWord64OffAddr# addr 7#)
  {-# INLINE peek_registers #-}

The '_hmac_rsb' function, where ‘rsb’ abbreviates ‘Registers, Separator, Bytes’ requires special care. This function accepts both sensitive DRBG state in ‘Registers’ and a user-supplied (heap-allocated) bytestring as input. The DRBG state and bytestring arguments must be combined, padded, and assembled into blocks for HMAC computation without at any point allocating the DRBG state on the heap. Its implementation is considerably longer than the excerpts shown above, but proceeds along identical principles.

As described in the security analysis for ppad-sha{256,512}, the HMAC implementations themselves perform only data-independent reads from the state, and thus leak no material information about it to an adversary who can observe cache.

State Annihilation

A ‘wipe’ function is included to explicitly zero out (or, more properly, reset to an initial state) the DRBG when one is finished with it:

  wipe
    :: PrimMonad m
    => DRBG (PrimState m)
    -> m ()
  wipe (DRBG drbg) = do
    init_counter drbg
    PA.setPrimArray drbg 01 08 (0x0000000000000000 :: Word64) -- init k
    PA.setPrimArray drbg 09 08 (0x0101010101010101 :: Word64) -- init v
    PA.setPrimArray drbg 17 16 (0x0000000000000000 :: Word64) -- init scratch
  {-# INLINE wipe #-}

As DRBG state has a single location on the heap, this operation annihilates it reliably.

ppad-hmac-drbg provides only an explicit wipe mechanism; there is nothing like e.g. the memory package’s ‘ScrubbedBytes’ that wipes heap-allocated memory when it is freed by the garbage collector. The rationale for this is that DRBG state is typically long-lived, so an operation that annihilates the state only during garbage collection may not actually wipe it at a useful time. A user may have finished using the DRBG to generate bytes, but references to it may still persist; explicit wiping ensures that the state is annihilated exactly when intended.

Correctness

ppad-hmac-drbg’s test suite exercises the HMAC-SHA256 and HMAC-SHA512 DRBG’s against NIST CAVP (Cryptographic Algorithm Validation Program) vectors (240 each). It’s worth noting that the implementations are highly optimized, so the full test suite executes very quickly:

  All 480 tests passed (0.01s)
  Test suite hmac-drbg-tests: PASS

Conclusion

By our appraisal the implementations of the HMAC-SHA256 and HMAC-SHA512 DRBGs in ppad-hmac-drbg are correct, and the state representation and update mechanisms employed make it impossible for an adversary to learn secret DRBG state by way of capabilities 1-4.

By invoking library functions with chosen inputs, an adversary can drive the DRBG into many states and observe timing & allocation patterns, but no operation reveals the ‘k’ and ‘v’ components directly. Similarly while the adversary can distinguish certain branches taken by the code (e.g. whether or not the user supplied additional entropy when generating bytes), this leaks nothing about the DRBG state itself.

The library allocates on the heap only when accumulating bytes to return to the user, which are returned as a strict bytestring. The adversary can thus determine the length of any generated bytes, and can infer the number of internal gen_loop iterations performed by way of the number of allocations observed.

The actual bytes generated by the DRBG are from ephemeral, iterative updates made to the ‘v’ component, which are serialized; an adversary with capability (5) who can read anything ever allocated on the heap can thus recover the bytes generated from a DRBG, but never its internal state beyond the contents of the DRBG’s mutable primitive buffer itself. The internal DRBG state is revealed only in a heap snapshot that contains it; if wiped, only that wiped state is revealed.