# Key reconciliation protocol application to error correction in silicon PUF responses

Brice Colombier\*, Lilian Bossuet\*, David Hély+

\*Laboratoire Hubert Curien Saint-Étienne — France \*LCIS, Grenoble Institute of Technology Valence — France

May 31, 2016

Journée Sécurité Numérique du GDR SoC-SiP : 11ème édition La génération d'aléa dans le matériel : TRNG & PUF











<sup>&</sup>lt;sup>1</sup>http://www.univ-st-etienne.fr/salware/



**Different** responses to the **same** challenge.

#### **Principle:**

Extract entropy from **process variations**.

#### Aim:

Provide a unique, per-device ID, thanks to the **inter-device** uniqueness.

#### **Problem:**

PUF responses to the **same** challenge **change** over time.

This variation depends on multiple parameters:

- PUF architecture,
- Process node,
- Aging,
- Temperature,
- Environment...
- → It prevents the PUF response from being used as a **key**.

#### **Solution:**

Correct the PUF response.



## Requirements for the error correction module:

- Low area,
- High correction probability.

## Several error-correcting code implementations exist:

| Article | Construction and code(s)                    | Logic resourd<br>Xilinx<br>Spartan 3 | ces (Xilinx Slices<br>Xilinx<br>Spartan 6 |
|---------|---------------------------------------------|--------------------------------------|-------------------------------------------|
| 2       | Concatenated:<br>Repetition and BCH         |                                      | 221                                       |
| 3       | Reed-Muller                                 |                                      | 179                                       |
| 4       | ВСН                                         |                                      | >59                                       |
| 5       | Concatenated:<br>Repetition and Reed-Muller | 168                                  |                                           |

<sup>&</sup>lt;sup>2</sup>R. Maes et al. "PUFKY: A Fully Functional PUF-Based Cryptographic Key Generator". *CHES*, 2012.

<sup>&</sup>lt;sup>3</sup>M. Hiller et al. "Low-Area Reed Decoding in a Generalized Concatenated Code Construction for PUFs". *ISVI SI*, 2015

<sup>&</sup>lt;sup>4</sup>A. V. Herrewege et al. "Reverse Fuzzy Extractors: Enabling Lightweight Mutual Authentication for PUF-Enabled RFIDs". *FC*. 2012.

<sup>&</sup>lt;sup>5</sup>C. Bösch et al. "Efficient Helper Data Key Extractor on FPGAs". CHES. 2008.

#### Several error-correcting code implementations exist:

| Article   | Construction and code(s)                    | Logic resources (Xilinx Slices) |                     |
|-----------|---------------------------------------------|---------------------------------|---------------------|
|           |                                             | Xilinx<br>Spartan 3             | Xilinx<br>Spartan 6 |
| 2         | Concatenated:<br>Repetition and BCH         |                                 | 221                 |
| 3         | Reed-Muller                                 |                                 | 179                 |
| 4         | ВСН                                         |                                 | >59                 |
| 5         | Concatenated:<br>Repetition and Reed-Muller | 168                             |                     |
| This work | CASCADE protocol                            | 69                              | 19                  |

<sup>&</sup>lt;sup>2</sup>R. Maes et al. "PUFKY: A Fully Functional PUF-Based Cryptographic Key Generator". *CHES*, 2012.

<sup>&</sup>lt;sup>3</sup>M. Hiller et al. "Low-Area Reed Decoding in a Generalized Concatenated Code Construction for PUFs" /SV/SI 2015

<sup>&</sup>lt;sup>4</sup>A. V. Herrewege et al. "Reverse Fuzzy Extractors: Enabling Lightweight Mutual Authentication for PUF-Enabled RFIDs". *FC*. 2012.

<sup>&</sup>lt;sup>5</sup>C. Bösch et al. "Efficient Helper Data Key Extractor on FPGAs". CHES. 2008.

CASCADE introduced in 1993 by Brassard and Salvail<sup>6</sup>



The final key is **shorter** than the original message.

<sup>&</sup>lt;sup>6</sup>G. Brassard et al. "Secret-Key Reconciliation by Public Discussion". *EUROCRYPT*. 1993.

#### CASCADE introduced in 1993 by Brassard and Salvail<sup>6</sup>



The final key is **shorter** than the original message.

This could be used to derive keys from slightly different PUF responses.

<sup>&</sup>lt;sup>6</sup>G. Brassard et al. "Secret-Key Reconciliation by Public Discussion". *EUROCRYPT*. 1993.

Works on **parts** of the responses that have a **different parity**.



Allows to correct **one error**.

#### **Algorithm 1:** BINARY

```
Input: r_0, r_t, n_{passes}

for i = 1 to n_{passes} do

Shuffle r_0 and r_t using a public permutation \sigma_i

Split r_0 and r_t in blocks of size s_b

forall blocks do

Compute the relative parity P_r(b_{0,i},b_{t,i}) // Detection if P_r(b_{0,i},b_{t,i}) = 1 then

CONFIRM(b_{0,i},b_{t,i}) // Correction
```

Double the block size  $s_b = 2 * s_b$ 

Un-shuffle  $r_0$  and  $r_t$  with inverse permutations  $\sigma_1^{-1}$ ,  $\sigma_2^{-1}$ , ...,  $\sigma_{n_{passes}}^{-1}$  return  $r_0$ ,  $r_t$ 

Example: 32-bit responses, 5 errors.



## Two ways of leaking information:

- Relative parity computations,
  - 1 bit.
- CONFIRM executions on an *n*-bit block.
  - $log_2(n)$  bits.

Two ways of leaking information:

- Relative parity computations,
  - 1 bit.
- CONFIRM executions on an *n*-bit block.
  - $log_2(n)$  bits.

# **Example:**

**128-bit** response,  $\varepsilon = 0.05 \rightarrow 7$  errors.

- 1st pass: 8-bit blocks, 4 errors corrected.
- 2<sup>nd</sup> pass: 16-bit blocks, 3 errors corrected.

Leakage:  $\frac{128}{8} + 4 \times log_2(8) + \frac{128}{16} + 3 \times log_2(16) = 48$  bits.

The final effective length of the response is 128 - 48 = 80 bits.

Two ways of leaking information:

- Relative parity computations,
  - 1 bit.
- CONFIRM executions on an *n*-bit block.
  - $log_2(n)$  bits.

## **Example:**

**128-bit** response,  $\varepsilon = 0.05 \rightarrow 7$  errors.

- 1st pass: 8-bit blocks, 4 errors corrected.
- 2<sup>nd</sup> pass: 16-bit blocks, 3 errors corrected.

Leakage:  $\frac{128}{8} + 4 \times log_2(8) + \frac{128}{16} + 3 \times log_2(16) = 48$  bits.

The final effective length of the response is 128 - 48 = 80 bits.

#### How can it be improved?

After a pass, all the blocks have an **even** relative parity.

After a pass, all the blocks have an **even** relative parity.

 $\rightarrow$  if an error is corrected on a bit from this block in a subsequent pass, then its relative parity becomes **odd** again.

After a pass, all the blocks have an **even** relative parity.

- $\rightarrow$  if an error is corrected on a bit from this block in a subsequent pass, then its relative parity becomes **odd** again.
  - → **one more** error from this block can be corrected.

After a pass, all the blocks have an **even** relative parity.

- $\rightarrow$  if an error is corrected on a bit from this block in a subsequent pass, then its relative parity becomes **odd** again.
  - → **one more** error from this block can be corrected.

# **Example:**

12 14 4 7 9 0 13 5

Parity check does not detect these errors.

If, in a subsequent pass, the error 9 is corrected:

 $\rightarrow$  The block can be **processed again** to correct error 13.

#### Required:

Two lists storing blocks according to their relative parity.

- Correcting an error at index *i* makes blocks containing index *i* move from one list to the other
  - → (their relative parity changed).
- Error correction is carried out until there are no more blocks of odd parity.
- At the end of each pass, the blocks are added to the list of blocks of **even relative parity**.



relative parity:  $\varnothing$  Blocks of odd relative

Blocks of even

Blocks of odd relative parity:

7



8

Blocks of odd relative parity:





Blocks of odd relative parity:





Blocks of odd relative parity:





Blocks of odd relative parity:



Blocks of even relative parity:

1 2 3 4 5 6 7

9 10 11 12 13 14 15

10 8 11 3 15 6 1

9 | 0 | 13 | 5

Blocks of odd relative parity:



Blocks of even relative parity:

1 2 3 4 5 6 7

9 10 11 12 13 14 15

10 8 11 3 15 6 1

9 | 0 | 13 | 5

Blocks of odd relative parity:





Blocks of odd relative parity:

8 9 10 11 12 13 14 15



0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 2 | 10 | 8 | 11 | 3 | 15 | 6 | 1 12 | 14 | 4 | 7 | 9 | 0 | 13 | 5

Blocks of odd relative parity:

8 9 10 11 12 13 14 15



0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 2 | 10 | 8 | 11 | 3 | 15 | 6 | 1

12 14 4 7 9 0 13 5

Blocks of odd relative parity:

8 9 10 11 12 13 14 15



5 6 9 10 11

8 | 11 | 3 | 15 |

Blocks of odd relative parity:

12 13 14 15

9 13 5







 0
 1
 2
 3
 4
 5
 6
 7

 8
 9
 10
 11
 12
 13
 14
 15

 2
 10
 8
 11
 3
 15
 6
 1

 12
 14
 4
 7
 9
 0
 13
 5

Blocks of odd relative parity:



For the **same number of passes**, the CASCADE protocol allows to correct **more errors** than BINARY.

→ The information leakage is **lower**.

What is the lower bound on the information leakage?

It is related to the conditional entropy  $H(r_t|r_0) = nh(\varepsilon)$  where:  $\varepsilon$  is the error rate and n is the response length.

$$h(\varepsilon) = -\varepsilon . log_2(\varepsilon) - (1 - \varepsilon) . log_2(1 - \varepsilon)$$

The best length we can expect for the final response is then:

$$n - nh(\varepsilon) = n(1 - h(\varepsilon))$$

#### **Examples:**

With a 128-bit response and a 5% error rate: 91 bits. With a 128-bit response and a 10% error rate: 67 bits.

 $<sup>^{7}\</sup>text{J.}$  Martinez-Mateo et al. "Demystifying the Information Reconciliation Protocol CASCADE". (2015).

How to set the CASCADE parameters?

- Initial block size: depends on the error rate.
- **Number of passes**: depends on the required correction success rate.
- Block size multiplier: x2 at each pass.

How to set the CASCADE parameters?

- Initial block size: depends on the error rate.
- **Number of passes**: depends on the required correction success rate.
- Block size multiplier: x2 at each pass.



How to set the CASCADE parameters?

- Initial block size: depends on the error rate.
- **Number of passes**: depends on the required correction success rate.
- Block size multiplier: x2 at each pass.



#### Solution

Add extra passes without increasing the block size.

### Several realistic PUF references:

- RO PUF in FPGA  $\varepsilon = 0.9\%^8$ .
- TERO PUF in FPGA  $\varepsilon = 1.8\%^9$ .
- SRAM PUF in ASIC  $\varepsilon = 5.5\%^{10}$ .

256-bit responses, aim for 128-bit security

Simulation carried out on 2500 000 responses.

<sup>&</sup>lt;sup>8</sup>A. Maiti et al. "A large scale characterization of RO-PUF". . HOST. 2010.

<sup>&</sup>lt;sup>9</sup>C. Marchand et al. "Enhanced TERO-PUF Implementations and Characterization on FPGAs". *International Symposium on FPGAs*. ACM, 2016.

<sup>&</sup>lt;sup>10</sup>M. Claes et al. "Comparison of SRAM and FF-PUF in 65nm Technology". Nordic Conference on Secure IT Systems. 2011.













From an n-bit response, if t bits are leaked, it is possible to obtain an (n-t)-bit secret key.



A **hash function** can be used for privacy amplification<sup>11</sup>.

<sup>&</sup>lt;sup>11</sup>R. Impagliazzo, L.A. Levin and M. Luby, *Pseudo-random Generation from one-way functions*, **21st Annual Symposium on Theory of Computing**, 1989.

Only **parity computations** are embedded.

All other computations can be done on the server.



#### **Requirements:**

- Multiplexer to select the bits to XOR,
- One XOR gate,
- One D flip-flop.

- Compute the parity of an *n*-bit block: **n cycles**.
- Correct one error in an *n*-bit block:  $\sum_{i=1}^{log_2(n)} \frac{n}{2^i} = \mathbf{n} \mathbf{1} \text{ cycles}.$

- Compute the parity of an *n*-bit block: **n cycles**.
- Correct one error in an *n*-bit block:  $\sum_{i=1}^{log_2(n)} \frac{n}{2^i} = \mathbf{n} \mathbf{1} \text{ cycles}.$

# **Example:**

256-bit response,  $\varepsilon = 2\%$ , 20 passes,  $k_1 = 8$  bits:

- Best case (3%): 1 error, corrected AEAP.
  - → Latency: **5 127** clock cycles.
- Worst case (0.05%): 14 errors, corrected ALAP.
  - → Latency: 8690 clock cycles.



## Requirements:

- Circular shift register to select the bits to XOR,
- One counter,
- One XOR gate,
- Two D flip-flops.

- Compute the parity of an *n*-bit block:  $\frac{n^2}{2}$  cycles.
- Correct one error in an *n*-bit block:  $\frac{n(n-1)}{2}$  cycles.

- Compute the parity of an *n*-bit block:  $\frac{n^2}{2}$  cycles.
- Correct one error in an *n*-bit block:  $\frac{n(n-1)}{2}$  cycles.

# **Example:**

256-bit response,  $\varepsilon = 2\%$ , 20 passes,  $k_1 = 8$  bits:

- Best case (3%): 1 error, corrected AEAP.
  - → Latency: **656 256** clock cycles.
- Worst case (0.05%): 14 errors, corrected ALAP.
  - → Latency: 1112320 clock cycles.

- Compute the parity of an *n*-bit block:  $\frac{n^2}{2}$  cycles.
- Correct one error in an *n*-bit block:  $\frac{n(n-1)}{2}$  cycles.

## **Example:**

256-bit response,  $\varepsilon = 2\%$ , 20 passes,  $k_1 = 8$  bits:

- Best case (3%): 1 error, corrected AEAP.
  - → Latency: **656 256** clock cycles.
- Worst case (0.05%): 14 errors, corrected ALAP.
  - → Latency: 1112320 clock cycles.

Trade-off: area/latency.

# IP core activation procedure:

|              | Server                           |                                   | Device i                                  |
|--------------|----------------------------------|-----------------------------------|-------------------------------------------|
| at $t = 0$   | Generates challenge $c_i$        | $\stackrel{c_i}{\longrightarrow}$ |                                           |
| enrolment    |                                  | <i>r</i> <sub>0</sub> ←           | $r_0 \leftarrow PUF(c_i)$                 |
|              | Stores r <sub>0</sub>            |                                   |                                           |
| at $t = t_1$ |                                  | $\stackrel{c_i}{\longrightarrow}$ | Requests activation                       |
| activation   | r <sub>0</sub>                   | CASCADE                           | $r_{t_1} \leftarrow PUF(c_i)$ $r_{t_1}$   |
|              | $K \leftarrow PA(r_{t_1})$       | Privacy<br>amplification          | $K \leftarrow PA(r_{t_1})$                |
|              | Encrypts <i>UW</i> with <i>K</i> | $[UW]_K$                          |                                           |
|              |                                  |                                   | Decrypts <i>UW</i> Activates by unlocking |

Conclusion 30/3

# Compared to existing methods:

- → few on-chip logic resources,
- → can reach very low failure rates,
- $\,\rightarrow\,$  very tunable depending on the expected error-rate

Conclusion 30/30

### Compared to existing methods:

- → few on-chip logic resources,
- → can reach very low failure rates,
- → very tunable depending on the expected error-rate

#### **DONE/TO-DO:**

- ✓ Software model,
- ✓ Implementation in VHDL,
- × Tests with a real PUF: TERO-PUF
- × Integration in the overall module.

Conclusion 30/3

#### Compared to existing methods:

- → few on-chip logic resources,
- → can reach very low failure rates,
- → very tunable depending on the expected error-rate

#### DONE/TO-DO:

- ✓ Software model,
- ✓ Implementation in VHDL,
- × Tests with a real PUF: TERO-PUF
- × Integration in the overall module.

# — Questions? —