In this section, we present in more detail an adaptive template attack technique, which is called an Online Template Attack (OTA) and is initially proposed by Batina et al. in [65]. This technique resides between horizontal and template attacks. The attacker is able to recover a complete scalar after obtaining only one power trace of a scalar multiplication from the device under attack. This attack is characterized as online, because templates are created after the acquisition of the target trace. While the same terminology is used, OTA is not a typical template attack; i.e., no preprocessing template-building phase is necessary. OTA functions by acquiring one target trace from the device under attack and comparing patterns of certain operations from this trace with templates obtained from the attacker’s device that runs the same implementation. Pattern matching is performed at suitable points in the algorithm, where key-bit related assignments take place by using an automated module based on the Pearson correlation coefficient.

The attacker needs only very limited control over the device used to generate the online template traces. The main assumption is that the attacker can choose the input point to a scalar multiplication, an assumption that trivially holds even without any modification to the template device in the context of ephemeral Diffie-Hellman. It also holds in the context of ECDSA, if the attacker can modify the implementation on the template device or can modify internal values of the computation. This is no different than for previous template attacks against ECDSA.

OTA is defined as a side-channel attack with the following conditions:

1. The attacker obtains only one power trace of the cryptographic algorithm involving the targeted secret data. This trace is called the target trace. The device, from which the target trace is obtained, is the target device. The fact that only one target trace is necessary for the attack, makes it possible to attack scalar multiplication algorithms with ephemeral scalar and with randomized scalar.

2. The attacker is generating template traces after having obtained the target trace. These traces are called (online) template traces.

3. The attacker obtains the template traces on the target device or a similar device^{[1]}with very limited control over it, i.e., access to the device to run several executions with chosen public inputs. The attacker does not rely on the assumption that the secret data are the same for all template traces.

4. At least one assignment in the exponentiation algorithm is made depending on the value of particular scalar bit(s), but there are no branches with key-dependent computations. Since the doubling operation is attacked, this key-dependent assignment should be during doubling. As a counterexample, it is noted that the binary right-to-left add-always algorithm for Lucas recurrences [32] is resistant to the proposed attack, because the result of the doubling is stored in a non-key- dependent variable.

The attack methodology is as follows:

• Acquire a full target trace from the device under attack, during the execution of a scalar multiplication.

• Locate the doubling and addition operations performed in each round.

• Find multiples of mV, where m e Z, m < k and k is the scalar. These points are used to create the template traces.

The methodology offers a generic attack framework, which does not require any previous knowledge of the leakage model nor a specific type of curve. It is applicable to various forms of curves (Weierstrass, Edwards and Montgomery curves), scalar multiplication algorithms and implementations. Contrary to the doubling attack [37], OTA can be launched against right-to-left algorithms and the Montgomery ladder.

The basic idea, as depicted in Fig. 3.4 consists of comparing the traces for the inputs V (target trace) and 2V (online template trace) while executing scalar multiplication and then finding similar patterns between them, based on the hypothesis on a bit for a given operation. The target trace is obtained only once. For every bit of the scalar, an online template trace with input kV, k e Z is obtained, where k is chosen as a function of a hypothesis on this bit.

In the original paper, pattern matching is performed using the Pearson correlation coefficient, p(X, Y), which measures the linear relationship between two vari-

Fig. 3.4 Schematic representation of OTA [66]

ables X and Y. For power traces, the correlation coefficient shows the relationship between two points of the trace, which indicates the Hamming-weight leakage of key- dependent assignments during the execution of a cryptographic algorithm. Extension to other distinguishers from machine learning is performed in [62].

The template matching corresponds to a list of correlation coefficients that show the relationship between all samples from the template trace to the same consecutive amount of samples in the target trace. If the hypothesis on the given key bit is correct, then the pattern match between the template traces at the targeted operation will be high (in experiments it reached 99 %).

In this way, the first i bits of the key can be recovered. Knowledge of the first i bits provides the attacker with complete knowledge of the internal state of the algorithm just before the (i + 1)th bit is processed. Since at least one operation in the loop depends on this bit, a hypothesis can be made about the (i + 1)th bit, a template trace based on this hypothesis is computed, and this trace is correlated with the target trace at the relevant predetermined point of the algorithm.

OTA on scalar multiplication algorithms The core idea and feasibility of the attack is demonstrated through an example based on the double-and-add-always algorithm described in Algorithm 1. Table3.1 shows two executions of the algorithm for two different scalars k = 100 and k = 110. The first execution of the loop always starts by doubling the input point P, for all values of k .It is assumed that k_{x} __{1} = 1. Depending on the second-most significant key bit k_{x}__{2}, the output of the first iteration of the algorithm will be either 2P or 3P. For any point P it is, therefore, possible to get a power trace for the operation 2P, i.e., the attacker lets the algorithm execute the first two double-and-add iterations. In the proposed setup, the authors could zoom into the level of one doubling, which will be the template trace. Then, the attacker can

Table 3.1 Two executions of the double-and-add-always algorithm [65]

k = 100

k = 110

Ro = P

R0 = P

R0 = 2 P, R1 = 3 P, return 2 P

R_{0} = 2 P, R_{1} = 3 P, return 3 P

R_{0} = 4 P, R_{1} = 5 P, return 4 P

R_{0} = 6P, R_{1} = 7P, return 6P

perform the same procedure with 2P as the input point to obtain the online template trace that he wants to compare with the target trace. If it is assumed, that the second- most significant bit of k is 0, then he compares the 2P template with the output of the doubling in the first iteration. Otherwise, he compares it with the online template trace for 3P.

Assuming that the first (i - 1) bits of k are known, he can derive the i th bit by computing the two possible states of R after this bit has been treated and recover the key iteratively. Note that only the assignment in the ith iteration depends on the key bit ki, but none of the computations do, so it is necessary to compare the trace of the doubling operation in the (i + 1)th iteration with the original target trace. To decide whether the i th bit of k is zero or one, the trace that the doubling operation in the (i + 1)th iteration would give for k_{i}+_{1} = 0 is compared with the target trace. For completeness, he can compare the target trace with a trace obtained for k_{+1} = 1 and verify that it has a lower pattern match percentage; in this case, the performed attack needs two template traces per key bit. However, if during the acquisition phase the noise level is low and the signal is of good quality, an efficient attack can be performed with only the target trace and a single trace for the hypothetical value of R_{H1}.

Attacking the right-to-left double-and-add-always algorithm of [32] can be done in a similar way, since it is a type of key-dependent assignment OTA. The attacker targets the doubling operation and notes that the input point will be doubled either in the first (if k0 = 0) or in the second iteration of the loop (if k0 = 1). If k is fixed he can easily decide between the two by inputting different points, since if k0 = 1 he will see the common operation 2 O. If k is not fixed, he simply measures the first two iterations and again uses the operation 2 O if the template generator should use the first or second iteration. Once he is able to obtain clear traces, the attack itself follows the general description of an OTA.

Montgomery Ladder The main observation that makes OTA attacks applicable to the Montgomery ladder is that at least one of the computations, namely the doubling in the main loop, directly depends on the key bit ki . For example, if it is assumed that the first three bits of the key are 100, then the output of the first iteration will be R_{0} = 2P. If it is assumed that the first bits are 110, then the output of the first iteration will be R_{0} = 3P. Therefore, if the attacker compares the pattern of the output of the first iteration of Algorithm 3 with scalar k = 100, he will observe a higher correlation with the pattern of R_{0} = 2P than with the pattern of R_{0} = 3P. This is demonstrated in the working example of Table 3.2.

Table 3.2 Two executions of the Montgomery ladder [65]

k = 100

k = 110

R_{0} = P, R_{1} = 2 P

R_{0} = P, R_{1} = 2 P

b = 1 R_{1} = 3 P, R_{0} = 2 P

b = 0 R_{0} = 3 P, R_{1} = 4 P

b = 1 R_{1} = 5 P, R_{0} = 4 P

b = 1 R_{1} = 7 P, R_{0} = 6 P

Side-channel atomicity

Simple atomic algorithms do not offer any protection against online template attacks, because the regularity of point operations does not prevent mounting this sort of attack. The point 2P, as an output of the third iteration of Algorithm 4, will produce a power trace with a pattern that is very similar to the trace that would have the point 2P as an input. Therefore, the attack will be the similar to the one described for the binary left-to-right double-and-add-always algorithm; the only difference is that instead of the output of the second iteration of the algorithm, the attacker has to focus on the pattern of the third iteration. In general, when an attacker forms a hypothesis about a certain number of bits of k, the hypothesis will include the point in time where R_{0} will contain the predicted value. This means that he would have to acquire a larger target trace to allow all hypotheses to be tested.

Practical results The feasibility and efficiency of OTA is shown in [65] with practical attacks on the double-and-add-always scalar multiplication running on the ATmega163 microcontroller [67] in a smart card. The scalar multiplication algorithm is based on the curve arithmetic of the Ed25519 implementation presented in [68], which is available online at http://cryptojedi.org/crypto/#avrnacl. The elliptic curve used in Ed25519 is the twisted Edwards curve E : —x^{2} + y^{2}= 1 + dx^{1} y^{2 }with d = -(121665/121666) and base point

V = (15112221349535400772501151409588531511454012693041857206046113283949847762202, 46316835694926478169428394003475163141307993866256225615783033603165251855960).

For more details on Ed25519 and this specific curve, see [69, 70]. The whole underlying field and curve arithmetic is the same as in [68]. This means in particular that points are internally represented in extended coordinates as proposed in [71]. In this coordinate system, a point P = (x, y) is represented as (X : Y : Z : T) with x = X/ Z, y = Y/ Z, and x ? y = T/Z.

Experimental results of OTA with extended projective coordinates of 256 bits, extended projective coordinates with reduced 255-bit input and input points with affine compressed coordinates are presented in [65]. The attack targets the output of the doubling operation and then performs pattern matching based on the Pearson correlation coefficient.

The correct key bit guess (k_{2} = 0) gives 97 % correlation of the target trace with the template trace for 2P. On the other hand, the correlation of the target trace with the template trace for 3P is at most 83 %. These high correlation results hold when one key bit is attacked. For every key bit, the pattern matching will give peaks as in Fig. 3.5. Attacking five bits with one acquisition gives lower numbers for pattern matching for both the correct and the wrong scalar guess, mainly due to the noise that is higher for longer acquisitions. However, the difference between correct and wrong assumptions is still remarkable; correct bit assumptions have 8488 % matching patterns, while the correlation for the wrong assumptions drops to 50-72 %. To determine the value of one bit, it is thus necessary to compute only one template trace, and decide on the value of the targeted bit depending on whether the

Fig. 3.5 Pattern Matching 2P (blue) to target and 3P to target (brown) [65]

correlation is above or below a certain threshold (in this case, the threshold can be set to 80 %.

Error-Detection and Correction The idea of Online Template Attacks was extended by Dugardin et al. in [66] with an adaptive template attack on scalar multiplication. The authors propose a generic method to distinguish matching templates and using two templates per key bit, they manage to detect and correct errors for wrong bit assumptions. This fact increases the success rate of the attack significantly compared to the original OTA, reaching 99.8 % when 100 average template traces are used. They also take advantage of the horizontal and vertical leakage, which occurs in the broadly used software implementation of mbedTLS during the modular multiplication of large numbers (256-bit elements).

Classification Algorithms for Template Attacks The fact that the template-building phase in OTA is not necessary, significantly simplifies the process of retrieving the key, leaving the overhead of the attack in the template-matching phase. The templatematching technique used for both OTA papers [65, 66] is based on the Pearson correlation coefficient. In [62], more efficient techniques from the field of Machine Learning are used as distinguishers, and the proposed attack reaches a success rate of 100 % with only 20 template traces per key bit. This work is the first step toward a framework for “automating” the template-matching phase. The attack can be classified as a form of OTA having the same attack model and assumptions. The proposed classification techniques from the field of Machine Learning (?-Nearest Neighbor, Naive Bayes, SVM) provide an efficient and simplified way to match templates during a Template Attack with very high success rates. A practical application of this attack is demonstrated on the scalar multiplication algorithm for the Brainpool curve BP256r1 implemented in mbedTLS (formerly PolarSSL, version 1.3.7).

[1] Similar device means the same type of microcontroller running the same algorithm.