-
Views
-
Cite
Cite
Xuefeng Zhao, Secure and Efficient Masking of Lightweight Ciphers in Software and Hardware, The Computer Journal, Volume 67, Issue 2, February 2024, Pages 581–603, https://doi.org/10.1093/comjnl/bxad002
- Share Icon Share
Abstract
Masking is a well used and widely deployed countermeasure against side channel attacks, both in software and hardware. With masking comes at a great cost, search has focused on how to lower a performance penalty or find efficient masking implementation. In particular, our contribution is 2-fold: for software masking, we first find bitsliced implementations of Sbox with Multiplicative Complexity 4 and Multiplicative Depth 2, then adapt the common shares approach introduced by Coron et al. at CHES 2016 to make many cross-products |$a_{i}\cdot b_{j}$| can be reuse for parallel ISW-based 32-bit nonlinear operations. Therefore, we improve the efficiency of 2|$\times b/4/32$| parallel high-order masking of ISW scheme for RECTANGLE, TANGRAM and KNOT on 32-bit ARM embedded microprocessor, with roughly a 13%-34% speed-up, at cost of |$(1+d) \times 32$|-bit randomness. For hardware masking, 4 bit cubic Sboxes with quadratic decomposition length 2, including RECTANGLE, TANGRAM, KNOT and LWC third-round candidates, can be implemented with a 3-share and 4-share threshold implementation (TI) by decomposing cubic permutations |$S$| as a composition of sub-permutations having lower algebraic degrees. We use two decomposition form: one composition of two quadratic permutations |$G$| and |$F$|, |$S = F\circ G$|, is for efficiency; the other composition of some linear permutations |$A_i$| and one quadratic permutation |$G$|, |$S=A_3 \circ G \circ A_2 \circ G \circ A_1 $|, is for reducing the area requirements. For |$S = F\circ G$|, we introduce a new approach of searching through all possible quadratic permutations |$G$| with 2|$^{25.71}$|, which is effcient than 2|$^{26.23}$| in Poschmann et al. at J. Cryptol 2011. For |$S=A_3 \circ G \circ A_2 \circ G \circ A_1 $|, our approach of finding |$A_i$| with complexity 2|$^{27.71} $|, which is effcient than the method introduced by Moradi et al. at ASIACRYPT 2016. In addition, we proposes a new decomposition that |$S=G \circ A_2 \circ G \circ A_1 $|. We can find the fastest and the smallest hard-ware decomposition implementation of 4-bit permutations for TI with 3 and 4 shares.