联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-04-10 10:07

Post-quantum RSA

This submission is from the following team, listed in alphabetical order:

Auxiliary submitters: There are no auxiliary submitters. The principal submitter is the

team listed above.

Inventors/developers: The inventors/developers of this submission are the same as the

principal submitter. Relevant prior work is credited below where appropriate.

Owner: Same as submitter.

Signature: ×. See also printed version of “Statement by Each Submitter”.

Document generated with the help of pqskeleton version 20171123.

1

Contents

1 Introduction 5

2 General algorithm specification (part of 2.B.1) 5

2.1 Parameter space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Secret key and public key . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 KEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.5 Public-key encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 List of parameter sets (part of 2.B.1) 9

3.1 Parameter set encrypt/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Parameter set encrypt/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3 Parameter set encrypt/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.4 Parameter set encrypt/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.5 Parameter set kem/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.6 Parameter set kem/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.7 Parameter set kem/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.8 Parameter set kem/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.9 Parameter set sign/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.10 Parameter set sign/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.11 Parameter set sign/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.12 Parameter set sign/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Design rationale (part of 2.B.1) 11

5 Detailed performance analysis (2.B.2) 11

5.1 Description of platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.2 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.3 Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2

5.4 How parameters affect performance . . . . . . . . . . . . . . . . . . . . . . . 13

5.5 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6 Expected strength (2.B.4) in general 14

6.1 Security definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6.2 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

7 Expected strength (2.B.4) for each parameter set 14

7.1 Parameter set encrypt/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . 14

7.2 Parameter set encrypt/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . 14

7.3 Parameter set encrypt/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . 14

7.4 Parameter set encrypt/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . 14

7.5 Parameter set kem/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

7.6 Parameter set kem/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

7.7 Parameter set kem/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

7.8 Parameter set kem/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

7.9 Parameter set sign/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

7.10 Parameter set sign/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

7.11 Parameter set sign/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

7.12 Parameter set sign/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

8 Analysis of known attacks (2.B.5) 15

8.1 Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

8.2 Factorization when factors are small . . . . . . . . . . . . . . . . . . . . . . . 18

8.3 Attacks without factorization . . . . . . . . . . . . . . . . . . . . . . . . . . 19

9 Advantages and limitations (2.B.6) 20

References 20

A Statements 21

A.1 Statement by Each Submitter . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3

A.2 Statement by Patent (and Patent Application) Owner(s) . . . . . . . . . . . 25

A.3 Statement by Reference/Optimized Implementations’ Owner(s) . . . . . . . 26

4

1 Introduction

Join us as we dance madly on the lip of the volcano!

—John Oliver hypothesizing Apple’s view

of the difficulty of securing the iPhone

https://www.youtube.com/watch?v=zsjZ2r9Ygzw

Integer factorization is a security disaster. There is a long list of proposed RSA key sizes that

have been shown vulnerable to attack. And yet RSA remains standardized and remarkably

popular. Users switch to larger RSA key sizes and believe that they will be safe.

Is it clear that quantum attacks should be handled in a different way? More importantly, is

it clear that quantum attacks will be handled in a different way?

Post-quantum RSA (pqRSA) uses extremely large RSA keys to stop Shor’s algorithm. It

uses many relatively small secret primes, and small encryption/verification exponents, so

that computations with such large keys are affordable for the legitimate users. It still uses

secret primes large enough to resist ECM and quantum versions of ECM.

Perhaps there are better quantum algorithms for factorization, especially when the factors

are relatively small. This has not been adequately studied—but how many post-quantum

proposals have been adequately studied? Many other post-quantum proposals are more

efficient than pqRSA—but is efficiency a sign of strength, or a sign of weakness?

It is not difficult to envision many RSA users gradually slouching their way into becoming

pqRSA users. The cryptographic community should not ignore this possibility: it should

figure out whether the possibility is secure.

2 General algorithm specification (part of 2.B.1)

2.1 Parameter space

This pqRSA submission provides signatures, key encapsulation, and public-key encryption.

Each operation has two parameters: K, a power of 2; and B, a positive multiple of 8.

There are actually two different options for public-key encryption:

Use a generic conversion from the key-encapsulation mechanism into a public-key encryption

mechanism. For example, use the KEM to send a 32-byte session key, and

then use the session key with AES-256-GCM to encrypt and authenticate the message.

NIST has indicated that it will apply this conversion automatically.

Use the public-key encryption mechanism specified below. This is more complicated

but saves space.

5

2.2 Secret key and public key

Alice’s secret key consists of K distinct primes. Each prime is between 2B?1 and 2B, and is

congruent to 5 modulo 6. Specifically, Alice accumulates a list of K primes by repeating the

following steps:

Generate a B-bit integer as a (B/8)-byte random string interpreted in little-endian

form.

Set bits 0 and B ? 1, obtaining an odd integer p between 2B?1 and 2B.

If p mod 3 = 2, p is prime, and p is not in the list, then add p to the list.

If K is a significant fraction of the number of primes congruent to 5 modulo 6 between 2B?1

and 2B, then the “not in the list” condition significantly slows down this procedure. If K is

larger than the number of primes then the procedure does not terminate. For the parameter

sets in this submission, the “in the list” condition has negligible chance of occurring, and

the test can safely be skipped.

There is an extensive literature on primality testing, with various tradeoffs between simplicity,

speed, conjectured error probability, and provability. Users are expected to follow NIST

standards on this topic.

Our reference implementation simply uses GMP’s mpz_probab_prime_p function. It is easy

to artificially construct non-primes p that have a noticeable chance of passing this test, but

a random non-prime p becomes increasingly unlikely to pass this test as B grows. All of the

parameter sets below have B ≥ 512, and we conjecture that the chance of Alice’s key being

invalid is below 2128.

Alice multiplies these K primes to obtain her public key N, an integer between 2K(B1) and

2KB. The number of bytes needed for Alice’s N is called A; i.e., A is the unique integer such

that 256A1 ≤ N < 256A. Note that K(B  1)/8 < A ≤ KB/8.

The secret key is then encoded as a 3K(B/8)-byte string, namely the concatenation of the

following strings:

For each p: B/8 bytes representing p in little-endian form.

For each p: B/8 bytes representing (N/p)?1 mod p in little-endian form. (This is used

inside various standard methods to compute cube roots.)

K(B/8) bytes representing N in little-endian form.

The public key is also encoded as K(B/8) bytes representing N in little-endian form.

6

2.3 Signatures

Alice signs a message M as follows:

Generate a random 32-byte string R.

Compute Y = HA1(R, M). Here the hash function HA1 is A 1 bytes of output of

SHAKE256. Recall that A is the number of bytes in N.

Compute the integer Y represented by Y in little-endian form.

Compute the cube root X of Y modulo N.

Encode X in little-endian form as a K(B/8)-byte string X.

The signature is X followed by R. The signed message is the signature followed by M.

Bob verifies an alleged signature of a message M0 as follows:

Parse the alleged signature as a K(B/8)-byte string X0 followed by a 32-byte string Compute Y 0 = HA1(R0, M0).

Compute the integer X0 represented by X0 in little-endian form. Fail if X0 ≥ N.

Compute the integer Y 0 represented by Y 0 in little-endian form.

Check whether Y 0 = (X0)3 mod N.

2.4 KEM

Bob exchanges a secret session key with Alice as follows, given Alice’s public key N:

Generate a string of K(B/8) uniform random bytes.

Clear the last K(B/8) ? (A ? 1) bytes, obtaining a string r. Recall that A is the

number of bytes in N.

Compute the session key H32(r), where H32 means 32 bytes of output of SHAKE256.

Compute the integer r represented by r in little-endian form.

Compute C = r3 mod N.

Encode C in little-endian form as a K(B/8)-byte string C.

Send C as a ciphertext.

7

Alice decapsulates C0 as follows:

Fail if C0 does not have length K(B/8).

Compute the integer C0 represented by C0 in little-endian form. Fail if C0 ≥ N.

Compute the cube root r0 of C0 modulo N.

Encode r0 in little-endian form as a K(B/8)-byte string r0

Compute the session key H32(r0).

2.5 Public-key encryption

The following encryption mechanism assumes that K(B ? 1) ≥ 776. This implies N ≥ 2776,

so A ≥ 98; recall that A is the number of bytes in N. Define α = A ? 65; then α ≥ 33.

Bob sends a secret `-byte message m to Alice as follows, given Alice’s public key N:

If ` < α: Define x0 as the α-byte string (m, 1, 0, . . . , 0). There are α ? 1 ?  copies of

byte 0.

If ` ≥ α: Define m0 as the first α ? 33 bytes of m, and define m1 as the remaining

`  (α 33) bytes of m. Generate a uniform random 32-byte string k, and define x0

as the α-byte string (m0, k, 2).

Generate a uniform random 32-byte string r.

Compute the α-byte string x1 = x0⊕Hα(r, 0). Here ⊕ means xor; r, 0 means r followed

by byte 0; and Hα means α bytes of output of SHAKE256.

Compute the 32-byte string x2 = H32(x0, r, 1). Here H32 means 32 bytes of output of

SHAKE256.

Compute the 32-byte string x3 = r ⊕ H32(x1, x2, 2).

Compute the (A ? 1)-byte string x = (x1, x2, x3).

Compute the integer x represented by x in little-endian form.

Compute C = x3 mod N.

Encode C in little-endian form as a K(B/8)-byte string C.

If ` < α: The ciphertext is C.

If ` ≥ α: The ciphertext is C followed by the AES-256-GCM ciphertext for m1 under

key k with nonce 0.

8

Alice decrypts by reversing the above steps:

Define C0 as the first K(B/8) bytes of ciphertext. Fail if the ciphertext has fewer than

K(B/8) bytes.

Define C0 as the corresponding integer. Fail if C0 ≥ N.

Compute the cube root x0 of C0 modulo N. Fail if x0 ≥ 256A?1.

Encode x0 in little-endian form as an (A ? 1)-byte string x.

Parse x as (x1, x2, x 00 03) where x1, x2, x 00 0 have α, 32, 32 bytes respectively.

Define r0 as the 32-byte string x03 ⊕ H32(x1, x2, 2). 00

Define x0

0 as the α-byte string x01 ⊕ Hα(r0, 0).

Compute the 32-byte string H32(x0, r 0 0, 1). Fail if this string is not x0

2. If the ciphertext has exactly K(B/8) bytes: Parse x0

0 as a plaintext m0 followed by

byte 1 and some number of copies of byte 0. Fail if this parsing fails.

If the ciphertext has more than K(B/8) bytes: Parse x0, 2) where k0 has 32

bytes and m0

0 has α 33 bytes. Fail if this parsing fails. Use AES-256-GCM to verify

and decrypt the remaining bytes of ciphertext, obtaining m0

1; fail if this fails. Define a

plaintext m0 as (m00 , m0 ). 1

Note that, beyond the usual importance of constant-time computations for security, it is

particularly important to hide the differences between an x0 ≥ 2A?2 failure and an

H32(1, r0 0) failure. 0 , x

3 List of parameter sets (part of 2.B.1)

3.1 Parameter set encrypt/pqrsa15

PKE with K = 512 and B = 512.

3.2 Parameter set encrypt/pqrsa20

PKE with K = 16384 and B = 512.

3.3 Parameter set encrypt/pqrsa25

PKE with K = 262144 and B = 1024.

9

3.4 Parameter set encrypt/pqrsa30

PKE with K = 8388608 and B = 1024.

3.5 Parameter set kem/pqrsa15

KEM with K = 512 and B = 512.

3.6 Parameter set kem/pqrsa20

KEM with K = 16384 and B = 512.

3.7 Parameter set kem/pqrsa25

KEM with K = 262144 and B = 1024.

3.8 Parameter set kem/pqrsa30

KEM with K = 8388608 and B = 1024.

3.9 Parameter set sign/pqrsa15

Signatures with K = 512 and B = 512.

3.10 Parameter set sign/pqrsa20

Signatures with K = 16384 and B = 512.

3.11 Parameter set sign/pqrsa25

Signatures with K = 262144 and B = 1024.

3.12 Parameter set sign/pqrsa30

Signatures with K = 8388608 and B = 1024.

10

4 Design rationale (part of 2.B.1)

Shoup’s “Simple RSA”, also known as “RSA-KEM”, takes r as a uniform random integer

modulo N. We instead take uniform random r from a power-of-2 range, simplifying the

generation process, and more specifically a power-of-256 range, further simplifying the generation

process. The size of the range is at least N/256, so an algorithm to compute our r

given rE mod N has probability at least 1/256 of computing Shoup’s r given rE mod N.

We reuse the same range for x in public-key encryption, and for Y in signatures.

In the original RSA paper [9], the encryption/verification exponent E was a random number

with as many bits as N. Rabin in [8] suggested instead using a small constant E, and said

that E = 2 is “several hundred times faster.” A complication of E = 2 is that each square

has 2K different square roots mod N; E = 3 is about twice as slow for encryption but avoids

this complication. The subsequent literature has focused mainly on E = 3 and E = 65537.

There are attacks that compute various types of structured Eth roots more quickly than

factoring. Some of these attacks are specific to very small E, and historically this has led

to some preference for E = 65537 over E = 3. We instead treat the attacks as a reason to

never take Eth powers of structured inputs. There is then no known problem taking E = 3.

Shoup has also pointed out that the connection between “RSA-OAEP+” and computing

Eth roots is very tight for E = 3, but becomes looser as E grows. See [11]. This leads to

the following conclusions:

If computing 3rd roots is harder than computing 65537th roots, then breaking RSAOAEP+

for E = 3 is harder than breaking RSA-OAEP+ for E = 65537.

If computing 3rd roots is the same hardness as computing 65537th roots, then breaking

RSA-OAEP+ for E = 3 is at least as hard as breaking RSA-OAEP+ for E = 65537.

If computing 3rd roots is easier than computing 65537th roots, then breaking RSAOAEP+

for E = 3 could still be harder than breaking RSA-OAEP+ for E = 65537.

The public-key encryption system described above is intended to be an example of RSAOAEP+,

although the details need to be checked carefully.

5 Detailed performance analysis (2.B.2)

5.1 Description of platform

The following measurements were collected on one (otherwise idle) core of a computer named

samba. The CPU on samba is an Intel Xeon E3-1220 v5 (Skylake) running at 3 GHz. Turbo

Boost is disabled. samba has 64GB of RAM and runs Ubuntu 16.04, with gcc 5.4.0.

11

NIST says that the “NIST PQC Reference Platform” is “an Intel x64 running Windows

or Linux and supporting the GCC compiler.” samba is an Intel x64 running Linux and

supporting the GCC compiler. Beware, however, that different Intel CPUs have different

cycle counts.

5.2 Time

The following measurements are for kem. encrypt and sign have essentially the same performance

as kem. There is a slight slowdown for the extra hashing in encrypt, and a measurable

slowdown for long messages.

Measurements were collected by the program shown in Figure 1, compiled with

gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv. Various measurements

were also checked (with no obvious discrepancies) against results of

./do-part from supercop-20170904, with the compiler list reduced to just

gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv, with reduced

values of LOOPS and TIMINGS, and with timeouts (killafter) extended.

pqrsa15: About 3.5 billion cycles (3483292516) for key generation; 17 million cycles

(17492210 17410534 17382984 17361047 17358893) for encapsulation; and 122 million cycles

(122127482 122462936 122079316 122135561 122018320) for decapsulation.

Compared to the 32768-byte key size, these are around 110000 cycles per byte, 530 cycles

per byte, and 3700 cycles per byte respectively. These figures are useful in understanding

how well pqRSA scales to larger key sizes.

pqrsa20: About 120 billion cycles (119047642299) for key generation; 1.1 billion cycles

(1071561548 1077606577 1076427117 1076293353 1076391885) for encapsulation; and 6.1

billion cycles (6123286512 6116529230 6119549109 6118574953 6118670863) for decapsulation.

Compared to the 1048576-byte key size, these are around 110000 cycles per byte, 1000 cycles

per byte, and 5800 cycles per byte respectively.

pqrsa25: About 18 trillion cycles (18177137014865) for key generation; 46 billion cycles

(46248174238 46222427697 46191747583 46242537978 46191225425) for encapsulation; and

520 billion cycles (519581069107 519569878218 519608774814 519612644254 519558611912)

for decapsulation.

Compared to the 33554432-byte key size, these are around 540000 cycles per byte, 1400

cycles per byte, and 15000 cycles per byte respectively. Note that primes are bigger here,

1024 bits instead of 512 bits.

pqrsa30: About 590 quadrillion cycles (586593568853135) for key generation; 1.8 quadrillion

cycles (1821539719905 1822281625179 1822120731196 1819092856762 1821360421361) for

encapsulation; and 22 quadrillion cycles (22294539298463 22296758019988 22296538833629

22300640944170 22296717126117) for decapsulation.

12

Compared to the 1073741824-byte key size, these are around 550000 cycles per byte, 1700

cycles per byte, and 21000 cycles per byte respectively.

5.3 Space

Sizes are straightforwardly calculated from parameters (and confirmed in various experiments).

Specifically, keys are 215 bytes for pqrsa15, 220 bytes for pqrsa20, 225 bytes for

pqrsa25, and 230 bytes for pqrsa30. KEM ciphertexts have the same size as keys. PKE ciphertexts

have the same size as keys if the transmitted messages are short enough. Signatures

are 32 bytes longer.

5.4 How parameters affect performance

Encryption and signature verification involve a small number of modular multiplications.

Decryption and signature generation are slower, and key generation is even slower, but if B

is chosen sensibly then these slowdowns are by factors logarithmic in the number of bits in

the modulus.

For comparison, Shor’s algorithm involves a quantum modular exponentiation, which is a

large number of modular multiplications. See Section 8. The gap in costs grows with the

number of bits in the modulus. The growth is essentially linear, giving the legitimate user a

rapidly increasing advantage over the attacker as the user’s computer power increases.

5.5 Optimizations

Compared to the KEM, the PKE is more complicated but allows compression of encrypted

messages. If messages are close to the key size, or longer, then almost the entire traffic is

used for message contents.

Similarly, a slightly more complicated scheme for public-key signatures allows pqRSA signed

messages to be compressed. This submission skips this option for simplicity.

Checking primality of many independent uniform random integers is faster than checking

primality of each integer separately. This speedup is not in the reference software that we

are submitting, but it is already implemented in our experimental software.

See [3] for further discussion of various speedup techniques.

13

6 Expected strength (2.B.4) in general

6.1 Security definitions

The KEM and PKE are designed for IND-CCA2 security. The signature system is designed

for EUF-CMA security. See Section 7 for quantitative estimates of the security of specific

parameter sets.

6.2 Rationale

See Section 8 for an analysis of known attacks. This analysis also presents the rationale for

these security estimates.

7 Expected strength (2.B.4) for each parameter set

7.1 Parameter set encrypt/pqrsa15

Scaled-down version provided as a target for cryptanalysis.

7.2 Parameter set encrypt/pqrsa20

Scaled-down version provided as a target for cryptanalysis.

7.3 Parameter set encrypt/pqrsa25

Scaled-down version provided as a target for cryptanalysis.

7.4 Parameter set encrypt/pqrsa30

Category 2, assuming depth limit 264.

7.5 Parameter set kem/pqrsa15

Scaled-down version provided as a target for cryptanalysis.

14

7.6 Parameter set kem/pqrsa20

Scaled-down version provided as a target for cryptanalysis.

7.7 Parameter set kem/pqrsa25

Scaled-down version provided as a target for cryptanalysis.

7.8 Parameter set kem/pqrsa30

Category 2, assuming depth limit 264.

7.9 Parameter set sign/pqrsa15

Scaled-down version provided as a target for cryptanalysis.

7.10 Parameter set sign/pqrsa20

Scaled-down version provided as a target for cryptanalysis.

7.11 Parameter set sign/pqrsa25

Scaled-down version provided as a target for cryptanalysis.

7.12 Parameter set sign/pqrsa30

Category 2, assuming depth limit 264.

8 Analysis of known attacks (2.B.5)

8.1 Factorization

2048-bit RSA keys are widely standardized and deployed. This key size is typically estimated

to provide more than 2100 security against the number-field sieve, and even higher security

against other known non-quantum methods for integer factorization. However, (1) precise

15

estimates vary; (2) it is not clear whether the security level is maintained against multiuser

attacks; and, most importantly, (3) post-quantum cryptography also considers quantum

algorithms. In particular, Shor’s quantum algorithm is believed to pose a serious threat to

2048-bit RSA keys.

pqRSA uses much larger RSA keys, slowing down Shor’s algorithm so as to reach any desired

security level. The number-field sieve scales much more poorly than Shor’s algorithm, so we

focus on the performance of Shor’s algorithm.

Naive analysis. The main bottleneck in Shor’s algorithm is computing an n-bit quantity

ae mod N. Here N is the public key; n is the number of bits in N; a is an integer, which

can safely be taken to be small; and e is a superposition of 2n-bit integers.

e Shor computes a mod N as follows: compute a, a2 mod N, a4 mod N, a8 mod N, etc.;

multiply these consecutively into a superposition of products, conditioned upon the bits

of e. For reversibility, each multiplication is followed by a corresponding multiplication

by 1/a mod N, 1/a2 mod N, 1/a4 mod N, 1/a8 mod N, etc. Overall there are about 8n

multiplications here, half of which are in superposition.

Each step, multiplying two n-bit integers modulo N, becomes increasingly expensive as n

grows. H¨aner–Roetteler–Svore [7] report approximately 32n2 lg n Toffoli gates (not counting

CNOT gates) for a reversible n-bit modular multiplication, and thus 64n3 lg n Toffoli gates

overall for Shor’s algorithm, using a total of 2n+2 qubits. For n = 233 this is approximately

2110 Toffoli gates using approximately 234 qubits.

If each Toffoli gate has comparable cost to 236 non-quantum gates then the total cost is

comparable to 2146 non-quantum gates, i.e., the cost of finding a SHA3-256 collision, the

definition of NIST’s “Category 2”.

Higher security: communication costs. The naive analysis above counts only the cost

of computation and not the cost of communication. This is important because the algorithm

is constantly communicating data across large distances.

Communication of a qubit—or merely a bit—costs energy proportional to the distance communicated.

Concretely, Intel states that at 22nm the energy cost of simply moving 8 bytes

of data is 11.20 pJ “per 5 mm” moved, and that this is “more difficult to scale down” than

computation cost; see [6, page 9]. Replacing wires with a different technology might save a

constant factor but does not eliminate the scaling difficulties: e.g., lasers spread out linearly

over distance. Even with quantum teleportation, there is a cost of the initial setup, namely

pushing EPR pairs from one place to another; the cost per bit transmitted again increases

linearly with the distance.

1/2+o(1) Storing 2n qubits, or merely 2n bits, involves distances at least n on any realistic

two-dimensional architecture. Architectures are two-dimensional because this allows energy

to arrive (and depart) through the third dimension:

16

Billions of transistors are spread across a two-dimensional chip, with only a few layers

in the third dimension, because otherwise nobody knows how to get the energy in and

out.

At a larger scale, nodes in a computer cluster are spread across two dimensions, with

only a few layers in the third dimension, because otherwise nobody knows how to get

the energy in and out.

There is a massive literature on real two-dimensional computations, and the costs of ac- 1/2 cessing n bits of memory are consistently at least n (times the feature sizes etc., which

are independent of n). The occasional papers on three-dimensional computations (e.g.,

[10]) do not seriously address the energy issues, and we have not found literature report- 1/2 ing experiments that can be plausibly interpreted as beating n . Similarly, every serious

quantum-computing proposal is limited to two dimensions.

Higher security: limits on latency. The naive analysis also makes the absurd assumption

that attackers can wait for roughly 2100 serial qubit operations.

NIST sensibly suggests that submissions consider attacks “restricted to a fixed running time,

or circuit depth.” NIST observes that 240 sequential qubit operations is “the approximate

number of gates that presently envisioned quantum computing architectures are expected

to serially perform in a year”, and that 264 sequential bit operations is “the approximate

number of gates that current classical computing architectures can perform serially in a

decade”.

Integer multiplication might seem trivial to parallelize with enough hardware: all n bits of

one input can be multiplied by all n bits of the other input in parallel; adding the resulting

n2 bits also allows massive parallelism; final carries can also be done in parallel, or skipped

with redundant representations of integers. However, this parallelism severely increases

communication costs. Specifically, handling n2 bits in parallel means that distances increase

1/2 1/2 to n, losing another factor n in communication costs beyond the factor n discussed

above.

Furthermore, the higher-level loop in Shor’s algorithm is quite difficult to parallelize. One

can use many more qubits to parallelize the multiplications by a, a2 mod N, a4 mod N,

a8 mod N, etc., but this does nothing to parallelize the initial computation of a, a2 mod N,

a4 mod N, a8 mod N, etc.

Knowing the factors of N allows parallel exponentiation, as pointed out by von zur Gathen

[13] and much later Zeugmann [14], but the problem facing the attacker is to figure out

those factors in the first place. Adleman and Kompella [1] suggested a parallel modularexponentiation

algorithm that is essentially a sieving-type discrete-logarithm algorithm run

in parallel, but this involves an intolerable amount of computation once n is moderately large.

Bernstein and Sorenson [4] slightly reduced the latency of modular exponentiation using a

polynomial number of processors in a simplified model of computation, but this incurs huge

costs in memory consumption (and in the total number of operations), presumably increasing

17

communication latency in realistic models of computation.

For comparison, NIST appears to estimate that checking a guess for an AES-128 key takes

about 215 bit operations. These operations allow considerable parallelization, so a keysearch

core carrying out a sequence of 248 key guesses will fit comfortably within the 264

latency limit. A cluster of 280 such cores will find the AES-128 key within the same latency

limit. Distributing the target to 280 cores in the first place is a nontrivial communication

problem but will still fit within the same latency limit. Similar comments apply to NIST’s

“Category 2”, finding a SHA-256 collision: parallel collision search [12] involves negligible

communication costs even with massive parallelization.

The same reasonable latency limit does not appear to allow a search for an AES-256 key

with noticeable success probability: for high success probability one needs more than 2200

cores, but there is not enough time to communicate the target to so many cores. Does

NIST’s “Category 5” implicitly assume a higher latency limit? Or does it implicitly rely

upon unrealistic assumptions about communication costs? The lack of clear definitions

of NIST’s model of computation makes it difficult to evaluate whether pqRSA fits

Categories 3–5 with gigabyte keys. The security estimates above have thus been limited to

Category 2.

Lower security: improved algorithms. Attackers will take every possible opportunity

to save logarithmic factors and constant factors: for example, there are various techniques to

use somewhat shorter exponents in Shor’s algorithm. More importantly, the fastest known

n-bit multiplication methods take only n(log n)1+o(1) bit operations.

On the other hand, all integer-multiplication methods on realistic architectures are constrained

by the Brent–Kung area-time theorem [5], which states that a chip

containing A

parallel cores cannot compute n-bit integer multiplication in time below Θ(n/ A). Asymptotically,

all known factorization algorithms cost energy at least n2.5+o(1) and have latency

1.5+o(1) at least n .

Concretely, can an attacker break a gigabyte key within a reasonable latency limit (say a

year), while at the same time having the energy costs of non-quantum computation, nonquantum

communication, quantum computation, and quantum communication all below the

energy cost of finding collisions in SHA3-256? The literature does not demonstrate this; even

if it is possible, it is not a risk for the foreseeable future; and users can use larger keys to

243 eliminate the risk. We have successfully generated a 1-terabyte pqRSA key (n = with

K = and B = 212), demonstrating feasibility of pqRSA for parameters that leave a

substantial security margin.

8.2 Factorization when factors are small

There are various factorization algorithms, such as trial division and the elliptic-curve method

(ECM), that are faster than the number-field sieve at finding small factors. These methods

are even faster than Shor’s method when factors are sufficiently small.

18

231

√2+o(1) ECM finds a prime divisor p of N using L multiplications modulo N, where L = √ √ exp( log p log

log p). Optimizations summarized in [2, Table 10.1] indicate that the

2+o(1)

is as low as 0.9 2 for p around 50 bits, but not noticeably lower; as p increases, the 2+o(1)

√ √ √2 ≈ 2125 is forced to converge to 2. If p ≈ 2512 then L0.9 2 ≈ 284; if p ≈ 21024 then L0.9 .

These multiplications are divided into L1/

√2+o(1) scalar multiplications. Each scalar multiplication

is a series of L1/

√2+o(1) multiplications modulo N, similar to the modular exponentiation

inside Shor’s algorithm. One can adjust ECM parameters to reduce latency, but this

increases the total number of operations: if each scalar multiplication is limited to Lc+o(1)

multiplications modulo N then the attacker needs L1/2c+o(1) scalar multiplications. See [2,

Figure 10.1] for a visualization of this effect.

A series of 234 multiplications modulo a 233-bit public key N is challenging to fit within

acceptable latency limits; see the analysis of Shor’s algorithm in Section 8.1. If each scalar

multiplication in ECM is limited to 234 multiplications then the number of parallel scalar

multiplications must be more, presumably much more, than

250, involving more than 283 bits, for p ≈ 2512; and

291, involving more than 2124 bits, for p ≈ 21024.

Communicating the target to all of these parallel machines involves further latency problems,

as in Section 8.1.

It seems likely that the total number of operations would already reach Category 2 for

p ≈ 2512. We leave a security margin here, without much cost in performance, by instead

taking p ≈ 21024. The total number of operations is then beyond Category 2, and it is not

clear that it is physically possible to achieve the required parallelism within the latency limit.

If latency were not an issue then Grover’s method would be applicable for sufficiently large

inputs: “GEECM” in [3] carries out a series of just L1/4c+o(1) scalar multiplications, each

being a series of Lc+o(1) multiplications modulo N. However, as NIST has commented, it is

“quite likely” that Grover’s method “will provide no advantage to an adversary wishing to

perform a cryptanalytic attack that can be completed in a matter of years, or even decades.”

Furthermore, since the underlying function inside ECM is already at the limit of latency,

the only way to apply Grover’s method is to further reduce c, further increasing the total

number of scalar multiplications.

It is important to study whether there are better quantum algorithms to find small factors,

but the current situation is that p ≈ 21024 has a comfortable security margin beyond all

known algorithms.

8.3 Attacks without factorization

The fastest algorithm known to compute the cube root of a “random-looking” integer modulo

N, given the integer and N, is to factor N into primes.

19

There are much faster algorithms when the root is created with some structure, but we avoid

such structure. We clear the top byte of the root, but, as noted above, this cannot improve

the success probability of any root-finding algorithm by a factor larger than 256.

An attack against the KEM that works for all functions H can be converted into a root-

finding algorithm with similar speed and similar success probability. Of course, this does

not eliminate the possibility of an attack that works better for a particular choice of H.

Similar comments apply to the signature system. Similar comments should apply to the

PKE system, which is based on Shoup’s “OAEP+”, although it is difficult to find a full

proof in the literature. All of these conversions should be checked carefully.

9 Advantages and limitations (2.B.6)

pqRSA is, as one of our PQCrypto 2017 referees put it, “not cheap”. The cost of pqRSA

is particularly noticeable in scenarios requiring forward secrecy. However, pqRSA has many

compensating advantages in simplicity, flexibility, and familiarity.

Unlike most post-quantum proposals, pqRSA provides encryption and signatures in a single

cryptographic primitive. pqRSA also provides more advanced cryptographic functions such

as blind signatures.

pqRSA provides much higher pre-quantum security levels than most post-quantum proposals.

pqRSA also provides much higher pre-quantum confidence than most post-quantum

proposals. Any fast algorithm to completely factor a pqRSA key can be converted into a fast

algorithm to factor traditional RSA keys: for example, the attacker takes a 2048-bit product

of two 1024-bit primes, multiplies by many more 1024-bit primes to obtain a pqrsa30 key,

and factors the result.

Finally, pqRSA benefits from the community’s extensive experience with RSA, allowing

uncommon levels of reuse of existing software and standards.

References

[1] Leonard M. Adleman and Kireeti Kompella. Using smoothness to achieve parallelism

(abstract). In Janos Simon, editor, Proceedings of the 20th Annual ACM Symposium

on Theory of Computing, May 2-4, 1988, Chicago, Illinois, USA, pages 528–538. ACM,

1988.

[2] Daniel J. Bernstein, Peter Birkner, Tanja Lange, and Christiane Peters. ECM using

Edwards curves. Mathematics of Computation, 82:1139–1179, 2013.

[3] Daniel J. Bernstein, Nadia Heninger, Paul Lou, and Luke Valenta. Post-quantum RSA.

In Tanja Lange and Tsuyoshi Takagi, editors, Post-quantum cryptography—8th inter-

20

national workshop, PQCrypto 2017, Utrecht, the Netherlands, June 26–28, 2017, proceedings,

volume 10346 of Lecture Notes in Computer Science, pages 311–329, 2017.

[4] Daniel J. Bernstein and Jonathan P. Sorenson. Modular exponentiation via the explicit

Chinese remainder theorem. Math. Comput., 76(257):443–454, 2007.

[5] Richard P. Brent and H. T. Kung. The area-time complexity of binary multiplication.

J. ACM, 28(3):521–534, 1981.

[6] Dave Dunning. Fabrics—why we love them and why we hate them (talk

slides), 2015. http://www.openfabrics.org/images/eventpresos/workshops2015/

DevWorkshop/Tuesday/tuesday_10.pdf.

[7] Thomas H¨aner, Martin Roetteler, and Krysta M. Svore. Factoring using 2n + 2 qubits

with Toffoli based modular multiplication. Quantum Information and Computation, 17,

2017. https://arxiv.org/abs/1611.07995.

[8] Michael O. Rabin. Digitalized signatures and public-key functions as intractable as factorization.

Technical Report MIT/LCS/TR-212, Massachusetts Institute of Technology,

January 1979.

[9] Ronald L. Rivest, Adi Shamir, and Leonard M. Adleman. A method for obtaining

digital signatures and public-key cryptosystems. Commun. ACM, 21(2):120–126, 1978.

[10] Arnold L. Rosenberg. Three-dimensional VLSI: A case study. J. ACM, 30(3):397–416,

1983.

[11] Victor Shoup. OAEP reconsidered. J. Cryptology, 15(4):223–249, 2002. https://

eprint.iacr.org/2000/060.

[12] Paul C. van Oorschot and Michael J. Wiener. Parallel collision search with cryptanalytic

applications. J. Cryptology, 12(1):1–28, 1999.

[13] Joachim von zur Gathen. Computing powers in parallel. SIAM J. Comput., 16(5):930–

945, 1987.

[14] Thomas Zeugmann. Highly parallel computations modulo a number having only small

prime factors. Inf. Comput., 96(1):95–114, 1992.

A Statements

These statements “must be mailed to Dustin Moody, Information Technology Laboratory,

Attention: Post-Quantum Cryptographic Algorithm Submissions, 100 Bureau Drive – Stop

8930, National Institute of Standards and Technology, Gaithersburg, MD 20899-8930, or can

be given to NIST at the first PQC Standardization Conference (see Section 5.C).”

21

First blank in submitter statement: full name. Second blank: full postal address. Third,

fourth, and fifth blanks: name of cryptosystem. Sixth and seventh blanks: describe and

enumerate or state “none” if applicable.

First blank in patent statement: full name. Second blank: full postal address. Third blank:

enumerate. Fourth blank: name of cryptosystem.

First blank in implementor statement: full name. Second blank: full postal address. Third

blank: full name of the owner.

22

A.1 Statement by Each Submitter

I, , of , do

hereby declare that the cryptosystem, reference implementation, or optimized implementations

that I have submitted, known as , is my own original

work, or if submitted jointly with others, is the original work of the joint submitters. I

further declare that (check one):

I do not hold and do not intend to hold any patent or patent application with a claim

which may cover the cryptosystem, reference implementation, or optimized implementations

that I have submitted, known as OR (check one

or both of the following):

– to the best of my knowledge, the practice of the cryptosystem, reference implementation,

or optimized implementations that I have submitted, known as

may be covered by the following U.S. and/or foreign patents:

– I do hereby declare that, to the best of my knowledge, the following pending

U.S. and/or foreign patent applications may cover the practice of my submitted

cryptosystem, reference implementation or optimized implementations:

I do hereby acknowledge and agree that my submitted cryptosystem will be provided to the

public for review and will be evaluated by NIST, and that it might not be selected for standardization

by NIST. I further acknowledge that I will not receive financial or other compensation

from the U.S. Government for my submission. I certify that, to the best of my knowledge,

I have fully disclosed all patents and patent applications which may cover my cryptosystem,

reference implementation or optimized implementations. I also acknowledge and agree that

the U.S. Government may, during the public review and the evaluation process, and, if my

submitted cryptosystem is selected for standardization, during the lifetime of the standard,

modify my submitted cryptosystem’s specifications (e.g., to protect against a newly discovered

vulnerability).

I acknowledge that NIST will announce any selected cryptosystem(s) and proceed to publish

the draft standards for public comment.

I do hereby agree to provide the statements required by Sections 2.D.2 and 2.D.3, below, for

any patent or patent application identified to cover the practice of my cryptosystem, reference

implementation or optimized implementations and the right to use such implementations for

the purposes of the public review and evaluation process.

I acknowledge that, during the post-quantum algorithm evaluation process, NIST may remove

my cryptosystem from consideration for standardization. If my cryptosystem (or the derived

cryptosystem) is removed from consideration for standardization or withdrawn from consideration

by all submitter(s) and owner(s), I understand that rights granted and assurances made

under Sections 2.D.1, 2.D.2 and 2.D.3, including use rights of the reference and optimized

implementations, may be withdrawn by the submitter(s) and owner(s), as appropriate.

23

Signed:

Title:

Date:

Place:

24

A.2 Statement by Patent (and Patent Application) Owner(s)

If there are any patents (or patent applications) identified by the submitter, including those

held by the submitter, the following statement must be signed by each and every owner, or

each owner’s authorized representative, of each patent and patent application identified.

I, , of ,

am the owner or authorized representative of the owner (print

full name, if different than the signer) of the following patent(s)

and/or patent application(s):

and do hereby commit and agree to grant to any interested party on a worldwide basis, if

the cryptosystem known as is selected for standardization, in consideration

of its evaluation and selection by NIST, a non-exclusive license for the purpose of

implementing the standard (check one):

without compensation and under reasonable terms and conditions that are demonstrably

free of any unfair discrimination, OR

under reasonable terms and conditions that are demonstrably free of any unfair discrimination.

I further do hereby commit and agree to license such party on the same basis with respect

to any other patent application or patent hereafter granted to me, or owned or controlled by

me, that is or may be necessary for the purpose of implementing the standard.

I further do hereby commit and agree that I will include, in any documents transferring

ownership of each patent and patent application, provisions to ensure that the commitments

and assurances made by me are binding on the transferee and any future transferee.

I further do hereby commit and agree that these commitments and assurances are intended by

me to be binding on successors-in-interest of each patent and patent application, regardless

of whether such provisions are included in the relevant transfer documents.

I further do hereby grant to the U.S. Government, during the public review and the evaluation

process, and during the lifetime of the standard, a nonexclusive, nontransferrable, irrevocable,

paid-up worldwide license solely for the purpose of modifying my submitted cryptosystem’s

specifications (e.g., to protect against a newly discovered vulnerability) for incorporation into

the standard.

Signed:

Title:

Date:

Place:

25

A.3 Statement by Reference/Optimized Implementations’

Owner(s)

The following must also be included:

I, , , am the

owner or authorized representative of the owner of the submitted

reference implementation and optimized implementations and hereby grant the U.S.

Government and any interested party the right to reproduce, prepare derivative works based

upon, distribute copies of, and display such implementations for the purposes of the postquantum

algorithm public review and evaluation process, and implementation if the corresponding

cryptosystem is selected for standardization and as a standard, notwithstanding that

the implementations may be copyrighted or copyrightable.

Signed:

Title:

Date:

Place:

26

#include <stdio.h>

#include <stdlib.h>

#include "cpucycles.h"

#include "crypto_kem.h"

#define TIMINGS 5

long long t[TIMINGS + 1];

void printtimings(const char *s)

{

long long i;

long long j;

long long above;

long long below;

printf("%lld %s",(long long) crypto_kem_PUBLICKEYBYTES,s);

for (i = 0;i < TIMINGS;++i) t[i] = t[i + 1] - t[i];

for (j = 0;j < TIMINGS;++j) {

above = below = 0;

for (i = 0;i < TIMINGS;++i) if (t[i] < t[j]) ++below;

for (i = 0;i < TIMINGS;++i) if (t[i] > t[j]) ++above;

if (below <= TIMINGS/2 && above <= TIMINGS/2) {

printf(" %lld (%lf)",t[j],t[j] / (1.0 * crypto_kem_PUBLICKEYBYTES));

break;

}

}

for (i = 0;i < TIMINGS;++i) printf(" %lld",t[i]);

printf("\n");

}

Figure 1: A benchmarking program with limited double-checking. See Figure 2 for second

half.

27

int main()

{

unsigned char *sk = malloc(crypto_kem_SECRETKEYBYTES);

unsigned char *pk = malloc(crypto_kem_PUBLICKEYBYTES);

unsigned char *c = malloc(crypto_kem_CIPHERTEXTBYTES);

unsigned char *k = malloc(crypto_kem_BYTES);

long long i;

if (!sk) abort();

if (!pk) abort();

if (!c) abort();

if (!k) abort();

t[0] = cpucycles();

crypto_kem_keypair(pk,sk);

t[1] = cpucycles();

t[1] -= t[0];

printf("%lld keypair %lld (%lf)\n",(long long) crypto_kem_PUBLICKEYBYTES

,t[1],t[1] / (1.0 * crypto_kem_PUBLICKEYBYTES));

for (i = 0;i < TIMINGS + 1;++i) {

t[i] = cpucycles();

crypto_kem_enc(c,k,pk);

}

printtimings("enc");

for (i = 0;i < TIMINGS + 1;++i) {

t[i] = cpucycles();

crypto_kem_dec(k,c,sk);

}

printtimings("dec");

return 0;

}

Figure 2: A benchmarking program with limited double-checking. See Figure 1 for first half.

28


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp