Comment by 201984

3 days ago

In the context of encrypting 32 or 64 bit IDs, where there is no nonce, that'd be equivalent to XOR encryption and much weaker than TFA's small block ciphers.

10 comments

201984

adrian_b 3 days ago

If you really want to encrypt and decrypt 32-bit numbers without having any nonces available, the fastest way on non-microcontroller CPUs remains using the AES instructions.

You can exploit the fact that the core of AES consists of 32-bit invertible mixing functions. In order to extend AES to 128-bit, a byte permutation is used, which mixes the bytes of the 32-bit words.

The AES instructions are such, that you can cancel the byte permutation. In this case, you can use the AES instructions to encrypt separately four 32-bit words, instead of one 128-bit block.

Similarly by canceling the standard byte permutation and replacing it with separate permutations on the 2 halves, you can make the AES instructions independently encrypt two 64-bit words.

These AES modifications remain faster than any software cipher.

How to cancel the internal permutation and replace it with external shuffle instructions was already described in the Intel white paper published in 2010, at the launch of Westmere, the first CPU with AES instructions.

201984 3 days ago
Are you certain using AES is still faster? Let's say for a 32-bit block size and 64-bit key.
From https://en.wikipedia.org/wiki/Speck_(cipher), that Speck combination would use 22 rounds, and using the instruction timings for Zen 5 from https://instlatx64.github.io/InstLatx64/AuthenticAMD/Authent..., it looks like each round would take at most 3 cycles. (Dependency chain for each round is 3 instructions long, ror+add+xor). 22*3 = ~66 cycles.
Using AES with a pshufb to take out the ShiftRows step would be 2 cycles for the pshufb and 4 cycles for each aesenc, and at 10 rounds, you have ~60 cycles.
It's quite close, and to say which one wins, we'd need to actually benchmark it. One is not clearly much faster than the other.
- adrian_b 2 days ago
  
  Standard AES-128 has a throughput of around 16 bytes per 8 clock cycles or even less in recent CPUs, because they can do 2 or 4 AES instructions per clock cycle (in the modes of operation that are not limited by latency).
  AES-128 can be easily modified to independently encrypting four 32-bit words per execution, instead of one 128-bit block, by cancelling the byte permutation that extends the AES mixing function from 32-bit to 128-bit. this would increase the throughput at least twice, depending on whether PSHUFB is done concurrently or not.
  You have given the latencies of the instructions, not their throughput. When you use AES in such a way that you are limited by latency, that is normally wrong. The cryptographic libraries have multi-buffer functions, which compute e.g. 8 AES values, so that they are not limited by latencies.
  Regarding the parent article, if you want an unpredictable identifier for a record, you should not do this by encrypting some value with the intent of decrypting it in the future. Instead of this, you should use as identifier an unpredictable random number. Such identifiers can be generated with AES in batches, at maximum throughput, and stored until they are needed for assignment to a record.
  If you need in your record some information like time of creation or a monotonically increasing number, which you consider private, such information should be put in distinct fields, that you do not give externally, instead of attempting to encrypt them in a record identifier, which would need to be decrypted to access such information.
  
  1 reply →
- botusaurus 3 days ago
  
  maybe the reason they are so close is that the AES microcode is inplementing exactly those operations
  
  1 reply →

fluoridation 3 days ago

Would it, though? Either way you're operating in ECB mode with 2^32 or 2^64 values. Why is one more secure than the other?

EDIT: What I mean is you can do cypher = truncate(plain ^ AES(zero_extend(plain))).

201984 3 days ago
>EDIT: What I mean is you can do cypher = truncate(plain ^ AES(zero_extend(plain))).
How would you decrypt that though? You truncated 3/4ths of the AES output needed to decrypt it.
I thought you were suggesting this:
ciphertext = truncate(AES(key) ^ plaintext)
And in this case, since AES(key) does not depend on the plaintext, it would just be XOR by a constant.
- adrian_b 2 days ago
  
  The first examples in the parent article do not require decryption. They only require unpredictable random numbers that are unique.
  If uniqueness is needed for a 32-bit number or a 64-bit number, then in AES-128 the byte permutation can be modified, to reduce the block size accordingly.
  For the other examples with record identifiers, I am not sure whether the author meant for them to ever be decrypted. If decryption was intended, I disagree that this is the right solution. If an opaque record identifier is desired, it should be an unpredictable random number, which can be generated at maximum speed with AES. There is no need to ever decrypt such an identifier.
  If other private information is needed, like a record counter, it should be put in separate fields, that are not provided to external entities, instead of encrypting it inside the identifier. Encrypting such private information would prevent its use in indexing anyway.
- fluoridation 3 days ago
  
  You're right, my bad. I guess if you have strict size requirements it does make sense to use small block sizes.