2WF80 -- Introduction to cryptology - Winter 2019

Contents Announcements Exams Literature Pictures and slides Old exams

Tanja Lange
Coding Theory and Cryptology
Eindhoven Institute for the Protection of Information
Department of Mathematics and Computer Science
Room MF 6.104B
Technische Universiteit Eindhoven
P.O. Box 513
5600 MB Eindhoven
Netherlands

Phone: +31 (0) 40 247 4764

The easiest ways to reach me wherever I am:
e-mail:tanja@hyperelliptic.org

This page belongs to course 2WF80 - Introduction to cryptology. This course is offered at TU/e as part of the bachelor's elective package 'Security'. The official page is here.

Contents
Classical systems (Caesar cipher, Vigenère, Playfair, rotor machines), shift register sequences, DES, RC4, RSA, Diffie-Hellman key exchange, cryptanalysis by using statistics, factorization, attacks on WEP (aircrack).

Some words up front: Crypto is an exciting area of research. Learning crypto makes you more aware of the limitations of security and privacy which might make you feel less secure but that's just a more accurate impression of reality and it a good step to improve your security.
Here is a nice link collection of software to help you stay secure https://prism-break.org/en/ and private https://www.privacytools.io/.

Announcements

If you study mathemtics, you should have participated in "2WF50 - Algebra" and "2WF70 - Algorithmic algebra and number theory".
If you study computer science or any other program you should have participated in "2DBI00 - Linear Algebra and Applications", "2IT50 or 2IT80 - Discrete structures", and "2WF90 - Algebra for security" before taking this course.
If not you can find some material in the Literature section but note that you are on your own for learning this.

All lectures take place Mondays 10:45 - 12:30 in AUD 1 and Thursdays 13:45 - 17:30. in AUD 1. There is a holiday break between Christmas and New year so that there are no lectures between 23 Dec and 03 Jan. There will also be no lectures on 06 Jan and 09 Jan but on 13 Jan and 16 Jan.

The teaching assistants for this course are:

Literature and software

It is not necessary to purchase a book to follow the course.

For some background on algebra see

Some nice books on crypto (but going beyond what we need for this course) are For easy prototyping of cryptoimplementations I like the computer algebra system Sage. It is based on python and you can use it online or install it on your computer (in a virtual box in case you're running windows).
For encrypting your homeworks you should use GPG/PGP. If you're running linux then GnuPG is easy to install. If you're using windows I recommend using GPG4win; if you're using MAC-OS you can use GPG Suite. We are OK with having only the attachment encrypted, but for proper encryption of your email you might want to look into Enigmail which works well with Thunderbird.

Examination

30% of the grade is determined by homeworks. There will be four sets of homework during the quarter. You may hand in your homework in groups of 2 or 3. To make sure that you get used to crypto we require solutions to be sent encrypted with GPG/PGP. Each participant must have communicated with me at least once using GPG/PGP. You can find my public key for tanja@hyperelliptic.org on the key servers and on my homepage here. You can find the keys for the TAs linked above or also on the key servers.

The exam took place on 27 January 2020, 13:30 - 16:30. The retake was scheduled for April 14, 18:00 - 21:00 but will be replaced by oral exams because of Covid-19. The exam accounts for the remaining 70% of the grade. You may use a simple (non-programmable) calculator but no cell-phones or other devices containing calculator applications. You may not use books or your class notes.
You should have received an email assigning you to one of the exam rooms. Please make sure to go to your exam room.

Here is a test exam. Note that the CRT exercise would have somewhat smaller keys for the exam. See below for old exams.

Class notes

This section will fill in gradually after the lectures. I'll provide short summaries and links to pictures of the blackboards. The homeworks will be posted here as well.

11 Nov 2019
Substitution cipher, Caesar cipher, Viginere, start of Playfair system. Some statements about the number of possible keys for these schemes.
Pictures of black boards are here.

14 Nov 2019
Here is the exercise sheet for block 5 and 6: exercise-1.pdf. See also the raw data if paste fails.

For most of the exercises the solution is obvious when you have it. There are many more pages on the web with tools for cryptanalysis of classical ciphers, e.g. https://www.guballa.de/vigenere-solver, http://www.braingle.com/brainteasers/codes/index.php, http://www.cryptool-online.org.

In the lecture we discussed how to break the Hill cipher given some plaintext-ciphertext pairs and in particular repeated the extended Euclidean algorithm.

We discussed a bit about rotor machines and cryptanalysis. I showed an Enigma webpage where you can click around a bit. We currently have one rotor machines from the Cryptomuseum on show in the MetaForum (near the evelvator on floor 6) and take a look at their website (and museum if you get a chance).

We discussed the column transposition cipher (see pictures). One-time pad with bits, and problems with reuse, in particular with formatted messages.

Finally, we discussed stream ciphers. These are much more practical than the OTP in that the key is much shorter. To encrypt a message, expand the key into a stream of pseudo-random bits and xor those to the message, i.e., treat the stream-cipher output as the one-time pad. To encrypt multiple message it becomes necessary to remember how many bits have been used and either stay in that state or forward by that many positions the next time one uses the cipher. This is impractical. Initialization Vectors (IVs) deal with that problem in that they move the beginning of the stream to a random postion. The IV is then sent in clear along with the ciphertext, so that the receiving end can compute the same starting position. More on this on Monday.
Pictures of black boards are here.

18 Nov 2019
Feedback shift registers and how to use them for encryption; state and period, pre-period(=tail), ultimately periodic sequences, If the sequence is periodic with period r and si = si+l for all i then r divides l.
Linear feedback shift registers (LFSRs), want c0=1 (else there is just a delay in output). . I didn't show this but one can run the LFSR backwards and thus the is periodic, not just ultimately periodic.
The zero state produces the zero sequence, thus the max period is 2^n-1, relation to matrix multiplication. I stated the characteristic polynomial of the matrix, the proof will be on the exercise sheet on Thursday.
Note that we did the last part only for characteristic 2, in general we need to pay attention to signs and P(x)=det(xI-C).
Somebody asked me about literature. There is not a single book that covers all to the extend that I like. For LFSR you can find what we do (and more) in the above-listed books by Lidl/Niederreiter and by van Tilborg.
Pictures of black boards are here.

21 Nov 2019
Here is the exercise sheet for block 5 and 6: exercise-2.pdf.

Explanation of public-key cryptography for encryption and signatures. To submit your homeworks you will need to use PGP/GnuPG. See above for software suggestions. Note that the sender needs the public key of the receipient in order to send a message. This means that you need to send your public key to the TAs so that they can reply to you.

Quick primer on matrices, covering eigenvalues, eigenvectors, the characterisitic polynomial P(x) and that P(C)=O.

Defintion of irreducible polynomials, the definition of order of a polynomial is on the exercise sheet. We summarized the results of exercise 1 to derive some conjectures on the periods and factorization patterns (see blackboard pictures). Because P is C's characteristic polynomial; Order of C matches order of its characteristic polynomial because P(C)=0 means C and x play the same role, i.e. we compute the powers of C modulo P.

We captured some more conjectures from the examples: An irreducible polynomial generates a sequence with period dividing 2^k-1. The sequence {si+ti} has period the lcm of the periods of {si} and {ti}. We will get back to these on Monday.

Further reading on finite fields and LFSRs, is in Lidl/Niederreiter (see literature section), van Tilborg (see literature section). or David Kohel's lecture notes.

You should recognize factors x, x+1, x^2+x+1, x^3+x+1, and x^3+x^2+1 as factors of polynomials. From 2WF50 or 2WF90 you should know Rabin's irreducibility test, else look it up.

Pictures of black boards are here.

The first homework is due on 28 November 2019 at 13:30. Here is the first homework sheet. Please remember to submit your homework by encrypted, signed email to the TAs. Don't forget to include your public key and those of your team mates.

25 Nov 2019
Proof that the maximal period equals the order of C, the matrix used for updating the state. We have ord(C)=ord(P) as x plays the same role as C. An irreducible polynomial is called primitive if it generates a sequence of period 2^k-1, i.e. the maximum possible. If P(x) is irreducible, its order divides 2^k-1. We showed this by using the Rabin irreducibility test. This means that for computing the order of an irreducible polynomial of degree d we only need to test for divisors of 2^d -1. If P(x) is not primitive but irreducible then all nonzero states lead to the same period; this only means the same length, they split into different subsequences.

I explained how to use an LFSR as a stream cipher and why we want a large period. However, for hardware reasons it might be better to go for sums of LFSRs with smaller state.

I didn't proof but stated as a result: The characteristic polynomial of the sequence {si+ti} is the lcm of the characteristic polynomials of the two sequences. This means, we can now analyze LFSRs by analyzing the irreducible factors of their characteristic polynomials. To get good examples, make sure that the factors are all primitive, then the largest period is as large as it can get, namely (2^m -1)*(2^n-1) for degrees m and n. We did not study how to handle repeated factors but from the examples on Thursday this does not look like a good idea.

One can recover the state of an LFSR from k output bits and then compute all future (and past) outputs. If the polynomial P is unkown we need to solve a linear system in the c_i, which needs 2k consecutive outputs. If k is unknown, attack for k=1, k=2, ... For each of them compute the candidate P and check for consistency with the following outputs to get confirmation or contradictions.

LFSRs are used in practice because they are small and efficient, but they need a non-linear output filter. As an example I mentioned Grain which is one of the finalists of the eStream competition on stream ciphers. Grain uses an LFSR together with a NFSR and an output function.

As a bad example I mentioned A5/2 which is a stream cipher used in GSM encryption. I mentioned a bit of the weird history of it but forgot to mention that the design was secret. One of the first postings on it with some details on the history (note that the original link does not work now) an attack idea for A5/1 by Ross Anderson is from 1994, but lots of details were missing. The full algorithm descriptions of A5/1 and its purposefully weakened sibling A5/2 were reverse engineered and posted in 1999 by Marc Briceno, Ian Goldberg, and David Wagner. The same group also showed a devastating attack on A5/2, allowing for real-time decryption. Sadly enough, the A5 algorithms allow downgrade attacks, so this is a problem for any phone which has code for it, which is most until recently. Also A5/1 does not offfer 2^54 security (54 bits is the effective key length) but only 2^24 (with some precomputation/space). However, A5/2 is broken even worse, in 2^16 computations, with efficient code online, e.g A5/2 Hack Tool.

A nice overview of lightweight ciphers, including more modern and less broken ones is given by Alex Biryukov and Leo Perrin.

Pictures of black boards will be linked here once I get the pictures. are here.

28 Nov 2019
Here is the exercise sheet for block 5 and 6: exercise-3.pdf. Stan was nice and wrote an implementation in pure python.

RC4 was a secret design, available only as black box implementation. Soon after it was leaked as "arcfour" weaknesses were found. Some are so obvious that we found them in the exercise session. RC4 has a strong bias towards 0 in the second byte. We saw this experimentally and also gave the explanation. The first byte has a strong bias towards the first key byte unless the first key byte it zero. You can also show this by tracing the state vector S.

RC4 does not provide for refreshing the key stream, so one needs to remember the last values for j and i. WEP needs a place to put some per connection data and uses key bits for that, so we get the known biases plus known key bits plus known plaintext/ciphertext pairs. Aircrack uses these to break WEP encryption. (Sorry, I wrote WPA, which is the newer protocol; this should have been WEP).
I showed some slides on more biases of RC4 by Daniel J. Bernstein. They are available here.

Better streamciphers exist, e.g. the final portfolio from the eSTREAM competition has held up well.

Cryptographic hash functions need to provide preimage resistance, second preimage resistance, and collision resistance.

Pictures of black boards are here.

The second homework is due on 05 December 2019 at 13:30. Here is the second homework sheet. Please remember to submit your homework by encrypted, signed email to the TAs. Don't forget to include your public key and those of your team mates. Do not submit as a singleton, the minimum group size is 2.
Addendum on 30 Nov: there was a typo in exercise 1. It has been fixed and marked in red on the exercise sheet. The correct ciphertext is 10100100100010111010.

02 Dec 2019
If the output of the hash function has n bits then finding a collision takes on average 2n/2 trials (use the birthday paradox to see this) and finding a preimage or second preimage takes on average 2n trials.

We covered design of hash functions using the Merkle-Damgaard construction and how this enables length-extension attacks. We used this as a feature in computing more SHA-1 iterations per second when searching for near collisions. See our writeup for more details. But there are also situtations when this property is not welcome. One can deal with it by insisting on a fixed pading in the last block and checking for that.

Stream ciphers are susceptible to attacks flipping bits in the ciphertext, which cause the same bits to flip in the plaintext. A fingerprint protects against accidental bit filps, but a proper Message Authentication Codes (MACs) need to resist adversarially chosen changes. Communicating parties A and B need an authentication key along with the encryption key. The easiest version of a MAC is to use a hash function to compute cryptographic checksum over the authentication key and the ciphertext. We want to have checksum on the ciphertext for easy and quick rejection of forged packets = Encrypt, then MAC. If you would use the simple MAC you would run into trouble with length-extension attacks. There are many MACs either as standalone constructions ore integrated into a block cipher + mode as in AES-GCM.

Short summary of hash functions: MD4 is completely broken; for MD5 it's esasy to find collisions, first SHA-1 collisions were computed in 2017 (see https://shattered.io/). SHA-256, SHA-512 and SHA-3 (and the other SHA-3 finalists) are likely to be OK.

Block cipers. ECB (electronic code book) mode encrypts each block separately, this means that identical blocks encrypt the same way. A famous example of how weak this is is the ECB penguin. This means it is necessary to change the encryption so that identical plaintexts do not lead to identical ciphertexts. We'll cover more resaonable modes in the next lecture.

Pictures of black boards are here.

05 Dec 2019
Here is the exercise sheet for block 5 and 6: exercise-4.pdf.

Some details on DES, including how to decrypt a Feistel cipher. S-box is non-linear part; the NSA strengthened the original design (Lucifer) agains differential attacks. In the exercises we saw that small changes in the input lead to big changes in the output. 56 bits for the key is not secure enough! First brute force attack was done with "DES Cracker" for 250k USD. In 2006 a team from Bochum and Kiel built COPACOBANA which can break DES in a week for 8980 EUR (plus some grad-student time).

Quick comment on PRESENT and that it uses a single S-box and on the current standard AES.
Discussion of brute force attacks on DES. Only 2^56 trials for complete key search. 2-DES is only marginally harder to break than DES, taking 2^57 with a divide-and-conquer approach. Still common use is 3-DES, sometimes with k1=k3. Full 3-DES needs 2^112 steps to break, there are some more weaknesses in 2-key 3-DES.

More reasonable modes than ECB are CBC, CFB, OFB, and CTR. These modes ensure that identical plaintext blocks do not lead to identical ciphertext blocks. I commented on how easy it is to de/encrypt locally, by that I mean how much effort is needed to de/encrypt block i. In OFB you only need data C_i but also i+1 times the encryption of IV which makes this not local. For CTR mode I only wrote the counter on the board. Normally the input to Enc is split into a random nonce (number used once) followed by a counter, so e.g. 64 bit nonce N and 64 counter. The nonce ensures that the keystream does not repeat when the same key is used again. The nonce and position are sent in clear.

When you use symmetric crypto, make sure to include a MAC!

Pictures of black boards are here.

The third homework is due on 12 December 2019 at 13:30. Here is the third homework sheet. Please remember to submit your homework by encrypted, signed email to the TAs. Don't forget to include your public key and those of your team mates. Do not submit as a singleton, the minimum group size is 2.
You will get the last homework next week but will have more time for it.

09 Dec 2019
Lecture given by Jonathan Levin
General explanation of public key cryptography (encryption and signatures) and the motivation of why public-key encryption is necessary for key management.
Everybody can encrypt to a public key but only the private key can decrypt. Public-key signatures have a public verification key and a private signing key. Anybody can verify the signature (given the message and the public key) but only the private key can make valid signatures.

Public-key encryption requires 3 algorithms: Key generation, encryption, and decryption. Signatures also require 3 algorithms: Key generation, signing, and verification.

RSA encryption: public key for RSA is (n,e), private key is (n,d), where n=pq for two different primes p and q, φ(n)=(p-1)(q-1) and d is the inverse of e modulo φ(n). We showed how to encrypt and decrypt and why this works. Attention, this is schoolbook RSA, do not use this in practice.
If you don't remember how to compute modulo an integer or what φ(n) is, now is a good moment to catch up on this.

Exponentiation by square and multiply, this takes l squarings and as many multiplications as the exponent ehas bits set to 1; examples. In RSA we can choose small and sparse e but d must be large/random.

There are several problems with schoolbook RSA: RSA is homomorphic, which defeats some security notions (see next lecture when we motivate why having a decryption oracle is a reasonable assumption).
Furthermore, we can recover a message that is sent to multiple people, if they all use the same small exponent e.

Pictures of black boards are here.

12 Dec 2019
Here is the exercise sheet for block 5 and 6: exercise-5.pdf.

Lecture given by Jonathan Levin
Short recap of RSA sigantures and encryption.
Attack on RSA using gcd computation (problem for any way of using key, and this happened for real, see https://factorable.net/) for an internet-wide scan and several factored keys. Tanja was involved in finding a similar case in the RSA keys of Taiwanese Citizen Cards, see https://smartfacts.cr.yp.to/.
More issues with schoolbook RSA: can decrypt linearly related messages (full details on the board), or messages related via a known function.
The bottom line is that one should not use schoolbook RSA.

Security notions and attack definitions for encryption and signatures:
These formalize what we consider an attack and what powers the attacker has.
In a full break the secret key is recovered. For signatures, the attack goal is to forge signatures, this could mean to generate any valid (m,s) pair (existential forgery) or to generate valid (m,s) for a meaningful message m (universal forgery).
For encryption the goal is to recover plaintext from ciphertext; we typically request that a scheme is so strong that the attacker learns no information about the plaintext from the ciphertext; this is called semantic security.

The abilities of the attacker vary; for signatures it might be a key-only attack (KOA), a known message attack (KMA), or a chosen message attack (CMA). In the latter two cases the attacker sees valid signature pairs (m,s); in CMA the attacker can choose for which messages he sees sigantures. The homomorphic property gives existential forgery under KMA: (m1*m2, s1*s2) is a valid signature different from the observed ones. This works as long as the attacker sees at least one message (he can use m1=m2). It also gives universal forgey under CMA: in order to create a signature on m the attacker asks to see a signature on m*2^e. He receives (m^d*2,s) and then obtains a valid signature on m as (m,s/2).

For encryption the attacker may do a chosen-plaintext attack (CPA) or a chosen-ciphertext attack (CCA). The typical security requirement is ciphertext indistinguishability: the attacker chooses two messages m0 and m1 and is then presented with a ciphertext c which encrypts one of m0 and m1. The attacker wins if he correctly guesses which, i.e., if he can distinguish the ciphertexts. To deal with a 50% chance of guessing, the advantage of the attacker is defined as the extra chance on top of the 50%. See the pictures for the full formal definition.
There are two versions of CCA security: in CCA-I the attacker gets to request decryptions of arbitrary ciphertexts until he sees c; in CCA-II the attacker can request decryptions of ciphertexts c' (not equal to c) also after receiving c.
Because schoolbook RSA is deterministic, it is not even CPA secure: the attacker can simply encrypt m0 and m1 himself and compare the results to c..

A security requirement besides indistinguishability is one-wayness -- it should not be possible to decrypt unless one has the key. The same scenarios (CPA, CCA-I, CCA-II) as for IND apply. Schoolbook RSA is not is not OW CCA-II secure, because of the homormophic property: to decrypt c the attacker can ask for the decryption of c'=c*2^d, obtain m' and get m = m'/2.

To make RSA a randomized encryption one uses some padding; however this is also erro prone. PKCS v1.5 is a negative example which is broken using Bleichenbacher's attack. Take a look at https://robotattack.org/ for a recent use of Bleichenbacher's attack in practice. You should be able to understand details of the full paper Return Of Bleichenbacher's Oracle Threat. RSA-OAEP is a better padding scheme.

Pictures of black boards are here.

The fourth and last homework is due on 09 January 2020 at 13:30. Here is the homework sheet. At this point you can solve the first half of the exercises, you will have all material necessary by next Thursday.
Please remember to submit your homework by encrypted, signed email. Don't forget to include your public keys.

16 Dec 2019
Differences between MACs and signatures.

Factorization methods: trial division, factoring numbers of the form p*nextprime(p+1), Fermat factorization, p-1 method.

Namedropping of other factorization methods, see also http://facthacks.cr.yp.to/ for descriptions and code snippets. I mentioned that Nadia Heninger put up some nice code for factorization as a service.

Diffie-Hellman key exchange in different groups, including some insecure ones. A good choice is to use the intergers modulo a large prime or elliptic curve crypto (not covered in this class).
CDHP, DDHP, DLP, relations between these problems. Very quick run through Baby-Step Giant-Step algorithm with an example in F_53. I have copied the Pari-GP session.

BSGS is an algorithm to compute discrete logarithms in a cyclic group with generator g, i.e. given g and h_A =g^a it computes a.
Put m=floor(√n), where n is the order of g. BSGS computes all powers of the generator from g^0=1 up to g^(m-1), these are the baby steps (incrementing by 1 in the exponent). Then it computes k=g^(-m) and checks for each h_A * k^j for j = 0,... whether it is in the list of baby steps; these are the giant steps (moving by m in the exponent). If a match happens, we have g^i=h_A * k^j = g^a*g^(-mj), thus a=i+mj.

Pictures of black boards are here.

19 Dec 2019
Here is the exercise sheet for block 5 and 6: exercise-6.pdf.

Recap of order, definitions of F_(p^n) and Z/(p^n), how to check that an element is a generator/has maximal order.

Repettition of BSGS with more details and cost analysis. Any system based on DLP has at most squareroot of the group oder hardness of the DLP. For elliptic-curve groups that's also the best known attack complexity while there are faster attacks on finite-field DLP which reduce the complexity to that of RSA numbers of the same size.

In 2016 I wrote some slides for this lecture. You might find them interesting as a different way to explain BSGS.

We worked out how we can test whether the DL is even or odd and how to turn this into an attack on DDH that succeeds with non-negligible probability. A way out is to work in subgroups of F_p which have prime order. If you stay on for 2MMC10 you will encounter the Pohlig-Hellman attack, an attack that reduces the security to that of the largest prime-order subgroup, so restricting to it does not give up any security.

DH has a problem with active man-in-the-middle attacks. Eve can establish communications with A and B and pass messages between them so that each of them is convinced they are securely talking to the other party while Eve gets all plaintext. This does require Eve to stay in the middle of the conversation and decrypt and re-encrypt all messages.

Semi-static DH has A have a fixed key which B knows to be authentic. B picks fresh random values d for each neaw interaction to get key freshness. If both A and B should be authenticated they can use the triple-DH handshake to combine long-term and short-term keys.

Pictures of black boards are here.

Enjoy your holidays. If you want to do some crypto take a look at the old exams (below). Email me if you have questions or think you have solutions to old exams (= send me scans of your solutions and I'll send comments back).

Remember, there will be no lectures on 06 and 9 Jan.

13 Jan 2020
Needham-Schroeder authentication protocol and why it doesn't actually prove to B that he is talking to A. Triple DH or DH+ signatures achieves authentication and key freshness.

Explanation of KE (Key Encasulation mechanism) and how that works for DH and is the more modern security notiion. A KEM is used together with a DEM (Data Encapsulation Mechanism to send messages after the key is obtained. Semi-static DH is an example of a KEM

ElGamal encryption (for historical purposes), this is not CCA-II secure: asking for (g^r,2c) can be used to decrypt (g^r,c). The homomorphism might be what you want, but otherwise you should really not use this but use static-DH instead.

ElGamal signatures and why they work.

Shamir secret sharing: allows to share a secret in a t-out-of-n fashion so that any set of t people can recover it; works >y picig a raondom degree-(t-1) plynomial with constant term as the secret and giving each user a share (i,f(i)) for non-repeating and nonzero i. Recoering the secret works by simple Lagrange interpolation.

Note that the secret never needs to be re-computed -- for applications in RSA or DH the shares can be applied individually and then only the per-message secrets be combined. Also note that there is no need to ever have the secret -- it can be generated from t shares; these shares are then re-shared in a t-out-of-n fashion.

Pictures of black boards are here.

16 Jan 2020
Here is the exercise sheet for block 5 and 6: exercise-7.pdf.

We spent the lecture hours on exam prep (see pictures): E.g. how to deal with factorization of polynomials, order of polynomials, periods of LFSRs. The summary is that you can put in time or brain. Please double check your results of factoring! Example of LFSR, example of BSGS.

Pictures of black boards are here.

Old Exams

This course was given for the first time in Q2 of 2014. Here are my exams so far