Cypher User Manual

BACK | INDEX | NEXT

Background

Types of Ciphers

There are two broad categories of ciphers:substitution ciphers and transposition ciphers. Substitution ciphers are characterized by a substitution of a character in the original plain text messageby the corresponding character in the cipher alphabet, often according to some protocol derived from a predetermined key. In a substitution cipher, the cryptographer is not concerned with changing the position of the characters being encoded, only their values. Transposition ciphers, on the other hand, retain a characters true form but instead change its position. The subject of substitution ciphers is addressed first. Monoalphabetic Substitution Ciphers In a monoalphabetic substitution cipher a single character in the plaintext alphabet is replaced by a single character in the cipher alphabet. If the language of the plain text message is known (and it is assumed to be English in this software), then the frequency statistics of the occurrence of each letter in the plaintext message are also knownby analyzing samples of that language. In English, the most frequently occuring letter is e at a 12.3% count, followed by t (9.6%) and then a (8.0%) ; a complete reference to the order statistics of the English language isincluded in the software. After analyzing the order statistics of a message encrypted by a monalphabetic substitution cipher,a fairly close mapping of the most frequently occurring letters in theciphertext to the corresponding most frequently occurring letters in the plaintext language often yields most of the correct substitutions. The incorrect assignments can be corrected by visually (or algorithmically) finding patterns within the text, often by recognizing commonly occuring words such as "and" or "the". Once a few letters are deciphered, the rest will follow as a natural consequence of the order statistics and new recognizable words within the partially decrypted message. If the message does not yield readily to a single-character frequency analysis, the frequencies of digrams (two-character combinations) and trigrams(three-character combinations)can also be analyzed and compared to the known statistics in the English language.

One common form of the monalphabetic substitution cipher is the Caesar shift, in which the encrypter chooses a keyword or key phrase (simply referred to as the "key") that is easy to remember and readily yields the cipher alphabet. As an example, consider the plain text alphabet to be "abcdefghijklmnopqrstuvwxyz" and let the keybe the word "cypher". Then the cipher alphabet would look like "cypherabdfghijklmnoqstuvwxz", simply the key (with letter repetitions removed) followed by the rest of the normal alphabet (also with repetitions removed). This type of cipheris easilysolved by examining what the software calls a "shift histogram", or a histogramof the cipher alphabet with the assumption that the cipher and plain text alphabets are the same. Since the ordered histogram of lettersin the English language is known it can be compared to the results of theciphertext analysis, and peaks and troughs in the graphs can be matched to yield a reasonable guess at the key and thus the cipher alphabet.

Polyalphabetic Substitution Ciphers A polyalphabetic substitution cipher is fairly similar to a monalphabetic one, the difference being that instead of using a single cipher alphabet, multiple cipher alphabets are used.The simplest type of polyalphabetic cipher, the Vigenere cipher, can use as many as twenty-six distinct cipher alphabets, each a simple rotation of the normally ordered alphabet. For example, one alphabet might be "abcdefghijklmnopqrstuvwxyz", another "bcdefghijklmnopqrstuvwxyza", another "cdefghijklmnopqrstuvwxyzab" and so on. The Vigenere encryption works on the basis of a keyword, whose letters determine the order and character of the cipher alphabets used to encode a message.

As a simple example, to encrypt a message using the keyword "cypher" start by translating the first character of the message according to the cipher alphabet corresponding to the first letter of the key word, or "cdef". If the plaintext character is b then it becomes d according to this known cipher alphabet. To encrypt the second letter of plain text, use the cipher alphabet corresponding to the second letter of the keyword, or "yzabc". If the plaintext character is b again, this time it becomes z according the second cipher alphabet.This process continues to the last letter of the keyword and then the cycleis restarted, reusing the first letter of the keyword, then the second, and so on.

The solution to the Vigenere cipher takes advantage of the fact that cipher alphabets repeat themselves accordingtothe length of the keyword. By guessing various lengths of the key word and performing analyses only on cipher text produced by the same cipher alphabet (ie, if the keyword is assumed to be 5 letters long, include every 5th letter in the analysis) and scanning the results for a distribution similar to the English language, individual cipher alphabets can be deciphered and then combined to find the keyword, and decode the remainder of the encrypted message. Alternately, or jointly, finding repetitions of letter combinationsin the encrypted message (ie, "XYZ XYZ") may indicate a repeated word encrypted with the same rotation. Statistically, if the message islong enough thenthere is a significant chance that this may occur. Using the spacing between the letter repetitions, the length of the keyword can be constrained to be a factor of that spacing interval, and this consequence can then be tested. The fact that the cipher alphabets are simple rotations on the normal alphabet further simplifies this task.

If the cipher alphabets are not simple rotations but instead are random arrangements of the normal alphabet, the encrypted message may not succumb to this type of analysis. Historically, decoding machines were paired with human ingenuity to find alternate solutions to these types of codes, such as those used in the German Enigma machine in World War II, but in todays world of super-fast d computers an exhaustive search of keys is a real possibility to cracking these codes.

Transposition Ciphers A transposition cipher rearranges the positions of the characters but not their true values, so that a frequency analysis of a message encrypted by transposition appears "normal". There are many ways to accomplish this, including rotating columns and/or rows of the message, rotating the entire message left or right, or along diagonals. These types of ciphers are fairly straightforward to break by considering a block of text as characters in a matrix and performing row or columnshifts and swaps until a recognizable message is derived. One can recognize that a message has been encrypted by a transposition cipher if it has the same frequency statistics as a given language (assumed to be English) but the message text still appears garbled. In this case we rotate rows or columns or perform other basic matrix operations to recover the original message. A special consideration of transposition ciphers is that the message must act as a block of text, and so might require reformatting or buffering of some sort(such as by adding extra characters at the end of the message)in order to give it a readily transposable form. Other Types of Ciphers There are many other ways to complicate ciphers and render them useless for decryption by automated algorithms. As an example, a cryptographer could chose a larger cipher alphabet than required for an one-to-one mapping with the plaintext alphabet, and assign more frequently occurring letters in the plaintext alphabet correspondingly more symbols in the cipher alphabet so that analyzing letter frequencies of the end result yields no useful information (as all frequencies willbe roughly equivalent). Another way to deviously encrypt a message is by using a key-text, such as the Declaration of Independence or a well-known poem, numbering the words in the said document and replacing letters in the plaintext by a number that represents the first character of that number word in the key-text. Similarly, all pairs of letters could be numbered randomly and replaced by their corresponding numbers to form the cipher text. It is important to emphasize that any or all of these types of ciphers may be used to encrypt a message, and a clever cryptographer can quickly boggle preset software algorithms. Public-Key Encryption This type of encryption is not addressed by the Cypher software, but is the most commonly used and secure form of encryption at this time. Public-key encryption assigns the receiver ofthe message both a public and a private key; the public one is available for any potential sender to encrypt a message with and the private one is kept by the receiver. Because these keys are related mathematically and based on modular functions and very very large prime numbers, knowing the public key gives little informationon the private key needed to decrypt the message, and a brute force approachis not feasible. Suggestions for Further Reading Simon Singhs The Code Book is a very readable and up to date account of the history of cryptography, and as a matter of fact is the basis for most of the algorithms used in this software. Other references might include