Decrittazione mediante analisi del modello

Ti viene data una stringa crittografata, crittografata utilizzando un codice di sostituzione molto semplice.

Problema

Non sai quale sia il codice ma sai che il testo cifrato è l'inglese e che le lettere più frequenti in inglese sono etaoinshrdlucmfwypvbgkqjxz in questo ordine. Gli unici caratteri consentiti sono lettere maiuscole e spazi. È possibile eseguire analisi di base - a partire da singole lettere, ma è possibile migrare su analisi a più lettere più complesse - ad esempio, U segue quasi sempre Q e solo alcune lettere possono arrivare due volte di seguito.

Esempi

clear : SUBMARINE TO ATTACK THE DOVER WAREHOUSE AND PORT ON TUESDAY SUNRISE
cipher: ZOQ DUPAEYSRYDSSDXVYSHEYNRBEUYLDUEHROZEYDANYKRUSYRAYSOEZNDMYZOAUPZE

clear : THE QUICK BROWN FOX BEING QUITE FAST JUMPED OVER THE LAZY DOG QUITE NICELY
cipher: TNAEPDHIGEMZQJLEVQBEMAHL EPDHTAEVXWTEODYUASEQKAZETNAERXFCESQ EPDHTAELHIARC

clear : BUFFALO BUFFALO BUFFALO BUFFALO BUFFALO BUFFALO BUFFALO
cipher: HV  WRPDHV  WRPDHV  WRPDHV  WRPDHV  WRPDHV  WRPDHV  WRP

Le sfide

Vedi se riesci a decifrare il testo in ognuna di queste cifre:

SVNXIFCXYCFSXKVVZXIHXHERDXEIYRAKXZCOFSWHCZXHERDXBNRHCXZR RONQHXORWECFHCUH
SOFPTGFIFBOKJPHLBFPKHZUGLSOJPLIPKBPKHZUGLSOJPMOLEOPWFSFGJLBFIPMOLEOPXULBSIPLBP KBPBPWLIJFBILUBKHPGKISFG
TMBWFYAQFAZYCUOYJOBOHATMCYNIAOQW Q JAXOYCOCYCHAACOCYCAHGOVYLAOEGOTMBWFYAOBFF ACOBHOKBZYKOYCHAUWBHAXOQW XITHJOV WOXWYLYCU
FTRMKRGVRFMHSZVRWHRSFMFLMBNGKMGTHGBRSMKROKLSHSZMHKMMMMMRVVLVMPRKKOZRMFVDSGOFRW

Ho le matrici di sostituzione e il testo in chiaro per ognuna, ma le rivelerò solo se diventa troppo difficile o qualcuno non lo capisce.

La soluzione che può decodificare con successo la maggior parte dei messaggi è il vincitore. Se due soluzioni sono ugualmente valide, saranno decise in base al numero di voti.

code-challenge cryptography

— Thomas O
fonte

Cosa definisce »il più elegante«? Penso che sia la stessa cosa che Chris ha già contestato in 99 bottiglie. È un criterio soggettivo che è abbastanza difficile da giudicare.

— Joey,

@Joey Più voti? Lascia decidere alla community.

— Thomas O

Ri "maggior parte dei voti": non sono contento di vederlo diventare un post sul concorso di popolarità, anche perché il post è comunque eccellente; vedi meta.codegolf.stackexchange.com/questions/110/… per i miei pensieri sull'intera questione.

— Chris Jester-Young,

Cosa significa "elegante" qui? Le migliori prestazioni big-O?

— Gnibbler,

@ Bass5098, no. È solo un testo cifrato difficile che è stato contaminato per renderlo più resistente all'analisi delle frequenze.

— Thomas O

Risposte:

Pitone

Ho capito tutte le frasi segrete, ma non le posterò qui. Esegui il codice se ti interessa.

Il codice funziona selezionando un carattere di spazio, elencando tutte le possibili sostituzioni per ogni parola, quindi cercando sostituzioni compatibili. Permette anche ad alcune parole fuori dal lessico di gestire gli errori di ortografia nel testo in chiaro :)

Ho usato un grande lessico (~ 500.000 parole) da http://wordlist.sourceforge.net/ .

import sys,re

# get input
message = sys.argv[1]

# read in lexicon of words
# download scowl version 7.1
# mk-list english 95 > wordlist
lexicon = set()
roman_only = re.compile('^[A-Z]*$')
for word in open('wordlist').read().upper().split():
  word=word.replace("'",'')
  if roman_only.match(word): lexicon.add(word)

histogram={}
for c in message: histogram[c]=0
for c in message: histogram[c]+=1
frequency_order = map(lambda x:x[1], sorted([(f,c) for c,f in histogram.items()])[::-1])

# returns true if the two maps are compatible.
# they are compatible if the mappings agree wherever they are defined,
# and no two different args map to the same value.
def mergeable_maps(map1, map2):
  agreements = 0
  for c in map1:
    if c in map2:
      if map1[c] != map2[c]: return False
      agreements += 1
  return len(set(map1.values() + map2.values())) == len(map1) + len(map2) - agreements

def merge_maps(map1, map2):
  m = {}
  for (c,d) in map1.items(): m[c]=d
  for (c,d) in map2.items(): m[c]=d
  return m

def search(map, word_maps, outside_lexicon_allowance, words_outside_lexicon):
  cleartext = ''.join(map[x] if x in map else '?' for x in message)
  #print 'trying', cleartext

  # pick a word to try next
  best_word = None
  best_score = 1e9
  for (word,subs) in word_maps.items():
    if word in words_outside_lexicon: continue
    compatible_subs=0
    for sub in subs:
      if mergeable_maps(map, sub): compatible_subs += 1
    unassigned_chars = 0
    for c in word:
      if c not in map: unassigned_chars += 1  #TODO: duplicates?
    if compatible_subs == 0: score = 0
    elif unassigned_chars == 0: score = 1e9
    else: score = 1.0 * compatible_subs / unassigned_chars   # TODO: tweak?
    if score < best_score:
      best_score = score
      best_word = word
  if not best_word:  # no words with unset characters, except possibly the outside lexicon ones
    print cleartext,[''.join(map[x] if x in map else '?' for x in word) for word in words_outside_lexicon]
    return True

  # use all compatible maps for the chosen word
  r = False
  for sub in word_maps[best_word]:
    if not mergeable_maps(map, sub): continue
    r |= search(merge_maps(map, sub), word_maps, outside_lexicon_allowance, words_outside_lexicon)

  # maybe this word is outside our lexicon
  if outside_lexicon_allowance > 0:
    r |= search(map, word_maps, outside_lexicon_allowance - 1, words_outside_lexicon + [best_word])
  return r

for outside_lexicon_allowance in xrange(3):
  # assign the space character first
  for space in frequency_order:
    words = [w for w in message.split(space) if w != '']
    if reduce(lambda x,y:x|y, [len(w)>20 for w in words]): continue  # obviously bad spaces

    # find all valid substitution maps for each word
    word_maps={}
    for word in words:
      n = len(word)
      maps = []
      for c in lexicon:
        if len(c) != n: continue
        m = {}
        ok = 1
        for i in xrange(n):
          if word[i] in m:                      # repeat letter
            if m[word[i]] != c[i]: ok=0; break  # repeat letters map to same thing
          elif c[i] in m.values(): ok=0; break  # different letters map to different things
          else: m[word[i]]=c[i]
        if ok: maps.append(m);
      word_maps[word]=maps

    # look for a solution
    if search({space:' '}, word_maps, outside_lexicon_allowance, []): sys.exit(0)

print 'I give up.'

— Keith Randall
fonte

PHP (incompleto)

Questa è una soluzione PHP incompleta che funziona usando le informazioni sulla frequenza delle lettere nella domanda più un dizionario di parole abbinato a espressioni regolari basate sulle lettere più affidabili nella parola data.

Al momento il dizionario è piuttosto piccolo ma con l'espansione appropriata prevedo che i risultati migliorerebbero. Ho preso in considerazione la possibilità di corrispondenze parziali, ma con l'attuale dizionario ciò si traduce in un degrado piuttosto che in un miglioramento dei risultati.

Anche con l'attuale piccolo dizionario, penso di poter dire con sicurezza ciò che codifica il quarto messaggio.

#!/usr/bin/php
<?php

    if($argv[1]) {

        $cipher = $argv[1];

        // Dictionary
        $words = explode("/", "the/to/on/and/in/is/secret/message");
        $guess = explode("/", "..e/t./o./a../i./.s/.e..et/.ess..e");

        $az = str_split("_etaoinshrdlucmfwypvbgkqjxz");

        // Build table
        for($i=0; $i<strlen($cipher); $i++) {
            $table[$cipher{$i}]++;
        }
        arsort($table);

        // Do default guesses
        $result = str_replace("_", " ", str_replace(array_keys($table), $az, $cipher));

        // Apply dictionary
        $cw = count($words);
        for($i=0; $i<$cw*2; $i++) {
            $tokens = explode(" ", $result);
            foreach($tokens as $t) {
                if(preg_match("/^" . $guess[$i%$cw] . "$/", $t)) {
                    $result = deenc($words[$i%$cw], $t, $result);
                    echo $t . ' -> ' . $words[$i%$cw] . "\n";
                    break;
                }
            }
        }

        // Show best guess
        echo $result . "\n";

    } else {

        echo "Usage: " . $argv[0] . " [cipher text]\n";

    }

    // Quick (non-destructive) replace tool
    function deenc($word, $enc, $string) {
        $string = str_replace(str_split($enc), str_split(strtoupper($word)), $string);
        $string = str_replace(str_split($word), str_split($enc), $string);
        return strtolower($string);
    }

?>

— jtjacques
fonte

Prova a usare / usr / share / dict / words se utilizzi un sistema che lo possiede.

— Keith Randall,