Il codice python più veloce per trovare una serie di parole vincenti in questo gioco

Questo è un gioco di parole da una serie di carte attività per bambini. Sotto le regole c'è il codice per trovare la migliore tripletta usando / usr / share / dict / words. Ho pensato che fosse un problema di ottimizzazione interessante e mi chiedo se le persone possano trovare miglioramenti.

Regole

Scegli una lettera da ciascuno dei set seguenti.
Scegli una parola usando le lettere scelte (e tutte le altre).
Segna la parola.
- Ogni lettera del set scelto ottiene il numero mostrato con il set (ripetizioni incluse).
- AEIOU contare 0
- Tutte le altre lettere sono -2
Ripeti i passaggi 1-3 sopra (non riutilizzare le lettere nel passaggio 1) altre due volte.
Il punteggio finale è la somma dei punteggi delle tre parole.

Imposta

(imposta 1 punti 1 punto, imposta 2 punteggi 2 punti, ecc.)

Codice:

from itertools import permutations
import numpy as np

points = {'LTN' : 1,
          'RDS' : 2,
          'GBM' : 3,
          'CHP' : 4,
          'FWV' : 5,
          'YKJ' : 6,
          'QXZ' : 7}

def tonum(word):
    word_array = np.zeros(26, dtype=np.int)
    for l in word:
        word_array[ord(l) - ord('A')] += 1
    return word_array.reshape((26, 1))

def to_score_array(letters):
    score_array = np.zeros(26, dtype=np.int) - 2
    for v in 'AEIOU':
        score_array[ord(v) - ord('A')] = 0
    for idx, l in enumerate(letters):
        score_array[ord(l) - ord('A')] = idx + 1
    return np.matrix(score_array.reshape(1, 26))

def find_best_words():
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    wlist = [l for l in wlist if len(l) > 4]
    orig = [l for l in wlist]
    for rep in 'AEIOU':
        wlist = [l.replace(rep, '') for l in wlist]
    wlist = np.hstack([tonum(w) for w in wlist])

    best = 0
    ct = 0
    bestwords = ()
    for c1 in ['LTN']:
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                vals = [to_score_array(''.join(s)) for s in zip(c1, c2, c3, c4, c5, c6, c7)]
                                ct += 1
                                print ct, 6**6
                                scores1 = (vals[0] * wlist).A.flatten()
                                scores2 = (vals[1] * wlist).A.flatten()
                                scores3 = (vals[2] * wlist).A.flatten()
                                m1 = max(scores1)
                                m2 = max(scores2)
                                m3 = max(scores3)
                                if m1 + m2 + m3 > best:
                                    print orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()], m1 + m2 + m3
                                    best = m1 + m2 + m3
                                    bestwords = (orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()])
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

La versione a matrice è quella che mi è venuta in mente dopo averne scritto uno in puro pitone (usando dizionari e segnando ogni parola in modo indipendente), e un altro in modo intorpidito ma usando l'indicizzazione anziché la moltiplicazione con matrice.

La prossima ottimizzazione sarebbe quella di rimuovere completamente le vocali dal punteggio (e usare una ord()funzione modificata ), ma mi chiedo se ci siano approcci ancora più veloci.

EDIT : aggiunto timeit.timeit code

EDIT : sto aggiungendo un premio, che darò a qualsiasi miglioramento mi piaccia di più (o possibilmente più risposte, ma dovrò accumulare un po 'più di reputazione se è il caso).

fastest-code python optimization

— thouis
fonte

A proposito, ho scritto il codice per dare a mio figlio di otto anni tre parole da ricordare per quando giocava contro sua madre. Ora so cosa significa xilopirografia.

Questa è una domanda divertente Penso che potresti aumentare le probabilità di ottenere risposte se fornisci quanto segue: (1) Un link a un elenco di parole online in modo che tutti lavorino con lo stesso set di dati. (2) Metti la tua soluzione in un'unica funzione. (3) Esegui quella funzione usando il modulo time-it per mostrare i tempi. (4) Assicurati di mettere il caricamento dei dati del dizionario al di fuori della funzione in modo da non testare la velocità del disco. Le persone possono quindi utilizzare il codice esistente come framework per confrontare le loro soluzioni.

Riscriverò per usare timeit, ma per confronti equi dovrei usare la mia macchina (cosa che sono felice di fare per le persone che pubblicano soluzioni). Un elenco di parole dovrebbe essere disponibile sulla maggior parte dei sistemi, ma in caso contrario, ce ne sono diversi qui: wordlist.sourceforge.net

Si possono avere confronti equi se ogni utente cronometra la propria soluzione e qualsiasi altra soluzione pubblicata rispetto alla propria sulla propria macchina. Ci saranno alcune differenze tra le piattaforme, ma in generale questo metodo funziona.

Hm, in quel caso mi chiedo se questo è il sito giusto. Penso che SO sarebbe stata la soluzione migliore.

— Joey,

Risposte:

Usando l'idea di Keith di pre-calcolare il miglior punteggio possibile per ogni parola, sono riuscito a ridurre il tempo di esecuzione a circa 0,7 secondi sul mio computer (usando un elenco di 75.288 parole).

Il trucco è passare attraverso le combinazioni di parole da giocare anziché tutte le combinazioni di lettere scelte. Possiamo ignorare tutte le combinazioni tranne poche (203 usando il mio elenco di parole) perché non riescono a ottenere un punteggio più alto di quello che abbiamo già trovato. Quasi tutto il tempo di esecuzione viene impiegato per precompattare i punteggi delle parole.

Python 2.7:

import collections
import itertools


WORDS_SOURCE = '../word lists/wordsinf.txt'

WORDS_PER_ROUND = 3
LETTER_GROUP_STRS = ['LTN', 'RDS', 'GBM', 'CHP', 'FWV', 'YKJ', 'QXZ']
LETTER_GROUPS = [list(group) for group in LETTER_GROUP_STRS]
GROUP_POINTS = [(group, i+1) for i, group in enumerate(LETTER_GROUPS)]
POINTS_IF_NOT_CHOSEN = -2


def best_word_score(word):
    """Return the best possible score for a given word."""

    word_score = 0

    # Score the letters that are in groups, chosing the best letter for each
    # group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts_sum = 0
        max_letter_count = 0
        for letter in group:
            if letter in word:
                count = word.count(letter)
                letter_counts_sum += count
                if count > max_letter_count:
                    max_letter_count = count
        if letter_counts_sum:
            word_score += points_if_chosen * max_letter_count
            total_not_chosen += letter_counts_sum - max_letter_count
    word_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return word_score

def best_total_score(words):
    """Return the best score possible for a given list of words.

    It is fine if the number of words provided is not WORDS_PER_ROUND. Only the
    words provided are scored."""

    num_words = len(words)
    total_score = 0

    # Score the letters that are in groups, chosing the best permutation of
    # letters for each group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts = []
        # Structure:  letter_counts[word_index][letter] = count
        letter_counts_sum = 0
        for word in words:
            this_word_letter_counts = {}
            for letter in group:
                count = word.count(letter)
                this_word_letter_counts[letter] = count
                letter_counts_sum += count
            letter_counts.append(this_word_letter_counts)

        max_chosen = None
        for letters in itertools.permutations(group, num_words):
            num_chosen = 0
            for word_index, letter in enumerate(letters):
                num_chosen += letter_counts[word_index][letter]
            if num_chosen > max_chosen:
                max_chosen = num_chosen

        total_score += points_if_chosen * max_chosen
        total_not_chosen += letter_counts_sum - max_chosen
    total_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return total_score


def get_words():
    """Return the list of valid words."""
    with open(WORDS_SOURCE, 'r') as source:
        return [line.rstrip().upper() for line in source]

def get_words_by_score():
    """Return a dictionary mapping each score to a list of words.

    The key is the best possible score for each word in the corresponding
    list."""

    words = get_words()
    words_by_score = collections.defaultdict(list)
    for word in words:
        words_by_score[best_word_score(word)].append(word)
    return words_by_score


def get_winning_words():
    """Return a list of words for an optimal play."""

    # A word's position is a tuple of its score's index and the index of the
    # word within the list of words with this score.
    # 
    # word played: A word in the context of a combination of words to be played
    # word chosen: A word in the context of the list it was picked from

    words_by_score = get_words_by_score()
    num_word_scores = len(words_by_score)
    word_scores = sorted(words_by_score, reverse=True)
    words_by_position = []
    # Structure:  words_by_position[score_index][word_index] = word
    num_words_for_scores = []
    for score in word_scores:
        words = words_by_score[score]
        words_by_position.append(words)
        num_words_for_scores.append(len(words))

    # Go through the combinations of words in lexicographic order by word
    # position to find the best combination.
    best_score = None
    positions = [(0, 0)] * WORDS_PER_ROUND
    words = [words_by_position[0][0]] * WORDS_PER_ROUND
    scores_before_words = []
    for i in xrange(WORDS_PER_ROUND):
        scores_before_words.append(best_total_score(words[:i]))
    while True:
        # Keep track of the best possible combination of words so far.
        score = best_total_score(words)
        if score > best_score:
            best_score = score
            best_words = words[:]

        # Go to the next combination of words that could get a new best score.
        for word_played_index in reversed(xrange(WORDS_PER_ROUND)):
            # Go to the next valid word position.
            score_index, word_chosen_index = positions[word_played_index]
            word_chosen_index += 1
            if word_chosen_index == num_words_for_scores[score_index]:
                score_index += 1
                if score_index == num_word_scores:
                    continue
                word_chosen_index = 0

            # Check whether the new combination of words could possibly get a
            # new best score.
            num_words_changed = WORDS_PER_ROUND - word_played_index
            score_before_this_word = scores_before_words[word_played_index]
            further_points_limit = word_scores[score_index] * num_words_changed
            score_limit = score_before_this_word + further_points_limit
            if score_limit <= best_score:
                continue

            # Update to the new combination of words.
            position = score_index, word_chosen_index
            positions[word_played_index:] = [position] * num_words_changed
            word = words_by_position[score_index][word_chosen_index]
            words[word_played_index:] = [word] * num_words_changed
            for i in xrange(word_played_index+1, WORDS_PER_ROUND):
                scores_before_words[i] = best_total_score(words[:i])
            break
        else:
            # None of the remaining combinations of words can get a new best
            # score.
            break

    return best_words


def main():
    winning_words = get_winning_words()
    print winning_words
    print best_total_score(winning_words)

if __name__ == '__main__':
    main()

Questo restituisce la soluzione ['KNICKKNACK', 'RAZZMATAZZ', 'POLYSYLLABLES']con un punteggio di 95. Con le parole della soluzione di Keith aggiunte all'elenco delle parole ottengo lo stesso risultato di lui. Con l'aggiunta della "xilopirografia" di Thouis, ottengo ['XYLOPYROGRAPHY', 'KNICKKNACKS', 'RAZZMATAZZ']con un punteggio di 105.

— flornquake
fonte

Ecco un'idea: puoi evitare di controllare molte parole notando che la maggior parte delle parole ha punteggi terribili. Supponi di aver trovato un punteggio abbastanza buono che ti fa guadagnare 50 punti. Quindi ogni gioco con più di 50 punti deve avere una parola di almeno ceil (51/3) = 17 punti. Quindi qualsiasi parola che non può generare 17 punti può essere ignorata.

Ecco del codice che fa quanto sopra. Calcoliamo il miglior punteggio possibile per ogni parola nel dizionario e lo memorizziamo in un array indicizzato per punteggio. Quindi usiamo quell'array per controllare solo le parole che hanno il punteggio minimo richiesto.

from itertools import permutations
import time

S={'A':0,'E':0,'I':0,'O':0,'U':0,
   'L':1,'T':1,'N':1,
   'R':2,'D':2,'S':2,
   'G':3,'B':3,'M':3,
   'C':4,'H':4,'P':4,
   'F':5,'W':5,'V':5,
   'Y':6,'K':6,'J':6,
   'Q':7,'X':7,'Z':7,
   }

def best_word(min, s):
    global score_to_words
    best_score = 0
    best_word = ''
    for i in xrange(min, 100):
        for w in score_to_words[i]:
            score = (-2*len(w)+2*(w.count('A')+w.count('E')+w.count('I')+w.count('O')+w.count('U')) +
                      3*w.count(s[0])+4*w.count(s[1])+5*w.count(s[2])+6*w.count(s[3])+7*w.count(s[4])+
                      8*w.count(s[5])+9*w.count(s[6]))
            if score > best_score:
                best_score = score
                best_word = w
    return (best_score, best_word)

def load_words():
    global score_to_words
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    score_to_words = [[] for i in xrange(100)]
    for w in wlist: score_to_words[sum(S[c] for c in w)].append(w)
    for i in xrange(100):
        if score_to_words[i]: print i, len(score_to_words[i])

def find_best_words():
    load_words()
    best = 0
    bestwords = ()
    for c1 in permutations('LTN'):
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
            print time.ctime(),c1,c2,c3
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                sets = zip(c1, c2, c3, c4, c5, c6, c7)
                                (s1, w1) = best_word((best + 3) / 3, sets[0])
                                (s2, w2) = best_word((best - s1 + 2) / 2, sets[1])
                                (s3, w3) = best_word(best - s1 - s2 + 1, sets[2])
                                score = s1 + s2 + s3
                                if score > best:
                                    best = score
                                    bestwords = (w1, w2, w3)
                                    print score, w1, w2, w3
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

Il punteggio minimo arriva rapidamente a 100, il che significa che dobbiamo considerare solo 33+ parole in punti, che è una frazione molto piccola del totale complessivo (il mio /usr/share/dict/wordsha 208662 parole valide, solo 1723 delle quali sono 33+ punti = 0,8%). Funziona in circa mezz'ora sulla mia macchina e genera:

(('MAXILLOPREMAXILLARY', 'KNICKKNACKED', 'ZIGZAGWISE'), 101)

— Keith Randall
fonte

Bello. Lo aggiungerò alla soluzione matrice (rimuovendo le parole quando il loro punteggio scende troppo in basso), ma questo è significativamente migliore rispetto a qualsiasi delle soluzioni pure di Python che avevo escogitato.

— tu,

Non sono sicuro di aver mai visto così tanti nidificati per loop prima.

— Peter Olson,

Combinando la tua idea con il punteggio matrix (e un limite superiore più stretto sul miglior punteggio possibile) riduci il tempo a circa 80 secondi sulla mia macchina (da circa un'ora). codice qui

— tu,

Una buona parte di quel tempo è nel pre-calcolo dei punteggi migliori possibili, che potrebbero essere fatti molto più velocemente.

— tu,