Mescola due elenchi contemporaneamente con lo stesso ordine

Question 1

Sto usando il corpus nltkdella biblioteca movie_reviewsche contiene un gran numero di documenti. Il mio compito è ottenere prestazioni predittive di queste revisioni con la pre-elaborazione dei dati e senza pre-elaborazione. Ma c'è un problema, negli elenchi documentse documents2ho gli stessi documenti e ho bisogno di mescolarli per mantenere lo stesso ordine in entrambi gli elenchi. Non posso mescolarli separatamente perché ogni volta che mischio l'elenco, ottengo altri risultati. Questo è il motivo per cui ho bisogno di mescolare contemporaneamente con lo stesso ordine perché alla fine devo confrontarli (dipende dall'ordine). Sto usando Python 2.7

Esempio (in realtà sono stringhe tokenizzate, ma non è relativo):

documents = [(['plot : two teen couples go to a church party , '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['they get into an accident . '], 'neg'),
             (['one of the guys dies'], 'neg')]

documents2 = [(['plot two teen couples church party'], 'neg'),
              (['drink then drive . '], 'pos'),
              (['they get accident . '], 'neg'),
              (['one guys dies'], 'neg')]

E ho bisogno di ottenere questo risultato dopo aver mescolato entrambe le liste:

documents = [(['one of the guys dies'], 'neg'),
             (['they get into an accident . '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['plot : two teen couples go to a church party , '], 'neg')]

documents2 = [(['one guys dies'], 'neg'),
              (['they get accident . '], 'neg'),
              (['drink then drive . '], 'pos'),
              (['plot two teen couples church party'], 'neg')]

Ho questo codice:

def cleanDoc(doc):
    stopset = set(stopwords.words('english'))
    stemmer = nltk.PorterStemmer()
    clean = [token.lower() for token in doc if token.lower() not in stopset and len(token) > 2]
    final = [stemmer.stem(word) for word in clean]
    return final

documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

documents2 = [(list(cleanDoc(movie_reviews.words(fileid))), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

random.shuffle( and here shuffle documents and documents2 with same order) # or somehow

Question 2

Puoi farlo come:

import random

a = ['a', 'b', 'c']
b = [1, 2, 3]

c = list(zip(a, b))

random.shuffle(c)

a, b = zip(*c)

print a
print b

[OUTPUT]
['a', 'c', 'b']
[1, 3, 2]

Naturalmente, questo era un esempio con elenchi più semplici, ma l'adattamento sarà lo stesso per il tuo caso.

Spero che sia d'aiuto. In bocca al lupo.

Question 3

Ho un modo semplice per farlo

import numpy as np
a = np.array([0,1,2,3,4])
b = np.array([5,6,7,8,9])

indices = np.arange(a.shape[0])
np.random.shuffle(indices)

a = a[indices]
b = b[indices]
# a, array([3, 4, 1, 2, 0])
# b, array([8, 9, 6, 7, 5])

Question 4

from sklearn.utils import shuffle

a = ['a', 'b', 'c','d','e']
b = [1, 2, 3, 4, 5]

a_shuffled, b_shuffled = shuffle(np.array(a), np.array(b))
print(a_shuffled, b_shuffled)

#random output
#['e' 'c' 'b' 'd' 'a'] [5 3 2 4 1]

Question 5

Mescola simultaneamente un numero arbitrario di elenchi.

from random import shuffle

def shuffle_list(*ls):
  l =list(zip(*ls))

  shuffle(l)
  return zip(*l)

a = [0,1,2,3,4]
b = [5,6,7,8,9]

a1,b1 = shuffle_list(a,b)
print(a1,b1)

a = [0,1,2,3,4]
b = [5,6,7,8,9]
c = [10,11,12,13,14]
a1,b1,c1 = shuffle_list(a,b,c)
print(a1,b1,c1)

Produzione:

$ (0, 2, 4, 3, 1) (5, 7, 9, 8, 6)
$ (4, 3, 0, 2, 1) (9, 8, 5, 7, 6) (14, 13, 10, 12, 11)

Nota: gli
oggetti restituiti da shuffle_list()are tuples.

PS shuffle_list()può essere applicato anche anumpy.array()

a = np.array([1,2,3])
b = np.array([4,5,6])

a1,b1 = shuffle_list(a,b)
print(a1,b1)

Produzione:

$ (3, 1, 2) (6, 4, 5)

Question 6

Un modo semplice e veloce per farlo è usare random.seed () con random.shuffle (). Ti consente di generare lo stesso ordine casuale molte volte che vuoi. Sarà simile a questo:

a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
seed = random.random()
random.seed(seed)
a.shuffle()
random.seed(seed)
b.shuffle()
print(a)
print(b)

>>[3, 1, 4, 2, 5]
>>[8, 6, 9, 7, 10]

Funziona anche quando non puoi lavorare con entrambi gli elenchi contemporaneamente, a causa di problemi di memoria.

Question 7

È possibile utilizzare il secondo argomento della funzione shuffle per correggere l'ordine di shuffling.

In particolare, puoi passare il secondo argomento della funzione shuffle a una funzione con zero argomenti che restituisce un valore in [0, 1). Il valore restituito da questa funzione fissa l'ordine di mescolamento. (Per impostazione predefinita, ad esempio se non si passa alcuna funzione come secondo argomento, viene utilizzata la funzione random.random(). Puoi vederla alla riga 277 qui .)

Questo esempio illustra ciò che ho descritto:

import random

a = ['a', 'b', 'c', 'd', 'e']
b = [1, 2, 3, 4, 5]

r = random.random()            # randomly generating a real in [0,1)
random.shuffle(a, lambda : r)  # lambda : r is an unary function which returns r
random.shuffle(b, lambda : r)  # using the same function as used in prev line so that shuffling order is same

print a
print b

Produzione:

['e', 'c', 'd', 'a', 'b']
[5, 3, 4, 1, 2]