Come ottenere i valori sconosciuti

19

Qualcuno può aiutarmi con il seguente problema?

Voglio trovare alcuni valori (mod ) dove (ad esempio ), dato un elenco di valori che corrisponde alle differenze (ad esempio ), senza conoscere la relazione concreta corrispondente. Poiché i valori non sono definiti in modo univoco date le differenze , cerchiamo qualsiasi assegnazione di valori valida. $a_i,b_j$ $N$ $i=1,2,…,K, j=1,2,…,K$ $K=6$ $K^2$ $a_i-b_j\pmod N$ $N=251$ $a_i,b_j\pmod N$ $a_i-b_j\pmod N$

Sicuramente, provare ogni permutazione dei numeri $K^2$ nell'elenco (totalmente $K^2!$ Casi possibili) e quindi risolvere le equazioni modulari con $a_i,b_j$ poiché le variabili sono impossibili.

In effetti, questo problema si pone in un documento sulla crittoanalisi ad una prima versione dello schema di firma NTRU ( http://eprint.iacr.org/2001/005 ). Tuttavia, l'autore ha scritto solo una frase "Un semplice algoritmo di backtrack trova una soluzione ..." (nella Sezione 3.3) e quindi chiunque può fornire ulteriori spiegazioni? Inoltre, l'autore ha anche detto che "ogni spostamento circolare $\{((a_i+M)\mod N,(b_i+M)\mod N\}_{i=1}^K$ o uno swap $(\{(N-1-b_i,N-1-a_i)\}_{i=1}^K)$ produce lo stesso modello di $a_i-b_j\mod N$ ”e questa affermazione è utile?

— un ospite
fonte

7

Si noti che è impossibile recuperare , poiché se si aggiunge una costante a tutti i numeri, le differenze rimangono invariate.

a_{i}, b_{j}

$a_i,b_j$

C

$C$

— Yuval Filmus,

1

@Yuval: questo è già incluso nell'ultima frase della descrizione. Penso che sia necessaria una sola soluzione, poiché ne potrebbero esistere diverse.

— domotorp,

2

@Yuval Spiacente per non sottolineare che i

a_{i}, b_{j}

$a_i,b_j$ s’dovrebbero essere prese modulare

N

$N$ . Quindi non ci sono soluzioni infinite.

— un ospite il

@domotorp Sì, trovare una delle soluzioni è OK.

— un ospite il

1

Forse l'OP potrebbe chiarire che l'

a_{i}

$a_i$ ,

b_{j}

$b_j$ modulo sono presi

N

$N$ precedenza nel post: forse nel titolo o nel primo paragrafo. Vale anche la pena menzionare il problema con la costante

C

$C$ Entrambe le cose mi hanno confuso quando ho iniziato a leggere.

— Juan Bermejo Vega,

4

Ecco un suggerimento, per $K = 6$ e $N = 251$ . Ci viene fornito un elenco $a_i - b_j \pmod{N}$ . Inizia prendendo uno di essi, senza perdita di generalità $a_1-b_1$ . Senza perdita di generalità $b_1=0$ , e otteniamo il valore di $a_1$ . Ora prendete un altro, e la speranza che sia di forma $a_2-b_1$ (questo accade con probabilità $5/35 = 1/7$ ), e dedurne $a_2$ .

In questa fase, conosciamo . Il nostro prossimo obiettivo è cercare per . Per ogni candidato , se , allora dovrebbe essere anche sulla lista. Se $a_1,a_2,b_1$ $a_1-b_j$ $j \neq 1$ $a_i-b_j$ $i=1$ $(a_i-b_j)+(a_2-a_1)=a_2-b_j$ , allora la probabilità che è anche sulla lista è di circa . Quindi se troviamo un candidato per il quale è anche nell'elenco, allora probabilmente . In questo modo, possiamo recuperare $i \neq 1$ $(a_i-b_j)+(a_2-a_1)$ $33/251$ $a_i-b_j$ $(a_i-b_j)+(a_2-a_1)$ $i=1$ con una certa certezza. $b_2$

In questa fase, conosciamo . Allo stesso modo in cui abbiamo recuperato , possiamo recuperare con ragionevole certezza. Possiamo quindi recuperare cercando un candidato per il quale e $a_1,a_2,b_1,b_2$ $b_2$ $a_3$ $b_3$ $a_i-b_j$ $(a_i-b_j)+(a_2-a_1)$ sono entrambi nell'elenco. Perché abbiamo più s, la nostra probabilità di guasto va sensibilmente verso il basso. Continuiamo e troviamo . $(a_i-b_j)+(a_3-a_1)$ $a$ $b_3,a_4,b_4,a_5,b_6,a_6,b_6$

In qualsiasi momento di questo algoritmo, potremmo aver indovinato qualcosa di sbagliato, e questo alla fine si tradurrà in una contraddizione (diciamo ad un certo punto, non esiste un buon candidato ). Quindi torniamo indietro e proviamo un'altra possibilità; se esauriamo tutte le possibilità, torniamo indietro e proviamo un'altra possibilità (per un diverso stadio dell'algoritmo); e così via. $a_i-b_j$

È un buon esercizio programmare effettivamente questo algoritmo - questo è probabilmente l'unico modo per capire come implementare correttamente il backtracking. Questo è anche l'unico modo per dire se questo algoritmo funziona in pratica.

— Yuval Filmus
fonte

Grazie e codificherò anche questo backtracking per renderlo più chiaro. Forse l'autore di quel documento originale ha usato un metodo simile perché ha anche menzionato "backtrack".

— un ospite il

Ci scusiamo per aver dimenticato di pubblicare il mio commento alla tua risposta! Ho anche implementato il metodo che hai suggerito (in C ++). La conclusione è che il tuo algoritmo funziona abbastanza bene e una delle soluzioni può essere trovata molto velocemente (in meno di un secondo sul mio PC). E questa volta, posso capire meglio le procedure di backtrack. Grazie mille!

— un ospite

Perché non riesco a "@Yuval" nel mio ultimo commento ?! Scusa, ma ci ho provato più volte.

— un ospite

Forse potresti condividere il codice online, in modo che altre persone che leggono il documento possano accedervi.

— Yuval Filmus

5

Aggiornamento : la descrizione che segue è per un problema diverso (in cui si hanno tutte le distanze a coppie in un insieme anziché a coppie tra due insiemi distinti). Lo lascerò comunque poiché è strettamente correlato.

Questo problema è chiamato problema della circonvallazione ed è un caso speciale del problema generale dell'incorporamento di -torus. È anche strettamente correlato al problema del turnpike, in cui le differenze di distanza sono assolute (non modulo un numero). $d$

Non è noto se il problema della circonvallazione ammetta un algoritmo poly-time. Esistono vari algoritmi pseudo-poli-tempo per domande correlate. La migliore referenza (purtroppo una vecchia) è l'articolo di Lemke, Skiena e Smith .

— Suresh Venkat
fonte

1

Penso che questo problema sia diverso. Nel problema della tangenziale conosciamo tutte le distanze a coppie, qui lo conosciamo solo tra due punti che si trovano in gruppi diversi. Sebbene ciò sembri meno informazioni, in realtà può aiutare a risolvere il problema.

— domotorp,

Ah sì. è un grafico bipartito. buon punto.

— Suresh Venkat,

Grafico bipartito? Qualcosa di simile a. Forse dovrei provare il problema in questo modo, ma non ho il treno concreto del pensiero ora.

— un ospite il

3

Ecco un'osservazione che penso ti dia un punto d'appoggio, forse abbastanza per risolverlo.

Supponiamo di avere quattro differenze , , , che sorgono come differenze a coppie tra due 's e due ' s. Chiamalo un quartetto di differenze. Si noti che abbiamo una relazione non banale: $a_1-b_1$ $a_1-b_2$ $a_2-b_1$ $a_2-b_2$ $a$ $b$

(a_{1} - b_{1}) - (a_{1} - b_{2}) = (a_{2} - b_{1}) - (a_{2} - b_{2}) (\mod N) .

$(a_1-b_1)-(a_1-b_2) = (a_2-b_1)-(a_2-b_2) \pmod N.$

Puoi provare a usare questa relazione per identificare potenziali quartetti fuori dall'elenco di . Ad esempio, selezionare quattro differenze dall'elenco; se non soddisfano la relazione di cui sopra, sicuramente non derivano da una struttura a quartetto; se soddisfano la relazione, potrebbero derivare da un quartetto. $K^2$

Ci sono molti modi in cui puoi prendere le cose da qui, ma sospetto che questo basti.

In particolare, sospetto che, per le impostazioni dei parametri di esempio, il problema sarà piuttosto semplice, poiché il test sopra riportato per riconoscere un quartetto probabilmente non avrà troppi falsi positivi. Il nostro di tutti modi di scegliere 4 differenze dalla lista, ci saranno ${K^2 \choose 4}$ quartetti (che soddisfano tutti la relazione) e il resto sono non quartetti (che soddisfano la relazione con probabilità, euristicamente). Pertanto ci aspettiamo di vedere circa ${K \choose 2}^2$ $1/N$ falsi positivi, ovvero 4 tuple che superano il test anche se non sono quartetti. Per i tuoi parametri, questo significa che abbiamo 225 quartetti ealtri falsi positivi; quindi circa la metà delle 4 tuple che superano il test sono in realtà quartetti. Ciò significa che il test sopra riportato è un ottimo modo per riconoscere i quartetti. Una volta che puoi riconoscere i quartetti, puoi davvero andare in città per recuperare la struttura dell'elenco delle differenze. $({K^2 \choose 4}-{K \choose 2}^2)/N$ $(58905-225)/251 \approx 234$

— DW
fonte

@DW: Grazie, ma ora mi chiedo il prossimo passo dopo che saranno stati trovati tutti i possibili quartetti (totalmente 225 + 234 = 459). Dovrebbe cercare 3 quartetti non sovrapposti e verificare se possono costituire una possibile soluzione? Come realizzare questo in modo efficiente? Forse non è così difficile perché non ci saranno molte sovrapposizioni.

— un ospite il

@aguest, bella domanda! Non riesco a ricordare cosa stavo pensando in quel momento. Penso di ricordare di aver pensato che un approccio potrebbe essere quello di iniziare con un quartetto, quindi cercare tutti gli altri che si sovrappongono in 2 differenze (ad esempio, derivanti da

dove

), ma io non so dove andare da lì (come filtrare i falsi positivi).

a_{1}, a_{j}, b_{1}, b_{2}

$a_1,a_j,b_1,b_2$

j \neq 2

$j\ne 2$

— DW

3

Ecco un approccio diverso, basato sul trovare iterativamente numeri che non possono apparire tra . Chiamare un insieme un over-approssimazione della 's se sappiamo che . Allo stesso modo, è un overapproximation del 's se sappiamo che . Ovviamente, la più piccola $\{a_1,\dots,a_6\}$ $A$ $a$ $\{a_1,\dots,a_6\} \subseteq A$ $B$ $b$ $\{b_1,\dots,b_6\} \subseteq B$ $A$ is, the more useful this over-approximation is, and the same goes for $B$ . My approach is based upon iteratively refining these over-approximations, i.e., iteratively reducing the size of these sets (as we rule out more and more values as impossible).

The core of this approach is a method for refinement: given an over-approximation $A$ for the $a$ 's and an over-approximation $B$ for the $b$ 's, find a new over-approximation $A^*$ for the $a$ 's such that $A^* \subsetneq A$ . In particular, normally $A^*$ will be smaller than $A$ , so this lets us refine the over-approximation for the $a$ 's.

By symmetry, essentially the same trick will let us refine our over-approximation for the $b$ 's: given an over-approximation $A$ for the $a$ 's and an over-approximation $B$ for the $b$ 's, it will produce a new over-approximation $B^*$ for the $b$ 's.

So, let me tell you how do refinement, then I'll put everything together to get a full algorithm for this problem. In what follows, let $D$ denote the multi-set of differences, i.e., $D=\{a_i-b_j:1 \le i,j \le 6\}$ ; we'll focus on finding a refined over-approximation $A^*$ , given $A,B$ .

How to compute a refinement. Consider a single difference $d \in D$ . Consider the set $d+B=\{d+y : y \in B\}$ . Based on our knowledge that $B$ is an over-approximation of the $b$ 's, we know that at least one element of $d+B$ must be an element of $\{a_1,\dots,a_6\}$ . Therefore, we can treat each of the elements in $d+B$ as a "suggestion" for a number to possibly include in $A$ . So, let's sweep over all differences $d \in D$ and, for each, identify which numbers are "suggested" by $d$ .

Now I'm going to observe that the number $a_1$ is sure to be suggested at least 6 times during this process. Why? Because the difference $a_1-b_1$ is in $D$ , and when we process it, $a_1$ will be one of the numbers it suggests (since we're guaranteed that $b_1 \in B$ , $(a_1-b_1)+B$ will surely include $a_1$ ). Similarly, the difference $a_1-b_2$ appears somewhere in $D$ , and it'll cause $a_1$ to be suggested again. In this way, we see that the correct value of $a_1$ will be suggested at least 6 times. The same holds for $a_2$ , and $a_3$ , and so on.

So, let $A^*$ be the set of numbers $a^*$ that have been suggested at least 6 times. This is sure to be an over-approximation of the $a$ 's, by the above comments.

As an optimization, we can filter out all suggestions that are not present in $A$ immediately: in other words, we can treat the difference $d$ as suggesting all of the values $(d+B)\cap A$ . This ensures that we will have $A^* \subseteq A$ . We are hoping that $A^*$ is strictly smaller than $A$ ; no guarantees, but if all goes well, maybe it will be.

Putting this together, the algorithm to refine $A,B$ to yield $A^*$ is as follows:

Let $S = \cup_{d \in D} (d+B)\cap A$ . This is the multi-set of suggestions.
Count how many times each value appears in $S$ . Let $A^*$ be the set of values that appear at least 6 times in $S$ . (This can be implemented efficiently by building an array $a$ of 251 initially, initially all zero, and each time the number $s$ is suggested, you increment $a[s]$ ; at the end you sweep through $a$ looking for elements whose value is 6 or larger)

A similar method can be built to refine $A,B$ to get $B^*$ . You basically reverse things above and flip some signs: e.g., instead of $d+B$ , you look at $-d+A$ .

How to compute an initial over-approximation. To get our initial over-approximation, one idea is to assume (wlog) that $b_1=0$ . It follows that each value $a_i$ must appear somewhere among $D$ , thus the list of differences $D$ can be used as our initial over-approximation for the $a$ 's. Unfortunately, this doesn't give us a very useful over-approximation for the $b$ 's.

A better approach is to additionally guess the value of one of the $a$ 's. In other words, we assume (wlog) that $b_1=0$ , and use $A=D$ as our initial over-approximation of the $a$ 's. Then, we guess which one of these 36 values is indeed one of the $a$ 's, say $a_1$ . That then gives us an over-approximation $B=a_1-D$ for the $b$ 's. We use this initial over-approximation $A,B$ , then iteratively refine it until convergence, and test whether the result is correct. We repeat up to 36 times, with 36 different guesses at $a_1$ (on average 6 guesses should be enough) till we find one that works.

A full algorithm. Now we can have a full algorithm to compute $a_1,\dots,a_6,b_1,\dots,b_6$ . Basically, we derive an initial over-approximation for $A$ and $B$ , then iteratively refine.

Make a guess: For each $z \in D$ , guess that $a_1=z$ . Do the following:
1. Initial over-approximation: Define $A=D$ and $B=z-D$ .
2. Iterative refinement: Repeatedly apply the following until convergence:
  - Refine $A,B$ to get a new over-approximation $B^*$ of the $b$ 's.
  - Refine $A,B^*$ to get a new over-approximation $A^*$ of the $a$ 's.
  - Let $A:= A^*$ and $B:= B^*$ .
3. Check for success: If the resulting sets $A,B$ each have size 6, test whether they are a valid solution to the problem. If they are, stop. If not, continue with the loop over candidate values of $z$ .

Analysis. Will this work? Will it eventually converge on $A=\{a_1,\dots,a_6\}$ and $B=\{b_1,\dots,b_6\}$ , or will it get stuck without completely solving the problem? The best way to find out is probably to test it. However, for your parameters, yes, I expect it will be effective.

If we use method #1, as long as $|A|,|B|$ are not too large, heuristically I expect the sizes of the sets to monotonically shrink. Consider deriving $A^*$ from $A,B$ . Each difference $d$ suggests $|B|$ values; one of them correct, and the other $|B|-1$ can be treated (heuristically) as random numbers. If $x$ is a number that does not appear among the $a$ 's, what is the probability that it survives the filtering and is added to $A^*$ ? Well, we expect $a$ to be suggested about $(|B|-1) \times 36/251$ times in total (on average, with standard deviation about the square root of that). If $|B|\le 36$ , the probability that a wrong $x$ survives the filtering should be about $p=0.4$ or so (using the normal approximation for the binomial, with continuity correction). (The probability is smaller if $|B|$ is smaller; e.g., for $|B|=30$ , I expect $p\approx 0.25$ .) I expect the size of $A^*$ to be about $p (|A|-6) + 6$ , which will strictly improve the over-approximation since it is strictly smaller than $|A|$ . For instance, if $|A|=|B|=36$ , then based upon these heuristics I expect $|A^*|\approx 18$ , which is a big improvement over $|A|$ .

Therefore, I predict that the running time will be very fast. I expect about 3-5 iterations of refinement to be enough for convergence, typically, and about 6 guesses at $z$ should probably be enough. Each refinement operation involves maybe a few thousand memory reads/writes, and we do that maybe 20-30 times. So, I expect this to be very fast, for the parameters you specified. However, the only way to find out for sure is to try it and see if it works well or not.

— D.W.
fonte

@DW: Thank you very much for your long answer and the effort you took to type so many words!!! According to your description, your algorithm here is quite correct. And I’m going to code it to test the efficiency right now.

— a guest

@DW: Hi, I’ve implemented your description in C++. The algorithm runs fast and the refinement step does reduce the sizes of original sets

A

$A$ and

B

$B$ . However, the convergence seems to be not so perfect. In fact, for each guess

z \in D

$z\in D$ , the final sizes of

A^{*}

$A^*$ and

B^{*}

$B^*$ are still more than 10 according to my record output by the program. The most frequent number of existing elements when

A^{*}

$A^*$ (and

B^{*}

$B^*$ ) can not be improved by further repetitions of refinement is 11, but I can hardly see a number below 10. However, this has made the problem solvable by trying each 6-elements chosen from

— a guest

@DW: (Cotinued)final

A^{*}

$A^*$ and

B^{*}

$B^*$ for each guess

z

$z$ (although I didn’t implement the last step on my PC). The total amount computation will be about

2^{20}

$2^{20}$ , I estimate. Thank you very much!

— a guest

Sorry, but my last comment is too long, and I have to split it into two.

— a guest