# Quando si dice che due algoritmi siano "simili"?

16

Non lavoro in teoria, ma il mio lavoro richiede la lettura (e la comprensione) di articoli teorici di tanto in tanto. Una volta compreso un (insieme di) risultati, discuto questi risultati con le persone con cui lavoro, molti dei quali non funzionano anche in teoria. Durante una di tali discussioni, è emersa la seguente domanda:

Quando si dice che due algoritmi dati sono "simili"?

Cosa intendo per "simile"? Supponiamo che due algoritmi siano simili se si può fare una delle seguenti affermazioni in un documento senza confondere / infastidire alcun revisore (sono benvenute definizioni migliori):

Rivendicazione 1. "L'algoritmo $A$$A$ , che è simile all'algoritmo $B$$B$ , risolve anche il problema $X$$X$ "

Rivendicazione 2. "Il nostro algoritmo è simile all'algoritmo $C$$C$ "

Vorrei renderlo leggermente più specifico. Supponiamo di lavorare con algoritmi grafici. Innanzitutto alcune condizioni necessarie affinché i due algoritmi siano simili:

1. Devono risolvere lo stesso problema.
2. Devono avere la stessa idea intuitiva di alto livello.

Ad esempio, parlando di attraversamento grafico, ampiezza prima e profondità prima soddisfano le due condizioni precedenti; per i calcoli del percorso più breve, breadth-first e l'algoritmo di Dijkstra soddisfano le due condizioni di cui sopra (su grafici non ponderati, ovviamente); eccetera.

Anche queste sono condizioni sufficienti? Più specificamente, supponiamo che due algoritmi soddisfino le condizioni necessarie per essere simili. Li chiameresti davvero simili, se

1. hanno prestazioni asintotiche diverse?
2. per una classe speciale di grafici, un algoritmo richiede $\mathrm{\Omega }\left(n\right)$$\Omega(n)$ tempo mentre l'altro richiede $O\left({n}^{1/3}\right)$$O(n^{1/3})$ tempo?
3. hanno condizioni di terminazione diverse? (ricorda, stanno risolvendo lo stesso problema)
4. la fase di pre-elaborazione è diversa nei due algoritmi?
5. la complessità della memoria è diversa nei due algoritmi?

Modifica: la domanda è chiaramente molto dipendente dal contesto ed è soggettiva. Speravo che le cinque condizioni precedenti, tuttavia, consentissero di ottenere alcuni suggerimenti. Sono felice di modificare ulteriormente la domanda e fornire ulteriori dettagli, se necessario per ottenere una risposta. Grazie!

1
dipende davvero dal contesto. Ad esempio, per alcuni algoritmi sequenziali, DFS e BFS sono molto diversi e si potrebbe anche non funzionare. In impostazioni parallele, DFS (o almeno una variante) è P-complete, mentre BFS è "facile in parallelo".
Suresh Venkat,

@SureshVenkat - Sono d'accordo sul fatto che la domanda dipende molto dal contesto. Nell'interesse di non iniziare un dibattito, mi sono astenuto dal prendere nomi di "i due algoritmi" a rischio di
sembrare

4
Il problema è che c'è vicino e c'è vicino. C'è un modo di pensare al metodo di aggiornamento del peso moltiplicativo come "essenzialmente una ricerca binaria", ma in un contesto sbagliato questo sembrerebbe folle. FWIW, in tutti i tuoi casi sopra posso immaginare di dichiarare diversi i algoritmi.
Suresh Venkat,

1
Questa domanda mi sembra troppo soggettiva. Fondamentalmente stai chiedendo una definizione di "simile", quando non esiste una definizione canonica.
Joe

Risposte:

23

È difficile dare anche una definizione coerente di "L'algoritmo A è simile all'algoritmo B". Per uno, non credo che "devono risolvere lo stesso problema" sia una condizione necessaria. Spesso quando si dice in un articolo che "l'algoritmo del Teorema è simile all'algoritmo nel Teorema ", l'algoritmo sta effettivamente risolvendo un problema diverso da quello di , ma ha alcune modifiche minori per gestire il nuovo problema .$A$$A$$2$$2$$B$$B$$1$$1$$A$$A$$B$$B$

Anche cercando di determinare cosa significhi per due algoritmi uguali è un problema interessante e difficile. Vedi l'articolo "Quando due algoritmi sono uguali?" http://research.microsoft.com/~gurevich/Opera/192.pdf

17

Più spesso, significa "Non voglio scrivere l'algoritmo B in dettaglio, perché tutti i dettagli interessanti sono quasi identici a quelli dell'algoritmo A e non voglio andare oltre il limite di 10 pagine, e comunque la scadenza per la presentazione è tra tre ore. "

7

If you mean "similar" in the colloquial sense, I think JeffE's answer captures what some people mean.

In a technical sense though, it depends on what you care about. If asymptotic time complexity is all you care about, the difference between recursion and iteration may not matter. If computatability is all you care about, the difference between a counter variable and a one-symbol stack do not matter.

To compare algorithms, a first step would be to make the notion of equivalence precise. Intuitively, let $A$$A$ be the space of algorithms and $M$$M$ be a space of mathematical objects and $\mathit{s}\mathit{e}\mathit{m}:A\to M$$\mathit{sem}: A \to M$ be a function encoding that $\mathit{s}\mathit{e}\mathit{m}\left(P\right)$$\mathit{sem}(P)$ is the meaning of algorithm $P$$P$. The space $M$$M$ could contain anything ranging from the number of variables in your algorithm, to its state-graph or it's time complexity. I don't believe there is an absolute notion of what $M$$M$ can be. Given $M$$M$ though, we can say two algorithms are equivalent if $\mathit{s}\mathit{e}\mathit{m}\left(P\right)$$\mathit{sem}(P)$ equals $\mathit{s}\mathit{e}\mathit{m}\left(Q\right)$$\mathit{sem}(Q)$. Let me add that I think each of the five criteria you mentioned can be formalised mathematically in this manner.

If we want to talk about an algorithm being more general than another (or an algorithm refining another), I would endow $M$$M$ with more structure. Imagine that $\left(M,⊑\right)$$(M, \sqsubseteq)$ is a partially ordered set and the order $x⊑y$$x \sqsubseteq y$ encodes that $x$$x$ is a more defined object than $y$$y$. For example, if $M$$M$ contains sets of traces of an algorithm and $⊑$$\sqsubseteq$ is set inclusion, $\mathit{s}\mathit{e}\mathit{m}\left(P\right)⊑\mathit{s}\mathit{e}\mathit{m}\left(Q\right)$$\mathit{sem}(P) \sqsubseteq \mathit{sem}(Q)$ means that every trace of $P$$P$ is a trace of $Q$$Q$. We can interpret this as saying that $P$$P$ is more deterministic than $Q$$Q$.

Next, we could ask if it's possible to quantify how close two algorithms are. In this case, I would imagine that $M$$M$ has to be endowed with a metric. Then, we can measure the distance between the mathematical objects that two algorithms represent. Further possibilities are to map algorithms to measure spaces or probability spaces and compare them using other criteria.

More generally, I would ask - what do you care about (in intuitive terms), what are the mathematical objects representing these intuitive properties, how can I map from algorithms to these objects, and what is the structure of this space? I would also ask if the space of objects enjoys enough structure to admit a notion of similarity. This is the approach I would take coming from a programming language semantics perspective. I'm not sure if you find this approach appealing, given the vastly different cultures of thought in computer science.

5

Along the lines of Jeff's answer, two algorithms are similar if the author of one of them expects that the author of the other one might be reviewing her paper.

But joking aside, in the theory community, I would say that what problem algorithm A is solving is rather tangental to whether it is "similar" to algorithm B, which might be solving a completely different problem. A is similar to B if it "works" because of the same main theoretical idea. For example, is the main idea in both algorithms that you can project the data into a much lower dimensional space, preserve norms with the Johnson-Lindenstrauss lemma, and then do a brute-force search? Then your algorithm is similar to other algorithms that do this, no matter what problem you are solving. There are some small number of heavy-duty algorithmic techniques that can be used to solve a wide variety of problems, and I would think that these techniques form the centroids of many sets of "similar" algorithms.

3

Very interesting question, and very nice paper Ryan!

I do definitely agree with the idea that making an assessment on the overall similarity between algorithms is mainly a subjective value judgement. While from a technical point of view there are a number of features that are closely observed to decide upon the similarity of algorithms, in the end, it is also a matter of personal taste. I will try to provide a description of the importance of both sides of the same coin while referring to the particular points of your question:

From a technical point of view:

1. Ryan already pointed out that both algorithms must solve the same problem. One could go even further and generalize this notion by saying that it is usually enough to prove that there is a polynomial transformation of the same instance that is understoodable by algorithm A so that algorithm B can handle it. However, this would be actually very weak. I do prefer to think of the similarity in a stronger sense.
2. However, I would never expect for two equivalent algorithms to have the same intuitive idea ---though, again, this is a definition which ain't easy to capture. More than that, it is often the case that algorithms that are deemed to be similar do not follow the main rationale. Consider for example some sorting algorithms which however, originated in different ways following different ideas. As an extreme example consider genetic algorithms which are usually considered by the mathematical community just as stochastic processes (and therefore they are equivalent in their view) which are then modeled and analyzed in quite a different way.
3. Moreover, I would even generalize this notion to say that other technicalities such as the termination condition or the pre-processing stage do not matter often. But this is not always the case. See for example Dijkstra's Algorithm versus Uniform Cost Search or a Case Against Dijkstra's Algorithm. Both algorithms are so close that most people do not tell the difference, yet the differences (being technical) were very important for the author of that paper. Much the same happens with the pre-processing step. In case you are familiar with the $N$$N$-puzzle, then observe that an A${}^{\ast }$$^*$-like search algorithm using the Manhattan distance or $\left({N}^{2}-1\right)$$(N^2-1)$ additive pattern databases would actually expand the same number of nodes in exactly the same order, and that makes both algorithms (and their heuristics) to be strictly equivalent in a very strong sense, whereas the first approach has no pre-processing and the second one has a significant overhead before starting to solve a particular instance. However, as soon as your Pattern Databases consider more simulatenous interactions there is a huge gap in performance between them, so that they are definitely different ideas/algorithms.
4. As a matter of fact, I think that most people would judge algorithms for their purpose and performance. Therefore, asymptotic performance is a good metric to reason about the similarity between programs. However, bear in mind that this performance is not necessarily the typical case so that if two algorithms have the same asymptotic performance but behave differently in practice, then you would probably conclude that they are different. The strong evidence in this regard would be that both algorithms have the same performance both on time and memory (and this, as Suresh said, makes DFS and BFS to look different). In case this assertion does not sound convincing to you, please refer to the excellent (and very recommended book): Programming the Universe by Seth Lloyd. In page 189 he refers to a list with more than 30 measures of complexity that can be used to regard algorithms as being different.

So what makes algorithms to be similar/different? In my view (and this is purely speculative), the main difference is about what they suggest to you. Many, many (many!) algorithms differ just in a few technicalities when serving to the same purpose so that the typical case is different for different ranges of the input. However, the greatest of all differences is (to my eye) what they suggest to you. Algorithms have different capabilities and therefore their own strengths and weaknesses. If two algorithms look like being the same but might be extended in different ways to cope with different cases then I would conclude that they are different. Often, however, two algorithms do look much the same so that you would regard them to be the same ... until someone arrives making a key distinction and suddenly, they are completely different!

Sorry, my response was in the end so long ...

Cheers,

1
Actually, Ryan suggested that it is not necessary for both algorithms to solve the same problem.
Jeffε

True! I was just collecting my opinions in this regard, but you are definitely right!
Carlos Linares López

2

Any mention of similarity without defining a similarity metric is not well-defined. There are many ways in which two algorithms can be similar:

Quicksort and Mergesort solve very similar problems, but they use different algorithms to do so. They have similar algorithmic complexity (although their worst-case performance and memory usage can vary). Quicksort and Mergesort are both similar to Bubblesort,however Bubblesort has very different performance metrics. If you ignore complexity statistics Quicksort, Mergesort and Bubblesort are all in the same equivalence class. However, if you care at all about algorithmic complexity, then Quicksort and Mergesort are much more similar to each other than either is to Bubblesort.

Smith-Waterman dynamic programming and HMM-sequence comparison attempt to solve the problem of aligning two sequences. However, they take different inputs. Smith-Waterman takes two sequences as input, and HMM-sequence comparisons take a HMM and a sequence as input. Both output sequence alignments. In terms of motivating ideas, both of these are similar to Levenshtein's edit distance, but only at a very high level.

Here are some criteria by which two algorithms might be called similar:

1. Input/output types
2. Algorithmic/Memory Complexity
3. Assumptions about types of inputs (eg only positive numbers or floating point stability)
4. Nested relationships (eg some algorithms are special cases of others)

The critical decision about the meaning of similarity remains. Sometimes you care about the complexity of an algorithm, sometimes you don't. As the definition of similarity depends on the context of the discussion, the term "similar algorithm" isn't well-defined.

Utilizzando il nostro sito, riconosci di aver letto e compreso le nostre Informativa sui cookie e Informativa sulla privacy.