Quale definizione di tasso di crescita asintotica dovremmo insegnare?

35

Quando seguiamo i libri di testo o la tradizione standard, la maggior parte di noi insegna la seguente definizione di notazione big-Oh nelle prime lezioni di una classe di algoritmi:

f = O (g) iff (\exists c > 0) (\exists n_{0} \geq 0) (\forall n \geq n_{0}) (f (n) \leq c \cdot g (n)) .

$f = O(g) \mbox{ iff } (\exists c > 0)(\exists n_0 \geq 0)(\forall n \geq n_0)(f(n) \leq c \cdot g(n)).$ Forse diamo anche l'intero elenco con tutti i suoi quantificatori:

$f = o(g) \mbox{ iff } (\forall c > 0)(\exists n_0 \geq 0)(\forall n \geq n_0)(f(n) \leq c \cdot g(n))$
$f = O(g) \mbox{ iff } (\exists c > 0)(\exists n_0 \geq 0)(\forall n \geq n_0)(f(n) \leq c \cdot g(n))$
$f = \Theta(g) \mbox{ iff } (\exists c > 0)(\exists d > 0)(\exists n_0 \geq 0)(\forall n \geq n_0)(d \cdot g(n) \leq f(n) \leq c \cdot g(n))$
$f = \Omega(g) \mbox{ iff } (\exists d > 0)(\exists n_0 \geq 0)(\forall n \geq n_0)(f(n) \geq d \cdot g(n))$
$f = \omega(g) \mbox{ iff } (\forall d > 0)(\exists n_0 \geq 0)(\forall n \geq n_0)(f(n) \geq d \cdot g(n))$ .

However, since these definitions are not so easy to work with when it comes to proving even simple things such as $5 n \log^4 n + \sqrt{n\log n} = o(n^{10/9})$ , most of us quickly move to introduce the "trick of the limit":

$f = o(g)$ if $\lim_{n \rightarrow \infty} f(n)/g(n)$ exists and is $0$ ,
$f = O(g)$ if $\lim_{n \rightarrow \infty} f(n)/g(n)$ exists and is not $+\infty$ ,
$f = \Theta(g)$ if $\lim_{n \rightarrow \infty} f(n)/g(n)$ exists and is neither $0$ nor $+\infty$ ,
$f = \Omega(g)$ if $\lim_{n \rightarrow \infty} f(n)/g(n)$ exists and is not $0$ ,
$f = \omega(g)$ if $\lim_{n \rightarrow \infty} f(n)/g(n)$ exists and is $+\infty$ .

My question is:

Would it be a big loss for teaching an undergraduate algorithms class to take the limit conditions as the definitions of $o$ , $O$ , $\Theta$ , $\Omega$ , and $\omega$ ? That's what we all end-up using anyway and it seems pretty clear to me that skipping the quantifier definitions makes everybody's life easier.

I would be interested to know if you have encountered some convincing natural case where the standard $c,n_0$ -definitions are actually required, and if not, whether you have a convincing argument to keep the standard $c,n_0$ -definitions upfront anyway.

ds.algorithms soft-question teaching

— slimton
fonte

1

The tag should really be "teaching" but I couldn't find any related tag and I am not allowed to create new tags.

— slimton

1

This basically absorbs the quantifiers into the epsilon-delta definition of limits. My only concern would be that many CS students haven't taken analysis and so their understanding of limits is mostly mechanical. For enabling them to quickly calculate, though, it's a no-brainer.

— Per Vognsen

6

Note that your two definitions of O() are not equivalent (the same caveat applies to Θ() and Ω()). Consider the case where f(n)=2n for even n and f(n)=1 for odd n. Is f(n)=O(n)? I prefer to use limsup instead of lim so that I can say f(n)=Θ(n) in this case (although neither of your definitions allows this). But this may be my personal preference (and even a nonstandard practice), and I have never taught a class.

— Tsuyoshi Ito

2

@Tsuyoshi: I thought the point of the "limit trick" was that it was a sufficient but not necessary condition for

O ()

$O()$ . (For

o ()

$o()$ it is also necessary.) The oscillating function counterexample does not have a limit.

— András Salamon

1

Shouldn't you replace the symbol

=

$=$ by

\in

$\in$ in each definition and property? I found the use of

=

$=$ very disturbing as a student.

— Jeremy

13

I prefer teaching the original definition with quantifiers.

IMO, humans generally have trouble in understanding formulas and definitions with more than two alternation of quantifiers directly. Introducing new quantifiers can clarify what the definition means. Here, the last two quantifiers just mean "for all sufficiently large n", introducing this kind of quantification can help.

The pictures I draw for explaining these concepts match better with the quantifier versions.

I think the limit simplification is useful for engineering student who are only interested in computing the growth rate, but won't be as useful for computer science students. In fact, using this simplification can cause more harm than good.

This idea is similar to suggestion that we use the rules for computing derivatives (of polynomials, exponentiation, ..., chain rule, ...) in place of the epsilon-delta definition of it, which IMHO is not a good idea.

— Kaveh
fonte

The eventual domination notion is also helpful:

f (x) ≪ g (x)

$f(x) \ll g(x)$ iff

\esits m \forall n > m f (n) < g (n)

$\esits m \forall n>m f(n) < g(n)$ . Now

f \in O (g)

$f \in O(g)$ iff there is

c > 0

$c>0$ s.t.

f (x) ≪ c g (x)

$f(x) \ll c g(x)$ .

— Kaveh

9

Edit: Major revision in revision 3.

Since I have never taught a class, I do not think that I can claim anything convincingly about what we should teach. Nevertheless, here are what I thought about it.

There are natural examples where the “limit trick” as is written cannot be applied. For example, suppose you implement a “variable-length vector” (like vector<T> in C++) by using a fixed-length array with size doubling (that is, every time you are about to exceed the size of the array, you reallocate the array twice as large as now and copy all the elements). The size S(n) of the array when we store n elements in the vector is the smallest power of 2 greater than or equal to n. We want to say that S(n)=O(n), but using the “limit trick” as is written as definition would not allow us to do so because S(n)/n oscillates densely in the range [1,2). The same applies to Ω() and Θ().

As a somewhat separate matter, when we use these notations to describe the complexity of an algorithm, I think that your definition of Ω() is sometimes inconvenient (although I guess that that definition is common). It is more convenient to define that f(n)=Ω(g(n)) if and only if limsup f(n)/g(n) > 0. This is because some problems are trivial for infinitely many values of n (such as the perfect maching problem on a graph with an odd number n of vertices). The same applies to Θ() and ω().

Therefore, I personally find that the following definitions the most convenient to use to describe the complexity of an algorithm: for functions f, g: ℕ → ℝ_>0,

f(n)=o(g(n)) if and only if limsup f(n)/g(n) = 0. (This is equivalent to lim f(n)/g(n) = 0.)
f(n)=O(g(n)) if and only if limsup f(n)/g(n) < ∞.
f(n)=Θ(g(n)) if and only if 0 < limsup f(n)/g(n) < ∞.
f(n)=Ω(g(n)) if and only if limsup f(n)/g(n) > 0. (This is equivalent to that f(n) is not o(g(n)).)
f(n)=ω(g(n)) if and only if limsup f(n)/g(n) = ∞. (This is equivalent to that f(n) is not O(g(n)).)

or equivalently,

f(n)=o(g(n)) if and only if for every c>0, for sufficiently large n, f(n) ≤ c⋅g(n).
f(n)=O(g(n)) if and only if for some c>0, for sufficiently large n, f(n) ≤ c⋅g(n).
f(n)=Θ(g(n)) if and only if f(n)=O(g(n)) and f(n)=Ω(g(n)).
f(n)=Ω(g(n)) if and only if for some d>0, for infinitely many n, f(n) ≥ d⋅g(n).
f(n)=ω(g(n)) if and only if for every d>0, for infinitely many n, f(n) ≥ d⋅g(n).

But I do not know if this is a common practice or not. Also I do not know if it is suitable for teaching. The problem is that we sometimes want to define Ω() by liminf instead (as you did in the first definition). For example, when we say “The probability of error of this randomized algorithm is 2^−Ω(n),” we do not mean that the error probability is exponentially small merely for infinitely many n!

— Tsuyoshi Ito
fonte

I also use the limsup definitions, but for students who haven't seen limsup (almost all of them) I have to expand into explicit quantifiers anyway.

— Jeffε

@JeffE: I agree that most students have not seen limsup, so if we use the limsup definitions, we have to use quantifiers instead in class.

— Tsuyoshi Ito

2

The problem with the quantifier versions is that they are difficult to remember and visualize. I prefer

l i m s u p

$limsup$ because it can be described as "highest limit point". A possible explanation is: "It is like

l i m

$lim$ , except that

l i m

$lim$ only works when the sequence converges. If the sequence does not converge, for instance because the algorithm oscillates between very fast for some

n

$n$ and slow for other

n

$n$ , then we take the highest limit point."

— Heinrich Apfelmus

Actually, are there any natural examples for algorithms where the running time does oscillate?

— Heinrich Apfelmus

2

@Heinrich: I already mentioned the running time of an algorithm to find a perfect matching of a graph on n vertices, but does it count as a natural example? I added another example where the running time does not oscillate but f(n)/g(n) oscillates. The example speaks about space complexity, but the time complexity of the same example has the same property.

— Tsuyoshi Ito

8

Using limits is a bit confusing since (1) its a more complicated notion (2) it doesn't capture f=O(g) nicely (as we can see in the discussion above). I usually talk about functions from the Natural (strictly positive) numbers to the Natural numbers (which suffices for running times), skip the little-o stuff, and then the definition is concise and appropriate for 1st year undergrads:

Dfn: f=O(g) if for some C for all n we have that f(n)<=C*g(n)

— Noam
fonte

1

First I did not like this definition because stating “all n” obscures the important fact that the O() notation only cares about the behavior of the functions for large n. However, no matter which definition we choose, I guess that we should explain this fact together with the definition. Thinking that way, stating this simple definition seems quite good.

— Tsuyoshi Ito

While this captures the essence, I dislike that if

f (n) = n

$f(n) = n$ for all

n

$n$ ,

g (n) = 0

$g(n) = 0$ for all

n

$n$ up to

N_{0}

$N_0$ , and

g (n) = f (n) + 1

$g(n) = f(n)+1$ otherwise, then

f = O (g)

$f=O(g)$ but this definition fails to capture this relationship. So one has to add some handwaving about functions that are well-behaved in some sense.

— András Salamon

2

The point of talking about functions whose range is the Natural numbers (not including 0) is exactly not to fall into problems with g(n)=0.

— Noam

1

@Warren Victor Shoup in his book on Computational Number Theory uses the notation

l e n (a)

$len(a)$ instead of

\log a

$\log a$ in running time analysis, which I found neat.

— Srivatsan Narayanan

1

@Warren (continued) This is how he explains it: "In expressing the running times of algorithms in terms of an input

a

$a$ , we generally prefer to write

l e n (a)

$len(a)$ rather than

\log a

$\log a$ . One reason is esthetic: writing

l e n (a)

$len(a)$ stresses the fact that the running time is a function of the bit length of

a

$a$ . Another reason is technical: for big-

O

$O$ estimates involving functions on an arbitrary domain, the appropriate inequalities should hold throughout the domain, and for this reason, it is very inconvenient to use functions, like

\log

$\log$ , which vanish or are undeﬁned on some input."

— Srivatsan Narayanan

5

When I took basic courses, we were given the $\exists c,n_0 \dots$ thing as definition and the other stuff as theorem.

I think the first one is more natural for many people that think discrete rather than continuous, that is most computer scientists (in my experience). It also fits the way we usually talk about those things better: "There is a polynomial function of degree 3 that is an upper bound for this $f$ up to a constant factor."

Edit: You can get even closer to this way of speaking if you use this definition: $f \in \mathcal{O}(g) :\Leftrightarrow \exists c,d > 0 \forall n \geq 0 : f(n) \leq c\cdot g(n) + d$ (Note that $d=f(n_0)$ connects this definition with the one usually given)

The limit stuff is pretty useful for calculating complexity classes, that is with pen and paper.

In any case, I think it is very useful for students to learn that there is a wealth of (hopefully) equivalent definitions. They should be able to realize that and pick out differences in case of nonequivalent definitions.

— Raphael
fonte

4

Having studied these concepts only a few years ago, they were not the hardest ones to grasp for my class (as opposed to concepts like induction, or contra positives). Limits and limsups are only more "intuitive" for those familiar with calculus in my opinion. But students with such a math grounding will have set-theoretic background anyway, so that they can process discrete qualifiers.

Also, more importantly, remember that ultimately your students will go on (hopefully) to read other cs theory textbooks, and perhaps even research papers one day. As such, it is better for them to be comfortable with standard notation in the field, even if it was not ideally conceived initially. There is no harm giving them alternate definitions as well, once they've assimilated the standard ones.

— Amir
fonte

3

For an interesting take on the issue, look at Don Knuth's nicely written letter "Calculus via O notation". He advocates the reverse view that calculus should be taught via the 'A', 'O' and 'o' notations.

Note: He uses the "A" notation as a preliminary step in defining the standard "O" notation. A quantity $x$ is $A$ of $y$ (i.e., $x = A(y)$ ), if $|x| \leq y$ . In particular, it makes sense to say $100$ is $A(200)$ .

— Srivatsan Narayanan
fonte

1

Tsuyoshi Ito's definitions don't look quite right. For little-omega and big-omega the definitions should use liminf, not limsup. The definition of big-theta needs both a lower-bound on liminf and an upper-bound on limsup.
One definition of f(n)=O(g(n)) is that there exists another function f'(n) >= f(n) such that lim f'(n)/g(n) < infinity.
Why are newbies allowed to post answers but not make comments?

— Warren Schudy
fonte

1

As for item 1, I mean limsup in all cases, and the reason is explained in the second paragraph of my answer.

— Tsuyoshi Ito

it's a spam blocking mechanism unfortunately.

— Suresh Venkat

Aso, you can use latex in your answers.

— Suresh Venkat

1

First, I try to develop in students some intuition, before showing equations.

"Merge-sort vs Insertion-Sort" is good starting point.

Then, later... I try to show both ways. Students, that relies more on intuition prefer

f = O (g) iff (\exists c > 0) (\exists n_{0} \geq 0) (\forall n \geq n_{0}) (f (n) \leq c \cdot g (n)) .

$f = O(g) \mbox{ iff } (\exists c > 0)(\exists n_0 \geq 0)(\forall n \geq n_0)(f(n) \leq c \cdot g(n)).$ while those who relies more on math, equasions, algebra etc. , they prefer "

lim_{n \to \infty}

$\lim_{n \rightarrow \infty}$ " definitions.

Another aspect is that, it heavily depends on concrete studies' program. IMHO depending on previous subjects one of definitions will be more suitable - while IMHO still it is good idea to show both and accept both types of solutions.

— Grzegorz Wierzowiecki
fonte