Valuta l'intervallo definito di distribuzione normale

So che manca una formula facile da gestire per il CDF di una distribuzione normale, a causa della complicata funzione di errore in esso.

Tuttavia, mi chiedo se esiste una buona formula per $N(c_{-} \leq x < c_{+}| \mu, \sigma^2)$ . O quale potrebbe essere l'approssimazione dello "stato dell'arte" per questo problema.

normal-distribution approximation

— bayerj
fonte

Risposte:

Dipende esattamente da cosa stai cercando . Di seguito sono riportati alcuni brevi dettagli e riferimenti.

Gran parte della letteratura per i centri di approssimazione attorno alla funzione

Q (x) = \int_{x}^{\infty} \frac{1}{\sqrt{2 π}} e^{- \frac{u^{2}}{2}} d u

$Q(x) = \int_x^\infty \frac{1}{\sqrt{2\pi}} e^{-\frac{u^2}{2}} \, \mathrm{d}u$

per . Questo perché la funzione che hai fornito può essere scomposta come una semplice differenza della funzione sopra (eventualmente regolata da una costante). Questa funzione è indicata da molti nomi, tra cui "coda superiore della distribuzione normale", "integrale normale destro" e "funzione gaussiana ", solo per citarne alcuni. Vedrai anche approssimazioni al rapporto di Mills , che è $x > 0$ $Q$ doveè il pdf gaussiana.

R (x) = \frac{Q (x)}{φ (x)}

$R(x) = \frac{Q(x)}{\varphi(x)}$

φ (x) = (2 π)^{- 1 / 2} e^{- x^{2} / 2}

$\varphi(x) = (2\pi)^{-1/2} e^{-x^2 / 2}$

Qui elencherò alcuni riferimenti per vari scopi che potrebbero interessarti.

computazionale

Lo standard di fatto per il calcolo della funzione o della relativa funzione di errore complementare è $Q$

WJ Cody, approssimazioni razionali di Chebyshev per la funzione di errore , matematica. Comp. , 1969, pagg. 631-637.

Ogni implementazione (che si rispetti) utilizza questo documento. (MATLAB, R, ecc.)

Approssimazioni "semplici"

Abramowitz e Stegun ne hanno uno basato su un'espansione polinomiale di una trasformazione dell'input. Alcune persone lo usano come approssimazione "di alta precisione". Non mi piace per quello scopo poiché si comporta male intorno allo zero. Ad esempio, la loro approssimazione non non cede , che ritengo un grande no-no. A volte cose cattive $\hat{Q}(0) = 1/2$ accadono per questo.

Borjesson e Sundberg forniscono una semplice approssimazione che funziona abbastanza bene per la maggior parte delle applicazioni in cui si richiedono solo poche cifre di precisione. L' errore relativo assoluto non è mai peggiore dell'1%, il che è abbastanza buono considerando la sua semplicità. L'approssimazione base è e le loro scelte preferite delle costanti sonoe. Tale riferimento è

\hat{Q} (x) = \frac{1}{(1 - a) x + a \sqrt{x^{2} + b}} φ (x)

$\hat{Q}(x) = \frac{1}{(1-a) x + a \sqrt{x^2 + b}} \varphi(x)$

a = 0.339

$a = 0.339$

b = 5.51

$b = 5.51$

PO Borjesson e CE Sundberg. Semplici approssimazioni della funzione di errore Q (x) per applicazioni di comunicazione . IEEE Trans. Commun. , COM-27 (3): 639–643, marzo 1979.

Ecco una trama del suo assoluto errore relativo.

inserisci qui la descrizione dell'immagine

La letteratura elettrotecnica è piena di varie approssimazioni del genere e sembrano interessarsi troppo intensamente. Molti di loro sono poveri o si espandono in espressioni molto strane e contorte.

Potresti anche guardare

W. Bryc. Un'approssimazione uniforme al giusto integrale normale . Matematica applicata e calcolo , 127 (2-3): 365–374, aprile 2002.

La frazione continua di Laplace

Laplace ha una bellissima frazione continua che produce limiti superiori e inferiori successivi per ogni valore di . È, in termini di rapporto di Mills, $x > 0$

R (x) = \frac{1}{x +} \frac{1}{x +} \frac{2}{x +} \frac{3}{x +} \dots,

$R(x) = \frac{1}{x+}\frac{1}{x+}\frac{2}{x+}\frac{3}{x+}\cdots ,$

dove la notazione che ho usato è abbastanza standard per una frazione continua , cioè . Questa espressione non converge molto velocemente per la piccola , tuttavia, e diverge in $1/(x+1/(x+2/(x+3/(x+\cdots))))$ $x$ $x = 0$ .

Questa frazione continua produce in realtà molti dei limiti "semplici" su che furono "riscoperti" dalla metà alla fine del 1900. È facile intuire che per una frazione continua in forma "standard" (cioè composta da coefficienti interi positivi), troncare la frazione in termini dispari (pari) dà un limite superiore (inferiore). $Q(x)$

Quindi, Laplace ci dice immediatamente che entrambi i quali sono limiti "riscoperti" a metà del 1900. In termini di funzione , questo equivale a

\frac{x}{x^{2} + 1} < R (x) < \frac{1}{x},

$\frac{x}{x^2 + 1} < R(x) < \frac{1}{x} \>,$

Q

$Q$

Una prova alternativa di ciò utilizzando una semplice integrazione per parti può essere trovata in S. Resnick,

\frac{x}{x^{2} + 1} φ (x) < Q (x) < \frac{1}{x} φ (x) .

$\frac{x}{x^2 + 1} \varphi(x) < Q(x) < \frac{1}{x} \varphi(x) .$ Adventures in Stochastic Processes , Birkhauser, 1992, nel capitolo 6 (Brownian motion). L'errore relativo assoluto di questi limiti non è peggiore di

, come mostrato in questa risposta correlata .

x^{- 2}

$x^{-2}$

Si noti, in particolare, che le disuguaglianze sopra implicano immediatamente che . Questo fatto può essere dimostrato anche usando la regola di L'Hopital. Questo aiuta anche a spiegare la scelta della forma funzionale dell'approssimazione di Borjesson-Sundberg. Qualsiasi scelta di mantiene l'equivalenza asintotica come . Il parametro $Q(x) \sim \varphi(x)/x$ $a \in [0,1]$ $x \to \infty$ $b$ funge da "correzione di continuità" vicino a zero.

Ecco un grafico della funzione e dei due limiti di Laplace. $Q$

Limiti di Laplace per la coda superiore della distribuzione normale

CI. C. Lee ha un documento dei primi anni '90 che fa una "correzione" per piccoli valori di . Vedere $x$

CI. C. Lee. Su Laplace è proseguita la frazione per l'integrale normale . Ann. Inst. Statist. Matematica. , 44 (1): 107–120, marzo 1992.

Probabilità di Durrett : teoria ed esempi fornisce i limiti superiore e inferiore classici su alle pagine 6–7 della 3a edizione. Sono pensati per valori maggiori di (diciamo, ) e sono asintoticamente stretti. $Q(x)$ $x$ $x > 3$

Spero che questo ti inizi. Se hai un interesse più specifico, potrei essere in grado di indicarti da qualche parte.

— cardinale
fonte

Suppongo di essere troppo tardi per l'eroe, ma volevo commentare il post del cardinale e questo commento è diventato troppo grande per la sua scatola prevista.

Per questa risposta, suppongo che ; formule di riflessione appropriate possono essere utilizzate per negativi $x >0$ $x$

$\mathrm{erf}(x)$ $R(x)$ (come definito nella risposta del cardinale).

Esistono in realtà modi alternativi per calcolare la funzione di errore (complementare) oltre all'uso delle approssimazioni di Chebyshev. Poiché l'uso di un'approssimazione di Chebyshev richiede la memorizzazione di non pochi coefficienti, questi metodi potrebbero avere un vantaggio se le strutture degli array sono un po 'costose nel tuo ambiente di elaborazione (potresti incorporare i coefficienti, ma il codice risultante sembrerebbe probabilmente un barocco pasticcio).

$|x|$

R (x) = \sqrt{\frac{π}{2}} \exp (\frac{x^{2}}{2}) - x \sum_{j = 0}^{\infty} \frac{2^{j} j!}{(2 j + 1)!} x^{2 j}

$R(x)=\sqrt{\frac{\pi}{2}}\exp\left(\frac{x^2}{2}\right)-x\sum_{j=0}^\infty\frac{2^j j!}{(2j+1)!}x^{2j}$ (adapted from formula 7.1.6)

Note that the coefficients of $x^{2j}$ in the series $c_j=\frac{2^j j!}{(2j+1)!}$ can be computed by starting with $c_0=1$ and then using the recursion formula $c_{j+1}=\frac{c_j}{2j+3}$ . This is convenient when implementing the series as a summation loop.

cardinal gave the Laplacian continued fraction as a way to bound Mills's ratio for large $|x|$ ; what is not as well-known is that the continued fraction is also useful for numerical evaluation.

Lentz, Thompson and Barnett derived an algorithm for numerically evaluating a continued fraction as an infinite product, which is more efficient than the usual approach of computing a continued fraction "backwards". Instead of displaying the general algorithm, I'll show how it specializes to the computation of Mills's ratio:

$\displaystyle Y_0=x,\,C_0=Y_0,\,D_0=0$
$\text{repeat for }j=1,2,\dots$

D_{j} = \frac{1}{x + j D_{j - 1}}

$D_j=\frac1{x+jD_{j-1}}$

C_{j} = x + \frac{j}{C_{j - 1}}

$C_j=x+\frac{j}{C_{j-1}}$

H_{j} = C_{j} D_{j}

$H_j=C_j D_j$

Y_{j} = H_{j} Y_{j - 1}

$Y_j=H_j Y_{j-1}$

until | H_{j} - 1 | < tol

$\text{until }|H_j-1| < \text{tol}$

R (x) = \frac{1}{Y_{j}}

$\displaystyle R(x)=\frac1{Y_j}$

where $\text{tol}$ determines the accuracy.

The CF is useful where the previously mentioned series starts to converge slowly; you will have to experiment with determining the appropriate "break point" to switch from the series to the CF in your computing environment. There is also the alternative of using an asymptotic series instead of the Laplacian CF, but my experience is that the Laplacian CF is good enough for most applications.

Finally, if you don't need to compute the (complementary) error function very accurately (i.e., to only a few significant digits), there are compact approximations due to Serge Winitzki. Here is one of them:

R (x) \approx \frac{\sqrt{2 π} + x (π - 2)}{2 + x \sqrt{2 π} + x^{2} (π - 2)}

$R(x)\approx \frac{\sqrt{2\pi}+x(\pi-2)}{2+x\sqrt{2\pi}+x^2(\pi-2)}$

This approximation has a maximum relative error of $1.84\times 10^{-2}$ and becomes more accurate as $x$ increases.

— J. M. is not a statistician
fonte

(This reply originally appeared in response to a similar question, subsequently closed as a duplicate. The O.P. only wanted "an" implementation of the Gaussian integral, not necessarily "state of the art." In his comments it became apparent that a relatively simple, short implementation would be preferred.)

As comments point out, you need to integrate the PDF. There are many ways to perform the integral. Long ago, when computations were slow and expensive, David Hill worked out an approximation using simple arithmetic (rational functions and an exponentiation). It has double precision accuracy for typical arguments (between $-8.5$ and $+8.5$ , approximately). In 1973 he published a Fortran version in Applied Statistics called ALNORM.F. Over the years I have ported this to various environments which did not have a Normal (Gaussian) integral or which had suspect ones (such as Excel).

A MatLab version (with appropriate attributions) is available at http://people.sc.fsu.edu/~jburkardt/m_src/asa005/alnorm.m. A completely undocumented version of the original Fortran code appears on a "Koders Code Search" (sic) site.

Many years ago I ported this to AWK. This version may be more congenial for the modern developer to port due to its C-like (rather than Fortran) syntax and some additional comments I inserted when developing and testing it, because I needed to enhance its accuracy. It appears below.

For those without much experience porting scientific/math/stats code, some words of advice: one single typographical mistake can create serious errors that might not be easily detectable. (Trust me on this, I've made lots of them.) Always, always create a careful and exhaustive test. Because the normal integral/Gaussian integral/error function is available in so many tables and so much software, it's simple and fast to tabulate a huge number of values of your ported function and systematically compare (i.e., with the computer, not by eye) the values to correct ones. You can see such a test at the beginning of my code: it produces a table of values in -8.5:8.5 (by 0.1) which can be piped (via STDOUT) to another program for systematic checking.

Another testing approach--for those with enough numerical analysis background to know how to estimate expected errors--would be to numerically differentiate the values and compare them to the PDF (which is readily computed).

By the way: this code is only for the case with a mean of $0$ and unit standard deviation ("sigma"). But that's all one needs: to integrate from $-\infty$ to $x$ when the mean is $\mu$ and the SD is $\sigma$ , just compute $z = (x-\mu)/\sigma$ and apply alnorm to it.

Edit

I tested a port of alnorm to Mathematica, which computes the values to arbitrary precision. To compare the results, here is a plot of the natural log of the ratios of upper tail values $1 - \Phi(z)$ with $z\ge 1$ . (A positive relative error means alnorm is too large.)

Alnorm

The values are always accurate to $4 \times 10^{-11}$ relative to the vanishingly small tail probabilities. You can see where the calculation switches to an asymptotic formula (at $z=16$ ) and it is evident that this formula becomes extremely accurate as $z$ increases. The plot stops at $z=\sqrt(2\times 708) \approx 37.6$ because here is where double-precision exponentiation begins underflowing.

For example, alnorm[-6.0] returns $9.865\ 876\ 450\ 315E-10$ while the true value, equal to $\frac{1}{2}\text{erfc}(3\sqrt{2})$ , is approximately $9.865\ 876\ 450\ 377E-10$ , first differing in the twelfth decimal digit.

NB As part of this edit, I changed UPPER_TAIL_IS_ZERO from 15. to 16. in the code: it makes the result a tiny bit more accurate for $Z$ between $15$ and $16$ . (End of edit.)

#----------------------------------------------------------------------#
#   ALNORM.AWK
#   Compute values of the cumulative normal probability function.
#   From G. Dallal's STAT-SAK (Fortran code).
#   Additional precision using asymptotic expression added 7/8/92.
#----------------------------------------------------------------------#
BEGIN {
    for (i=-85; i<=85; i++) {
        x = i/10
        p = alnorm(x, 0)
        printf("%3.1f %12.10f\n", x, p)
    }
    exit
}
function alnorm(z,up,    y,aln,w) {
#
#    ALGORITHM AS 66 APPL. STATIST. (1973) VOL.22, NO.3:
#    Hill,  I.D.  (1973).  Algorithm AS 66.  The normal  integral.
#                          Appl. Statist.,22,424-427.
#
#    Evaluates the tail area of the standard normal curve from
#    z to infinity if up, or from -infinity to z if not up.
#
#    LOWER_TAIL_IS_ONE, UPPER_TAIL_IS_ZERO, and EXP_MIN_ARG
#    must be set to suit this computer and compiler.

    LOWER_TAIL_IS_ONE = 8.5     # I.e., alnorm(8.5,0) = .999999999999+
    UPPER_TAIL_IS_ZERO = 16.0   # Changes to power series expression
    FORMULA_BREAK = 1.28        # Changes cont. fraction coefficients
    EXP_MIN_ARG = -708          # I.e., exp(-708) is essentially true 0

    if (z < 0.0) {
        up = !up
        z = -z
    }
    if ((z <= LOWER_TAIL_IS_ONE) || (up && z <= UPPER_TAIL_IS_ZERO)) {
        y = 0.5 * z * z
        if (z > FORMULA_BREAK) {
            if (-y > EXP_MIN_ARG) {
                aln = .398942280385 * exp(-y) / \
                  (z - 3.8052E-8 + 1.00000615302 / \
                  (z + 3.98064794E-4 + 1.98615381364 / \
                  (z - 0.151679116635 + 5.29330324926 / \
                  (z + 4.8385912808 - 15.1508972451 / \
                  (z + 0.742380924027 + 30.789933034 / \
                  (z + 3.99019417011))))))
            } else {
                aln = 0.0
            }
        } else {
            aln = 0.5 - z * (0.398942280444 - 0.399903438504 * y / \
              (y + 5.75885480458 - 29.8213557808 / \
              (y + 2.62433121679 + 48.6959930692 / \
              (y + 5.92885724438))))
        }
    } else {
        if (up) {   # 7/8/92
            # Uses asymptotic expansion for exp(-z*z/2)/alnorm(z)
            # Agrees with continued fraction to 11 s.f. when z >= 15
            # and coefficients through 706 are used.
            y = -0.5*z*z
            if (y > EXP_MIN_ARG) {
                w = -0.5/y  # 1/z^2
                aln = 0.3989422804014327*exp(y)/ \
                    (z*(1 + w*(1 + w*(-2 + w*(10 + w*(-74 + w*706))))))
                # Next coefficients would be -8162, 110410
            } else {
                aln = 0.0
            }
        } else {
            aln = 0.0
        }
    }
    return up ? aln : 1.0 - aln
}
### end of file ###

— whuber
fonte

I used boost in C++ to compute the CDF of normal distribution. But sometimes when I compute P( x > mean1+sigma1 ) for the normal(mean1, sigma1), and then recompute the P( x > mean2+sigma2 ) for the for the normal(mean2, sigma2), it always gives the same probability value ! Even if I try with other slightly different values of mean and sigma. Does this have any signification ?

— shn

@user995434 That's a good observation. It is addressed in the last line of my reply: both calculations are equivalent to

Pr (Z > 1)

$\Pr(Z \gt 1)$ where

Z = (X - m e a n_{1}) / σ_{1}

$Z = (X-mean_1)/\sigma_1$ or

Z = (X - m e a n_{2}) / σ_{2}

$Z = (X-mean_2)/\sigma_2$ has a standard normal distribution (of zero mean and unit SD). It's easy to understand as a change of units: it's like counting the number of days when the temperature exceeded 86 degrees (F) and noting that it's exactly the same number of days the temperature exceeded 30 degrees (C).

— whuber

Oh great then, I thought that it was an error in my code.

— shn

And yeah actually it is not the same probability, but very close to each other, like 0.158655273989975 and 0.158655230168700

— shn

@Cardinal: done.

— whuber