Numpy isnan () fallisce su un array di float (da pandas dataframe si applica)

101

Ho un array di float (alcuni numeri normali, alcuni nans) che esce da un'applicazione su un dataframe Panda.

Per qualche ragione, numpy.isnan non funziona su questo array, tuttavia, come mostrato di seguito, ogni elemento è un float, numpy.isnan viene eseguito correttamente su ogni elemento, il tipo della variabile è sicuramente un array numpy.

Cosa sta succedendo?!

set([type(x) for x in tester])
Out[59]: {float}

tester
Out[60]: 
array([-0.7000000000000001, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan], dtype=object)

set([type(x) for x in tester])
Out[61]: {float}

np.isnan(tester)
Traceback (most recent call last):

File "<ipython-input-62-e3638605b43c>", line 1, in <module>
np.isnan(tester)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

set([np.isnan(x) for x in tester])
Out[65]: {False, True}

type(tester)
Out[66]: numpy.ndarray

— tim654321
fonte

163

np.isnan può essere applicato agli array NumPy di dtype nativo (come np.float64):

In [99]: np.isnan(np.array([np.nan, 0], dtype=np.float64))
Out[99]: array([ True, False], dtype=bool)

ma solleva TypeError quando applicato agli array di oggetti:

In [96]: np.isnan(np.array([np.nan, 0], dtype=object))
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Dato che hai Panda, puoi usare pd.isnullinvece - può accettare array NumPy di oggetti o dtypes nativi:

In [97]: pd.isnull(np.array([np.nan, 0], dtype=float))
Out[97]: array([ True, False], dtype=bool)

In [98]: pd.isnull(np.array([np.nan, 0], dtype=object))
Out[98]: array([ True, False], dtype=bool)

Notare che Noneè anche considerato un valore nullo negli array di oggetti.

— unutbu
fonte

3

Grazie - ho usato pd.isnull (). Non sembra nemmeno avere alcun impatto sulle prestazioni.

— tim654321

11

Un ottimo sostituto di np.isnan () e pd.isnull () è

for i in range(0,a.shape[0]):
    if(a[i]!=a[i]):
       //do something here
       //a[i] is nan

poiché solo nan non è uguale a se stesso.

— Statham
fonte

che potrebbe non funzionare per gli array perché solleva il noto "ValueError: il valore di verità di un xxx è ambiguo".

— MSeifert

@MSeifert Stai parlando di python ? Uso questo metodo per fare qualcosa nell'apprendimento automatico, perché non ho riscontrato l'errore ben noto?

— Statham

Sì, sembra che tu non abbia mai usato numpy o panda prima. Usa import numpy as np; a = np.array([1,2,3, np.nan])ed esegui il tuo codice.

— MSeifert

@MSeifert er, sono nuovo di numpy ma il codice è andato bene, nessun errore si è verificato

— Statham

In [1]: importa numpy come np In [2]: a = np.array ([1,2,3, np.nan]) In [3]: stampa a [1. 2. 3. nan] In [ 4]: print a [3] == a [3] False

— Statham

10

Oltre alla risposta @unutbu, potresti costringere l'array di oggetti numpy dei panda al tipo nativo (float64), qualcosa lungo la linea

import pandas as pd
pd.to_numeric(df['tester'], errors='coerce')

Specificare errors = 'coerce' per forzare le stringhe che non possono essere analizzate in un valore numerico a diventare NaN. Il tipo di colonna sarebbe dtype: float64, quindi il isnancontrollo dovrebbe funzionare

— Severin Pappadeux
fonte

Il suo nome sembra essere unutbu;)

— Dr_Zaszuś

@ Dr_Zaszuś Grazie, risolto

— Severin Pappadeux

0

Assicurati di importare il file csv usando Pandas

import pandas as pd

condition = pd.isnull(data[i][j])

— Dariswan Janweri P.
fonte