8

Query 1:

select distinct email from mybigtable where account_id=345

richiede 0.1s

Query 2:

Select count(*) as total from mybigtable where account_id=123 and email IN (<include all from above result>)

richiede 0,2 secondi

Query 3:

Select count(*) as total from mybigtable where account_id=123 and email IN (select distinct email from mybigtable where account_id=345)

richiede 22 minuti e il 90% è nello stato "preparazione". Perché ci vuole così tanto tempo.

La tabella è innodb con righe 3.2mil su MySQL 5.0

— Stewie
fonte

8

In Query 3, stai praticamente eseguendo una sottoquery per ogni riga di mybigtable contro se stessa.

Per evitare ciò, è necessario apportare due importanti modifiche:

PRINCIPALE CAMBIAMENTO # 1: Rifattorizza la query

Ecco la tua query originale

Select count(*) as total from mybigtable
where account_id=123 and email IN
(select distinct email from mybigtable where account_id=345)

Potresti provare

select count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    INNER JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
) A;

o forse il conteggio per email

select email,count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    INNER JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
) A group by email;

PRINCIPALE CAMBIAMENTO N. 2: indicizzazione corretta

Penso che tu lo abbia già da quando Query 1 e Query 2 funzionano velocemente. Assicurati di avere un indice composto su (account_id, email). Fai SHOW CREATE TABLE mybigtable\Ge assicurati di averne uno. Se non lo possiedi o se non sei sicuro, crea comunque l'indice:

ALTER TABLE mybigtable ADD INDEX account_id_email_ndx (account_id,email);

AGGIORNAMENTO 2012-03-07 13:26 EST

Se vuoi fare un NOT IN (), cambia INNER JOINin a LEFT JOINe controlla che il lato destro sia NULL, in questo modo:

select count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    LEFT JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
    WHERE tbl345.email IS NULL
) A;

AGGIORNAMENTO 2012-03-07 14:13 EST

Si prega di leggere questi due link su come effettuare JOIN

Ecco un ottimo video di YouTube in cui ho imparato a rispondere alle domande e al libro su cui era basato

— RolandoMySQLDBA
fonte

9

In MySQL, le sottoselezioni all'interno della clausola IN vengono rieseguite per ogni riga nella query esterna, creando così O (n ^ 2). Il racconto è, non usare IN (SELEZIONA).

— Aaron Brown
fonte

1

Hai un indice su account_id?
Il secondo problema potrebbe essere con le sottoquery nidificate che hanno prestazioni terribili in 5.0.
GROUP BY con una clausola have è più veloce di DISTINCT.
Cosa stai cercando di fare, che può essere fatto meglio attraverso i join oltre all'articolo 3?

— Stephen Senkomago Musoke
fonte

1

Durante la gestione di una subquery IN () come la tua è necessaria molta elaborazione. Puoi leggere di più qui .

Il mio primo suggerimento sarebbe di provare a riscrivere la subquery in un JOIN. Qualcosa di simile (non testato):

SELECT COUNT(*) AS total FROM mybigtable AS t1
 INNER JOIN 
   (SELECT DISTINCT email FROM mybigtable WHERE account_id=345) AS t2 
   ON t2.email=t1.email
WHERE account_id=123

— Derek Downey
fonte

La subquery di MySQL rallenta drasticamente, ma funzionano bene indipendentemente

PRINCIPALE CAMBIAMENTO # 1: Rifattorizza la query

PRINCIPALE CAMBIAMENTO N. 2: indicizzazione corretta

AGGIORNAMENTO 2012-03-07 13:26 EST

AGGIORNAMENTO 2012-03-07 14:13 EST