Full Table Scan vs. BITMAP Index vs. VIRTUAL Column

Friday, 02.11.2012 – Dejan

Ovaj put donosim još jedan primjer iz prakse, na osnovu kojeg ću pokazati interesantne metode optimizacije problematičnog SQL upita.

U stvarnoj bazi postoji jedna tabela, u koju se podaci unose i obrađuju po slijedećem principu:
– podaci se unose u nejednakim razmacima, pri čemu se jedna status kolona označi sa ‘N’, što znači da podaci nisu obrađeni; ukoliko dođe do neke greške prilikom unosa ili su podaci nepotpuni, onda status kolona dobija vrijednost NULL
– jedan SQL upit se izvršava svakih 5 sekundi i provjerava, da li postoje neobrađeni ili nepotpuni podaci
– ukoliko postoje neobrađeni podaci, onda bivaju obrađeni, nakon čega se vrijednost u status koloni mijenja iz ‘N’ u ‘Y’

E sad, ostavimo na stranu što je dizajn ove tabele, odnosno njena fizička struktura, u praksi veoma loša…

Zadatak je da optimiram onaj SQL, koji svakih 5 sekundi provjerava, da li postoje neobrađeni podaci. Možeš se žaliti da je struktura tabele loša i da ju treba izmijeniti – odgovor je uvijek isti:”Nemamo sad vremena za komplikovane izmjene, moramo do kraja mjeseca završiti druge bitnije stvari, a to ćemo naknadno…bla bla“…

Dakle, na posao… Trebalo mi je skoro 2 sata vremena, ali rezultat je bio itekako dojmljiv… Zanima vas kako?

Kako?

Za ovaj primjer sam kreirao skoro identičnu tabelu sa istim kolonama, samo pod drugim nazivom, kako bih zaštitio podatke iz stvarne prakse:

create table t(
    dummy_string  VARCHAR2(50 BYTE) NOT NULL,
    dummy_number      NUMBER(9)     NOT NULL,
    dummy_number2   NUMBER(14),
    dummy_check    CHAR(1 BYTE),
    dummy_datum    DATE DEFAULT sysdate NOT NULL
   );

Table created.

Aktualizirajmo statistike, kako bi optimizer kreirao efikasan execution plan i usput pogledajmo, koliko tabela trenutno ima redova:

SQL> exec dbms_stats.gather_table_stats(user, 'T');

PL/SQL procedure successfully completed.

SQL> select count(*) from t;

  COUNT(*)
----------
   8000000

Aktivirajmo autotrace sa explain planom i statistikama, te uključimo timing:

SQL> set autotrace traceonly explain statistics
SQL> set timing on lines 180 trimspool on

Ovako izgleda originalni SQL:

SELECT dummy_string
    FROM t
   WHERE dummy_check is null or dummy_check <> 'Y';

FTS (Full Table Scan)

Da, dobro ste vidjeli – tabela nema nijednog indeksa. Kreirana je davno i tada je FTS (Full Table Scan) bio brži u odnosu na index, ali developeri su previdjeli činjenicu da je količina podataka svakim danom sve veća…

Izvršimo originalni SQL upit, koji koristi FTS, te analizirajmo rezultat:

SQL> SELECT dummy_string
  2    FROM t
  3   WHERE dummy_check is null or dummy_check <> 'Y';

no rows selected

Elapsed: 00:00:02.41

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   713 | 37789 | 22522   (2)| 00:04:31 |
|*  1 |  TABLE ACCESS FULL| T    |   713 | 37789 | 22522   (2)| 00:04:31 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DUMMY_CHECK"<>'Y' OR "DUMMY_CHECK" IS NULL)


Statistics
----------------------------------------------------------
         22  recursive calls
          0  db block gets
      81803  consistent gets
      81785  physical reads
          0  redo size
        340  bytes sent via SQL*Net to client
        509  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Dakle, za FTS potrebno je 00:00:02.41 sekundi, te se učita preko 80 000 blokova.

Nije strašno, ali bi vremenom moglo postati kritično, pošto se taj SQL izvršava svakih 5 sekundi, te se odmah potom podaci obrađuju i aktualiziraju, tako da može doći do konflikta iliti blokade u resursima. Onda bi se interval izvršavanja morao povećati sa 5 na npr. 10 sekundi.

Bitmap Index

Iz iskustva znam da se u određenim slučajevima može kreirati Bitmap index kada u SQL upitu postoji OR uslov. Kreirajmo Bitmap index i pogledajmo rezultat:

SQL> create bitmap index ix_bmp_t_dummy_check on t(dummy_check);

Index created.

SQL> SELECT /*+ index(t ix_bmp_t_dummy_check) */ dummy_string
  2    FROM t
  3   WHERE dummy_check is null or dummy_check <> 'Y';

no rows selected

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
Plan hash value: 1287377879

-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |   713 | 37789 |130K  (1)| 00:26:07 |
|   1 |  TABLE ACCESS BY INDEX ROWID | T                    |   713 | 37789 |130K  (1)| 00:26:07 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                      |       |       |         |          |
|*  3 |    BITMAP INDEX FULL SCAN    | IX_BMP_T_DUMMY_CHECK |       |       |         |          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("DUMMY_CHECK"<>'Y' OR "DUMMY_CHECK" IS NULL)


Statistics
----------------------------------------------------------
         22  recursive calls
          0  db block gets
        194  consistent gets
          0  physical reads
          0  redo size
        340  bytes sent via SQL*Net to client
        509  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Opaaa! Koja razlika! Mnogo bolje! Samo 0.3 sekunde, te samo 194 blokova (consistent gets). E, sad, budimo picajzle i pomislimo odmah na blokadu resursa prilikom DML upita (uglavnom prilikom UPDATE i DELETE), kada se Bitmap Index mora aktualizirati, pri čemu blokira ostale redove. Znači, nije idealno rješenje…

VIRTUAL Column

Od nedavno sam počeo aktivnije testirati i koristiti novu opciju u 11g verziji, pod nazivom “VIRTUAL column“, pa sam odlučio i u ovom slučaju pokušati primijeniti rješenje zasnovano na toj opciji:

SQL> alter table t
  2    add (
  3    virtual_check_number NUMBER
  4    GENERATED ALWAYS AS (case when dummy_check is null or dummy_check <> 'Y' then 1 else 0 end )
  5    VIRTUAL);

Table altered.

SQL> create index ix_t_virtual_check on t(virtual_check_number);

Index created.

SQL> exec dbms_stats.gather_table_stats(user, 'T');

PL/SQL procedure successfully completed.


SQL> SELECT dummy_string
  2    FROM t
  3   WHERE virtual_check_number = 1;

no rows selected

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 2480226401

--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |     1 |    55 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T                  |     1 |    55 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IX_T_VIRTUAL_CHECK |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VIRTUAL_CHECK_NUMBER"=1)


Statistics
----------------------------------------------------------
         22  recursive calls
          0  db block gets
          9  consistent gets
          2  physical reads
          0  redo size
        340  bytes sent via SQL*Net to client
        509  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Ohohoho! Vjerujte, da sam se i ja ugodno iznenadio kada sam vidio rezultat: rezultat je došao promptno i učitano je samo 9 blokova! Fascinantno!

A nakon unosa novih podataka?

Ništa. VIRTUAL column opet rastura FTS i Bitmap Index.

SQL> insert into t(dummy_string, dummy_number, dummy_number2, dummy_check, dummy_datum)
  2  select DBMS_RANDOM.STRING('A', 50) as dummy_string,
  3         round(dbms_random.value(1, 100)) as dummy_number,
  4         round(dbms_random.value(1, 100000)) as dummy_number2,
  5         case when mod(rownum,150) = 0 then NULL
  6              else 'N'
  7         end as dummy_check,
  8         to_date('02.11.2012', 'DD.MM.YYYY') as dummy_datum
  9   from dual
 10   connect by level <= 1000;

1000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user, 'T');

PL/SQL procedure successfully completed.

SQL> SELECT dummy_string
  2    FROM t
  3   WHERE dummy_check is null or dummy_check <> 'Y';

1000 rows selected.

Elapsed: 00:00:02.43

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  2221 |   114K| 22522   (2)| 00:04:31 |
|*  1 |  TABLE ACCESS FULL| T    |  2221 |   114K| 22522   (2)| 00:04:31 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DUMMY_CHECK"<>'Y' OR "DUMMY_CHECK" IS NULL)


Statistics
----------------------------------------------------------
         22  recursive calls
          0  db block gets
      81869  consistent gets
      81785  physical reads
          0  redo size
      64908  bytes sent via SQL*Net to client
       1246  bytes received via SQL*Net from client
         68  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1000  rows processed


SQL> SELECT /*+ index(t ix_bmp_t_dummy_check) */ dummy_string
  2    FROM t
  3   WHERE dummy_check is null or dummy_check <> 'Y';

1000 rows selected.

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
Plan hash value: 1287377879

-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |  2221 |   114K|130K  (1)| 00:26:07 |
|   1 |  TABLE ACCESS BY INDEX ROWID | T                    |  2221 |   114K|130K  (1)| 00:26:07 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                      |       |       |         |          |
|*  3 |    BITMAP INDEX FULL SCAN    | IX_BMP_T_DUMMY_CHECK |       |       |         |          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("DUMMY_CHECK"<>'Y' OR "DUMMY_CHECK" IS NULL)

Statistics
----------------------------------------------------------
         22  recursive calls
          0  db block gets
        279  consistent gets
          9  physical reads
          0  redo size
      64908  bytes sent via SQL*Net to client
       1246  bytes received via SQL*Net from client
         68  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1000  rows processed

SQL> SELECT dummy_string
  2    FROM t
  3   WHERE virtual_check_number = 1;

1000 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 2480226401

--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |  1477 | 82712 |    23   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T                  |  1477 | 82712 |    23   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IX_T_VIRTUAL_CHECK |  1510 |       |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VIRTUAL_CHECK_NUMBER"=1)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        150  consistent gets
          0  physical reads
        116  redo size
      64908  bytes sent via SQL*Net to client
       1246  bytes received via SQL*Net from client
         68  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1000  rows processed



SQL> update t
  2  set dummy_check = 'Y'
  3  where virtual_check_number = 1;

1000 rows updated.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user, 'T');

PL/SQL procedure successfully completed.


SQL> SELECT dummy_string
  2    FROM t
  3   WHERE dummy_check is null or dummy_check <> 'Y';

no rows selected

Elapsed: 00:00:02.41

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   724 | 38372 | 22522   (2)| 00:04:31 |
|*  1 |  TABLE ACCESS FULL| T    |   724 | 38372 | 22522   (2)| 00:04:31 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DUMMY_CHECK"<>'Y' OR "DUMMY_CHECK" IS NULL)


Statistics
----------------------------------------------------------
         22  recursive calls
          0  db block gets
      81803  consistent gets
      81785  physical reads
          0  redo size
        340  bytes sent via SQL*Net to client
        509  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL> SELECT /*+ index(t ix_bmp_t_dummy_check) */ dummy_string
  2    FROM t
  3   WHERE dummy_check is null or dummy_check <> 'Y';

no rows selected

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
Plan hash value: 1287377879

-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |   724 | 38372 |130K  (1)| 00:26:07 |
|   1 |  TABLE ACCESS BY INDEX ROWID | T                    |   724 | 38372 |130K  (1)| 00:26:07 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                      |       |       |         |          |
|*  3 |    BITMAP INDEX FULL SCAN    | IX_BMP_T_DUMMY_CHECK |       |       |         |          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("DUMMY_CHECK"<>'Y' OR "DUMMY_CHECK" IS NULL)


Statistics
----------------------------------------------------------
         22  recursive calls
          0  db block gets
        194  consistent gets
          0  physical reads
          0  redo size
        340  bytes sent via SQL*Net to client
        509  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed


SQL> SELECT dummy_string
  2    FROM t
  3   WHERE virtual_check_number = 1;

no rows selected

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 2480226401

--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |     1 |    55 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T                  |     1 |    55 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IX_T_VIRTUAL_CHECK |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VIRTUAL_CHECK_NUMBER"=1)


Statistics
----------------------------------------------------------
         22  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
        340  bytes sent via SQL*Net to client
        509  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Znači, sitna DDL izmjena na tabeli, a tako moćna! Kao šlag na torti – moguće je virtualnu kolonu particionirati. Testiranje i rezultate sa tom opcijom prepuštam vama. 🙂

  1. 3 Responses to “Full Table Scan vs. BITMAP Index vs. VIRTUAL Column”

  2. Pozdrav Dejane,

    zanimljiv primjer kako sa virtualnom kolonom riješiti problem sa performansama.
    Morati ću se malo pozabaviti virtualnim kolonama – moram priznati da su malo zapostavljene kod mene.

    Kod nas često puta bilo kakav DDL na produkcijskim tablicama je vrlo osjetljiva stvar, pa stoga pokušavam i zaobići tu metodu ako je moguće. Ovaj bi problem riješio upotrebom funkcijskog indeksa koji daje vrlo slične rezultate kao i virtualna kolona.
    Baci pogled ispod 😉

    SQL> select count(*),status
    2 from btab
    3 group by status;

    COUNT(*) STATUS
    ———- ——-
    251
    349 NO
    499400 YES

    3 rows selected.

    SQL> alter table btab
    2 add (
    3 virtual_check_number NUMBER
    4 GENERATED ALWAYS AS (case when status is null or status ‘YES’ then 1 else 0 end )
    5 VIRTUAL);

    Table altered.

    SQL> create index ix_t_virtual_check on btab(virtual_check_number);

    Index created.

    SQL> select status
    2 from btab
    3 where virtual_check_number=1;

    600 rows selected.

    Elapsed: 00:00:00.03

    Execution Plan
    ———————————————————-
    Plan hash value: 1817721707

    ————————————————————————————————–
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ————————————————————————————————–
    | 0 | SELECT STATEMENT | | 5000 | 75000 | 489 (0)| 00:00:06 |
    | 1 | TABLE ACCESS BY INDEX ROWID| BTAB | 5000 | 75000 | 489 (0)| 00:00:06 |
    |* 2 | INDEX RANGE SCAN | IX_T_VIRTUAL_CHECK | 2000 | | 455 (0)| 00:00:06 |
    ————————————————————————————————–

    Predicate Information (identified by operation id):
    —————————————————

    2 – access(“VIRTUAL_CHECK_NUMBER”=1)

    Statistics
    ———————————————————-
    29 recursive calls
    0 db block gets
    637 consistent gets
    588 physical reads
    0 redo size
    9020 bytes sent via SQL*Net to client
    849 bytes received via SQL*Net from client
    41 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    600 rows processed

    SQL> drop index ix_t_virtual_check;

    SQL> alter table btab drop column virtual_check_number;

    admin@AOCDB> create index check_idx on btab
    2 (case when status is null or status ‘YES’ then 1 else 0 end);

    Index created.

    SQL> select status
    2 from btab
    3 where (case when status is null or status ‘YES’ then 1 else 0 end) = 1;

    600 rows selected.

    Elapsed: 00:00:00.03

    Execution Plan
    ———————————————————-
    Plan hash value: 935407903

    —————————————————————————————–
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    —————————————————————————————–
    | 0 | SELECT STATEMENT | | 5000 | 25000 | 489 (0)| 00:00:06 |
    | 1 | TABLE ACCESS BY INDEX ROWID| BTAB | 5000 | 25000 | 489 (0)| 00:00:06 |
    |* 2 | INDEX RANGE SCAN | CHECK_IDX | 2000 | | 455 (0)| 00:00:06 |
    —————————————————————————————–

    Predicate Information (identified by operation id):
    —————————————————

    2 – access(CASE WHEN (“STATUS” IS NULL OR “STATUS”‘YES’) THEN 1 ELSE 0 END
    =1)

    Statistics
    ———————————————————-
    15 recursive calls
    0 db block gets
    633 consistent gets
    589 physical reads
    0 redo size
    9020 bytes sent via SQL*Net to client
    849 bytes received via SQL*Net from client
    41 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    600 rows processed

    By Marko Sutic on Nov 3, 2012

  3. Nije dobro da koristis bitmap indexe kada koristis <> umesto = jer onda ce uvek da radi BITMAP INDEX FULL SCAN. Pokusaj da promenis WHERE dummy_check is null or dummy_check <> ‘Y’; u WHERE dummy_check is null or dummy_check = ‘N’;

    By Srdjan Mitrovic on Mar 19, 2013

  4. Ponovo naleteh na ovaj tekst pa se pitam da li je Dejan probao da izmeni query koji koristi bitmap index. Problem kod queryja je sto za dummy_check ‘Y’ on mora da prodje kroz ceo index jer Oracle ne zna da postoji jos samo jedna vrednost ‘N’ (ok, ima i null) i zato je preporucljivo da se za bitmap indekse uvek koristi equi-join. Filter WHERE dummy_check is null or dummy_check = ‘N’ bi trebalo da radi mnoooogo brze jer bitmap indeksi indeksiraju i null vrednosti za razliku od obicnih indeksa.

    Druga primedba je to sto si za virtuelnu kolonu koristio WHERE virtual_check_number = 1 umesto WHERE virtual_check_number 0 pa nije fer poredjenje. Da si koristio ovo drugo onda bi i za to morao da imas full index scan a ne range scan jer oracle onda mora da prodje kroz sve vrednosti da vidi da li su razlicite od 0. Koristio si drugaciji query za virtualnu kolonu i za bitmap indekse.

    By Srdjan Mitrovic on Sep 6, 2016

Post a Comment