OceanBase 基于代價的查詢改寫

2021-06-30 10:50 更新

OceanBase 數據庫目前只支持一種基于代價的查詢改寫——或展開(OR-EXPANSION)。

數據庫中很多高級的改寫規(guī)則(例如 complex view merge 和窗口函數改寫)都需要基于代價進行改寫,OceanBase 數據庫后續(xù)版本會支持這些復雜的改寫規(guī)則。

或展開(OR-EXPANSION)

OR-EXPANSION 是將一個查詢改寫成若干個用 UNION 組成的子查詢,可以為每個子查詢提供更優(yōu)的優(yōu)化空間,但是也會導致多個子查詢的執(zhí)行,所以這個改寫需要基于代價去判斷。

OR-EXPANSION 的改寫主要有如下三個作用:

  • 允許每個分支使用不同的索引來加速查詢。

    如下例所示,Q1 會被改寫成 Q2 的形式,其中 Q2 中的謂詞 LNNVL(t1.a = 1) 保證了這兩個子查詢不會生成重復的結果。如果不進行改寫,Q1 一般來說會選擇主表作為訪問路徑,對于 Q2 來說,如果 t1 上存在索引(a)和索引(b),那么該改寫可能會讓 Q2 中的每一個子查詢選擇索引作為訪問路徑。

    Q1: 
    obclient>SELECT * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
    Q2: 
    obclient>SELECT * FROM t1 WHERE t1.a = 1 UNION ALL SELECT * FROM t1.b = 1 
           AND LNNVL(t1.a = 1);

    完整示例如下:

    obclient>CREATE TABLE t1(a INT, b INT, c INT, d INT, e INT, INDEX IDX_a(a), 
             INDEX IDX_b(b));
    Query OK, 0 rows affected (0.17 sec)
    
    /*如果不進行 OR-EXPANSION 的改寫,該查詢只能使用主表訪問路徑*/
    obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
    +--------------------------------------------------------------+
    | Query Plan                                                                         |
    +--------------------------------------------------------------+
    | ===================================
    |ID|OPERATOR  |NAME|EST. ROWS|COST|
    -----------------------------------
    |0 |TABLE SCAN|t1  |4        |649 |
    ===================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter([t1.a = 1 OR t1.b = 1]),
          access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p0)
    
    /*改寫之后,每個子查詢能使用不同的索引訪問路徑*/
    obclient>EXPLAIN SELECT * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
    +------------------------------------------------------------------------+
    | Query Plan                                                                                         |
    +------------------------------------------------------------------------+
    | =========================================
    |ID|OPERATOR   |NAME     |EST. ROWS|COST|
    -----------------------------------------
    |0 |UNION ALL  |         |3        |190 |
    |1 | TABLE SCAN|t1(idx_a)|2        |94  |
    |2 | TABLE SCAN|t1(idx_b)|1        |95  |
    =========================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)], [UNION(t1.c, t1.c)], [UNION(t1.d, t1.d)], [UNION(t1.e, t1.e)]), filter(nil)
      1 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter(nil),
          access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p0)
      2 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter([lnnvl(t1.a = 1)]),
          access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p02
  • 允許每個分支使用不同的連接算法來加速查詢,避免使用笛卡爾聯接。

    如下例所示,Q1 會被改寫成 Q2 的形式。對于 Q1 來說,它的聯接方式只能是 NESTED LOOP JOIN (笛卡爾乘積), 但是被改寫之后,每個子查詢都可以選擇 NESTED LOOP JOIN、HASH JOIN 或者 MERGE JOIN,這樣會有更多的優(yōu)化空間。

    Q1: 
    obclient>SELECT * FROM t1, t2 WHERE t1.a = t2.a OR t1.b = t2.b;
    
    Q2: 
    obclient>SELECT * FROM t1, t2 WHERE t1.a = t2.a UNION ALL
         SELECT * FROM t1, t2 WHERE t1.b = t2.b AND LNNVL(t1.a = t2.a);

    完整示例如下:

    obclient> CREATE TABLE t1(a INT, b INT);
    Query OK, 0 rows affected (0.17 sec)
    
    obclient> CREATE TABLE t2(a INT, b INT);
    Query OK, 0 rows affected (0.13 sec)
    
    /*如果不進行改寫,只能使用 NESTED LOOP JOIN*/
    obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1, t2 
           WHERE t1.a = t2.a OR t1.b = t2.b;
    +--------------------------------------------------------------------------+
    | Query Plan                                                                                          |
    +--------------------------------------------------------------------------+
    | ===========================================
    |ID|OPERATOR        |NAME|EST. ROWS|COST  |
    -------------------------------------------
    |0 |NESTED-LOOP JOIN|    |3957     |585457|
    |1 | TABLE SCAN     |t1  |1000     |499   |
    |2 | TABLE SCAN     |t2  |4        |583   |
    ===========================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
          conds(nil), nl_params_([t1.a], [t1.b])
      1 - output([t1.a], [t1.b]), filter(nil),
          access([t1.a], [t1.b]), partitions(p0)
      2 - output([t2.a], [t2.b]), filter([? = t2.a OR ? = t2.b]),
          access([t2.a], [t2.b]), partitions(p0)
    
    /*被改寫之后,每個子查詢都使用了 HASH JOIN*/
    obclient>  EXPLAIN SELECT * FROM t1, t2 WHERE t1.a = t2.a OR t1.b = t2.b;
    +--------------------------------------------------------------------------+
    | Query Plan                                                                                         |
    +--------------------------------------------------------------------------+
    |ID|OPERATOR    |NAME|EST. ROWS|COST|
    -------------------------------------
    |0 |UNION ALL   |    |2970     |9105|
    |1 | HASH JOIN  |    |1980     |3997|
    |2 |  TABLE SCAN|t1  |1000     |499 |
    |3 |  TABLE SCAN|t2  |1000     |499 |
    |4 | HASH JOIN  |    |990      |3659|
    |5 |  TABLE SCAN|t1  |1000     |499 |
    |6 |  TABLE SCAN|t2  |1000     |499 |
    =====================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)], [UNION(t2.a, t2.a)], [UNION(t2.b, t2.b)]), filter(nil)
      1 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
          equal_conds([t1.a = t2.a]), other_conds(nil)
      2 - output([t1.a], [t1.b]), filter(nil),
          access([t1.a], [t1.b]), partitions(p0)
      3 - output([t2.a], [t2.b]), filter(nil),
          access([t2.a], [t2.b]), partitions(p0)
      4 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
          equal_conds([t1.b = t2.b]), other_conds([lnnvl(t1.a = t2.a)])
      5 - output([t1.a], [t1.b]), filter(nil),
          access([t1.a], [t1.b]), partitions(p0)
      6 - output([t2.a], [t2.b]), filter(nil),
          access([t2.a], [t2.b]), partitions(p0)
  • 允許每個分支分別消除排序,更加快速的獲取 TOP-K 結果。

    如下例所示,Q1 會被改寫成 Q2。對于 Q1 來說,執(zhí)行方式是只能把滿足條件的行數找出來,然后進行排序,最終取 TOP-10 結果。對于 Q2 來說,如果存在索引(a,b), 那么 Q2 中的兩個子查詢都可以使用索引把排序消除,每個子查詢取 TOP-10 結果,然后最終對這 20 行數據排序一下取出最終的 TOP-10 行。

    Q1: 
    obclient>SELECT * FROM t1 WHERE t1.a = 1 OR t1.a = 2 ORDER BY b LIMIT 10;
    
    Q2: 
    obclient>SELECT * FROM  
        (SELECT * FROM  t1 WHERE t1.a = 1 ORDER BY b LIMIT 10 UNION ALL
         SELECT * FROM  t1 WHERE t1.a = 2 ORDER BY b LIMIT 10) AS TEMP
        ORDER BY temp.b LIMIT 10;

    完整示例如下:

    obclient> CREATE TABLE t1(a INT, b INT, INDEX IDX_a(a, b));
    Query OK, 0 rows affected (0.20 sec)
    
    /*不改寫的話,需要排序最終獲取 TOP-K 結果*/
    obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1 WHERE t1.a = 1 OR t1.a = 2 
            ORDER BY b LIMIT 10;
    +-------------------------------------------------------------------------+
    | Query Plan                                                                                         |
    +-------------------------------------------------------------------------+
    | ==========================================
    |ID|OPERATOR    |NAME     |EST. ROWS|COST|
    ------------------------------------------
    |0 |LIMIT       |         |4        |77  |
    |1 | TOP-N SORT |         |4        |76  |
    |2 |  TABLE SCAN|t1(idx_a)|4        |73  |
    ==========================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([t1.a], [t1.b]), filter(nil), limit(10), offset(nil)
      1 - output([t1.a], [t1.b]), filter(nil), sort_keys([t1.b, ASC]), topn(10)
      2 - output([t1.a], [t1.b]), filter(nil),
          access([t1.a], [t1.b]), partitions(p0)
    
    /* 進行改寫的話,排序算子可以被消除,最終獲取 TOP-K 結果*/
    obclient>EXPLAIN SELECT * FROM t1 WHERE t1.a = 1 OR t1.a = 2 
            ORDER BY b LIMIT 10;
    +-------------------------------------------------------------------------+
    | Query Plan                                                                                          |
    +-------------------------------------------------------------------------+
    | ===========================================
    |ID|OPERATOR     |NAME     |EST. ROWS|COST|
    -------------------------------------------
    |0 |LIMIT        |         |3        |76  |
    |1 | TOP-N SORT  |         |3        |76  |
    |2 |  UNION ALL  |         |3        |74  |
    |3 |   TABLE SCAN|t1(idx_a)|2        |37  |
    |4 |   TABLE SCAN|t1(idx_a)|1        |37  |
    ===========================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil), limit(10), offset(nil)
      1 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil), sort_keys([UNION(t1.b, t1.b), ASC]), topn(10)
      2 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil)
      3 - output([t1.a], [t1.b]), filter(nil),
          access([t1.a], [t1.b]), partitions(p0),
          limit(10), offset(nil)
      4 - output([t1.a], [t1.b]), filter([lnnvl(t1.a = 1)]),
          access([t1.a], [t1.b]), partitions(p0),
          limit(10), offset(nil)
以上內容是否對您有幫助:
在線筆記
App下載
App下載

掃描二維碼

下載編程獅App

公眾號
微信公眾號

編程獅公眾號