Skip to content

limit is filter after scan whole posting list? #86

Open
@digoal

Description

@digoal
postgres=> insert into test_rum_add select generate_series(1,10000000),  tsvector 'a b c', clock_timestamp();
INSERT 0 10000000
postgres=> create index idx_test_rum_add_1 on test_rum_add using rum (arr rum_tsvector_hash_addon_ops, ts) with (attach='ts', to='arr');
CREATE INDEX

postgres=> select * from test_rum_add where arr @@ 'a|b' order by ts <=> '2020-05-23' limit 10;
    id    |     arr     |             ts             
----------+-------------+----------------------------
 10000000 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945628
  9999999 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945628
  9999998 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945627
  9999997 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945627
  9999996 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945626
  9999995 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945625
  9999994 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945624
  9999993 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945624
  9999992 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945623
  9999991 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945623
(10 rows)

postgres=> explain (analyze,verbose,timing,costs,buffers) select * from test_rum_add where arr @@ 'a|b' order by ts <=> '2020-05-23' limit 10;
                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13.20..13.51 rows=10 width=40) (actual time=6335.531..6335.539 rows=10 loops=1)
   Output: id, arr, ts, ((ts <=> '2020-05-23 00:00:00'::timestamp without time zone))
   Buffers: shared hit=28705, temp read=42536 written=67010
   ->  Index Scan using idx_test_rum_add_1 on public.test_rum_add  (cost=13.20..309926.60 rows=10000000 width=40) (actual time=6335.529..6335.534 rows=10 loops=1)
         Output: id, arr, ts, (ts <=> '2020-05-23 00:00:00'::timestamp without time zone)
         Index Cond: (test_rum_add.arr @@ '''a'' | ''b'''::tsquery)
         Order By: (test_rum_add.ts <=> '2020-05-23 00:00:00'::timestamp without time zone)
         Buffers: shared hit=28705, temp read=42536 written=67010
 Planning Time: 0.050 ms
 Execution Time: 6391.589 ms
(10 rows)


postgres=> explain (analyze,verbose,timing,costs,buffers) select * from test_rum_add where arr @@ 'a|b' limit 10;
                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13.20..13.46 rows=10 width=32) (actual time=2380.119..2380.126 rows=10 loops=1)
   Output: id, arr, ts
   Buffers: shared hit=28706, temp read=1 written=14678
   ->  Index Scan using idx_test_rum_add_1 on public.test_rum_add  (cost=13.20..259926.60 rows=10000000 width=32) (actual time=2380.117..2380.122 rows=10 loops=1)
         Output: id, arr, ts
         Index Cond: (test_rum_add.arr @@ '''a'' | ''b'''::tsquery)
         Buffers: shared hit=28706, temp read=1 written=14678
 Planning Time: 0.072 ms
 Execution Time: 2414.058 ms
(9 rows)

i think it will improved by limit push to scan posting phase.

best regards ,
digoal

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions