limit is filter after scan whole posting list?

postgres=> insert into test_rum_add select generate_series(1,10000000),  tsvector 'a b c', clock_timestamp();
INSERT 0 10000000
postgres=> create index idx_test_rum_add_1 on test_rum_add using rum (arr rum_tsvector_hash_addon_ops, ts) with (attach='ts', to='arr');
CREATE INDEX

postgres=> select * from test_rum_add where arr @@ 'a|b' order by ts <=> '2020-05-23' limit 10;
    id    |     arr     |             ts             
----------+-------------+----------------------------
 10000000 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945628
  9999999 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945628
  9999998 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945627
  9999997 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945627
  9999996 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945626
  9999995 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945625
  9999994 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945624
  9999993 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945624
  9999992 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945623
  9999991 | 'a' 'b' 'c' | 2020-05-22 17:43:01.945623
(10 rows)

postgres=> explain (analyze,verbose,timing,costs,buffers) select * from test_rum_add where arr @@ 'a|b' order by ts <=> '2020-05-23' limit 10;
                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13.20..13.51 rows=10 width=40) (actual time=6335.531..6335.539 rows=10 loops=1)
   Output: id, arr, ts, ((ts <=> '2020-05-23 00:00:00'::timestamp without time zone))
   Buffers: shared hit=28705, temp read=42536 written=67010
   ->  Index Scan using idx_test_rum_add_1 on public.test_rum_add  (cost=13.20..309926.60 rows=10000000 width=40) (actual time=6335.529..6335.534 rows=10 loops=1)
         Output: id, arr, ts, (ts <=> '2020-05-23 00:00:00'::timestamp without time zone)
         Index Cond: (test_rum_add.arr @@ '''a'' | ''b'''::tsquery)
         Order By: (test_rum_add.ts <=> '2020-05-23 00:00:00'::timestamp without time zone)
         Buffers: shared hit=28705, temp read=42536 written=67010
 Planning Time: 0.050 ms
 Execution Time: 6391.589 ms
(10 rows)


postgres=> explain (analyze,verbose,timing,costs,buffers) select * from test_rum_add where arr @@ 'a|b' limit 10;
                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13.20..13.46 rows=10 width=32) (actual time=2380.119..2380.126 rows=10 loops=1)
   Output: id, arr, ts
   Buffers: shared hit=28706, temp read=1 written=14678
   ->  Index Scan using idx_test_rum_add_1 on public.test_rum_add  (cost=13.20..259926.60 rows=10000000 width=32) (actual time=2380.117..2380.122 rows=10 loops=1)
         Output: id, arr, ts
         Index Cond: (test_rum_add.arr @@ '''a'' | ''b'''::tsquery)
         Buffers: shared hit=28706, temp read=1 written=14678
 Planning Time: 0.072 ms
 Execution Time: 2414.058 ms
(9 rows)

i think it will improved by limit push to scan posting phase.

best regards ,
digoal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limit is filter after scan whole posting list? #86

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

limit is filter after scan whole posting list? #86

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions