Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: no binary output function available for type theta_sketch #7633

Open
biber-baek opened this issue Jun 25, 2024 · 5 comments
Open

ERROR: no binary output function available for type theta_sketch #7633

biber-baek opened this issue Jun 25, 2024 · 5 comments
Assignees

Comments

@biber-baek
Copy link

PostgreSQL 15.7
Citus 12.1.-1
datasketches 1.6.0
select version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
explain analyze select  report_date, campaign_id, theta_sketch_get_estimate(theta_sketch_union(reach)) 
 from 
(select report_date,campaign_id, theta_sketch_union(reach_ds) as reach
 from report_day where service_account_id='599267'
 group by report_date, campaign_id,representative_id
 ) a
 group by report_date, campaign_id
order by  campaign_id, report_date
 limit 10;
 
ERROR:  42883: no binary output function available for type theta_sketch
CONTEXT:  while executing command on postgres-citus09:5432
LOCATION:  ReportResultError, remote_commands.c:324
Time: 1248.635 ms (00:01.249)
explain select  report_date, campaign_id, theta_sketch_get_estimate(theta_sketch_union(reach)) 
 from 
(select report_date,campaign_id, theta_sketch_union(reach_ds) as reach
 from report_day where service_account_id='599267'
 group by report_date, campaign_id,representative_id
 ) a
 group by report_date, campaign_id
order by  campaign_id, report_date
 limit 10;

                                                                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=757.32..757.35 rows=10 width=20)
   ->  Sort  (cost=757.32..757.82 rows=200 width=20)
         Sort Key: remote_scan.campaign_id, remote_scan.report_date
         ->  HashAggregate  (cost=750.00..753.00 rows=200 width=20)
               Group Key: remote_scan.campaign_id, remote_scan.report_date
               ->  Custom Scan (Citus Adaptive)  (cost=0.00..0.00 rows=100000 width=44)
                     Task Count: 128
                     Tasks Shown: One of 128
                     ->  Task
                           Node: host=postgres-citus02 port=5432 dbname=citus_poc
                           ->  Subquery Scan on a  (cost=97482.75..108129.48 rows=31046 width=44)
                                 ->  Finalize GroupAggregate  (cost=97482.75..107819.02 rows=31046 width=52)
                                       Group Key: report_day.report_date, report_day.campaign_id, report_day.representative_id
                                       ->  Gather Merge  (cost=97482.75..106654.80 rows=62092 width=52)
                                             Workers Planned: 2
                                             ->  Partial GroupAggregate  (cost=96482.73..98487.82 rows=31046 width=52)
                                                   Group Key: report_day.report_date, report_day.campaign_id, report_day.representative_id
                                                   ->  Sort  (cost=96482.73..96806.13 rows=129361 width=147)
                                                         Sort Key: report_day.report_date, report_day.campaign_id, report_day.representative_id
                                                         ->  Parallel Append  (cost=309.83..78549.31 rows=129361 width=147)
                                                               ->  Parallel Bitmap Heap Scan on report_day_p20240606_104824 report_day_1  (cost=372.08..5512.13 rows=15365 width=147)
                                                                     Recheck Cond: (service_account_id = '599267'::bigint)
                                                                     ->  Bitmap Index Scan on report_day_p20240606_service_account_id_report_date_idx_104824  (cost=0.00..362.86 rows=36875 width=0)
                                                                           Index Cond: (service_account_id = '599267'::bigint)
                                                               ->  Parallel Bitmap Heap Scan on report_day_p20240608_105080 report_day_3  (cost=309.83..5381.77 rows=12555 width=147)
                                                                     Recheck Cond: (service_account_id = '599267'::bigint)
                                                                     ->  Bitmap Index Scan on report_day_p20240608_service_account_id_report_date_idx_105080  (cost=0.00..302.29 rows=30133 width=0)
                                                                           Index Cond: (service_account_id = '599267'::bigint)
....
 

Hi.

When using the datasketches parallel aggregate function on distributed tables in Citus, an error occurs as seen in 'explain analyze'. However, if I run a query without an analyze or use only 'explain', it works fine.

@onurctirtir
Copy link
Member

Hey @biber-baek,

Could you share the commands you used to create this distributed table so that I can try reproducing this on my end?

Any commands you used to create & distribute the table and to create the underlying column types would help a lot.

Thanks!

@biber-baek
Copy link
Author

@onurctirtir
thanks for your reply.
It is reproduced by the command below.

CREATE TABLE report_day (
    report_date timestamp without time zone NOT NULL,
    campaign_id bigint NOT NULL,
    ad_id bigint NOT NULL,
    account_id bigint NOT NULL,
    sketch_col public.theta_sketch
);

select create_distributed_table('report_day','ad_id');

insert into report_day select now() - (g%10 || 'day')::interval, g, trunc(random() * 100 + 1), g%10, (select  public.theta_sketch_build(1)) from generate_series(1, 100) g;

select report_date, public.theta_sketch_union(sketch_col) as reach
 from report_day where account_id='7'
 group by report_date;

@onurctirtir
Copy link
Member

Could you please also share the definition of theta_sketch type?

@biber-baek
Copy link
Author

I am using the datasketches extension to calculate approximate values in analytical work.
The theta_sketch type is a type provided by datasketches.

@m3hm3t
Copy link
Contributor

m3hm3t commented Aug 23, 2024

[local] citus@citus:9700-18493=# explain analyze select  report_date, campaign_id, theta_sketch_get_estimate(theta_sketch_union(reach)) 
 from 
(select report_date,campaign_id, theta_sketch_union(reach_ds) as reach
 from report_day where service_account_id='599267'
 group by report_date, campaign_id,representative_id
 ) a
 group by report_date, campaign_id
order by  campaign_id, report_date
 limit 10;
ERROR:  column "reach_ds" does not exist
LINE 3: ...elect report_date,campaign_id, theta_sketch_union(reach_ds) ...
                                                             ^
Time: 0.233 ms

[local] citus@citus:9700-18493=# explain select  report_date, campaign_id, theta_sketch_get_estimate(theta_sketch_union(reach)) 
 from 
(select report_date,campaign_id, theta_sketch_union(reach_ds) as reach
 from report_day where service_account_id='599267'
 group by report_date, campaign_id,representative_id
 ) a
 group by report_date, campaign_id
order by  campaign_id, report_date
 limit 10;
ERROR:  column "reach_ds" does not exist
LINE 3: ...elect report_date,campaign_id, theta_sketch_union(reach_ds) ...
                                                             ^
Time: 0.217 ms

@biber-baek When I attempted to reproduce the issue, I encountered the same error mentioned above. Could there be a missing step in the reproduction process?

@m3hm3t m3hm3t self-assigned this Aug 23, 2024
@m3hm3t m3hm3t pinned this issue Aug 26, 2024
@m3hm3t m3hm3t unpinned this issue Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants