layout	title	nav_order
page	Supported Operators and Functions	5

The Operators and Functions Support Progress

Gluten is still in active development. Here is a list of supported operators and functions.

Velox backend

Since the same function may have different semantics between Presto and Spark, Velox implement the functions in Presto category, if we note a different sematics from Spark, then the function is implemented in Spark category. So Gluten firstly will use Velox's spark category, if a function isn't implemented there then refer to Presto category.

The total supported functions' number for Spark3.3 is 387 and for Velox is 204. Gluten supported frequently used 94, in which offloaded 62 is implemented in velox/spark and 32 in velox/presto, shown as below picture.

Value	Description
S	(Supported) Gluten or Velox support fully.
	(Not Applicable) Neither Gluten not Velox support this type in this situation.
PS	(Partial Support) Apache Spark supports this type, but Velox only partially supports it.
NS	(Not Supported) Apache Spark supports this type but Velox backend does not.

Value	Description
Mismatched	Some functions implimented by Velox, which return results mismatched with Apache Spark. So we marked then as "Mismatched"
Ansi OFF	Velox's doesn't fully support ANSI mode). Once Ansi is enabled, Gluten will fallback to Vanilla Spark.

Operator Map

Gluten supports 14 operators (Draw to right to see all data types)

Executor	Description	Gluten Name	Velox Name	BOOLEAN	BYTE	SHORT	INT	LONG	FLOAT	DOUBLE	STRING	NULL	BINARY	ARRAY	MAP	STRUCT(ROW)	DATE	TIMESTAMP	DECIMAL	CALENDAR	UDT
FileSourceScanExec	Reading data from files, often from Hive tables	FileSourceScanExecTransformer	TableScanNode	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
BatchScanExec	The backend for most file input	BatchScanExecTransformer	TableScanNode	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
FilterExec	The backend for most filter statements	FilterExecTransformer	FilterNode	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
ProjectExec	The backend for most select, withColumn and dropColumn statements	ProjectExecTransformer	ProjectNode	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
HashAggregateExec	The backend for hash based aggregations	HashAggregateBaseTransformer	AggregationNode	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
BroadcastHashJoinExec	Implementation of join using broadcast data	BroadcastHashJoinExecTransformer	HashJoinNode	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
ShuffledHashJoinExec	Implementation of join using hashed shuffled data	ShuffleHashJoinExecTransformer	HashJoinNode	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
SortExec	The backend for the sort operator	SortExecTransformer	OrderByNode	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
SortMergeJoinExec	Sort merge join, replacing with shuffled hash join	SortMergeJoinExecTransformer	MergeJoinNode	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
WindowExec	Window-operator backend	WindowExecTransformer	WindowNode	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
GlobalLimitExec	Limiting of results across partitions	LimitTransformer	LimitNode	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
LocalLimitExec	Per-partition limiting of results	LimitTransformer	LimitNode	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
ExpandExec	The backend for the expand operator	ExpandExecTransformer	GroupIdNode	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
UnionExec	The backend for the union operator	UnionExecTransformer	N	S	S	S	S	S	S	S	S	S	S	NS	NS	NS	S	NS	NS	NS	NS
DataWritingCommandExec	Writing data	Y	TableWriteNode	S	S	S	S	S	S	S	S	S	S	S	NS	S	S	NS	S	NS	NS
CartesianProductExec	Implementation of join using brute force	N	CrossJoinNode	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS
ShuffleExchangeExec	The backend for most data being exchanged between processes	N	ExchangeNode	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS
	The unnest operation expands arrays and maps into separate columns. Arrays are expanded into a single column, and maps are expanded into two columns (key, value). Can be used to expand multiple columns. In this case produces as many rows as the highest cardinality array or map (the other columns are padded with nulls). Optionally can produce an ordinality column that specifies the row number starting with 1.	N	UnnestNode	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS
	The top-n operation reorders a dataset based on one or more identified sort fields as well as a sorting order. Rather than sort the entire dataset, the top-n will only maintain the total number of records required to ensure a limited output. A top-n is a combination of a logical sort and logical limit operations.	N	TopNNode	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS
	The partitioned output operation redistributes data based on zero or more distribution fields.	N	PartitionedOutputNode	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS
	The values operation returns specified data.	N	ValuesNode	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS
	A receiving operation that merges multiple ordered streams to maintain orderedness. Input streams are coming from remote exchange or shuffle.	N	MergeExchangeNode	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS
	An operation that merges multiple ordered streams to maintain orderedness. Input streams are coming from local exchange.	N	LocalMergeNode	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS
	A local exchange operation that partitions input data into multiple streams or combines data from multiple streams into a single stream.	N	LocalPartitionNode	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS
	The enforce single row operation checks that input contains at most one row and returns that row unmodified. If input is empty, returns a single row with all values set to null. If input contains more than one row raises an exception.	N	EnforceSingleRowNode	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS
	The assign unique id operation adds one column at the end of the input columns with unique value per row. This unique value marks each output row to be unique among all output rows of this operator.	N	AssignUniqueIdNode	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	NS	S	S	S	S	S
CollectLimitExec	Reduce to single partition and apply limit	N	N
BroadcastExchangeExec	The backend for broadcast exchange of data	N	N
ObjectHashAggregateExec	The backend for hash based aggregations supporting TypedImperativeAggregate functions	N	N
SortAggregateExec	The backend for sort based aggregations	N	N
CoalesceExec	Reduce the partition numbers	N	N
GenerateExec	The backend for operations that generate more output rows than input rows like explode	N	N
RangeExec	The backend for range operator	N	N
SampleExec	The backend for the sample operator	N	N
SubqueryBroadcastExec	Plan to collect and transform the broadcast key values	N	N
TakeOrderedAndProjectExec	Take the first limit elements as defined by the sortOrder, and do projection if needed	N	N
CustomShuffleReaderExec	A wrapper of shuffle query stage	N	N
InMemoryTableScanExec	Implementation of InMemoryTableScanExec to use GPU accelerated caching	N	N
BroadcastNestedLoopJoinExec	Implementation of join using brute force. Full outer joins and joins where the broadcast side matches the join side (e.g.: LeftOuter with left broadcast) are not supported	N	N
AggregateInPandasExec	The backend for an Aggregation Pandas UDF, this accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled.	N	N
ArrowEvalPythonExec	The backend of the Scalar Pandas UDFs. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled	N	N
FlatMapGroupsInPandasExec	The backend for Flat Map Groups Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled.	N	N
MapInPandasExec	The backend for Map Pandas Iterator UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled.	N	N
WindowInPandasExec	The backend for Window Aggregation Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. For now it only supports row based window frame.	N	N

Function support

Gluten supports 94 functions. (Draw to right to see all data types)

Spark Functions	Velox/Presto Functions	Velox/Spark functions	Gluten	Restrictions	BOOLEAN	BYTE	SHORT	INT	LONG	FLOAT	DOUBLE	DATE	TIMESTAMP	STRING	BINARS	CALENDAR	ARRAS	MAP	STRUCT
!		not	S		S	S	S	S	S	S	S			S
&
AND			S		S	S	S	S	S	S	S			S
\	bitwise_or		S				S	S	S
OR
^	bitwise_xor		S				S	S	S
~	bitwise_not		S				S	S	S
<	lt	lt	S		S	S	S	S	S	S	S			S
<=	lte	lte	S		S	S	S	S	S	S	S			S
>	gt	gt	S		S	S	S	S	S	S	S			S
>=	gte	gte	S		S	S	S	S	S	S	S			S
=
==	eq	eq	S		S	S	S	S	S	S	S			S
<==>
%	mod	remainder	S	Ansi Off		S	S	S	S	S
/	divide	divide	S	Ansi Off		S	S	S	S	S
*	multiply	multiply	S	Ansi Off		S	S	S	S	S
+	plus	add	S	Ansi Off		S	S	S	S	S
-	minus	substract	S	Ansi Off		S	S	S	S	S
isnull	is_null	isnull	S		S	S	S	S	S	S	S			S
isnotnull		isnotnull	S		S	S	S	S	S	S	S			S
in		in	S				S	S	S	S	S	S	S	S
between	between	between	S		S	S	S	S	S	S	S	S		S
if
<>	neq	neq	S		S	S	S	S	S	S	S			S
asc_nulls_first
asc_nulls_last
asc
ascii		ascii	S											S
base64, unbase64
bin
bit_length
char_length, character_length	length	length	S
chr, char	chr	chr	S											S
concat	concat	concat	S											S
concat_ws													S
contains		contains	S											S
endswith		endsWith	S											S
first, first_value
initcap
instr		instr	S
lcase, lower	lower	lower	S											S
left
length	length	length	S											S
lower	lower	lower	S											S
locate	strpos			Mismatched									S
lpad	lpad		S											S
ltrim		ltrim	S											S
position
printf
repeat
reverse	reverse		S											S
replace	replace	replace	S											S
right
rpad	rpad		S											S
rtrim		rtrim	S											S
second
soundex
space
split	split	split		Mismatched
split_part	split_part			Mismatched
substr, substring	substr	substring	S											S
substring_index
startswith	startsWith		S											S
translate
trim		trim	S											S
ucase, upper	upper	upper	S											S
like	like	like	S											S
rlike	rlike	rlike	S	Not support lookaround
regexp	rlike	rlike	S	Not support lookaround										S
regexp_extract	regexp_extract	regexp_extract	S	Not support lookaround										S
regexp_extract_all	regexp_extract_all		S	Not support lookaround										S
regexp_like	rlike	rlike	S	Not support lookaround										S
regexp_replace	regexp_replace			Mismatched									S
abs	abs	abs	S	Ansi Off		S	S	S	S	S
acos	acos		S				S	S	S	S	S
asin	asin		S				S	S	S	S	S
atan, atan2	atan, atan2		S				S	S	S	S	S
bitwiseNOT			S
bround
cbrt	cbrt
ceil	ceil	ceil	S				S	S	S	S	S
ceiling	ceiling		S				S	S	S	S	S
cos	cos		S				S	S	S	S	S
cosh	cosh		S				S	S	S	S	S
degrees	degrees		S				S	S	S	S	S
factorial
floor	floor	floor	S				S	S	S	S	S
e	e		S				S	S	S	S	S
exp	exp	exp	S				S	S	S	S	S
greatest	greatest	greatest	S						S	S	S	S	S
hex, unhex
hypot
least	least	least	S						S	S	S	S	S
log	ln		S
log10	log10		S				S	S	S	S	S
log1p
log2	log2		S				S	S	S	S	S
ln	ln		S				S	S	S	S	S
pi	pi		S				S	S	S	S	S
pow	pow						S	S	S	S	S
power	power	power	S				S	S	S	S	S
pmod		pmod	S	Ansi Off		S	S	S	S	S
quarter
radians	radians		S				S	S	S	S	S
rand	rand						S	S	S	S	S
random	random						S	S	S	S	S
rint
round	round	round	S				S	S	S	S	S
width_bucket	width_bucket					S	S	S	S	S
sequence						S	S	S	S	S
sign	sign		S				S	S	S	S	S
signum
sin, sinh	sin		S				S	S	S	S	S
sqrt	sqrt		S				S	S	S	S	S
tan, tanh	tan, tanh		S				S	S	S	S	S
array_contains	contains	array_contains														S
array_distinct	array_distinct															S
array_except	array_except															S
array_intersect	array_intersect	array_intersect													S
array_join	array_join															S
array_max	array_max															S
array_min	array_min															S
array_position	array_position															S
array_remove
array_repeat
array_sort	sort_array	sort_array														S
array_union
array
arrays_overlap
arrays_zip
element_at	element_at	element_at														S
flatten
filter		filter
map	map	map																S
map_concat	map_concat																S
map_entries	map_entries																S
map_filter	map_filter	map_filter															S
map_from_arrays	map_from_arrays	map_from_arrays														S
map_from_entries
map_keys
map_values
negate
size		size															S
slice	slice																S
sort_array	sort_array	sort_array														S
explode
add_months
current_date
current_timestamp
date
date_add	date_add	date_add				S	S	S			S	S
date_format	date_format										S	S
date_sub
date_trunc	date_trunc										S	S
datediff		date_diff										S
day	day		S									S	S
dayofmonth	day_of_month		S									S	S
dayofweek	day_of_week		S									S	S
dayofyear	day_of_year		S									S	S
from_unixtime	from_unixtime											S
from_utc_timestamp
hour	hour											S	S
last_day
month	month		S									S	S
minute	minute											S	S
months_between
next_day
now
quarter	quarter		S									S	S
second	second											S	S
timestamp
to_date
to_timestamp
to_unix_timestamp	to_unixtime											S
to_utc_timestamp
trunc
unix_timestamp
weekofyear
window
year	year		S									S	S
year_of_week	year_of_week		S
approx_count_distinct	approx_distinct		S		S	S	S	S	S	S	S	S		S
avg	avg	avg	S	Ansi Off		S	S	S	S	S
collect_list
collect_set
corr	corr
count	count	count	S				S	S	S	S	S
countDistinct
covar_pop	covar_pop					S	S	S	S	S
covar_samp	covar_samp					S	S	S	S	S
first
grouping_id
kurtosis
last
max	max	max	S				S	S	S	S	S
mean
min	min	min	S				S	S	S	S	S
skewness
stddev_pop	stddev_pop	stddev_pop	S			S	S	S	S	S
stddev_samp	stddev_samp	stddev_samp	S				S	S	S	S	S
stddev	stddev	stddev	S				S	S	S	S	S
sum	sum	sum	S	Ansi Off		S	S	S	S	S
sumDistinct
var_pop	var_pop					S	S	S	S	S
var_samp	var_samp					S	S	S	S	S
variance	variance					S	S	S	S	S
cume_dist	cume_dist		S
dense_rank	dense_rank		S
lag
lead
ntile
percent_rank	percent_rank
rank	rank		S
row_number	row_number		S				S	S	S
from_json
get_json_object	json_extract_scalar	get_json_object	S																S
input_file_name
json_array_length	json_array_length		S																S
json_tuple
schema_of_json
sha1		sha1	S											S
sha2		sha2	S											S
to_json
callUDF
crc32	crc32		S											S
decode
encode
hash	hash	hash	S		S	S	S	S	S	S	S
md5	md5		S			S
monotonically_increasing_id
shiftLeft	bitwise_left_shift		S				S	S	S
shiftRight	bitwise_right_shift		S				S	S	S
shiftRightUnsigned
shuffle
spark_partition_id
udf
when
xxhash64	xxhash64

Clickhouse Backed

To be added

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SupportProgress.md

SupportProgress.md

The Operators and Functions Support Progress

Velox backend

Operator Map

Function support

Clickhouse Backed

Files

SupportProgress.md

Latest commit

History

SupportProgress.md

File metadata and controls

The Operators and Functions Support Progress

Velox backend

Operator Map

Function support

Clickhouse Backed