The following expressions are slower with Comet enabled, according to the benchmarks in https://github.com/apache/datafusion-comet/pull/2984
This epic is for tracking progress on optimizing these. Separate issues should be created and linked to from this table. Some issues already exist (look for issues tagged with the performance label).
Also, I'd like to point out that this table was generated by AI and contains some duplicate entries, and may also have errors.
| Benchmark File | Expression | Spark Time (ms) | Comet Relative | Slowdown | |----------------|------------|-----------------|----------------|----------| | CometStringExpressionBenchmark | octet_length | 373.0 | 0.4X | 60.0% | | CometStringExpressionBenchmark | trim | 435.0 | 0.4X | 60.0% | | CometStringExpressionBenchmark | ltrim | 434.0 | 0.4X | 60.0% | | CometStringExpressionBenchmark | rtrim | 436.0 | 0.4X | 60.0% | | CometStringExpressionBenchmark | repeat | 720.0 | 0.4X | 60.0% | | CometStringExpressionBenchmark | concat | 595.0 | 0.5X | 50.0% | | CometStringExpressionBenchmark | startswith | 396.0 | 0.5X | 50.0% | | CometStringExpressionBenchmark | ascii | 405.0 | 0.6X | 40.0% | | CometStringExpressionBenchmark | bit_length | 451.0 | 0.6X | 40.0% | | CometStringExpressionBenchmark | concat_ws | 702.0 | 0.6X | 40.0% | | CometStringExpressionBenchmark | instr | 3805.0 | 0.6X | 40.0% | | CometStringExpressionBenchmark | endswith | 414.0 | 0.6X | 40.0% | | CometStringExpressionBenchmark | chr | 27.0 | 0.8X | 20.0% | | CometStringExpressionBenchmark | space | 28.0 | 0.8X | 20.0% | | CometStringExpressionBenchmark | translate | 28908.0 | 0.8X | 20.0% | | CometStringExpressionBenchmark | initCap | 4560.0 | 0.9X | 10.0% | | CometStringExpressionBenchmark | rlike | 3396.0 | 0.9X | 10.0% |
| Benchmark File | Expression | Spark Time (ms) | Comet Relative | Slowdown | |----------------|------------|-----------------|----------------|----------| | CometDatetimeExpressionBenchmark | Timestamp Truncate | 134.0 | 0.6X | 40.0% | | CometDatetimeExpressionBenchmark | Date Truncate | 34.0 | 0.8X | 20.0% | | CometDatetimeExpressionBenchmark | Timestamp Extract - year | 61.0 | 0.8X | 20.0% | | CometDatetimeExpressionBenchmark | Date Extract - year | 24.0 | 0.9X | 10.0% | | CometDatetimeExpressionBenchmark | Date Arithmetic - date_add | 24.0 | 0.9X | 10.0% |
| Benchmark File | Expression | Spark Time (ms) | Comet Relative | Slowdown | |----------------|------------|-----------------|----------------|----------| | CometArrayExpressionBenchmark | array_remove | 12.0 | 0.5X | 50.0% | | CometArrayExpressionBenchmark | array_compact | 13.0 | 0.5X | 50.0% | | CometArrayExpressionBenchmark | array_max | 13.0 | 0.8X | 20.0% | | CometArrayExpressionBenchmark | array_min | 12.0 | 0.8X | 20.0% | | CometArrayExpressionBenchmark | array_contains | 15.0 | 0.9X | 10.0% | | CometArrayExpressionBenchmark | array_distinct | 14.0 | 0.9X | 10.0% | | CometArrayExpressionBenchmark | array_append | 12.0 | 0.9X | 10.0% | | CometArrayExpressionBenchmark | arrays_overlap | 12.0 | 0.9X | 10.0% | | CometArrayExpressionBenchmark | array_insert | 11.0 | 0.9X | 10.0% | | CometArrayExpressionBenchmark | array_join | 13.0 | 0.9X | 10.0% |
| Benchmark File | Expression | Spark Time (ms) | Comet Relative | Slowdown | |----------------|------------|-----------------|----------------|----------| | CometHashExpressionBenchmark | xxhash64_multi | 15.0 | 0.9X | 10.0% | | CometHashExpressionBenchmark | murmur3_hash_single | 13.0 | 0.9X | 10.0% | | CometHashExpressionBenchmark | murmur3_hash_multi | 14.0 | 0.9X | 10.0% | | CometHashExpressionBenchmark | sha2_224 | 28.0 | 0.8X | 20.0% | | CometHashExpressionBenchmark | sha2_256 | 29.0 | 0.8X | 20.0% | | CometHashExpressionBenchmark | sha2_512 | 34.0 | 0.6X | 40.0% | | CometHashExpressionBenchmark | sha2_384 | 34.0 | 0.7X | 30.0% |
| Benchmark File | Expression | Spark Time (ms) | Comet Relative | Slowdown | |----------------|------------|-----------------|----------------|----------| | CometBitwiseExpressionBenchmark | shift_right_unsigned | 10.0 | 0.9X | 10.0% | | CometBitwiseExpressionBenchmark | shift_left | 10.0 | 0.7X | 30.0% | | CometBitwiseExpressionBenchmark | bitwise_or | 12.0 | 0.8X | 20.0% | | CometBitwiseExpressionBenchmark | bitwise_xor | 11.0 | 0.8X | 20.0% | | CometBitwiseExpressionBenchmark | bitwise_not | 10.0 | 0.8X | 20.0% | | CometBitwiseExpressionBenchmark | shift_right | 10.0 | 0.8X | 20.0% | | CometBitwiseExpressionBenchmark | bit_count | 10.0 | 0.8X | 20.0% |
| Benchmark File | Expression | Spark Time (ms) | Comet Relative | Slowdown | |----------------|------------|-----------------|----------------|----------| | CometCastStringToNumericBenchmark | CAST String to BYTE | 59.0 | 0.8X | 20.0% | | CometCastStringToNumericBenchmark | CAST String to SHORT | 59.0 | 0.8X | 20.0% | | CometCastStringToNumericBenchmark | CAST String to INT | 56.0 | 0.8X | 20.0% | | CometCastStringToNumericBenchmark | CAST String to LONG | 59.0 | 0.8X | 20.0% |
| Benchmark File | Expression | Spark Time (ms) | Comet Relative | Slowdown | |----------------|------------|-----------------|----------------|----------| | CometComparisonExpressionBenchmark | greater_than | 11.0 | 0.8X | 20.0% | | CometComparisonExpressionBenchmark | is_null | 10.0 | 0.8X | 20.0% | | CometComparisonExpressionBenchmark | is_nan_float | 10.0 | 0.8X | 20.0% | | CometComparisonExpressionBenchmark | not_equal_to | 13.0 | 0.9X | 10.0% | | CometComparisonExpressionBenchmark | less_than | 12.0 | 0.9X | 10.0% | | CometComparisonExpressionBenchmark | less_than_or_equal | 11.0 | 0.9X | 10.0% | | CometComparisonExpressionBenchmark | greater_than_or_equal | 11.0 | 0.9X | 10.0% | | CometComparisonExpressionBenchmark | equal_null_safe | 10.0 | 0.9X | 10.0% | | CometComparisonExpressionBenchmark | is_not_null | 10.0 | 0.9X | 10.0% | | CometComparisonExpressionBenchmark | and | 11.0 | 0.9X | 10.0% | | CometComparisonExpressionBenchmark | or | 11.0 | 0.9X | 10.0% | | CometComparisonExpressionBenchmark | not | 10.0 | 0.9X | 10.0% | | CometComparisonExpressionBenchmark | in_list | 10.0 | 0.9X | 10.0% | | CometComparisonExpressionBenchmark | not_in_list | 11.0 | 0.9X | 10.0% |
| Benchmark File | Expression | Spark Time (ms) | Comet Relative | Slowdown | |----------------|------------|-----------------|----------------|----------| | CometMathExpressionBenchmark | hex_int | 11.0 | 0.7X | 30.0% | | CometMathExpressionBenchmark | floor | 10.0 | 0.8X | 20.0% | | CometMathExpressionBenchmark | hex_long | 11.0 | 0.8X | 20.0% | | CometMathExpressionBenchmark | unhex | 13.0 | 0.8X | 20.0% | | CometMathExpressionBenchmark | unary_minus | 10.0 | 0.8X | 20.0% | | CometMathExpressionBenchmark | ceil | 11.0 | 0.9X | 10.0% | | CometMathExpressionBenchmark | round | 19.0 | 0.9X | 10.0% | | CometMathExpressionBenchmark | atan2 | 11.0 | 0.9X | 10.0% | | CometMathExpressionBenchmark | log | 11.0 | 0.9X | 10.0% | | CometMathExpressionBenchmark | log10 | 11.0 | 0.9X | 10.0% |
| Benchmark File | Expression | Spark Time (ms) | Comet Relative | Slowdown | |----------------|------------|-----------------|----------------|----------| | CometConditionalExpressionBenchmark | Case When Expr | 41.0 | 0.8X | 20.0% | | CometPredicateExpressionBenchmark | in Expr | 42.0 | 0.8X | 20.0% | | CometConditionalExpressionBenchmark | If Expr | 38.0 | 0.9X | 10.0% |
No response
No response