In this issue, we will list and track all tasks for ANSI mode support.
There are two Spark configurations directly related to ANSI usage.
spark.sql.ansi.enabled (default is true since Spark 4.0)spark.sql.storeAssignmentPolicy (default is ANSI since Spark 3.0)[x] cast string to boolean (@malinjawi) https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L701
[ ] Cast decimal to string (@Mariamalmesfer))
// In ANSI mode, Spark always use plain string representation on casting Decimal values
// as strings. Otherwise, the casting is using BigDecimal.toString which may use scientific
// notation if an exponent is needed.
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L678
[ ] cast string to timestamp (@infvg ) https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L733
[ ] cast String to timestampNTZ https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L775
[ ] cast float/double to timestamp (@infvg) https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L758 https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L765
[x] cast string to date (@malinjawi) https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L811
[ ] cast string to time (@malinjawi) https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L826 The implementation for codegen, assume equivalent with the above link: https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L1493
[ ] cast string to long/int/short/byte (@malinjawi) As one example, here is the related code for long type: https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L883
[ ] cast NumericType to long/int/short/byte (@minni31) As one example, here is the related code for long type: https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L896
[ ] cast timestamp to int/short/byte As one example, here is the related code for int type: https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L933
[ ] cast time to short/byte (requires TimeType support) As one example, here is the related code for short type: https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L980
[ ] cast several types to decimal ANSI controls the overflow behavior in changePrecision https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L1103
[ ] cast string to double/float ANSI controls the behavior in handling incorrect number format As one example, here is the related code for double type: https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L1159
[ ] A base type: AnsiIntervalType (@malinjawi) https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/api/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala#L168
[ ] Unary expressions like Abs, UnaryMinus (@malinjawi) The ANSI config controls failOnError. https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L152C35-L152C46
[ ] Binary arithmetic expressions using BinaryArithmetic as base, such as add, divide, multiply, etc. (@malinjawi) https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L209
[ ] String expressions: Elt https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L286
[ ] Collection expression Size Its legacySizeOfNull is impacted by ANSI config. https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L118
[ ] Collection expression ElementAt https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L2622
[ ] conv https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L451
[ ] round functions: Round, BRound, RoundCeil, RoundFloor As one example, see how to round to ByteType with ANSI enabled: https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L1579
[ ] STORE_ASSIGNMENT_POLICY defaults to ANSI https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L4487
SUM, AVG, VAR_POP, VAR_SAMP, STDDEV_POP, STDDEV_SAMPTRY_SUM (Spark 3.4+)NULL on overflow instead of error.Same overflow checks apply in window operations:
SUM(...) OVER(...)AVG(...) OVER(...)SUBSTRING / SUBSTRSUBSTRING(str FROM start [FOR len])SUBSTRING(str, start, len)TRIMTRIM(LEADING '0' FROM col)TRIM(BOTH ...), TRIM(TRAILING ...)OVERLAYOVERLAY(string PLACING replacement FROM start [FOR length])See Spark ANSI compliance: https://github.com/apache/spark/blob/v4.0.0/docs/sql-ref-ansi-compliance.md
Related discussion: https://github.com/apache/incubator-gluten/issues/4740. https://github.com/facebookincubator/velox/issues/3869