Skip to content

Incorrect imports style masks builtin Python operations #1

Open
@curtkohler

Description

@curtkohler

A few modules of the code imports the entire pyspark SQL function like so:
from pyspark.sql.functions import *

This is an anti-pattern which has the side effect of causing collisions with the builtin Python functions of with the same name as available PySpark functions, e.g. sum, max, etc. The end result is that you can't invoke the builtin functions within UDFs as you get signature mismatches and odd errors in Python notebooks. The imports should be of the format:
from pyspark.sql import functions as F

and then reference the PySpark variants using the 'F' prefix as needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions