Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: lambda that uses imported functions #11

Closed
kasra-keshavarz opened this issue Jul 7, 2021 · 10 comments
Closed

bug: lambda that uses imported functions #11

kasra-keshavarz opened this issue Jul 7, 2021 · 10 comments
Labels
bug Something isn't working

Comments

@kasra-keshavarz
Copy link

kasra-keshavarz commented Jul 7, 2021

Description

Using imported functions

Use case / motivation

mapply only works with imported function when they are explicitly imported in the body of the function they are called. Take the following code for example:

from math import sqrt

import pandas as pd
import numpy as np

import mapply
mapply.init(
    n_workers=-1,
    chunk_size=1,
    max_chunks_per_worker=10,
    progressbar=False
)

df = pd.DataFrame(np.random.rand(500,5))
df.mapply(lambda x: sqrt(x), axis=0)

the code above results in an error that sqrtis not defined. It only works when the import statement is used within the lambda function.

Related Issues

Nothing comes to mind.

@kasra-keshavarz kasra-keshavarz added the enhancement New feature or request label Jul 7, 2021
@ddelange
Copy link
Owner

ddelange commented Jul 8, 2021

Hi @kasra-keshavarz,

Thanks for posting this issue, I'm glad you found a workaround.

I'm afraid this is most lilely an upstream bug (although I'm not at my machine to confirm).

I agree that your example above looks like something you'd want to work, a general safe practise however would be to avoid lambdas for complex operations:

from math import sqrt

df.mapply(sqrt, axis=0)

def cool_polynomial(x):
    """Putting more complex logic like this in a lambda quickly becomes unreadable."""
    return sqrt(x) + 1

df.mapply(cool_polynomial, axis=0)

Hope that helps!

@ddelange ddelange changed the title using mapply for imported functions bug: lambda that uses imported functions Jul 8, 2021
@ddelange ddelange added bug Something isn't working and removed enhancement New feature or request labels Jul 8, 2021
@thoughtfuldata
Copy link

I just ran into this bug. can confirm that importing inside the lambda function works

@ddelange
Copy link
Owner

ddelange commented Nov 5, 2021

>>> from math import sqrt
... 
... import pandas as pd
... import numpy as np
... import mapply
... 
... mapply.init(progressbar=False)
... 
... df = pd.DataFrame(np.random.rand(500))
... df.mapply(lambda x: sqrt(x), axis=1)
0      0.649900
1      0.653619
2      0.910843
3      0.901363
4      0.955056
         ...
495    0.448268
496    0.696137
497    0.786392
498    0.833481
499    0.980583
Length: 500, dtype: float64

>>> from packaging.markers import default_environment
>>> default_environment()
{'implementation_name': 'cpython', 'implementation_version': '3.8.10', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_release': '19.6.0', 'platform_system': 'Darwin', 'platform_version': 'Darwin Kernel Version 19.6.0: Thu May  6 00:48:39 PDT 2021; root:xnu-6153.141.33~1/RELEASE_X86_64', 'python_full_version': '3.8.10', 'platform_python_implementation': 'CPython', 'python_version': '3.8', 'sys_platform': 'darwin'}

On my system the MWE works as expected. Can you provide more details about your system and the traceback you receive?

@thoughtfuldata
Copy link

temp_df['transcripts'] = temp_df.mapply(get_transcripts, axis=1)

the above call wont work, unless import youtube_transcript_api as yta is inside the custom function

import requests as rq

import numpy as np
import pandas as pd

import youtube_transcript_api as yta


def get_transcripts(video_id):

    try:
        print(video_id[0], video_id)
        transcripts = yta.YouTubeTranscriptApi.get_transcript(
            video_id[0], cookies='cookies/cookies.Youtube.txt')
        return transcripts

@ddelange
Copy link
Owner

ddelange commented Nov 6, 2021

The 'not defined' error does not occur for me when using an imported namespace or function in a lambda neither on my Mac nor on Linux (Google Colab). My hunch is that this is a Windows specific pickling issue. I would suggest creating an issue in the dill repository with a reproducible script (and traceback) that demonstrates this behaviour on Windows.

If your issue gets resolved upstream, please report back here!

@ma7555
Copy link

ma7555 commented Feb 8, 2022

This issue happens with me too

@ddelange
Copy link
Owner

ddelange commented Feb 8, 2022

@ma7555 are you on Windows? if yes, please see my last comment above. if no, please post a reproducible script and I can take a look!

@ma7555
Copy link

ma7555 commented Feb 11, 2022

@ddelange
yes I am on Windows.

@ddelange
Copy link
Owner

in that case, this issue might be related: uqfoundation/dill#323

as a fix was merged for it, you could try upgrading dill to the latest version with pip install -U dill

@ddelange
Copy link
Owner

Closing as duplicate of uqfoundation/multiprocess#58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants