Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify missing csv data preprocessing methods and implement the methods with a test case | Generic Issue - Not to be assigned #12

Open
Ask149 opened this issue Mar 9, 2021 · 6 comments
Labels
enhancement New feature or request gssoc21 GirlScript Summer of Code'21 Level3 Very Hard Level Difficulty

Comments

@Ask149
Copy link
Contributor

Ask149 commented Mar 9, 2021

Description

  1. Identify the missing methods of CSV data preprocessing in this repository.
  2. Find suitable cases of data and machine learning problems for which the method should be used.
  3. Implement the method only for those cases.

Please Note -
This is a generic issue and multiple students can work on the same. Notify the mentors once you identify a method (as mentioned above). The mentor will create a separate issue and assign you the same.

Contribution guidelines will be updated soon. Please refer them for guidance before committing any development work.

@Ask149 Ask149 added enhancement New feature or request good first issue Good for newcomers gssoc21 GirlScript Summer of Code'21 labels Mar 9, 2021
@rubyruins
Copy link

rubyruins commented Mar 10, 2021

Hi, I would like to work on this for GSSOC. Could you give an example of what kinds of missing data visualization methods you are looking to implement?

@Ask149
Copy link
Contributor Author

Ask149 commented Mar 10, 2021

Hi @rubyruins, thank you for your interest. So, if you take a look at csv_preprocess.py, there are methods such as fill numerical na, normalize numerical columns, label encode categorical columns, etc. Identify if there are any novel methods that we might have missed out on already and should be included. One of them I can think of for now is - Identify the format of a date column then extract the month, day of the week, date, year, etc. from the same and append the same into the column list.

@rubyruins
Copy link

@Ask149 sounds good. Do let me know if there are any other examples you can think of. If you can create an issue for those, I can start working on them. Hopefully, we can discuss it today evening!

@ashish-hacker
Copy link
Contributor

ashish-hacker commented Mar 11, 2021

@Ask149 I checked out csv_preprocess.py , and noticed there is only one method for scaling the features i.e., min-max normalisation. I think it would be better to add some more scaling methods like mean normalisation and standardization for Gaussian distributions to make it more flexible.
May I work on the same?
I am a participant in GSSOC'21.

@Ask149
Copy link
Contributor Author

Ask149 commented Mar 11, 2021

Thank you for your interest @ashish-hacker! Could you please refer the issue #15 and add a similar short description before you start the implementation, we can discuss the same prior to your implementation and make it precise? I am assigning an issue under your name - Issue #16, use the same to add the details.

@ashish-hacker
Copy link
Contributor

Sure @Ask149 : )

@Ask149 Ask149 added Level3 Very Hard Level Difficulty and removed good first issue Good for newcomers labels Apr 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gssoc21 GirlScript Summer of Code'21 Level3 Very Hard Level Difficulty
Projects
None yet
Development

No branches or pull requests

3 participants