Skip to content

RagavendranMRN/Machine-Learning-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

Machine-Learning(Basics)

This repo is for "Machine Learning from very basics".

Different Packages

  • Pandas
  • Numpy
  • Scikit-learn
  • matplotlib

Let's look into the use of packages

Pandas

Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. The most powerful and flexible open source data analysis / manipulation tool available in any language

Main Features of Panda
  • Easy handling of missing data
  • Columns can be inserted and deleted from DataFrame
  • Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels
  • Intuitive merging and joining data sets
  • Flexible reshaping and pivoting of data sets
  • Loading data from flat files (CSV and delimited), Excel files, databases, and saving/loading data
Documentation

The official documentation is hosted on PyData.org: https://pandas.pydata.org/pandas-docs/stable


NumPy

NumPy is the fundamental package needed for scientific computing with Python. The ndarray (NumPy Array) is a multidimensional array used to store values of same datatype. These arrays are indexed just like Sequences, starts with zero

Main Features of Numpy
  • A powerful N-Dimensional array
  • Sophisticated Functions
  • Mathematical and Logical opertions
  • Fourier Transforms
  • Linear Algebra, Random Number Generation
Documentation

The official documentation is hosted on numpy.org: https://www.numpy.org


Scikit-learn

Simple and efficient tools for data mining and data analysis, The library is built upon the SciPy (Scientific Python) that must be installed before you can use scikit-learn. This stack that includes:

  • NumPy: Base n-dimensional array package
  • SciPy: Fundamental library for scientific computing
  • Matplotlib: Comprehensive 2D/3D plotting
  • IPython: Enhanced interactive console
  • Sympy: Symbolic mathematics
  • Pandas: Data structures and analysis Accessible to everybody, and reusable in various contexts
Main Features of scikit-learn
  • Clustering: for grouping unlabeled data such as KMeans.
  • Datasets: for test datasets and for generating datasets with specific properties for investigating model behavior.
  • Feature extraction: for defining attributes in image and text data.
  • Dimensionality Reduction: for reducing the number of attributes in data for summarization, visualization and feature selection such as Principal component analysis.
Documentation

The official documentation is hosted on scikit-learn.org: http://scikit-learn.org


Matplotlib

Comprehensive 2D/3D plotting

Documentation

The official documentation is hosted on matplotlib.org: http://matplotlib.org