From ab4e7563ff3c6210ee53375c24796cf5b911dde3 Mon Sep 17 00:00:00 2001
From: "shivant47@gmail.com" <shivant47@gmail.com>
Date: Sat, 29 Aug 2020 16:57:38 +0530
Subject: [PATCH] all ML train and model added

---
 Mlwork.txt        |  10 ---
 Regression.txt    | 166 ++++++++++++++++++++++++++++++++++++++++++++++
 Step1&2.txt       | 119 +++++++++++++++++++++++++++++++++
 TrainingModel.txt |  75 +++++++++++++++++++++
 4 files changed, 360 insertions(+), 10 deletions(-)
 delete mode 100644 Mlwork.txt
 create mode 100644 Regression.txt
 create mode 100644 Step1&2.txt
 create mode 100644 TrainingModel.txt

diff --git a/Mlwork.txt b/Mlwork.txt
deleted file mode 100644
index 84e07fb..0000000
--- a/Mlwork.txt
+++ /dev/null
@@ -1,10 +0,0 @@
-khdskfjhlkdjlkshlnkjsahrh kjgsdkjfbk iuaw sahr kjhaiwhoiuw 
-kljlksdjlkfjds
-dsjhflkdhdf
-jdhslkjdflk
-sakfklds
-jksahflkd
-iwyrioueo
-sjhklashd
-ksaljflkdsjf
-asklfjldskf
diff --git a/Regression.txt b/Regression.txt
new file mode 100644
index 0000000..7c565a5
--- /dev/null
+++ b/Regression.txt
@@ -0,0 +1,166 @@
+References
+
+1
+Breiman, “Random Forests”, Machine Learning, 45(1), 5-32, 2001.
+
+Examples
+
+>>>
+>>> from sklearn.ensemble import RandomForestClassifier
+>>> from sklearn.datasets import make_classification
+>>> X, y = make_classification(n_samples=1000, n_features=4,
+...                            n_informative=2, n_redundant=0,
+...                            random_state=0, shuffle=False)
+>>> clf = RandomForestClassifier(max_depth=2, random_state=0)
+>>> clf.fit(X, y)
+RandomForestClassifier(...)
+>>> print(clf.predict([[0, 0, 0, 0]]))
+[1]
+Methods
+
+apply(X)
+
+Apply trees in the forest to X, return leaf indices.
+
+decision_path(X)
+
+Return the decision path in the forest.
+
+fit(X, y[, sample_weight])
+
+Build a forest of trees from the training set (X, y).
+
+get_params([deep])
+
+Get parameters for this estimator.
+
+predict(X)
+
+Predict class for X.
+
+predict_log_proba(X)
+
+Predict class log-probabilities for X.
+
+predict_proba(X)
+
+Predict class probabilities for X.
+
+score(X, y[, sample_weight])
+
+Return the mean accuracy on the given test data and labels.
+
+set_params(**params)
+
+Set the parameters of this estimator.
+
+__init__(n_estimators=100, *, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, class_weight=None, ccp_alpha=0.0, max_samples=None)[source]
+Initialize self. See help(type(self)) for accurate signature.
+
+apply(X)[source]
+Apply trees in the forest to X, return leaf indices.
+
+Parameters
+X{array-like, sparse matrix} of shape (n_samples, n_features)
+The input samples. Internally, its dtype will be converted to dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparse csr_matrix.
+
+Returns
+X_leavesndarray of shape (n_samples, n_estimators)
+For each datapoint x in X and for each tree in the forest, return the index of the leaf x ends up in.
+
+decision_path(X)[source]
+Return the decision path in the forest.
+
+New in version 0.18.
+
+Parameters
+X{array-like, sparse matrix} of shape (n_samples, n_features)
+The input samples. Internally, its dtype will be converted to dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparse csr_matrix.
+
+Returns
+indicatorsparse matrix of shape (n_samples, n_nodes)
+Return a node indicator matrix where non zero elements indicates that the samples goes through the nodes. The matrix is of CSR format.
+
+n_nodes_ptrndarray of shape (n_estimators + 1,)
+The columns from indicator[n_nodes_ptr[i]:n_nodes_ptr[i+1]] gives the indicator value for the i-th estimator.
+
+property feature_importances_
+The impurity-based feature importances.
+
+The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.
+
+Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance as an alternative.
+
+Returns
+feature_importances_ndarray of shape (n_features,)
+The values of this array sum to 1, unless all trees are single node trees consisting of only the root node, in which case it will be an array of zeros.
+
+fit(X, y, sample_weight=None)[source]
+Build a forest of trees from the training set (X, y).
+
+Parameters
+X{array-like, sparse matrix} of shape (n_samples, n_features)
+The training input samples. Internally, its dtype will be converted to dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparse csc_matrix.
+
+yarray-like of shape (n_samples,) or (n_samples, n_outputs)
+The target values (class labels in classification, real numbers in regression).
+
+sample_weightarray-like of shape (n_samples,), default=None
+Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. In the case of classification, splits are also ignored if they would result in any single class carrying a negative weight in either child node.
+
+Returns
+selfobject
+get_params(deep=True)[source]
+Get parameters for this estimator.
+
+Parameters
+deepbool, default=True
+If True, will return the parameters for this estimator and contained subobjects that are estimators.
+
+Returns
+paramsmapping of string to any
+Parameter names mapped to their values.
+
+predict(X)[source]
+Predict class for X.
+
+The predicted class of an input sample is a vote by the trees in the forest, weighted by their probability estimates. That is, the predicted class is the one with highest mean probability estimate across the trees.
+
+Parameters
+X{array-like, sparse matrix} of shape (n_samples, n_features)
+The input samples. Internally, its dtype will be converted to dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparse csr_matrix.
+
+Returns
+yndarray of shape (n_samples,) or (n_samples, n_outputs)
+The predicted classes.
+
+predict_log_proba(X)[source]
+Predict class log-probabilities for X.
+
+The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the trees in the forest.
+
+Parameters
+X{array-like, sparse matrix} of shape (n_samples, n_features)
+The input samples. Internally, its dtype will be converted to dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparse csr_matrix.
+
+Returns
+pndarray of shape (n_samples, n_classes), or a list of n_outputs
+such arrays if n_outputs > 1. The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
+
+predict_proba(X)[source]
+Predict class probabilities for X.
+
+The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.
+
+Parameters
+X{array-like, sparse matrix} of shape (n_samples, n_features)
+The input samples. Internally, its dtype will be converted to dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparse csr_matrix.
+
+Returns
+pndarray of shape (n_samples, n_classes), or a list of n_outputs
+such arrays if n_outputs > 1. The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
+
+score(X, y, sample_weight=None)[source]
+Return the mean accuracy on the given test data and labels.
+
+In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
\ No newline at end of file
diff --git a/Step1&2.txt b/Step1&2.txt
new file mode 100644
index 0000000..0ee0077
--- /dev/null
+++ b/Step1&2.txt
@@ -0,0 +1,119 @@
+step 1: 
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+import seaborn as sns
+%matplotlib inline
+import scipy.stats as stats
+import warnings
+warnings.filterwarnings("ignore")
+
+
+from google.colab import files
+uploaded = files.upload()
+df = pd.read_csv(io.BytesIO(uploaded['Bank_Personal_Loan_Modelling.csv']))
+
+
+
+Importing Data
+Use these commands to import data from a variety of different sources and formats.
+
+pd.read_csv(filename) | From a CSV file
+pd.read_table(filename) | From a delimited text file (like TSV)
+pd.read_excel(filename) | From an Excel file
+pd.read_sql(query, connection_object) | Read from a SQL table/database
+pd.read_json(json_string) | Read from a JSON formatted string, URL or file.
+pd.read_html(url) | Parses an html URL, string or file and extracts tables to a list of dataframes
+pd.read_clipboard() | Takes the contents of your clipboard and passes it to read_table()
+pd.DataFrame(dict) | From a dict, keys for columns names, values for data as lists
+
+Exporting Data
+Use these commands to export a DataFrame to CSV, .xlsx, SQL, or JSON.
+
+df.to_csv(filename) | Write to a CSV file
+df.to_excel(filename) | Write to an Excel file
+df.to_sql(table_name, connection_object) | Write to a SQL table
+df.to_json(filename) | Write to a file in JSON format
+
+Create Test Objects
+These commands can be useful for creating test segments.
+
+pd.DataFrame(np.random.rand(20,5)) | 5 columns and 20 rows of random floats
+pd.Series(my_list) | Create a series from an iterable my_list
+df.index = pd.date_range('1900/1/30', periods=df.shape[0]) | Add a date index
+
+Viewing/Inspecting Data
+Use these commands to take a look at specific sections of your pandas DataFrame or Series.
+
+df.head(n) | First n rows of the DataFrame
+df.tail(n) | Last n rows of the DataFrame
+df.shape | Number of rows and columns
+df.info() | Index, Datatype and Memory information
+df.describe() | Summary statistics for numerical columns
+s.value_counts(dropna=False) | View unique values and counts
+df.apply(pd.Series.value_counts) | Unique values and counts for all columns
+
+Selection
+Use these commands to select a specific subset of your data.
+
+df[col] | Returns column with label col as Series
+df[[col1, col2]] | Returns columns as a new DataFrame
+s.iloc[0] | Selection by position
+s.loc['index_one'] | Selection by index
+df.iloc[0,:] | First row
+df.iloc[0,0] | First element of first column
+
+Data Cleaning
+Use these commands to perform a variety of data cleaning tasks.
+
+df.columns = ['a','b','c'] | Rename columns
+pd.isnull() | Checks for null Values, Returns Boolean Arrray
+pd.notnull() | Opposite of pd.isnull()
+df.dropna() | Drop all rows that contain null values
+df.dropna(axis=1) | Drop all columns that contain null values
+df.dropna(axis=1,thresh=n) | Drop all rows have have less than n non null values
+df.fillna(x) | Replace all null values with x
+s.fillna(s.mean()) | Replace all null values with the mean (mean can be replaced with almost any function from the statistics module)
+s.astype(float) | Convert the datatype of the series to float
+s.replace(1,'one') | Replace all values equal to 1 with 'one'
+s.replace([1,3],['one','three']) | Replace all 1 with 'one' and 3 with 'three'
+df.rename(columns=lambda x: x + 1) | Mass renaming of columns
+df.rename(columns={'old_name': 'new_ name'}) | Selective renaming
+df.set_index('column_one') | Change the index
+df.rename(index=lambda x: x + 1) | Mass renaming of index
+
+Filter, Sort, and Groupby
+Use these commands to filter, sort, and group your data.
+
+df[df[col] > 0.5] | Rows where the column col is greater than 0.5
+df[(df[col] > 0.5) & (df[col] < 0.7)] | Rows where 0.7 > col > 0.5
+df.sort_values(col1) | Sort values by col1 in ascending order
+df.sort_values(col2,ascending=False) | Sort values by col2 in descending order
+df.sort_values([col1,col2],ascending=[True,False]) | Sort values by col1 in ascending order then col2 in descending order
+df.groupby(col) | Returns a groupby object for values from one column
+df.groupby([col1,col2]) | Returns groupby object for values from multiple columns
+df.groupby(col1)[col2] | Returns the mean of the values in col2, grouped by the values in col1 (mean can be replaced with almost any function from the statistics module)
+df.pivot_table(index=col1,values=[col2,col3],aggfunc=mean) | Create a pivot table that groups by col1 and calculates the mean of col2 and col3
+df.groupby(col1).agg(np.mean) | Find the average across all columns for every unique col1 group
+df.apply(np.mean) | Apply the function np.mean() across each column
+nf.apply(np.max,axis=1) | Apply the function np.max() across each row
+
+Join/Combine
+Use these commands to combine multiple dataframes into a single one.
+
+df1.append(df2) | Add the rows in df1 to the end of df2 (columns should be identical)
+pd.concat([df1, df2],axis=1) | Add the columns in df1 to the end of df2 (rows should be identical)
+df1.join(df2,on=col1,how='inner') | SQL-style join the columns in df1 with the columns on df2 where the rows for col have identical values. 'how' can be one of 'left', 'right', 'outer', 'inner'
+
+Statistics
+Use these commands to perform various statistical tests. (These can all be applied to a series as well.)
+
+
+df.describe() | Summary statistics for numerical columns
+df.mean() | Returns the mean of all columns
+df.corr() | Returns the correlation between columns in a DataFrame
+df.count() | Returns the number of non-null values in each DataFrame column
+df.max() | Returns the highest value in each column
+df.min() | Returns the lowest value in each column
+df.median() | Returns the median of each column
+df.std() | Returns the standard deviation of each column
\ No newline at end of file
diff --git a/TrainingModel.txt b/TrainingModel.txt
new file mode 100644
index 0000000..0798803
--- /dev/null
+++ b/TrainingModel.txt
@@ -0,0 +1,75 @@
+Gaussian Naive Bayes
+GaussianNB implements the Gaussian Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian:
+
+ 
+ 
+The parameters 
+ and 
+ are estimated using maximum likelihood.
+
+>>>
+>>> from sklearn.datasets import load_iris
+>>> from sklearn.model_selection import train_test_split
+>>> from sklearn.naive_bayes import GaussianNB
+>>> X, y = load_iris(return_X_y=True)
+>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
+>>> gnb = GaussianNB()
+>>> y_pred = gnb.fit(X_train, y_train).predict(X_test)
+>>> print("Number of mislabeled points out of a total %d points : %d"
+...       % (X_test.shape[0], (y_test != y_pred).sum()))
+Number of mislabeled points out of a total 75 points : 4
+
+
+GPR with noise-level estimation
+This example illustrates that GPR with a sum-kernel including a WhiteKernel can estimate the noise level of data. An illustration of the log-marginal-likelihood (LML) landscape shows that there exist two local maxima of LML.
+
+../_images/sphx_glr_plot_gpr_noisy_0011.png
+The first corresponds to a model with a high noise level and a large length scale, which explains all variations in the data by noise.
+
+../_images/sphx_glr_plot_gpr_noisy_0021.png
+The second one has a smaller noise level and shorter length scale, which explains most of the variation by the noise-free functional relationship. The second model has a higher likelihood; however, depending on the initial value for the hyperparameters, the gradient-based optimization might also converge to the high-noise solution. It is thus important to repeat the optimization several times for different initializations.
+
+../_images/sphx_glr_plot_gpr_noisy_0031.png
+1.7.2.2. Comparison of GPR and Kernel Ridge Regression
+Both kernel ridge regression (KRR) and GPR learn a target function by employing internally the “kernel trick”. KRR learns a linear function in the space induced by the respective kernel which corresponds to a non-linear function in the original space. The linear function in the kernel space is chosen based on the mean-squared error loss with ridge regularization. GPR uses the kernel to define the covariance of a prior distribution over the target functions and uses the observed training data to define a likelihood function. Based on Bayes theorem, a (Gaussian) posterior distribution over target functions is defined, whose mean is used for prediction.
+
+A major difference is that GPR can choose the kernel’s hyperparameters based on gradient-ascent on the marginal likelihood function while KRR needs to perform a grid search on a cross-validated loss function (mean-squared error loss). A further difference is that GPR learns a generative, probabilistic model of the target function and can thus provide meaningful confidence intervals and posterior samples along with the predictions while KRR only provides predictions.
+
+The following figure illustrates both methods on an artificial dataset, which consists of a sinusoidal target function and strong noise. The figure compares the learned model of KRR and GPR based on a ExpSineSquared kernel, which is suited for learning periodic functions. The kernel’s hyperparameters control the smoothness (length_scale) and periodicity of the kernel (periodicity). Moreover, the noise level of the data is learned explicitly by GPR by an additional WhiteKernel component in the kernel and by the regularization parameter alpha of KRR.
+
+../_images/sphx_glr_plot_compare_gpr_krr_0011.png
+The figure shows that both methods learn reasonable models of the target function. GPR correctly identifies the periodicity of the function to be roughly  (6.28), while KRR chooses the doubled periodicity  . Besides that, GPR provides reasonable confidence bounds on the prediction which are not available for KRR. A major difference between the two methods is the time required for fitting and predicting: while fitting KRR is fast in principle, the grid-search for hyperparameter optimization scales exponentially with the number of hyperparameters (“curse of dimensionality”). The gradient-based optimization of the parameters in GPR does not suffer from this exponential scaling and is thus considerable faster on this example with 3-dimensional hyperparameter space. The time for predicting is similar; however, generating the variance of the predictive distribution of GPR takes considerable longer than just predicting the mean.
+
+1.7.2.3. GPR on Mauna Loa CO2 data
+This example is based on Section 5.4.3 of [RW2006]. It illustrates an example of complex kernel engineering and hyperparameter optimization using gradient ascent on the log-marginal-likelihood. The data consists of the monthly average atmospheric CO2 concentrations (in parts per million by volume (ppmv)) collected at the Mauna Loa Observatory in Hawaii, between 1958 and 1997. The objective is to model the CO2 concentration as a function of the time t.
+
+The kernel is composed of several terms that are responsible for explaining different properties of the signal:
+
+a long term, smooth rising trend is to be explained by an RBF kernel. The RBF kernel with a large length-scale enforces this component to be smooth; it is not enforced that the trend is rising which leaves this choice to the GP. The specific length-scale and the amplitude are free hyperparameters.
+
+a seasonal component, which is to be explained by the periodic ExpSineSquared kernel with a fixed periodicity of 1 year. The length-scale of this periodic component, controlling its smoothness, is a free parameter. In order to allow decaying away from exact periodicity, the product with an RBF kernel is taken. The length-scale of this RBF component controls the decay time and is a further free parameter.
+
+smaller, medium term irregularities are to be explained by a RationalQuadratic kernel component, whose length-scale and alpha parameter, which determines the diffuseness of the length-scales, are to be determined. According to [RW2006], these irregularities can better be explained by a RationalQuadratic than an RBF kernel component, probably because it can accommodate several length-scales.
+
+a “noise” term, consisting of an RBF kernel contribution, which shall explain the correlated noise components such as local weather phenomena, and a WhiteKernel contribution for the white noise. The relative amplitudes and the RBF’s length scale are further free parameters.
+
+Maximizing the log-marginal-likelihood after subtracting the target’s mean yields the following kernel with an LML of -83.214:
+
+34.4**2 * RBF(length_scale=41.8)
++ 3.27**2 * RBF(length_scale=180) * ExpSineSquared(length_scale=1.44,
+                                                   periodicity=1)
++ 0.446**2 * RationalQuadratic(alpha=17.7, length_scale=0.957)
++ 0.197**2 * RBF(length_scale=0.138) + WhiteKernel(noise_level=0.0336)
+Thus, most of the target signal (34.4ppm) is explained by a long-term rising trend (length-scale 41.8 years). The periodic component has an amplitude of 3.27ppm, a decay time of 180 years and a length-scale of 1.44. The long decay time indicates that we have a locally very close to periodic seasonal component. The correlated noise has an amplitude of 0.197ppm with a length scale of 0.138 years and a white-noise contribution of 0.197ppm. Thus, the overall noise level is very small, indicating that the data can be very well explained by the model. The figure shows also that the model makes very confident predictions until around 2015
+
+../_images/sphx_glr_plot_gpr_co2_0011.png
+1.7.3. Gaussian Process Classification (GPC)
+The GaussianProcessClassifier implements Gaussian processes (GP) for classification purposes, more specifically for probabilistic classification, where test predictions take the form of class probabilities. GaussianProcessClassifier places a GP prior on a latent function , which is then squashed through a link function to obtain the probabilistic classification. The latent function  is a so-called nuisance function, whose values are not observed and are not relevant by themselves. Its purpose is to allow a convenient formulation of the model, and  is removed (integrated out) during prediction. GaussianProcessClassifier implements the logistic link function, for which the integral cannot be computed analytically but is easily approximated in the binary case.
+
+In contrast to the regression setting, the posterior of the latent function  is not Gaussian even for a GP prior since a Gaussian likelihood is inappropriate for discrete class labels. Rather, a non-Gaussian likelihood corresponding to the logistic link function (logit) is used. GaussianProcessClassifier approximates the non-Gaussian posterior with a Gaussian based on the Laplace approximation. More details can be found in Chapter 3 of [RW2006].
+
+The GP prior mean is assumed to be zero. The prior’s covariance is specified by passing a kernel object. The hyperparameters of the kernel are optimized during fitting of GaussianProcessRegressor by maximizing the log-marginal-likelihood (LML) based on the passed optimizer. As the LML may have multiple local optima, the optimizer can be started repeatedly by specifying n_restarts_optimizer. The first run is always conducted starting from the initial hyperparameter values of the kernel; subsequent runs are conducted from hyperparameter values that have been chosen randomly from the range of allowed values. If the initial hyperparameters should be kept fixed, None can be passed as optimizer.
+
+GaussianProcessClassifier supports multi-class classification by performing either one-versus-rest or one-versus-one based training and prediction. In one-versus-rest, one binary Gaussian process classifier is fitted for each class, which is trained to separate this class from the rest. In “one_vs_one”, one binary Gaussian process classifier is fitted for each pair of classes, which is trained to separate these two classes. The predictions of these binary predictors are combined into multi-class predictions. See the section on multi-class classification for more details.
+
+In the case of Gaussian process classification, “one_vs_one” might be computationally cheaper since it has to solve many problems involving only a subset of the whole training set rather than fewer problems on the whole dataset. Since Gaussian process classification scales cubically with the size of the dataset, this might be considerably faster. However, note that “one_vs_one” does not support predicting probability estimates but only plain predictions. Moreover, note that GaussianProcessClassifier does not (yet) implement a true multi-class Laplace approximation internally, but as discussed above is based on solving several binary classification tasks internally, which are combined using one-versus-rest or one-versus-one.
\ No newline at end of file