-
Notifications
You must be signed in to change notification settings - Fork 0
/
ClassifAI_ 2 - KNN Part 2
1 lines (1 loc) · 11 KB
/
ClassifAI_ 2 - KNN Part 2
1
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"ClassifAI: 2 - KNN Part 2","provenance":[{"file_id":"1aVu1mFabhsDag5dDR3_B_RZtvZnOw-hF","timestamp":1653851833704}],"collapsed_sections":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"JYs4rkgADcRi"},"source":["###KNN is a classifier machine learning technique where the model uses datapoints close to the input and predicts an output\n","In this section we will go over an abstracted version of running a KNN model using sklearn."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"PCd0keB8C9GB","executionInfo":{"status":"ok","timestamp":1660432220967,"user_tz":420,"elapsed":2750,"user":{"displayName":"Leo Huang","userId":"16558901284710269921"}},"outputId":"d534793f-a5e5-49bb-f10b-6c96e16ab2af"},"source":["import sklearn\n","from sklearn.neighbors import KNeighborsClassifier\n","from sklearn import neighbors,linear_model, preprocessing\n","from sklearn.model_selection import train_test_split\n","import pandas as pd\n","from math import *\n","\n","url = \"https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv\"\n","names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']\n","dataset = pd.read_csv(url, names=names)\n","dataset.head()"],"execution_count":1,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" sepal_length sepal_width petal_length petal_width class\n","0 5.1 3.5 1.4 0.2 Iris-setosa\n","1 4.9 3.0 1.4 0.2 Iris-setosa\n","2 4.7 3.2 1.3 0.2 Iris-setosa\n","3 4.6 3.1 1.5 0.2 Iris-setosa\n","4 5.0 3.6 1.4 0.2 Iris-setosa"],"text/html":["\n"," <div id=\"df-2f3a60ed-316e-46b1-b2ff-9a75edabd968\">\n"," <div class=\"colab-df-container\">\n"," <div>\n","<style scoped>\n"," .dataframe tbody tr th:only-of-type {\n"," vertical-align: middle;\n"," }\n","\n"," .dataframe tbody tr th {\n"," vertical-align: top;\n"," }\n","\n"," .dataframe thead th {\n"," text-align: right;\n"," }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n"," <thead>\n"," <tr style=\"text-align: right;\">\n"," <th></th>\n"," <th>sepal_length</th>\n"," <th>sepal_width</th>\n"," <th>petal_length</th>\n"," <th>petal_width</th>\n"," <th>class</th>\n"," </tr>\n"," </thead>\n"," <tbody>\n"," <tr>\n"," <th>0</th>\n"," <td>5.1</td>\n"," <td>3.5</td>\n"," <td>1.4</td>\n"," <td>0.2</td>\n"," <td>Iris-setosa</td>\n"," </tr>\n"," <tr>\n"," <th>1</th>\n"," <td>4.9</td>\n"," <td>3.0</td>\n"," <td>1.4</td>\n"," <td>0.2</td>\n"," <td>Iris-setosa</td>\n"," </tr>\n"," <tr>\n"," <th>2</th>\n"," <td>4.7</td>\n"," <td>3.2</td>\n"," <td>1.3</td>\n"," <td>0.2</td>\n"," <td>Iris-setosa</td>\n"," </tr>\n"," <tr>\n"," <th>3</th>\n"," <td>4.6</td>\n"," <td>3.1</td>\n"," <td>1.5</td>\n"," <td>0.2</td>\n"," <td>Iris-setosa</td>\n"," </tr>\n"," <tr>\n"," <th>4</th>\n"," <td>5.0</td>\n"," <td>3.6</td>\n"," <td>1.4</td>\n"," <td>0.2</td>\n"," <td>Iris-setosa</td>\n"," </tr>\n"," </tbody>\n","</table>\n","</div>\n"," <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-2f3a60ed-316e-46b1-b2ff-9a75edabd968')\"\n"," title=\"Convert this dataframe to an interactive table.\"\n"," style=\"display:none;\">\n"," \n"," <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n"," width=\"24px\">\n"," <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n"," <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n"," </svg>\n"," </button>\n"," \n"," <style>\n"," .colab-df-container {\n"," display:flex;\n"," flex-wrap:wrap;\n"," gap: 12px;\n"," }\n","\n"," .colab-df-convert {\n"," background-color: #E8F0FE;\n"," border: none;\n"," border-radius: 50%;\n"," cursor: pointer;\n"," display: none;\n"," fill: #1967D2;\n"," height: 32px;\n"," padding: 0 0 0 0;\n"," width: 32px;\n"," }\n","\n"," .colab-df-convert:hover {\n"," background-color: #E2EBFA;\n"," box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n"," fill: #174EA6;\n"," }\n","\n"," [theme=dark] .colab-df-convert {\n"," background-color: #3B4455;\n"," fill: #D2E3FC;\n"," }\n","\n"," [theme=dark] .colab-df-convert:hover {\n"," background-color: #434B5C;\n"," box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n"," filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n"," fill: #FFFFFF;\n"," }\n"," </style>\n","\n"," <script>\n"," const buttonEl =\n"," document.querySelector('#df-2f3a60ed-316e-46b1-b2ff-9a75edabd968 button.colab-df-convert');\n"," buttonEl.style.display =\n"," google.colab.kernel.accessAllowed ? 'block' : 'none';\n","\n"," async function convertToInteractive(key) {\n"," const element = document.querySelector('#df-2f3a60ed-316e-46b1-b2ff-9a75edabd968');\n"," const dataTable =\n"," await google.colab.kernel.invokeFunction('convertToInteractive',\n"," [key], {});\n"," if (!dataTable) return;\n","\n"," const docLinkHtml = 'Like what you see? Visit the ' +\n"," '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n"," + ' to learn more about interactive tables.';\n"," element.innerHTML = '';\n"," dataTable['output_type'] = 'display_data';\n"," await google.colab.output.renderOutput(dataTable, element);\n"," const docLink = document.createElement('div');\n"," docLink.innerHTML = docLinkHtml;\n"," element.appendChild(docLink);\n"," }\n"," </script>\n"," </div>\n"," </div>\n"," "]},"metadata":{},"execution_count":1}]},{"cell_type":"markdown","metadata":{"id":"0QphJgahAwNd"},"source":["X is a 4-int array signifying the sepal length, sepal width, petal length, and petal width.\n","Y is the class of flower\n","\n","The pandas syntax is slightly different than numpy. `[:, value]` means copy the of values in a column, from `0:value`.\n","\n","HINT FOR HW: THE NUMBER OF VALUES IN X SHOULD BE THE SAME AS THE NUMBER OF VALUES FOR Y"]},{"cell_type":"code","metadata":{"id":"sCNklkn9w2Hf","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1660432283583,"user_tz":420,"elapsed":185,"user":{"displayName":"Leo Huang","userId":"16558901284710269921"}},"outputId":"73d140f0-8c7a-49e4-d02c-8782b026dc58"},"source":["X = dataset.iloc[:, :-1].values\n","y = dataset.iloc[:, 4].values\n","print(X[0:5])"],"execution_count":3,"outputs":[{"output_type":"stream","name":"stdout","text":["[[5.1 3.5 1.4 0.2]\n"," [4.9 3. 1.4 0.2]\n"," [4.7 3.2 1.3 0.2]\n"," [4.6 3.1 1.5 0.2]\n"," [5. 3.6 1.4 0.2]]\n"]}]},{"cell_type":"markdown","metadata":{"id":"GlWVey-vBDVO"},"source":["Now we will split the dataset using a sklearn function `train_test_split` which takes in X (values), y, (class), and test_size (the percentage of data you want to test the data vs the percentage of data you want to train the data)"]},{"cell_type":"code","metadata":{"id":"7xIoDR8LxgKs","executionInfo":{"status":"ok","timestamp":1660432284401,"user_tz":420,"elapsed":4,"user":{"displayName":"Leo Huang","userId":"16558901284710269921"}}},"source":["X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.40)"],"execution_count":4,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"zPla0GbqBoED"},"source":["Models in sklearn are as simple as `KNeighborsClassifier` that take in an input of the number of neighbors it wants to check"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"etEUgoc6xjts","executionInfo":{"status":"ok","timestamp":1660432289438,"user_tz":420,"elapsed":139,"user":{"displayName":"Leo Huang","userId":"16558901284710269921"}},"outputId":"c53f6255-4aed-4679-c3da-f02f115495cb"},"source":["model = KNeighborsClassifier(n_neighbors = 9)\n","model.fit(X_train, y_train)"],"execution_count":5,"outputs":[{"output_type":"execute_result","data":{"text/plain":["KNeighborsClassifier(n_neighbors=9)"]},"metadata":{},"execution_count":5}]},{"cell_type":"markdown","metadata":{"id":"0MubGuveB0JP"},"source":["Now `y_pred` is a variable that stores the prediction of the model. `train_test_split` will take care of randomly and uniformly splitting up the data for the sklearn model\n"]},{"cell_type":"code","metadata":{"id":"cIpXisfexpIr","executionInfo":{"status":"ok","timestamp":1660432309111,"user_tz":420,"elapsed":157,"user":{"displayName":"Leo Huang","userId":"16558901284710269921"}}},"source":["y_pred = model.predict(X_test)"],"execution_count":6,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"eSxe_6LdDHc4"},"source":["We can see the results of our model and see that the model is a good fit for the data. Try playing around and seeing if changing the number of neighbors drastically changes accuracy."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"qF-34q2ixpzu","executionInfo":{"status":"ok","timestamp":1660432310693,"user_tz":420,"elapsed":146,"user":{"displayName":"Leo Huang","userId":"16558901284710269921"}},"outputId":"b176ac0a-17ab-4f35-db45-25bf29a3f935"},"source":["from sklearn.metrics import classification_report, accuracy_score\n","result1 = classification_report(y_test, y_pred)\n","print(\"Classification Report:\",)\n","print (result1)\n","result2 = accuracy_score(y_test,y_pred)\n","print(\"Accuracy:\",result2)"],"execution_count":7,"outputs":[{"output_type":"stream","name":"stdout","text":["Classification Report:\n"," precision recall f1-score support\n","\n"," Iris-setosa 1.00 1.00 1.00 19\n","Iris-versicolor 0.94 0.89 0.92 19\n"," Iris-virginica 0.91 0.95 0.93 22\n","\n"," accuracy 0.95 60\n"," macro avg 0.95 0.95 0.95 60\n"," weighted avg 0.95 0.95 0.95 60\n","\n","Accuracy: 0.95\n"]}]},{"cell_type":"code","source":[""],"metadata":{"id":"ZJSfVGZs5PWN"},"execution_count":null,"outputs":[]}]}