-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmain_project.html
202 lines (191 loc) · 9.09 KB
/
main_project.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
<!DOCTYPE HTML>
<!--
Massively by HTML5 UP
html5up.net | @ajlkn
Free for personal and commercial use under the CCA 3.0 license (html5up.net/license)
-->
<html>
<head>
<title>Predicting Insurance Costs</title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no" />
<link rel="stylesheet" href="assets/css/main.css" />
<noscript><link rel="stylesheet" href="assets/css/noscript.css" /></noscript>
</head>
<body class="is-preload">
<!-- Wrapper -->
<div id="wrapper">
<!-- Header -->
<header id="header">
<a href="" class="logo">Python Project</a>
</header>
<!-- Nav -->
<nav id="nav">
<ul class="links">
<li><a href="index.html">HOME</a></li>
<!-- <li class="active"><a href="generic.html">Generic Page</a></li>
<li><a href="elements.html">Elements Reference</a></li> -->
</ul>
<ul class="icons">
<!-- <li><a href="#" class="icon brands fa-twitter"><span class="label">Twitter</span></a></li>
<li><a href="#" class="icon brands fa-facebook-f"><span class="label">Facebook</span></a></li> -->
<li><a href="https://www.linkedin.com/in/mcbrownmwale" target="_blank" class="icon brands fa-linkedin"><span class="label">Linkedin</span></a></li>
<li><a href="https://github.com/mcbrownmwale" target="_blank" class="icon brands fa-github"><span class="label">GitHub</span></a></li>
</ul>
</nav>
<!-- Main -->
<div id="main">
<!-- Post -->
<section class="post">
<header class="major">
<span class="date">February 01, 2025</span>
<h2>Predicting Insurance Costs</h2>
<p><b>Tools</b>: Python [ Pandas, Numpy, Seaborn, Scikit Learn]</p>
</header>
<hr style="height: 2px; border-width: 0; color:rgba(190, 188, 188, 0.986); background-color: rgba(190, 188, 188, 0.986);">
<!--<div class="image main"><img src="images/analytics 1.jfif" alt="" /></div> -->
<h3>1. Introduction</h3>
<p>This project aims to predict individual medical insurance costs using the Medical
Costs dataset from Kaggle. The dataset contains demographic and personal
characteristics associated with insurance charges. The primary objective is to
build a predictive model that accurately estimates these costs based on available
patient data.</p>
<h4>1.1 Problem Statement</h4>
<p>For our special problem, we were interested in how these different characteristics relate
to the total medical cost. We constructed the best possible predictive model
for the cost given some information about the patient.
</p>
<hr style="height: 2px; border-width: 0; color:rgba(190, 188, 188, 0.986); background-color: rgba(190, 188, 188, 0.986);">
<h3>2. Methodology</h3>
<h4>2.1 Data Exploration and Preparation</h4>
<h4><em>Checking for Missing Data Points</em></h4>
<p>A review of the dataset confirmed that there were no missing values, ensuring data completeness</p>
<h4><em>Identifying and Removing Duplicates</em></h4>
<p>One duplicate record was found and removed to maintain data integrity.</p>
<h4><em>Correlation Analysis</em></h4>
<p style="line-height: 0%;">A correlation Analysis revealed that:</p>
<ul>
<li>Age and insurance charges had a moderate positive correlation (~30%).</li>
<li>BMI and number of children had weak correlations with insurance charges.</li>
<li>The distribution of charges was right-skewed, prompting a log transformation to normalize the data.</li>
<!--<li>-</li>
<li>-</li>-->
</ul>
<h4><em>Feature Engineering</em></h4>
<ul>
<li>A new column, <b>log_charges</b>, was created by applying a logarithmic transformation to the <b>charges</b> column.</li>
<li>The categorical <b>smoker</b> variable was converted into a binary format (1 for smokers, 0 for non-smokers).</li>
<!--<li>-</li>
<li>-</li>
<li>-</li>-->
</ul>
<h4>2.2 Model Building</h4>
<h4><em>Data Splitting</em></h4>
<ul>
<li>The dataset was split into training <b>(80%)</b> and testing <b>(20%)</b> sets.</li>
<li>Features selected for the model: <b>age</b> and <b>smoker_yes</b>.</li>
</ul>
<h4><em>Selection and Training</em></h4>
<p>A <b>Linear Regression</b> model was trained using the selected features.</p>
<h4><em>Model Evaluation</em></h4>
<ul>
<li><b>R-Squared value (Goodness of fit)</b>
<ul>
<li>Training data: <b>72.19%</b></li>
<li>Testing data: <b>79.97%</b></li>
</ul>
<li>The model explains a significant portion of variance in insurance costs,
indicating a strong predictive capability.</li>
</ul>
<h4>2.3 Interpretation of Coefficients</h4>
<ul>
<li><b>Age:</b> A one-year increase in age results in a <b>5%</b> increase in insurance charges.</li>
<li><b>Smoking:</b> Being a smoker increases insurance charges by <b>822%</b>, highlighting the impact
of smoking on insurance costs.</li>
</ul>
<hr style="height: 2px; border-width: 0; color:rgba(190, 188, 188, 0.986); background-color: rgba(190, 188, 188, 0.986);">
<h3>3. Conclusion</h3>
<p>This project successfully developed a regression model that predicts insurance costs based on age and
smoking status. While the model performs well, further improvements can be achieved by incorporating
additional features such as BMI and region. Additionally, exploring alternative machine learning
models could enhance predictive performance.</p>
<!--<p><strong>Recommendations</strong></p>
<p>For our special problem, we were interested in how these different characteristics relate
to the total medical cost. We constructed the best possible predictive model
for the cost given some information about the patient.<ol>
<li>-</li>
<li>-</li>
<li>-</li>
<li>-</li>
<li>-</li>
</ol></p>-->
<p style="font-weight: bold; text-align: center;">For more details, please refer to the full analysis by clicking on the following button:</p>
<ul class="actions special">
<li><a href="https://github.com/mcbrownmwale/Data_Analytics_Project_3/blob/main/Predicting_Insurance_Costs.ipynb" class="button" target="_blank" style="background-color: rgb(0, 204, 255);">Go To Analysis</a></li>
</ul>
<hr style="height: 2px; border-width: 0; color:rgba(190, 188, 188, 0.986); background-color: rgba(190, 188, 188, 0.986);">
</section>
</div>
<!-- Footer -->
<footer id="footer">
<!--<section>
<form method="post" action="#">
<div class="fields">
<div class="field">
<label for="name">Name</label>
<input type="text" name="name" id="name" />
</div>
<div class="field">
<label for="email">Email</label>
<input type="text" name="email" id="email" />
</div>
<div class="field">
<label for="message">Message</label>
<textarea name="message" id="message" rows="3"></textarea>
</div>
</div>
<ul class="actions">
<li><input type="submit" value="Send Message" /></li>
</ul>
</form>
</section>-->
<section class="split contact">
<section class="alt">
<h3>Address</h3>
<p>Kasiwa Academy<br />
Lilongwe, Malawi</p>
</section>
<section>
<h3>Phone</h3>
<p><a href="#">+265 991 149 241 <br /> +265 888 177 387</a></p>
</section>
<section>
<h3>Email</h3>
<p><a href="#">[email protected]</a></p>
</section>
<section>
<h3>Social</h3>
<ul class="icons alt">
<!--<li><a href="#" class="icon brands alt fa-twitter"><span class="label">Twitter</span></a></li>
<li><a href="#" class="icon brands alt fa-facebook-f"><span class="label">Facebook</span></a></li>-->
<li><a href="https://www.linkedin.com/in/mcbrownmwale" target="_blank" class="icon brands alt fa-linkedin"><span class="label">fa-linkedin</span></a></li>
<li><a href="https://github.com/mcbrownmwale" target="_blank" class="icon brands alt fa-github"><span class="label">GitHub</span></a></li>
</ul>
</section>
</section>
</footer>
<!-- Copyright
<div id="copyright">
<ul><li>© Untitled</li><li>Design: <a href="https://html5up.net">HTML5 UP</a></li></ul>
</div> -->
</div>
<!-- Scripts -->
<script src="assets/js/jquery.min.js"></script>
<script src="assets/js/jquery.scrollex.min.js"></script>
<script src="assets/js/jquery.scrolly.min.js"></script>
<script src="assets/js/browser.min.js"></script>
<script src="assets/js/breakpoints.min.js"></script>
<script src="assets/js/util.js"></script>
<script src="assets/js/main.js"></script>
</body>
</html>