-
Notifications
You must be signed in to change notification settings - Fork 0
/
430276677.html
153 lines (140 loc) · 42.6 KB
/
430276677.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
<!DOCTYPE html>
<html>
<head>
<title>Di Luo : Di Luo's CS231 Project 7</title>
<link rel="stylesheet" href="styles/site.css" type="text/css" />
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body class="theme-default aui-theme-default">
<div id="page">
<div id="main" class="aui-page-panel">
<div id="main-header">
<div id="breadcrumb-section">
<ol id="breadcrumbs">
<li class="first">
<span><a href="index.html">Di Luo</a></span>
</li>
<li>
<span><a href="413631037.html">Di Luo’s Home</a></span>
</li>
<li>
<span><a href="474513827.html">Di Luo's CS231</a></span>
</li>
</ol>
</div>
<h1 id="title-heading" class="pagetitle">
<span id="title-text">
Di Luo : Di Luo's CS231 Project 7
</span>
</h1>
</div>
<div id="content" class="view">
<div class="page-metadata">
Created by <span class='author'> Di Luo</span>, last modified on Apr 24, 2019
</div>
<div id="main-content" class="wiki-content group">
<p><strong><u>Description</u></strong></p><p>In this project, I implemented Hashmap and<span style="color: rgb(34,51,68);"> examined the efficiency of my Hashmap and my BSTMap when counting words in the Reddit comment files. The Hashmap's basic structure is an array with a key value pair in each entry and the Hashmap has a hash function, which decides the index that a key is going to have. </span>Then in order to see which one of BSTMap or Hashmap is a more effective data structure, I used the Wordcounter from last project, which analyzed text files by storing each word and its frequency as a keyvalue pair in either a BSTMap or a Hashmap. I analyzed each of the reddit comments files five times with both BSTMap and Hashmap and got the average run time, total word count, unique word count, the depth of the BSTMap and the number of collisions in the Hashmap.</p><p>Finally I used the data to analyze the efficiency of BSTMap and Hashmap. I found that if the hashTable in Hashmap doubles the size when the table is full, which is the case in the <strong>Solution</strong>, then BSTMap is more efficient than Hashmap; but if the hashTable in Hashmap doubles the size when the table is half full, which is the case in Extension 1, then the Hashmap is much more efficient than BSTMap.</p><p><span style="color: rgb(34,51,68);"><br/></span></p><p><strong><u><span style="color: rgb(34,51,68);">Solution</span></u></strong></p><p><span style="color: rgb(34,51,68);">First, I changed the WordCounter class and enable it to use different data structures depending on the type indicator. With different type value, the word counter can use Hashmap or BSTMap to analyze the file.</span></p><p><span style="color: rgb(34,51,68);"><br/></span></p><p><span style="color: rgb(34,51,68);">Second, I enabled my <span style="color: rgb(34,51,68);">WordCounter main function to calculate the amount of time it takes to analyze a file.</span></span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><br/></span></span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);">Third, I <span style="color: rgb(34,51,68);">enabled the WordCounter to analyze the input file five times. Record each of the 5 times, drop the low time, drop the high time, and compute the mean of the remaining three. If you wish, automate the process for all eight of the Reddit files. At the end of the analysis, the program will print out the average run time, the total number of words, and the number of unique words.</span></span></span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="500" src="attachments/430276677/430178361.png" data-image-src="attachments/430276677/430178361.png" data-unresolved-comment-count="0" data-linked-resource-id="430178361" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-21 at 8.13.22 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span></span></span></span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><br/></span></span></span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);">Fourth, I r</span></span></span>un the analysis using the BSTMap on all eight Reddit files and record the run-times, total words, total unique words, and the depth of my tree in a spreadsheet. The file size for each year was also added as a separate column. The result and the sheet are shown below.</p><p><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="600" src="attachments/430276677/430178363.png" data-image-src="attachments/430276677/430178363.png" data-unresolved-comment-count="0" data-linked-resource-id="430178363" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-21 at 8.42.08 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span></p><p><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="400" src="attachments/430276677/430178368.png" data-image-src="attachments/430276677/430178368.png" data-unresolved-comment-count="0" data-linked-resource-id="430178368" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-21 at 9.19.34 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span></p><p> </p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"> </span></span></span></p><p><span style="color: rgb(34,51,68);">Fifth, I r<span style="color: rgb(34,51,68);">epeated the process using the Hashtable data structure. The difference comparing with BSTMap is that there is no depth but the number of collisions. The sheet is also shown below.</span></span></p><p><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="400" src="attachments/430276677/430178369.png" data-image-src="attachments/430276677/430178369.png" data-unresolved-comment-count="0" data-linked-resource-id="430178369" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-21 at 9.28.27 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><br/></span></span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);">Sixth, I made graphs to show the results. </span></span></p><ol><li>Year versus run-time: <br/>x-axis: year; y-axis: run time.<br/><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="300" src="attachments/430276677/430178370.png" data-image-src="attachments/430276677/430178370.png" data-unresolved-comment-count="0" data-linked-resource-id="430178370" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-21 at 9.33.29 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span><br/>We can see that in the same year, Hashmap costs more time than BSTMap.<br/><br/></li><li>Total number of words versus run-time: <br/>x-axis: total number of words; y-axis: run time.<br/><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="300" src="attachments/430276677/430178371.png" data-image-src="attachments/430276677/430178371.png" data-unresolved-comment-count="0" data-linked-resource-id="430178371" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="image2019-4-21 21:56:40.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span><br/>We can see that for Hashmap and BSTMap, the total word count is proportional to run time in general. But we can also see that when the difference between total word count is not big, there is a zigzag shape. Still, BSTMap has better performance than Hashmap.<br/><br/></li><li>Unique words versus run-time: <br/>x-axis: the number of unique words; y-axis: run time.<br/><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="300" src="attachments/430276677/430178372.png" data-image-src="attachments/430276677/430178372.png" data-unresolved-comment-count="0" data-linked-resource-id="430178372" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="image2019-4-21 21:56:59.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span><br/>We can see that for Hashmap and BSTMap, the unique word count is proportional to run time in general. Still, BSTMap has better performance than Hashmap.</li></ol><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"> </span></span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><br/></span></span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);">Seventh, for BSTMap, I compared depth with run time; for Hashmap, I compared number of collisions with run time. Besides, I also compared run time with file size.</span></span></p><p> </p><p><strong><span style="color: rgb(34,51,68);">Q1: For the BSTMap, does the depth/height of the tree affect performance?</span></strong></p><p><span style="color: rgb(34,51,68);">No. We can see from the table below that depth is not proportional to run time.</span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="300" src="attachments/430276677/430178375.png" data-image-src="attachments/430276677/430178375.png" data-unresolved-comment-count="0" data-linked-resource-id="430178375" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-21 at 10.15.57 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span></span></span></span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><br/></span></span></span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"> </span></span></span></p><p><strong><span style="color: rgb(34,51,68);">Q2: For the Hashtable, does the number of collisions affect performance?</span></strong><span style="color: rgb(34,51,68);"> </span></p><p><span style="color: rgb(34,51,68);">Yes, we can see from the table below that the number of collisions is proportional to run time in general.</span></p><p><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="300" src="attachments/430276677/430178376.png" data-image-src="attachments/430276677/430178376.png" data-unresolved-comment-count="0" data-linked-resource-id="430178376" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-21 at 10.16.33 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"> </span></span></span></p><p> </p><p><span style="color: rgb(34,51,68);"><span>For the relationship between run time and file size, we can see from the table below that run time is proportional to the file size in general for both BSTMap and Hashmap.</span></span></p><p><span style="color: rgb(34,51,68);"><span><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="300" src="attachments/430276677/430178377.png" data-image-src="attachments/430276677/430178377.png" data-unresolved-comment-count="0" data-linked-resource-id="430178377" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="image2019-4-21 22:22:12.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span></span></span></p><p><span style="color: rgb(34,51,68);"><span><br/></span></span></p><p><span style="color: rgb(255,0,0);"><strong><span><span>Analysis continues in Extension 1.</span></span></strong></span></p><p><span style="color: rgb(34,51,68);"><span style="color: rgb(34,51,68);"><br/></span></span></p><p><strong><u><span style="color: rgb(34,51,68);">Extension</span></u></strong></p><p><strong><u><span style="color: rgb(34,51,68);"> </span></u></strong></p><p><em><span style="color: rgb(34,51,68);">Extension 1:</span></em></p><p><span style="color: rgb(34,51,68);">In order to improve the efficiency of Hashmap, I added a replacement for ensureCapacity() function to enable the hash map to expand when the hash table is half full, instead of full, which is the case I originally used. The result is shown below.</span></p><p><span style="color: rgb(34,51,68);"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="400" src="attachments/430276677/430178621.png" data-image-src="attachments/430276677/430178621.png" data-unresolved-comment-count="0" data-linked-resource-id="430178621" data-linked-resource-version="2" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-22 at 4.40.27 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span></span></p><p><span style="color: rgb(34,51,68);"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="400" src="attachments/430276677/430178369.png" data-image-src="attachments/430276677/430178369.png" data-unresolved-comment-count="0" data-linked-resource-id="430178369" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-21 at 9.28.27 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span>(old version)</span></p><p><span style="color: rgb(34,51,68);"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="400" src="attachments/430276677/430178368.png" data-image-src="attachments/430276677/430178368.png" data-unresolved-comment-count="0" data-linked-resource-id="430178368" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-21 at 9.19.34 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span>(BSTMap)</span></p><p><span style="color: rgb(34,51,68);"> </span></p><ol><li>Year versus run-time: <br/>x-axis: year; y-axis: run time.<br/><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="300" src="attachments/430276677/430997756.png" data-image-src="attachments/430276677/430997756.png" data-unresolved-comment-count="0" data-linked-resource-id="430997756" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-24 at 8.21.42 AM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span><br/>We can see that in the same year, the new Hashmap has better efficiency.<br/><br/></li><li>Total number of words versus run-time: <br/>x-axis: total number of words; y-axis: run time.<br/><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="300" src="attachments/430276677/430997757.png" data-image-src="attachments/430276677/430997757.png" data-unresolved-comment-count="0" data-linked-resource-id="430997757" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-24 at 8.23.10 AM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span><br/>We can see that for Hashmap and BSTMap, the total word count is proportional to run time in general. But we can also see that when the difference between total word count is not big, there is a zigzag shape. Still, the new Hashmap has better performance than BSTMap.<br/><br/></li><li>Unique words versus run-time: <br/>x-axis: the number of unique words; y-axis: run time.<br/><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="300" src="attachments/430276677/430997758.png" data-image-src="attachments/430276677/430997758.png" data-unresolved-comment-count="0" data-linked-resource-id="430997758" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-24 at 8.23.27 AM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span><br/>We can see that for Hashmap and BSTMap, the unique word count is proportional to run time in general. Still, the new Hashmap has better performance than BSTMap.</li></ol><p><span style="color: rgb(34,51,68);"><br/></span></p><p><span style="color: rgb(34,51,68);">We can see that the new version of Hashmap is much more efficient than the old version, and even more efficient than BSTMap. Besides, we can also see that the number of collisions in the new version is also much smaller than those in the old version, which strengthens our conclusion that number of collisions is proportional to the run time.</span></p><p><span style="color: rgb(34,51,68);">So now we can make a conclusion for our analysis: if the hashTable in Hashmap doubles the size when the table is full, which is the case in the <strong>Solution</strong>, then BSTMap is more efficient than Hashmap; but if the hashTable in Hashmap doubles the size when the table is half full, which is the case in Extension 1, then the Hashmap is much more efficient than BSTMap. So in the next few extensions, I will continue using the new version of Hashmap.</span></p><p><strong><u><span style="color: rgb(34,51,68);"><br/></span></u></strong></p><p><span style="color: rgb(34,51,68);">-</span></p><p><span style="color: rgb(34,51,68);">Extension 2:</span></p><p><span style="color: rgb(34,51,68);">I created some new smaller files, which are parts of reddit comments files, to analysis, including snippet_a.txt and snippet_b.txt. Then I analyzed them with Hashmap and BSTMap. Since the run time is too small, I changed from s to ms. The result is shown below.</span></p><p><strong><u><span style="color: rgb(34,51,68);"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="500" src="attachments/430276677/430178638.png" data-image-src="attachments/430276677/430178638.png" data-unresolved-comment-count="0" data-linked-resource-id="430178638" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-22 at 5.10.43 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span></span></u></strong></p><p><span style="color: rgb(34,51,68);">We can find that the new Hashmap is still more efficient than BSTMap in run time.</span></p><p><span style="color: rgb(34,51,68);"><br/></span></p><p>-</p><p><span style="color: rgb(34,51,68);"><span>Extension 3:</span></span></p><p><span style="color: rgb(34,51,68);"><span>I implemented a new hash function with </span></span>Math.floorMod(), which is a built-in math function in java which returns the floor modulus of the integer arguments passed to it.</p><p><span style="color: rgb(34,51,68);"><span><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="300" src="attachments/430276677/430178362.png" data-image-src="attachments/430276677/430178362.png" data-unresolved-comment-count="0" data-linked-resource-id="430178362" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-21 at 8.41.35 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span></span></span></p><p><span style="color: rgb(34,51,68);"><span>Then I compared it with the original hash function. The result is shown below.</span></span></p><p><span style="color: rgb(34,51,68);"><span>2008:</span></span></p><p><span style="color: rgb(34,51,68);"><span><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="500" src="attachments/430276677/430178640.png" data-image-src="attachments/430276677/430178640.png" data-unresolved-comment-count="0" data-linked-resource-id="430178640" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-22 at 5.16.39 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span>(new hash function)</span></span></p><p><span style="color: rgb(34,51,68);"><span><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="500" src="attachments/430276677/430178641.png" data-image-src="attachments/430276677/430178641.png" data-unresolved-comment-count="0" data-linked-resource-id="430178641" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-22 at 5.17.18 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span>(old hash function)</span></span></p><p><span style="color: rgb(34,51,68);"><span>2009:</span></span></p><p><span style="color: rgb(34,51,68);"><span><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="500" src="attachments/430276677/430178642.png" data-image-src="attachments/430276677/430178642.png" data-unresolved-comment-count="0" data-linked-resource-id="430178642" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-22 at 5.17.45 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span>(new hash function)</span></span></p><p><span style="color: rgb(34,51,68);"><span><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="500" src="attachments/430276677/430178643.png" data-image-src="attachments/430276677/430178643.png" data-unresolved-comment-count="0" data-linked-resource-id="430178643" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-22 at 5.17.36 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span>(old hash function)</span></span></p><p><span style="color: rgb(34,51,68);"><span>We can see that the new hash function is a little less efficient than the old hash function in run time.</span></span></p><p><span style="color: rgb(34,51,68);"><span><br/></span></span></p><p><span style="color: rgb(34,51,68);"><span>-</span></span></p><p><span style="color: rgb(34,51,68);"><span>Extension 4: </span></span></p><p><span style="color: rgb(34,51,68);"><span>I implemented another Hashmap: Hashmap2, which uses open hash table with BSTMap in each entry. The new data structure can be accessed with type variable 2. I tried to use it to analyze the reddit comments file on 2008, but it took too much time (more than 10 minutes) to complete the analysis. So I stopped the execution and tried to analyze smaller files, such as snippet_a and snippet_b. The result is shown below.</span></span></p><p><span style="color: rgb(34,51,68);"><span>snippet_a:</span></span></p><p><span style="color: rgb(34,51,68);"><span><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="500" src="attachments/430276677/430178644.png" data-image-src="attachments/430276677/430178644.png" data-unresolved-comment-count="0" data-linked-resource-id="430178644" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-22 at 5.24.41 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span>(hashmap2)</span></span></p><p><span style="color: rgb(34,51,68);"><span><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="500" src="attachments/430276677/430178645.png" data-image-src="attachments/430276677/430178645.png" data-unresolved-comment-count="0" data-linked-resource-id="430178645" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-22 at 5.24.07 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span>(hashmap)</span></span></p><p><span style="color: rgb(34,51,68);"><span>snippet_b:</span></span></p><p><span style="color: rgb(34,51,68);"><span><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="500" src="attachments/430276677/430178646.png" data-image-src="attachments/430276677/430178646.png" data-unresolved-comment-count="0" data-linked-resource-id="430178646" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-22 at 5.25.09 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span>(hashmap2)</span></span></p><p><span style="color: rgb(34,51,68);"><span><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="500" src="attachments/430276677/430178647.png" data-image-src="attachments/430276677/430178647.png" data-unresolved-comment-count="0" data-linked-resource-id="430178647" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-22 at 5.24.18 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span>(hashmap)</span></span></p><p><span style="color: rgb(34,51,68);"><span>We can see that Hashmap2 with open hash table is much less efficient than Hashmap with closed hash table.</span></span></p><p><span style="color: rgb(34,51,68);"><span><br/></span></span></p><p><span style="color: rgb(34,51,68);"><span>-</span></span></p><p><span style="color: rgb(34,51,68);"><span>Extension 5:</span></span></p><p><span style="color: rgb(34,51,68);"><span>I used scanner class to create some interaction between the user and the terminal.</span></span></p><p><span style="color: rgb(34,51,68);"><span><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="300" src="attachments/430276677/430178651.png" data-image-src="attachments/430276677/430178651.png" data-unresolved-comment-count="0" data-linked-resource-id="430178651" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-22 at 5.41.16 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span></span></span></p><p><span style="color: rgb(34,51,68);"><span><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" width="500" src="attachments/430276677/430178652.png" data-image-src="attachments/430276677/430178652.png" data-unresolved-comment-count="0" data-linked-resource-id="430178652" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Screen Shot 2019-04-22 at 5.40.31 PM.png" data-base-url="https://wiki.colby.edu" data-linked-resource-content-type="image/png" data-linked-resource-container-id="430276677" data-linked-resource-container-version="9"></span></span></span></p><p><span style="color: rgb(34,51,68);"><br/></span></p><p><strong><u><span style="color: rgb(34,51,68);">Reflection</span></u></strong></p><p><span style="color: rgb(34,51,68);">In this project, I learned how to use hash function to implement a Hashmap. By comparing the efficiency of different data structures, Hashmap and BSTmap in this case, I got clearer with the feature of each data structure and found out which data structure has better efficiency while dealing with big data, Reddit comment files in this case.</span></p><p><span style="color: rgb(34,51,68);"><br/></span></p><p><strong><u><span style="color: rgb(34,51,68);">Source</span></u></strong></p><p><span style="color: rgb(34,51,68);"><span>Thanks to Sihang Chen.</span></span></p><p><span style="color: rgb(34,51,68);"><span><a href="https://www.geeksforgeeks.org/math-floormod-method-in-java/" class="external-link" rel="nofollow">https://www.geeksforgeeks.org/math-floormod-method-in-java/</a></span></span></p>
</div>
<div class="pageSection group">
<div class="pageSectionHeader">
<h2 id="attachments" class="pageSectionTitle">Attachments:</h2>
</div>
<div class="greybox" align="left">
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178361.png">Screen Shot 2019-04-21 at 8.13.22 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178362.png">Screen Shot 2019-04-21 at 8.41.35 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178363.png">Screen Shot 2019-04-21 at 8.42.08 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178368.png">Screen Shot 2019-04-21 at 9.19.34 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178369.png">Screen Shot 2019-04-21 at 9.28.27 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178370.png">Screen Shot 2019-04-21 at 9.33.29 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178371.png">image2019-4-21 21:56:40.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178372.png">image2019-4-21 21:56:59.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178373.png">Screen Shot 2019-04-21 at 9.52.50 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178374.png">image2019-4-21 21:57:44.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178375.png">Screen Shot 2019-04-21 at 10.15.57 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178376.png">Screen Shot 2019-04-21 at 10.16.33 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178377.png">image2019-4-21 22:22:12.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178622.png">Screen Shot 2019-04-22 at 4.40.27 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178621.png">Screen Shot 2019-04-22 at 4.40.27 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178638.png">Screen Shot 2019-04-22 at 5.10.43 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178640.png">Screen Shot 2019-04-22 at 5.16.39 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178641.png">Screen Shot 2019-04-22 at 5.17.18 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178642.png">Screen Shot 2019-04-22 at 5.17.45 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178643.png">Screen Shot 2019-04-22 at 5.17.36 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178644.png">Screen Shot 2019-04-22 at 5.24.41 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178645.png">Screen Shot 2019-04-22 at 5.24.07 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178646.png">Screen Shot 2019-04-22 at 5.25.09 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178647.png">Screen Shot 2019-04-22 at 5.24.18 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178651.png">Screen Shot 2019-04-22 at 5.41.16 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430178652.png">Screen Shot 2019-04-22 at 5.40.31 PM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430997756.png">Screen Shot 2019-04-24 at 8.21.42 AM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430997757.png">Screen Shot 2019-04-24 at 8.23.10 AM.png</a> (image/png)
<br/>
<img src="images/icons/bullet_blue.gif" height="8" width="8" alt=""/>
<a href="attachments/430276677/430997758.png">Screen Shot 2019-04-24 at 8.23.27 AM.png</a> (image/png)
<br/>
</div>
</div>
</div> </div>
<div id="footer" role="contentinfo">
<section class="footer-body">
<p>Document generated by Confluence on Aug 29, 2022 09:29</p>
<div id="footer-logo"><a href="http://www.atlassian.com/">Atlassian</a></div>
</section>
</div>
</div> </body>
</html>