-
Notifications
You must be signed in to change notification settings - Fork 45
/
changelog.txt
303 lines (214 loc) · 7.43 KB
/
changelog.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
Change Log
==========
1.2.0
Updated gem dependencies due to vulnerabilities.
1.1.0
Updated gems to latest, possibly breaking change on versions
Merged pull request #45 - Relative urls are handled properly
1.0.29
Removed a method_missing from String class as it was causing performance issues
1.0.28
merged pull request from @pisaacs fixing reference to awesome_print
added crawl cancel to sidekiq jobs
1.0.27
removed references to awesome print
1.0.26
added ability to specify a file to contain seed urls for the command line report and export functions
1.0.25
merged pull request regarding referencing redis if cache disabled
fixed bug and updated specs for treat_https_as_http
1.0.24
added https to http normalization with option to make unique
1.0.23
fixed report command of executable to stop removing the url
1.0.22
limited slop version due to v4 changes
added cobweb into executables in gemspec
1.0.21
switched from namespaced_redis to redis-namespace
increased ruby version to 2.1.3
1.0.20
removed dependency on thin and resque from gemfile
fixed destroy method on CobwebCrawlHelper to no longer take seperate options, takes options from initializer
1.0.19
updated redis connection to reuse existing connection
fixed some end of crawl detection bugs
updated specs for stub! deprecations
fixed small bug in worker start detection is spec
merged pull request #20 (Add missing config options, fix default values and layout)
1.0.18
sidekiq gem now optional
1.0.17
added basic authentication support
1.0.16
remove forced output of cache hit notifications
1.0.15
post bug fix version bump...
1.0.14
added seed_list functionality
binding web statistics to 0.0.0.0 instead of 127.0.0.1
updated debug info for cache hits
1.0.13
changed order of crawls listed in web statistics to be oldest last
fixed bug in time statitstcs on web view, they were an hour out of date.
1.0.12
added sidekiq support for processing
fixed bug around full cache being set into crawl-based cache
1.0.11
added command line options report, export
report allows you to generate a csv with various data from your site
export downloads your site to your filesystem with page being stored as yaml
1.0.10
only show current version's crawls in statitics
1.0.9
improvements to external_urls
1.0.8
fixed threading for cobwebcrawler
1.0.7
added thread_count as an option for cobweb crawler (defaults to 1)
1.0.6
changed CobwebCrawler to output stats class rather than hash
added ability to store inbound links for pages
1.0.5
fixed bug in following redirects where the :redirect_through response wasn't correct
1.0.4
fixed bug.. sinatra settings.root missing
1.0.3
added ability to set exceptions to be raised in host application
1.0.2
added user-agent to options passed through
updated specs to accept the new default user-agent
1.0.1
fixed bug where CobwebCrawler was going beyond the end of the queued urls
1.0.0
released version 1... yay!
some bug fixes
0.0.77
removed some redundant locks
fixed bug where some locks deadlock
removed lock debug being output by default
0.0.76
fixed bug with connecting to redis
0.0.75
Content link parser now uses the base tag if it is available
added locking to crawl to get correct count of pages and limits
0.0.74
Major refactor of CrawlJob moved logic out into CobwebModule::Crawl
0.0.73
fixed bug in cancel crawl that didn't remove all items from queue
0.0.72
updating cancel crawl code
0.0.71
added json to dependancies
0.0.70
updates to detecting the end of a crawl
0.0.69
refactored Crawl class to CobwebCrawlHelper as may cause clashes
0.0.68
added Crawl class as a helper for crawls
added destroy method to stop a crawl running that is in progress
0.0.67
changed logo on statistics web interface to cobweb
0.0.66
updated server monitor to use redis options
updated default user agent to include versions of cobweb and relevant modules
changed crawl_job to directly use redis counters rather than storing them to more tightly integrate with other processes
0.0.65
fixed bug with crawling same page multiple times
0.0.64
fixed bug if option was array
updated readme to include cobweb_sample
some minor changes to statistics from resque crawl
0.0.63
fixed bug in statistics sinatra app with encoding issues
removed some debug output
changed "Crawl Stopped" status to "Crawl Finished"
0.0.62
fixed bugs in redirect logic
0.0.61
remove url from queued and add it to crawled for both original and redirected url if a redirect is in place
0.0.60
bug fixes in the resque job to make crawl finished detection better
0.0.59
merged pull request from ephox for running method instead of enqueueing to process queue
0.0.58
removed rogue 'puts' that was displaying gem path
added obey_robots option
0.0.57
fixed bug with default internal_urls when port was non standard
added specs for crawl_job
0.0.56
updated gemspec
0.0.55
updated comments for rdoc
0.0.54
fixed bug when url had ' in it was breaking statistics
0.0.53
added limit on links returned to exclude duplicated sections of the path (ie in a link loop situation)
0.0.52
fixed bug with escaping for regex
0.0.51
removed debug
0.0.50
fixed bug in escaping urls for regex
0.0.49
added escape code to ? in urls for internal and external urls
added specs for excluding based on querystrings
0.0.48
fixed bug when ENVIRONMENT isn't defined
0.0.47
switched to perform join of url before excluding based on scheme
0.0.46
fixed bug in cookies
0.0.45
moved internal and external link logic into its own class
0.0.44
additional improvements for redis
0.0.43
improved performance of redis on large crawls
0.0.42
fixed bug with cobweb crawler returning empty hash
0.0.41
fixed bug showing duplicates in statistics
0.0.40
added ability for cobweb_crawler to set crawl_id manually which allows restarting crawls from last position
0.0.39
added internal links addition to cobweb_crawler
added more advanced statistics to sinatra app
0.0.38
fixed detection of crawl finishing when no crawl_limit has been set
0.0.37
pulled merge request for encoding issues
0.0.36
pulled merge request for seeing errors on a redirect
0.0.35
0.0.34
Updated to use namespaced_redis gem
pulled merge request to remove anchors when making requests
0.0.33
Bug in parsing url('') directives in css fixed
0.0.32
Added enqueue counter to update for info on how many items have been queued for processing
0.0.31
Removed debug
0.0.30
Fixed bug causing problems where the redirected location was relative
0.0.29
Including internal_urls in content hash as crawler may detect a redirect on the first page
0.0.28
Removed debug on crawl finishing
0.0.27
Fixed bug in set_base_url
0.0.24
Added internal_urls to the start options. It allows you to limit what is within a site for the crawl, its an array and * is wildcard.
0.0.13 (or there abouts)
Changed CobWeb to Cobweb in line with conventions
0.0.7
Adding the retrieved url to cache if required even if it is a redirect
Added workaround for bug in addressable gem for https addresses
0.0.5
Added Addressable gem to do some of the uri parsing as it does a better job than the standard ruby parser
0.0.4
Some large changes have been made, can't remember them all. If you were using 0.0.3 then some things will break. Best bet is to read through the documentation to see what has changed.
0.0.3
Started Change log