-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathindex.php
725 lines (702 loc) · 43.6 KB
/
index.php
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
<?php
$page_title="Apache Solr vs Elasticsearch - the Feature Smackdown!";
include_once("inc/header.php");
?>
<div class="container">
<div class="jumbotron subhead">
<h1 class="secthead" style="margin-top:-10px;margin-bottom:-5px">Apache Solr vs Elasticsearch</h1><h3 class="secthead">The Feature Smackdown</h3>
</div>
</div>
<hr/>
<div class="container">
<h2 class="secthead">API</h2>
<table class="table table-striped table-bordered table-hover">
<thead><tr>
<th width="20%">Feature</th>
<th width="40%"><?=$solr_version;?></th>
<th width="40%"><?=$es_version;?></th></tr>
</thead>
<tbody>
<tr>
<td>Format</td>
<td>XML, CSV, JSON</td>
<td>JSON</td>
</tr>
<tr>
<td>HTTP REST API</td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Binary API <a href="#" title="A binary API is likely to be a more efficient for large data." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> SolrJ</td>
<td><img src="img/tick.png"> TransportClient, Thrift (through a <a href="https://github.com/elasticsearch/elasticsearch-transport-thrift">plugin</a>)</td>
</tr>
<tr>
<td>JMX support</td>
<td><img src="img/tick.png"></td>
<td><img src="img/cross.png"> ES specific stats are exposed through the REST API</td>
</tr>
<tr>
<td>Official client libraries <a href="#" title="Actively-maintained official client libraries in various languages." class="tt"><img src="img/help.png"></a></td>
<td>Java</td>
<td>Java, Groovy, PHP, Ruby, Perl, Python, .NET, Javascript <a href="https://www.elastic.co/guide/en/elasticsearch/client/index.html">Official list of clients</a></td>
</tr>
<tr>
<td>Community client libraries <a href="#" title="Community-maintained client libraries in various languages." class="tt"><img src="img/help.png"></a></td>
<td>PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Go, Erlang, Clojure</td>
<td>Clojure, Cold Fusion, Erlang, Go, Groovy, Haskell, Java, JavaScript, .NET, OCaml, Perl, PHP, Python, R, Ruby, Scala, Smalltalk, Vert.x <a href="https://www.elastic.co/guide/en/elasticsearch/client/community/current/index.html">Complete list</a></td>
</tr>
<tr>
<td>3rd-party product integration (open-source)<a href="#" title="3rd-party open-source products which use Solr/ES to provide search functionality." class="tt"><img src="img/help.png"></a></td>
<td>Drupal, Magento, Django, ColdFusion, Wordpress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna)</td>
<td>Drupal, Django, Symfony2, Wordpress, CouchBase</td>
</tr>
<tr>
<td>3rd-party product integration (commercial)<a href="#" title="3rd-party commercial products which use Solr/ES to provide search functionality." class="tt"><img src="img/help.png"></a></td>
<td>DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapR</td>
<td>SearchBlox, Hortonworks Data Platform, MapR etc <a href="https://www.elastic.co/guide/en/elasticsearch/plugins/current/integrations.html">Complete list</a></td>
</tr>
<tr>
<td>Output<a href="#" title="Output formats" class="tt"><img src="img/help.png"></a></td>
<td>JSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native Java</td>
<td>JSON, XML/HTML (via <a href="http://blog.zenika.com/index.php?post/2012/12/20/Introducing-the-Elasticsearch-View-Plugin">plugin</a>)</td>
</tr>
</tbody>
</table>
<br/>
<h2 class="secthead">Infrastructure</h2>
<table class="table table-striped table-bordered table-hover">
<thead><tr>
<th width="20%">Feature</th>
<th width="40%"><?=$solr_version;?></th>
<th width="40%"><?=$es_version;?></th></tr>
</thead>
<tbody>
<tr>
<td>Master-slave replication</td>
<td><img src="img/tick.png"> </td>
<td><img src="img/cross.png"> Not an issue because shards are replicated across nodes.</td>
</tr>
<tr>
<td>Integrated snapshot and restore</td>
<td>Filesystem</td>
<td>Filesystem, AWS Cloud Plugin for S3 repositories, HDFS Plugin for Hadoop environments, Azure Cloud Plugin for Azure storage repositories</td>
</tr>
</tbody>
</table>
<br/>
<h2 class="secthead">Indexing</h2>
<table class="table table-striped table-bordered table-hover">
<thead><tr>
<th width="20%">Feature</th>
<th width="40%"><?=$solr_version;?></th>
<th width="40%"><?=$es_version;?></th></tr>
</thead>
<tbody>
<tr>
<td>Data Import</td>
<td>DataImportHandler - JDBC, CSV, XML, Tika, URL, Flat File</td>
<td><font color=maroon>[DEPRECATED in 2.x]</font> Rivers modules - ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia</td>
</tr>
<tr>
<td>ID field for updates and deduplication</td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>DocValues <a href="#" title="Disk-based field data. Replacement for FieldCache" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Partial Doc Updates <a href="#" title="Partial document updates allow you to update a document by sending just the fields that have changed. <p>This makes it more similar to SQL update statements.</p>" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> with stored fields</td>
<td><img src="img/tick.png"> with _source field</td>
</tr>
<tr>
<td>Custom Analyzers and Tokenizers <a href="#" title="Analyzers and Tokenizers are what break up text into terms, or tokens, which are then indexed for searching." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Per-field analyzer chain <a href="#" title="You can specify a sequence of analyzers/tokenizers on a per-field basis." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Per-doc/query analyzer chain <a href="#" title="You can specify a sequence of analyzers/tokenizers on a per-document or per-query basis." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/cross.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Index-time synonyms <a href="#" title="You can specify synonyms either through term expansion, or term substitution." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"> Supports Solr and Wordnet synonym format</td>
</tr>
<tr>
<td>Query-time synonyms <a href="#" title="You can specify synonyms either through term expansion, or term substitution." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> Solr 6 provides proper multi-word synonyms via SynonymGraphFilter</td>
<td><img src="img/tick.png"> Synonym Graph Token Filter is in beta in ES 6.2</td>
</tr>
<tr>
<td>Multiple indexes <a href="#" title="Lucene stores documents in an index. This feature allows you to manage multiple indices from a single installation. The RDBMS-equivalent of an index is a database." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Near-Realtime Search/Indexing <a href="#" title="Near-Realtime search means thats documents are available for search almost immediately after being indexed - additions and updates to documents are seen in 'near' realtime." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Complex documents <a href="#" title="Parent-child relationship between documents is supported. You can nest documents, rather than having to flatten documents." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Schemaless <a href="#" title="A mode that requires no up-front schema modifications, in which previously unknown fields' types are guessed based on the values in added/updated documents, and are then added to the schema prior to processing the update." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> </td>
<td><img src="img/tick.png"> </td>
</tr>
<tr>
<td>Multiple document types per schema <a href="#" title="The RDBMS-equivalent of a schema is a database. <br/>A database table is a collection of fields.<br/> Having multiple doc types per schema is akin to having multiple tables per database." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/cross.png"> One set of fields per schema, one schema per core</td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Online schema changes <a href="#" title="Can changes to the schema be made without restarting the server?" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> Schemaless mode or via dynamic fields.</td>
<td><img src="img/tick.png"> Only backward-compatible changes.</td>
</tr>
<tr>
<td>Apache Tika integration <a href="#" title="Apache Tika is a Java library which supports full-text extraction from binary files such as PDF, MS Word etc" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Dynamic fields <a href="#" title="Dynamic fields are field definitions which support wildcard matching. e.g. book_* matches all field names starting with book_" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Field copying <a href="#" title="Field copying is useful where you need multiple versions of a field indexed differently, e.g. a stemmed and an unstemmed version of a field." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"> via multi-fields</td>
</tr>
<tr>
<td>Hash-based deduplication <a href="#" title="Determining the uniqueness of a document not based on an ID-field, but the hash signature of a field. Useful for web pages for example, where the URL may be different but the content the same." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"> <a href="https://www.elastic.co/guide/en/elasticsearch/plugins/current/mapper-murmur3.html">Murmur plugin</a> or <a href="https://github.com/YannBrrd/elasticsearch-entity-resolution">ER plugin</a></td>
</tr>
<tr>
<td>Index-time sorting <a href="#" title="Stores documents sorted at index-time, reducing the performance overhead of sorting at query-time" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/cross.png"></td>
<td><img src="img/tick.png"></td>
</tr>
</tbody>
</table>
<br/>
<h2 class="secthead">Searching</h2>
<table class="table table-striped table-bordered table-hover">
<thead><tr>
<th width="20%">Feature</th>
<th width="40%"><?=$solr_version;?></th>
<th width="40%"><?=$es_version;?></th></tr>
</thead>
<tbody>
<tr>
<td>Lucene Query parsing <a href="#" title="Lucene provides a string-based query syntax for performing searches. This is adequate for most instances, but may come up short for more complex queries." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Structured Query DSL <a href="#" title="A Domain-Specific Language which allows you to build complex queries not otherwise possible with just a Lucene query string." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> <a href="https://lucene.apache.org/solr/guide/7_1/json-query-dsl.html">JSON Query DSL</a> is new in Solr 7.x</td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Span queries <a href="#" title="SpanQueries allow for nested, positional restrictions when matching documents in Lucene. They're kind of like phrase queries, but much more expressive." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> via <a href="https://issues.apache.org/jira/browse/SOLR-2703">SOLR-2703</a></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Spatial/geo search <a href="#" title="Searching for documents by latitude/longitude." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Multi-point spatial search <a href="#" title="An advanced spatial search feature which allows for each document to possess more than one spatial point. A good example is an index where documents are companies, which may have more than one physical office." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Faceting <a href="#" title="Faceting allows for efficient computation of doc counts by facets. An example of facets may be 'Category', 'Price' or 'Shipping Method'." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"> Top N term accuracy can be controlled with <a href="https://www.elastic.co/guide/en/elasticsearch/reference/2.1/search-aggregations-bucket-terms-aggregation.html#_shard_size_2">shard_size</a> </td>
</tr>
<tr>
<td>Advanced Faceting <a href="#" title="Advanced operations such as hierarchical faceting, metrics and bucketing" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> New <a href="https://lucene.apache.org/solr/guide/7_2/analytics.html">Analytics component</a> and <a href="http://yonik.com/json-facet-api/">JSON faceting API</a></td>
<td><img src="img/tick.png"> <a href="http://www.elasticsearch.org/blog/data-visualization-elasticsearch-aggregations" rel="nofollow">blog post</a></td>
</tr>
<tr>
<td>Geo-distance Faceting </td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Pivot Facets <a href="#" title="A pivot facet, aka decision tree, is a multi-level facet across multiple fields. e.g. pivoting on price than category returns category facet counts for each price facet." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>More Like This</td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Boosting by functions <a href="#" title="Modify document scores through pre-built or custom function classes." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Boosting using scripting languages <a href="#" title="Modify document scores through custom code written in a scripting language like Javascript." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/cross.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Push Queries <a href="#" title="Think of push queries as the reverse operation of indexing and then searching. Instead of sending docs, indexing them, and then running queries. One sends queries, registers them, and then sends docs and finds out which queries match that doc." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> Via <a href="https://lucene.apache.org/solr/guide/7_2/streaming-expressions.html">Streaming Expressions</a></td>
<td><img src="img/tick.png"> Percolation. Distributed percolation supported in 1.0</td>
</tr>
<tr>
<td>Field collapsing/Results grouping <a href="#" title="Field Collapsing collapses a group of results with the same field value down to a single (or fixed number) of entries. For example, most search engines such as Google collapse on site so only one or two entries are shown, along with a link to click to see more results from that site." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Query Re-Ranking <a href="#" title="Query Re-Ranking allows you to run a simple query (A) for matching documents and then re-rank the top N documents using the scores from a more complex query (B)." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"> via <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-rescore.html">Rescoring</a> or <a href="https://github.com/codelibs/elasticsearch-dynarank">a plugin</a></td>
</tr>
<tr>
<td>Index-based Spellcheck <a href="#" title="Performs spellcheck recommendations based on words/terms that exist in the index." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"> <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html">Phrase Suggester</a></td>
</tr>
<tr>
<td>Wordlist-based Spellcheck <a href="#" title="Performs spellcheck recommendations based on a wordlist, for example, a dictionary file or a list of user queries." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/cross.png"></td>
</tr>
<tr>
<td>Autocomplete</td>
<td><img src="img/tick.png"> </td>
<td><img src="img/tick.png"> </td>
</tr>
<tr>
<td>Document-oriented Autocomplete</td>
<td><img src="img/cross.png"> Solr suggester return phrases not documents.</td>
<td><img src="img/tick.png"> </td>
</tr>
<tr>
<td>Learning to Rank</td>
<td><img src="img/tick.png"> </td>
<td><img src="img/tick.png"> Via <a href="https://github.com/o19s/elasticsearch-learning-to-rank">https://github.com/o19s/elasticsearch-learning-to-rank</a></td>
</tr>
<tr>
<td>Query elevation <a href="#" title="Query elevation enables you to configure the top results for a given query regardless of the normal lucene scoring. This is sometimes called 'sponsored search', 'editorial boosting' or 'best bets'." class="tt"><img src="img/help.png"></a> </td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"><a href="https://github.com/elasticsearch/elasticsearch/issues/1066#issuecomment-8625739">workaround</a></td>
</tr>
<tr>
<td>Intra-index joins <a href="#" title="A method of searching on inter-document relationships, just like SQL joins." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> via parent-child query</td>
<td><img src="img/tick.png"> via <i>has_children</i> and <i>top_children</i> queries</td>
</tr>
<tr>
<td>Inter-index joins <a href="#" title="Joining between indexes" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> Joined index has to be single-shard and replicated across all nodes.</td>
<td><img src="img/cross.png"></td>
</tr>
<tr>
<td>Resultset Scrolling <a href="#" title="Efficient scrolling/paging of large result sets" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> </td>
<td><img src="img/tick.png"> via <i>scan</i> search type</td>
</tr>
<tr>
<td>Filter queries <a href="#" title="Cached queries which only limit the set of document results and do not affect doc score." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"> also supports filtering by native scripts</td>
</tr>
<tr>
<td>Filter execution order <a href="#" title="The ability to specify when a filter query is expensive and thus should run last, on the smallest document set possible." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> local params and <i>cache</i> property</td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Alternative QueryParsers <a href="#" title="An example of an alternative QueryParser is Solr's DisjunctionMaxQueryParser." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> DisMax, eDisMax</td>
<td><img src="img/tick.png"> query_string, dis_max, match, multi_match etc</td>
</tr>
<tr>
<td>Negative boosting <a href="#" title="Reducing the score of certain documents which match a query." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> but awkward. Involves positively boosting the inverse set of negatively-boosted documents.</td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Search across multiple indexes</td>
<td><img src="img/tick.png"> it can search across multiple compatible collections</td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Result highlighting</td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Custom Similarity <a href="#" title="Lucene's Similarity class provides a way to customize some variables used in calculating document scores." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Searcher warming on index reload <a href="#" title="When an index is changed, Searchers need to be reloaded. All existing FieldCaches are refreshed. By warming Searchers with queries before making them live, you avoid the instance where the first search is always a slow one." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"> <a href="http://www.elasticsearch.org/guide/reference/api/admin-indices-warmers/">Warmers API</a></td>
</tr>
<tr>
<td>Term Vectors API</td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>SQL queries</td>
<td><img src="img/tick.png"> Via <a href="https://lucene.apache.org/solr/guide/7_2/parallel-sql-interface.html#parallel-sql-interface">Parallel SQL</a>. SolrCloud only</td>
<td><img src="img/cross.png"></td>
</tr>
<tr>
<td>Distributed Map/Reduce processing</td>
<td><img src="img/tick.png"> Via <a href="https://lucene.apache.org/solr/guide/7_2/streaming-expressions.html">Streaming Expressions</a>. SolrCloud only</td>
<td><img src="img/cross.png"></td>
</tr>
<!--
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
-->
</tbody>
</table>
<br/>
<h2 class="secthead">Customizability</h2>
<table class="table table-striped table-bordered table-hover">
<thead><tr>
<th width="20%">Feature</th>
<th width="40%"><?=$solr_version;?></th>
<th width="40%"><?=$es_version;?></th></tr>
</thead>
<tbody>
<tr>
<td>Pluggable API endpoints <a href="#" title="You can define new API endpoints." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Pluggable search workflow <a href="#" title="You can modify the workflow of existing API endpoints." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> via SearchComponents</td>
<td><img src="img/cross.png"></td>
</tr>
<tr>
<td>Pluggable update workflow <a href="#" title="You can modify the workflow of document inserts and updates." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> via <a href="https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors" target="_blank">UpdateRequestProcessor</a></td>
<td><img src="img/cross.png"></td>
</tr>
<tr>
<td>Pluggable Analyzers/Tokenizers</td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Pluggable QueryParsers <a href="#" title="An example of an alternative QueryParser is Solr's DisjunctionMaxQueryParser." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> </td>
<td><img src="img/tick.png"> </td>
</tr>
<tr>
<td>Pluggable Field Types</td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Pluggable Function queries</td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Pluggable scoring scripts</td>
<td><img src="img/cross.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Pluggable hashing <a href="#" title="See Hash-based deduplication above" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Pluggable webapps <a href="#" title="Webapps integrated with the application" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/cross.png"></td>
<td><img src="img/cross.png"> <font color=maroon>[site plugins DEPRECATED in 5.x]</font> <a href="https://www.elastic.co/blog/running-site-plugins-with-elasticsearch-5-0">blog post</a></td>
</tr>
<tr>
<td>Automated plugin installation <a href="#" title="Can plugins be installed via some kind of manager?" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/cross.png"></td>
<td><img src="img/tick.png"> Installable from GitHub, maven, sonatype or elasticsearch.org</td>
</tr>
<!--
<tr>
<td></td>
<td></td>
<td></td>
</tr>
-->
</tbody>
</table>
<br/>
<h2 class="secthead">Distributed</h2>
<table class="table table-striped table-bordered table-hover">
<thead><tr>
<th width="20%">Feature</th>
<th width="40%"><?=$solr_version;?></th>
<th width="40%"><?=$es_version;?></th></tr>
</thead>
<tbody>
<tr>
<td>Self-contained cluster <a href="#" title="Does the cluster depend on any other servers, or is it self-contained?" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/cross.png"> Depends on separate ZooKeeper server</td>
<td><img src="img/tick.png"> Only Elasticsearch nodes</td>
</tr>
<tr>
<td>Automatic node discovery</td>
<td><img src="img/tick.png"> ZooKeeper</td>
<td><img src="img/tick.png"> internal Zen Discovery or ZooKeeper</td>
</tr>
<tr>
<td>Partition tolerance</td>
<td><img src="img/tick.png"> The partition without a ZooKeeper quorum will stop accepting indexing requests or cluster state changes, while the partition with a quorum continues to function.</td>
<td><img src="img/cross.png"> Partitioned clusters can diverge unless discovery.zen.minimum_master_nodes set to at least N/2+1, where N is the size of the cluster. If configured correctly, the partition without a quorum will stop operating, while the other continues to work. See <a href="http://elasticsearch-users.115913.n3.nabble.com/Split-brain-td3620149.html">this</a></td>
</tr>
<tr>
<td>Automatic failover</td>
<td><img src="img/tick.png"> If all nodes storing a shard and its replicas fail, client requests will fail, unless requests are made with the shards.tolerant=true parameter, in which case partial results are retuned from the available shards.</td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Automatic leader election</td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Shard replication</td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Sharding <a href="#" title="A shard is a subset of an index stored on a node." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"></td>
<td><img src="img/tick.png"></td>
</tr>
<tr>
<td>Automatic shard rebalancing <a href="#" title="Shards are automatically rebalanced to adhere to the desired replication factor." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> Solr Autoscaling is new in Solr 7.</td>
<td><img src="img/tick.png"> it can be machine, rack, availability zone, and/or data center aware. Arbitrary tags can be assigned to nodes and it can be configured to not assign the same shard and its replicates on a node with the same tags.</td>
</tr>
<tr>
<td>Change # of shards</td>
<td><img src="img/tick.png"> Shards can be added (when using implicit routing) or split (when using compositeId). Cannot be lowered. Replicas can be increased anytime.</td>
<td><img src="img/cross.png"> each index has 5 shards by default. Number of primary shards cannot be changed once the index is created. Replicas can be increased anytime. The <a href="https://www.elastic.co/guide/en/elasticsearch/reference/6.2/indices-shrink-index.html">Shrink Index API</a> lets you reindex the index into a new index with fewer shards. </td>
</tr>
<tr>
<td>Shard splitting</td>
<td><img src="img/tick.png"></td>
<td><img src="img/cross.png"> You can use the <a href="https://www.elastic.co/guide/en/elasticsearch/reference/6.2/indices-split-index.html">Index Splitting API</a> to index to a new index with primary shards split.</td>
</tr>
<tr>
<td>Relocate shards and replicas <a href="#" title="Move shards and replicas within a cluster" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> can be done by creating a shard replicate on the desired node and then removing the shard from the source node</td>
<td><img src="img/tick.png"> can move shards and replicas to any node in the cluster on demand</td>
</tr>
<tr>
<td>Control shard routing <a href="#" title="Control which shard a search request gets routed to" class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> <i>shards</i> or <i>_route_</i> parameter</td>
<td><img src="img/tick.png"> <i>routing</i> parameter</td>
</tr>
<tr>
<td>Pluggable shard/replica assignment </td>
<td><img src="img/tick.png"> New Autoscaling API replaces the old rule-based replica assignment</td>
<td><img src="img/tick.png"> Probabilistic shard balancing with <a href="https://github.com/datarank/tempest">Tempest plugin</a></td>
</tr>
<tr>
<td>Avoid duplicate indexing on replicas <a href="#" title="Each document that gets indexed on the master is by default reindexed on each replica, incurring unnecessary overhead." class="tt"><img src="img/help.png"></a></td>
<td><img src="img/tick.png"> Solr 7 provides 3 kinds of replica types: NRT (default and the pre-Solr 7 behavior), tlog and pull. Non-SolrCloud master-slave replication can be achieved with tlog replica types.</td>
<td><img src="img/cross.png"></td>
</tr>
<tr>
<td>Consistency</td>
<td>Indexing requests are synchronous with replication. A indexing request won't return until all replicas respond. No check for downed replicas. They will catch up when they recover. When new replicas are added, they won't start accepting and responding to requests until they are finished replicating the index.</td>
<td>Replication between nodes is synchronous by default, thus ES is consistent by default, but it can be set to asynchronous on a per document indexing basis. Index writes can be configured to fail is there are not sufficient active shard replicas. The default is quorum, but all or one are also available.</td>
</tr>
<!--
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
-->
</tbody>
</table>
<br/>
<h2 class="secthead">Misc</h2>
<table class="table table-striped table-bordered table-hover">
<thead><tr>
<th width="20%">Feature</th>
<th width="40%"><?=$solr_version;?></th>
<th width="40%"><?=$es_version;?></th></tr>
</thead>
<tbody>
<tr>
<td>Web Admin interface </td>
<td><img src="img/tick.png"> bundled with Solr</td>
<td><img src="img/tick.png"> Marvel or Kibana apps</td>
</tr>
<tr>
<td>Visualisation</td>
<td><a href="https://github.com/LucidWorks/banana" rel="nofollow">Banana (Port of Kibana)</a></td>
<td><a href="https://www.elastic.co/products/kibana" rel="nofollow">Kibana</a></td>
</tr>
<tr>
<td>Hosting providers </td>
<td><a href="http://www.websolr.com" rel="nofollow">WebSolr</a>, <a href="http://www.searchify.com" rel="nofollow">Searchify</a>, <a href="http://www.hosted-solr.com" rel="nofollow">Hosted-Solr</a>, <a href="http://www.indexdepot.com" rel="nofollow">IndexDepot</a>, <a href="http://www.opensolr.com" rel="nofollow">OpenSolr</a>, <a href="http://www.gotosolr.com" rel="nofollow">gotosolr</a></td>
<td><a href="https://www.elastic.co/found" rel="nofollow">Found</a>, <a href="https://www.scalefastr.io/"" rel="nofollow">Scalefastr</a>, <a href="http://objectrocket.com/elasticsearch/" rel="nofollow">ObjectRocket</a>, <a href="http://www.bonsai.io" rel="nofollow">bonsai.io</a>, <a href="http://www.indexisto.com" rel="nofollow">Indexisto</a>, <a href="http://www.qbox.io" rel="nofollow">qbox.io</a>, <a href="http://www.indexdepot.com" rel="nofollow">IndexDepot</a>, <a href="http://www.compose.io" rel="nofollow">Compose.io</a>, <a href="https://www.sematext.com/logsene" rel="nofollow">Sematext Logsene</a></td>
</tr>
</tbody>
</table>
<br/><hr/>
<h2 class="secthead">My recommendations as of May 2018</h2>
<p>Here are some simple guidelines if the crazy long grid of features above did not help.</p>
<h3>Choose Solr if any of the following are true...</h3>
<li>Your team consists mainly of Java programmers</li>
<li>You're already using ZooKeeper in your stack</li>
<li>You're already using Java in your stack</li>
<li>You are building a search application that has specific and nuanced relevancy requirements</li>
<li>You are building an ecommerce, job, or product search engine</li>
<li>Search is a central part of your product and user experience and there is the organizational mandate for search to be a core strength</li>
<h3>Choose Elasticsearch if any of the following are true...</h3>
<li>Your team consists mainly of Ruby/PHP/Python/full stack programmers (and your application does not have specific and nuanced relevancy requirements)</li>
<li>You live and breathe JSON</li>
<li>You already use Kibana/ELK for managing your logs</li>
<li>Your application is analytics-heavy</li>
<h3>If in doubt...</h3>
<p>Every serious search application I have worked on has required in-depth customization of the search workflow and relevancy tweaks, and at the time of writing, this is simply not possible in Elasticsearch without major hacking. If in doubt, go Solr.</p>
<br/><hr/>
<h2 class="secthead">Thoughts... (somewhat outdated)</h2>
<p>I'm embedding my answer to this "Solr-vs-Elasticsearch" Quora question verbatim here:</p>
<blockquote>
<p>
1. Elasticsearch was born in the age of REST APIs. If you love REST APIs, you'll probably feel more at home with ES from the get-go. I don't actually think it's 'cleaner' or 'easier to use', but just that it is more aligned with web 2.0 developers' mindsets.<br/>
<br/>
2. Elasticsearch's Query DSL syntax is really flexible and it's pretty easy to write complex queries with it, though it does border on being verbose. Solr doesn't have an equivalent, last I checked. Having said that, I've never found Solr's query syntax wanting, and I've always been able to easily write a custom SearchComponent if needed (more on this later).<br/>
<br/>
3. I find Elasticsearch's documentation to be pretty awful. It doesn't help that some examples in the documentation are written in YAML and others in JSON. I wrote a ES code parser once to auto-generate documentation from Elasticsearch's source and found a number of discrepancies between code and what's documented on the website, not to mention a number of undocumented/alternative ways to specify the same config key. <br/>
<br/>
By contrast, I've found Solr to be consistent and really well-documented. I've found pretty much everything I've wanted to know about querying and updating indices without having to dig into code much. Solr's schema.xml and solrconfig.xml are *extensively* documented with most if not all commonly used configurations. <br/>
<br/>
4. Whilst what Rick says about ES being mostly ready to go out-of-box is true, I think that is also a possible problem with ES. Many users don't take the time to do the most simple config (e.g. type mapping) of ES because it 'just works' in dev, and end up running into issues in production. <br/>
<br/>
And once you do have to do config, then I personally prefer Solr's config system over ES'. Long JSON config files can get overwhelming because of the JSON's lack of support for comments. Yes you can use YAML, but it's annoying and confusing to go back and forth between YAML and JSON. <br/>
<br/>
5. If your own app works/thinks in JSON, then without a doubt go for ES because ES thinks in JSON too. Solr merely supports it as an afterthought. ES has a number of nice JSON-related features such as parent-child and nested docs that makes it a very natural fit. Parent-child joins are awkward in Solr, and I don't think there's a Solr equivalent for ES Inner hits.<br/>
<br/>
6. ES doesn't require ZooKeeper for it's 'elastic' features which is nice coz I personally find ZK unpleasant, but as a result, ES does have issues with split-brain scenarios though (google 'elasticsearch split-brain' or see this: Elasticsearch Resiliency Status).<br/>
<br/>
7. Overall from working with clients as a Solr/Elasticsearch consultant, I've found that developer preferences tend to end up along language party lines: if you're a Java/c# developer, you'll be pretty happy with Solr. If you live in Javascript or Ruby, you'll probably love Elasticsearch. If you're on Python or PHP, you'll probably be fine with either. <br/>
<br/>
Something to add about this: ES doesn't have a very elegant Java API IMHO (you'll basically end up using REST because it's less painful), whereas Solrj is very satisfactory and more efficient than Solr's REST API. If you're primarily a Java dev team, do take this into consideration for your sanity. There's no scenario in which constructing JSON in Java is fun/simple, whereas in Python its absolutely pain-free, and believe me, if you have a non-trivial app, your ES json query strings will be works of art. <br/>
<br/>
8. ES doesn't have in-built support for pluggable 'SearchComponents', to use Solr's terminology. SearchComponents are (for me) a pretty indispensable part of Solr for anyone who needs to do anything customized and in-depth with search queries. <br/>
<br/>
Yes of course, in ES you can just implement your own RestHandler, but that's just not the same as being able to plug-into and rewire the way search queries are handled and parsed. <br/>
<br/>
9. Whichever way you go, I highly suggest you choose a client library which is as 'close to the metal' as you can get. Both ES and Solr have *really* simple search and updating search APIs. If a client library introduces an additional DSL layer in attempt to 'simplify', I suggest you think long and hard about using it, as it's likely to complicate matters in the long-run, and make debugging and asking for help on SO more problematic. <br/>
<br/>
In particular, if you're using Rails + Solr, consider using rsolr/rsolr<br/>
instead of sunspot/sunspot if you can help it. ActiveRecord is complex code and sufficiently magical. The last thing you want is more magic on top of that. <br/>
<br/>
---<br/>
<br/>
To conclude, ES and Solr have more or less feature-parity and from a feature standpoint, there's rarely one reason to go one way or the other (unless your app lives/breathes JSON). Performance-wise, they are also likely to be quite similar (I'm sure there are exceptions to the rule. ES' relatively new autocomplete implementation, for example, is a pretty dramatic departure from previous Lucene/Solr implementations, and I suspect it produces faster responses at scale).<br/>
<br/>
ES does offer less friction from the get-go and you feel like you have something working much quicker, but I find this to be illusory. Any time gained in this stage is lost when figuring out how to properly configure ES because of poor documentation - an inevitablity when you have a non-trivial application. <br/>
<br/>
Solr encourages you to understand a little more about what you're doing, and the chance of you shooting yourself in the foot is somewhat lower, mainly because you're forced to read and modify the 2 well-documented XML config files in order to have a working search app.<br/>
<br/>
---<br/>
<br/>
EDIT on Nov 2015: <br/>
<br/>
ES has been gradually distinguishing itself from Solr when it comes to data analytics. I think it's fair to attribute this to the immense traction of the ELK stack in the logging, monitoring and analytic space. My guess is that this is where Elastic (the company) gets the majority of its revenue, so it makes perfect sense that ES (the product) reflects this.<br/>
<br/>
We see this manifesting primarily in the form of aggregations, which is a more flexible and nuanced replacement for facets. Read more about aggregations here: Migrating to aggregations<br/>
<br/>
Aggregations have been out for a while now (since 1.4), but with the recently released ES 2.0 comes pipeline aggregations, which let you compute aggregations such as derivatives, moving averages, and series arithmetic on the results of other aggregations. Very cool stuff, and Solr simply doesn't have an equivalent. More on pipeline aggregations here: Out of this world aggregations<br/>
<br/>
If you're currently using or contemplating using Solr in an analytics app, it is worth your while to look into ES aggregation features to see if you need any of it. </p></blockquote>
<br/><hr/>
<h2 class="secthead">Resources</h2>
<ul>
<li>My other sites may be of interest if you're new to Lucene, Solr and Elasticsearch:
<ul>
<li><a href="http://www.lucenetutorial.com">Lucene Tutorial</a></li>
<li><a href="http://www.elasticsearchtutorial.com">Elasticsearch Tutorial</a></li>
<li><a href="http://www.solrtutorial.com">Solr Tutorial</a></li>
</ul>
</li>
<li>The <a href="http://wiki.apache.org/solr">Solr wiki</a> and the <a href="http://www.elasticsearch.org/guide">Elasticsearch Guide</a> are your friends.</li>
</ul>
<br/><hr/>
<h2 class="secthead">Contribute</h2>
<p>If you see any mistakes, or would like to append to the information on this webpage, you can clone the <a href="https://github.com/superkelvint/solr-vs-elasticsearch">GitHub repo for this site</a> with:</p>
<blockquote>git clone https://github.com/superkelvint/solr-vs-elasticsearch</blockquote>
<p>and submit a pull request.</p>
<br/><hr/>
<h2 class="secthead">Popular books related to Search</h2>
<?php include_once("inc/amazon-books.php");?>
</div>
<?php
include_once("inc/footer.php");
?>