Fuzzy Word Matching #56

ZaLiTHkA · 2014-07-08T09:41:31Z

Could DYM be extended to use the idea of "approximate string matching" (Wikipedia reference)? Not everybody phrases their sentences the same way.. I honestly believe this plugin would be a million times more useful if words like "faulty" also returned a match for "fault", "saving" matched "saved" or "save", etc etc.

Unfortunately I'm not familiar enough with Ruby to do this myself, but a quick Google search shows many Ruby Gems that provide this type of functionality. So I'm guessing (read: hoping) this isn't an unreasonable request. :)

abahgat · 2014-07-08T21:19:10Z

Yes, that's something that we had considered in the past, but never implemented as we did not want to kill the underlying database with too many queries.

Can you please list a few of the Ruby Gems you had found?

ZaLiTHkA · 2014-07-09T06:21:07Z

From my limited understanding, I can only imagine a DB hit when the front end gets an updated string to compare (I use the "as you type" method in my setup, which could actually skip key presses like shift, space etc.. but that's not the point of my request here). Do you think this idea would cause more hits?

One that keeps popping up in SO questions is amatch, but after a bit of digging it looks like either fuzzy_match or fuzzy-string-match might be faster.

Another interesting looking one is fuzzily, however the dev recommends using blurrily instead for large datasets.

From the lib weight point of view, it looks like amatch and fuzzily have less dependencies, but I'm not too sure how much difference that would actually make. Keep in mind I don't speak Ruby, so I'm running blind here. :)

rlisowski · 2014-08-05T19:52:33Z

What about full text search with elasticsearch-rails
I could work with sidekiq worker as an async indexer.
Indexer could be triggered by hooks in Issue (added via patch).
There is redmine_sidekiq plugin so you neeed only provide worker.
What you say?

rlisowski · 2014-08-05T20:04:47Z

Consider also thinkingsphinx which can fit better that elasticsearch to current features list (filter by project_id, issue_id, status)

rlisowski · 2014-09-16T11:26:22Z

👍

ZaLiTHkA · 2014-09-16T16:56:04Z

I just pulled the changes referenced above into my fork of the repo, but I still don't get any fuzzy searching.. When I migrate plugins, I do get a warning to say "Sphinx cannot be found on your system". Tried installing the gem manually (gem install sphinx from Redmine's htdocs root folder), which didn't give any errors, but it also didn't change anything, so I'm not sure how to get around this.

@korin, was that thumbs-up meant to imply that @swiatkiewicz's changes work? Or have you not had a chance to test them yet?

rlisowski · 2014-09-16T17:19:12Z

It works, you need to install Sphinx. See ThinkingSphinx quickstart guide it's already available in most distros.

To be honest @swiatkiewicz changes allow replace sql like search with sphinx indexer which is faster option. It's only one step from fuzzy matching feature.

ZaLiTHkA · 2014-09-16T18:39:24Z

Thanks for the link. Unfortunately I need to run my system in a Windows environment, so it looks like I've got some reading to do before I get that part working.

The SQL like search works perfectly and the additional configuration options are helpful, so this is already a nice improvement.

Fuzzy Word Matching with Sphinx #56

dominch · 2015-02-04T17:18:35Z

I can't get this feature to work :(
I have already started sphinx etc, then trying "test"
and have results:
Bug # 1 – Testowy błąd (Closed w projekcie Projekt testowy)
Bug # 11 – test (New w projekcie Projekt testowy)
Feature # 19 – Test (New w projekcie)
Feature # 21 – testowe zadanie (New w projekcie Zadania w realizacji KD)
Feature # 22 – Ticket testowo pokazowy (New w projekcie Zadania w realizacji KD)
so I assume it should show something for "testy" (should strip "y" and match like "test"?)

rlisowski · 2015-02-04T17:52:41Z

show us plugin settings /settings/plugin/redmine_didyoumean

dominch · 2015-02-04T17:56:10Z

it's in polish but you know all settings.
http://i.imgur.com/gvLDYVy.png
any ideas?

rlisowski · 2015-02-04T18:02:49Z

Should work, I have similar settings. Any errors in redmine log file? or in browser development console?

rlisowski · 2015-02-04T18:07:02Z

You can also try rebuild thinking sphinx index with rake ts:rebuild.

dominch · 2015-02-04T18:09:23Z

No errors so far noticed :( I'll try out this with new tickets,
BTW: is ts:rebuild better than ts:index ?

rlisowski · 2015-02-04T18:13:18Z

only small difference with configuration see http://pat.github.io/thinking-sphinx/rake_tasks.html

dominch · 2015-02-05T14:57:21Z

Ok, so basically ts:rebuild is same as stop+index+start, great :)
Still can't get this to work as expected. I tried mysql first, is there a chance it still use it? rake tasks are running ok without any errors, also settings should be ok, but still I can't get right results.
Maybe it's something wrong with my sphinx in system? how can I test?

swiatkiewicz · 2015-02-05T16:40:56Z

Have you tried testing it in rails console?
It's working like " Issue.search 'somethig' ", and then you should see
info about search engine (sql or sphinx).

2015-02-05 15:57 GMT+01:00 dominch [email protected]:

Ok, so basically ts:rebuild is same as stop+index+start, great :)
Still can't get this to work as expected. I tried mysql first, is there a
chance it still use it? rake tasks are running ok without any errors, also
settings should be ok, but still I can't get right results.
Maybe it's something wrong with my sphinx in system? how can I test?

—
Reply to this email directly or view it on GitHub
#56 (comment)
.

dominch · 2015-02-05T19:47:38Z

Trying now:

2.0.0-p594 :001 > Issue.search 'somethig'
  CustomField Load (0.6ms)  SELECT `custom_fields`.* FROM `custom_fields` WHERE `custom_fields`.`type` = 'IssueCustomField' AND `custom_fields`.`searchable` = 1
  Role Load (0.3ms)  SELECT `roles`.* FROM `roles` WHERE `roles`.`builtin` = 2 LIMIT 1
  GroupAnonymous Load (0.6ms)  SELECT `users`.* FROM `users` WHERE `users`.`type` IN ('GroupAnonymous') ORDER BY id LIMIT 1
  Member Load (0.4ms)  SELECT `members`.* FROM `members` INNER JOIN `projects` ON `projects`.`id` = `members`.`project_id` WHERE (projects.status <> 9) AND (members.user_id = 2 OR (projects.is_public = 1 AND members.user_id = 49))
   (15.0ms)  SELECT COUNT(DISTINCT `issues`.`id`) FROM `issues` LEFT OUTER JOIN `projects` ON `projects`.`id` = `issues`.`project_id` LEFT OUTER JOIN `journals` ON `journals`.`journalized_id` = `issues`.`id` AND (journals.private_notes = 0 OR (1=0)) AND `journals`.`journalized_type` = 'Issue' WHERE (((projects.status <> 9 AND projects.id IN (SELECT em.project_id FROM enabled_modules em WHERE em.name='issue_tracking')) AND ((projects.is_public = 1 AND ((issues.is_private = 0)))))) AND (((LOWER(subject) LIKE '%somethig%') OR (LOWER(issues.description) LIKE '%somethig%') OR (LOWER(journals.notes) LIKE '%somethig%') OR issues.id IN (SELECT cfs.customized_id FROM custom_values cfs WHERE cfs.customized_type='Issue' AND cfs.customized_id=issues.id AND LOWER(cfs.value) LIKE '%somethig%' AND cfs.custom_field_id IN (2,4) AND ((1=1) AND (issues.tracker_id IN (SELECT tracker_id FROM custom_fields_trackers WHERE custom_field_id = cfs.custom_field_id)) AND (EXISTS (SELECT 1 FROM custom_fields ifa WHERE ifa.is_for_all = 1 AND ifa.id = cfs.custom_field_id) OR issues.project_id IN (SELECT project_id FROM custom_fields_projects WHERE custom_field_id = cfs.custom_field_id))))))
  SQL (23.8ms)  SELECT `issues`.`id` AS t0_r0, `issues`.`tracker_id` AS t0_r1, `issues`.`project_id` AS t0_r2, `issues`.`subject` AS t0_r3, `issues`.`description` AS t0_r4, `issues`.`due_date` AS t0_r5, `issues`.`category_id` AS t0_r6, `issues`.`status_id` AS t0_r7, `issues`.`assigned_to_id` AS t0_r8, `issues`.`priority_id` AS t0_r9, `issues`.`fixed_version_id` AS t0_r10, `issues`.`author_id` AS t0_r11, `issues`.`lock_version` AS t0_r12, `issues`.`created_on` AS t0_r13, `issues`.`updated_on` AS t0_r14, `issues`.`start_date` AS t0_r15, `issues`.`done_ratio` AS t0_r16, `issues`.`estimated_hours` AS t0_r17, `issues`.`parent_id` AS t0_r18, `issues`.`root_id` AS t0_r19, `issues`.`lft` AS t0_r20, `issues`.`rgt` AS t0_r21, `issues`.`is_private` AS t0_r22, `issues`.`ir_position` AS t0_r23, `issues`.`closed_on` AS t0_r24, `issues`.`sprint_id` AS t0_r25, `issues`.`position` AS t0_r26, `projects`.`id` AS t1_r0, `projects`.`name` AS t1_r1, `projects`.`description` AS t1_r2, `projects`.`homepage` AS t1_r3, `projects`.`is_public` AS t1_r4, `projects`.`parent_id` AS t1_r5, `projects`.`created_on` AS t1_r6, `projects`.`updated_on` AS t1_r7, `projects`.`identifier` AS t1_r8, `projects`.`status` AS t1_r9, `projects`.`lft` AS t1_r10, `projects`.`rgt` AS t1_r11, `projects`.`inherit_members` AS t1_r12, `projects`.`default_assignee_id` AS t1_r13, `projects`.`product_backlog_id` AS t1_r14, `journals`.`id` AS t2_r0, `journals`.`journalized_id` AS t2_r1, `journals`.`journalized_type` AS t2_r2, `journals`.`user_id` AS t2_r3, `journals`.`notes` AS t2_r4, `journals`.`created_on` AS t2_r5, `journals`.`private_notes` AS t2_r6 FROM `issues` LEFT OUTER JOIN `projects` ON `projects`.`id` = `issues`.`project_id` LEFT OUTER JOIN `journals` ON `journals`.`journalized_id` = `issues`.`id` AND (journals.private_notes = 0 OR (1=0)) AND `journals`.`journalized_type` = 'Issue' WHERE (((projects.status <> 9 AND projects.id IN (SELECT em.project_id FROM enabled_modules em WHERE em.name='issue_tracking')) AND ((projects.is_public = 1 AND ((issues.is_private = 0)))))) AND (((LOWER(subject) LIKE '%somethig%') OR (LOWER(issues.description) LIKE '%somethig%') OR (LOWER(journals.notes) LIKE '%somethig%') OR issues.id IN (SELECT cfs.customized_id FROM custom_values cfs WHERE cfs.customized_type='Issue' AND cfs.customized_id=issues.id AND LOWER(cfs.value) LIKE '%somethig%' AND cfs.custom_field_id IN (2,4) AND ((1=1) AND (issues.tracker_id IN (SELECT tracker_id FROM custom_fields_trackers WHERE custom_field_id = cfs.custom_field_id)) AND (EXISTS (SELECT 1 FROM custom_fields ifa WHERE ifa.is_for_all = 1 AND ifa.id = cfs.custom_field_id) OR issues.project_id IN (SELECT project_id FROM custom_fields_projects WHERE custom_field_id = cfs.custom_field_id)))))) ORDER BY issues.id ASC
 => [[], 0]

It seems to be SQL, isn't it?
I migrated old redmine to new version and still it's searching without fuzzy words feature. Any ideas what can I check? Plugin settings seems to be saved correctly. Is there any debug available so I can see what engine it's using?

rlisowski · 2015-02-05T19:52:06Z

Setting.plugin_redmine_didyoumean['search_method'] in rails console, 0 - SQL 1- TS

dominch · 2015-02-05T20:07:28Z

I'm trying in console:

2.0.0-p594 :006 > Issue.sphinx_search 'test'
  Issue Load (0.8ms)  SELECT `issues`.* FROM `issues` WHERE `issues`.`id` IN (231, 1, 51, 52, 53, 114, 150, 153, 167, 173, 232, 235, 244, 284, 381, 523, 687, 717, 747, 912)
(20 results)

nad same for word 'testy' gives me only one result.
So sphinx is working and gives me some results but they are amost same as for sql.

dominch · 2015-02-05T20:10:42Z

That seems to be ok:

2.0.0-p594 :009 > Setting.plugin_redmine_didyoumean['search_method']
 => "1"

And that's:

  def search_class
    case Setting.plugin_redmine_didyoumean['search_method']
    when "0"
      SqlSearch
    when "1"
      ThinkingSphinxSearch
    else
      raise 'There is no search method selected!'
    end
  end

so its sphinx. I tried to modify searching_by_thinking_sphinx.rb and that caused an effect so it's using it for sure. The question is what is wrong with sphinx that results are wrong.

swiatkiewicz · 2015-02-06T09:06:33Z

@dominch
Follow this steps to fix this:
Open file: issues_index.rb and replace line 7 with set_property :enable_star => true,
next in main project("Redmine") catalog (not in plugin) run 'rake ts:rebuid' and then try to search duplicates.

My case was: 'test', 'tester', 'testowy'. And before this steps I got only 1 results, but should be 3, now after these steps, I got a good result (3).

Check your application log for something like :
Sphinx Query (1.4ms) SELECT * FROM issue_core WHERE MATCH('test') AND project_id IN (450) AND sphinx_deleted = 0 AND sphinx_internal_id NOT IN (0) LIMIT 0, 10
Sphinx Found 3 results

dominch · 2015-02-06T13:11:46Z

How can I turn on debug mode?
Trying

script/rails server webrick -e production -d -p 3000

plus:

http://localhost:3000/searchissues?project_id=1&issue_id=&query=testowy

gives me:

Processing by SearchIssuesController#index as HTML
  Parameters: {"project_id"=>"1", "issue_id"=>"", "query"=>"testowy"}
  Current user: dominik.chmaj (id=3)
Completed 200 OK in 485.4ms (Views: 1.0ms | ActiveRecord: 15.7ms)

previous setting for :enable_star was 1, changed that to true but still no effect :(

swiatkiewicz · 2015-02-06T13:42:00Z

@dominch
Open config/production.rb then find line config.logger.lever or log_level, and it should equal to :debug like config.log_level = :debug

In thinkingSphinx is another problem, because if you add new issue or edit exisitng one, then you should run ts:index, to update indexes.
You can use unix cron, and run this every five minutes or something.

I'm trying to implement RealTime indexing but it's doesn't work as I expected and it's can take a while.

dominch · 2015-02-06T14:33:26Z

Ok, debug logs are working and I have:

  Sphinx Query (0.8ms)  SELECT * FROM `issue_core` WHERE MATCH('*testowej*') AND `project_id` IN (1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 17, 18, 19, 20, 23, 24, 25, 26) AND `sphinx_deleted` = 0 AND `sphinx_internal_id` NOT IN (0) LIMIT 0, 5
  Sphinx  Found 3 results
  Issue Load (0.6ms)  SELECT `issues`.* FROM `issues` WHERE `issues`.`id` IN (336, 717, 962)

So that's proof - sphinx are working, somehow it does not return much results. Only exact words.

my development.sphinx.conf looks like:

indexer
{
}
searchd
{
  listen = 127.0.0.1:9306:mysql41
  log = /var/data/redmine/log/development.searchd.log
  query_log = /var/data/redmine/log/development.searchd.query.log
  pid_file = /var/data/redmine/log/development.sphinx.pid
  workers = threads
  binlog_path = /var/data/redmine/tmp/binlog/development
}
source issue_core_0
{
  type = mysql
  sql_host = localhost
  sql_user = redmine
  sql_pass = ***
  sql_db = redmine
  sql_query_pre = SET TIME_ZONE = '+0:00'
  sql_query_pre = SET NAMES utf8
  sql_query = SELECT SQL_NO_CACHE `issues`.`id` * 2 + 0 AS `id`, `issues`.`subject` AS `subject`, `issues`.`id` AS `sphinx_internal_id`, 'Issue' AS `sphinx_internal_class`, 0 AS `sphinx_deleted`, `issues`.`id` AS `id`, `issues`.`status_id` AS `status_id`, `issues`.`project_id` AS `project_id` FROM `issues`  WHERE (`issues`.`id` BETWEEN $start AND $end) GROUP BY `issues`.`id`, `issues`.`subject`, `issues`.`id`, `issues`.`id`, `issues`.`status_id`, `issues`.`project_id` ORDER BY NULL
  sql_query_range = SELECT IFNULL(MIN(`issues`.`id`), 1), IFNULL(MAX(`issues`.`id`), 1) FROM `issues`
  sql_attr_uint = sphinx_internal_id
  sql_attr_uint = sphinx_deleted
  sql_attr_uint = id
  sql_attr_uint = status_id
  sql_attr_uint = project_id
  sql_attr_string = sphinx_internal_class
  sql_field_string = subject
  sql_query_info = SELECT `issues`.* FROM `issues`  WHERE (`issues`.`id` = ($id - 0) / 2)
}
index issue_core
{
  type = plain
  path = /var/data/redmine/db/sphinx/development/issue_core
  docinfo = extern
  charset_type = utf-8
  min_infix_len = 2
  enable_star = 1
  source = issue_core_0
}
index issue
{
  type = distributed
  local = issue_core
}

everything seems to work except right results :) This have to be something with tokenization etc. I assume that language is not that important because it should look for any word in any language with described rules. Is that correct?

swiatkiewicz · 2015-02-06T15:52:13Z

@dominch
In my case when I'm looking for word 'test':
Project:

Results:

This seems to be ok, right?
I think you like get something like this, right?

dominch · 2015-02-06T16:17:57Z

In description there is:

"faulty" also returned a match for "fault", "saving" matched "saved" or "save"

from plugin settings:

- Thinking Sphinx - firsly search words 1:1 after then substract last character and search again ('Running' will be looking for 'Runner' 'Running' etc.). Substract to min word length which is definded below

So in Your case for word "tester" it should find everything with word "test" (substacted 2 letters)
Of course ideal is with all forms like in example - Running should find Runner and Running.

Right now and for Your example it's loking for test and I have same thing - above You can find: "FROM issue_core WHERE MATCH('testowej') "
that is equiwalent of sql "where x like "%test%". Sure it will find "testowy" and even "wytestuj", but that's not very useful. Fulltext search should tokenize words, change them to basic forms, remove stopwords and try to match with query (with same rules).

Try "tester" - it should find # 101497 and of course # 101512 I expect that this word should search for "tester" + "teste" + "test" + "tes", should assign weights etc. That should give much more results.

rlisowski · 2015-02-14T23:36:00Z

It seems that sphinxsearch does not tokenize words by default. To make it work install it with libstemmer library.

links:
https://pat.github.io/thinking-sphinx/advanced_config.html
http://sphinxsearch.com/docs/current.html#conf-morphology
http://snowball.tartarus.org/download.php

dominch · 2015-02-18T13:02:53Z

I just changed my config and now it's working.
I think that should be placed in readme to be clear :) I needed to add config/thinking_sphinx.yml file with content:

production:
  morphology: stem_en
  mem_limit: 128M
  wordforms: "/var/data/redmine/config/sphinx/wordforms.txt"
  stopwords: "/var/data/redmine/config/sphinx/stopwords.txt"

This added morphology and steammer for my generated files,
Also I added wordforms and stopwords to my config, that's easy to get for any language from google.

Now it's working great! :) Thank You for help, I wasn't sure if I need anything more to my configuration.

Androc · 2015-02-18T14:17:56Z

@dominch Hello, I am highly interested in what you found. Could you please describe a little more how you achieved that ?

Did you find your wordforms/stopwords or did you generate them (and how) ?
when you write config/thinking_sphinx.yml you means redmine_root_path/config/thinking_sphinx.yml?
did you do anything else than just put the thinking_sphinx.yml file ?

I would like to use a different language than english/russian and after reading documentation, it appears I have to make more steps to achieve the morphology search.

Androc · 2015-02-18T14:46:43Z

I found a beginning of answer :

yes it is redmine_root_path/config/thinking_sphinx.yml
you must create a sphinx directory inside config
you must install or download your_language.dict and your_language.aff (I found .dict here : http://icon.shef.ac.uk/Moby/mlang.html and .aff with ispell)
you must create a wordforms.txt file inside config/sphinx directory by using spelldump your_language.dict your_language.aff
you must create (even if empty) stopwords.txt file inside config/sphinx. I was not able to fill it as the indexer --buildstops wordforms.txt 1000 returned nothing to do

I ran rake ts:index (not sure if necessary or usefull)

The morphology still does not work :(

dominch · 2015-02-18T15:21:21Z

Yes, it's inside redmine config directory, edit this file (thinking_sphinx.yml) and after ts:rebuild You should notice change in production.sphinx.yml file (in same dir) generated after command is executed.
For other morphology You must install dictorinaries,
stopwords and word forms are simple txt files, first one contains one stopword each line, and wordform contains lines with "from > to" lines.

I found both files in internet and placed them inside redmine_dir/config/sphinx (and reflected that path in config above). That is enogh for my needs - wordforms are changing my complex words to basic like "thinking > think" in example above;

Best luck! :)

Androc · 2015-02-18T15:27:28Z

Thanks for your answer. My production.sphinx.confwas updated :

type = plain
  path = /usr/share/redmine-stable/db/sphinx/production/issue_core
  docinfo = extern
  morphology = stem_en, libstemmer_fr
  stopwords = /usr/share/redmine-stable/config/sphinx/stopwords.txt
  wordforms = /usr/share/redmine-stable/config/sphinx/wordforms.txt
  charset_type = utf-8
  min_infix_len = 2
  enable_star = 1
  source = issue_core_0

So I suppose, sphinx is aware that I would like morphology but what I want to achieve first is if I search for testy it will find test.
Morphology will come later, I suppose.

Androc · 2015-02-18T15:29:29Z

What bother me is after using the spelldum command, my wordforms.txt is full of word > word.

After reading the documentation I would assume that I would find word > other_word.

Am I wrong ?

dominch · 2015-02-18T15:37:06Z

No, it should be "something > somethingElse"
idea is to change complex words to basic,
sphinx should give You warnings on reindex that it's ignoring such lines

Androc · 2015-02-18T15:40:16Z

Ok. So my doubts were based :)

I just found that my dict file was not corrupted, certainly leading to a bad wordforms.txt

Androc · 2015-02-18T15:47:14Z

My dict file is now clean, I regenarated my wordforms.txt. It is a clean word > word with no more shitty characters but still not a word > other_word list :(
As it is not directly the plugin's problem, I will search by my own and will come back later.

Thanks for your help.

Androc · 2015-02-19T07:57:10Z

I found a solution.

My problem was :
My .dic was "clean" but did not have directives to take .aff in account.

You .dic file must have a list of word/X (where X can be one or several letters like S, A, etc.) and not only a list of word.

My solution :
I installed .dic from myspell rather than ispell and I obtained a valid .dic.

Now I have a valid wordforms.txt

Androc · 2015-02-19T10:24:49Z

When ran rs:indexI had a lot of warning for duplicates.

My wordforms.txt still had lines like word > word.

I ran this awk -F " > " '$1 != $2 { print $0 }' wordforms.txt > wordforms_clean.txt to extract only word1 > word2 lines.
Then I overwrote my wordforms.txt with the clean one.

But when I run rake ts:indexI still have the duplicate warnings :(

I don't know how to clean former indexes as I tried rake ts:rebuild, rake ts:regenarate, etc.

dominch · 2015-02-19T10:32:21Z

duplicates are not only

word1 > word1

but

word1 > word2
word1 > word3

are duplicate to. You need to have only one word to convert to. In other words some unique index is build and engine needs to know what should be replaced exactly. Try to grep Your wordforms for any example from warning ("word1 > ") :)

Androc · 2015-02-19T11:15:25Z

Mmm I think I found my real problem : the accents !

I don't have word > word1, word > word2 but I have is wôrd > word2, fôrd > word2 which leads to duplicate on rd > word2.

Probably a UTF-8 problem.

dominch · 2015-02-19T12:14:16Z

Then check out database and it's table encoding, i.e. mysql by default uses latin1_swedish, I needed to change both db and all tables to utf8 but this happeneded some time ago. New tables were created not in utf8 and now they are. I'm not sure if those information are stored in db on index, but some documents are.

Androc · 2015-02-19T13:47:47Z

They are all UTF8.

I think it is more a Sphinx problem. I found several posts about UTF8 problems.

Still searching :)

Androc · 2015-02-19T15:42:04Z

Ok ... This is strange. I was totally focused on my warnings but in fact, the morphology seems to work.

When I search for traiter or traités it matches traités. That is one good point.

Now, my problem is in the example of the did_you_mean plugin, there is the tests example which should match test without using morphology.

I have an issue test and when I search for tests the issue is not found.

Maybe I missundertood the example.

dominch · 2015-02-19T15:45:28Z

That example is based on en morphology which cuts all words to min length and then tes+t = tes+ts
It was not working on my side untill I added morphology.

Androc · 2015-02-19T15:51:24Z

As I added in thinking_sphinx.yml, I am not sure what is making the morphology works.

charset_table: "0..9, a..z, _, A..Z->a..z, U+00C0->a, U+00C1->a, U+00C2->a, U+00C3->a, U+00C4->a, U+00C5->a, U+00C7->c, U+00C8->e, U+00C9->e, U+00CA->e, U+00CB->e, U+00CC->i, U+00CD->i, U+00CE->i, U+00CF->i, U+00D1->n, U+00D2->o, U+00D3->o, U+00D4->o, U+00D5->o, U+00D6->o, U+00D8->o, U+00D9->u, U+00DA->u, U+00DB->u, U+00DC->u, U+00DD->y, U+00E0->a, U+00E1->a, U+00E2->a, U+00E3->a, U+00E4->a, U+00E5->a, U+00E7->c, U+00E8->e, U+00E9->e, U+00EA->e, U+00EB->e, U+00EC->i, U+00ED->i, U+00EE->i, U+00EF->i, U+00F1->n, U+00F2->o, U+00F3->o, U+00F4->o, U+00F5->o, U+00F6->o, U+00F8->o, U+00F9->u, U+00FA->u, U+00FB->u, U+00FC->u, U+00FD->y, U+00FF->y, U+0100->a, U+0101->a, U+0102->a, U+0103->a, U+0104->a, U+0105->a, U+0106->c, U+0107->c, U+0108->c, U+0109->c, U+010A->c, U+010B->c, U+010C->c, U+010D->c, U+010E->d, U+010F->d, U+0112->e, U+0113->e, U+0114->e, U+0115->e, U+0116->e, U+0117->e, U+0118->e, U+0119->e, U+011A->e, U+011B->e, U+011C->g, U+011D->g, U+011E->g, U+011F->g, U+0120->g, U+0121->g, U+0122->g, U+0123->g, U+0124->h, U+0125->h, U+0128->i, U+0129->i, U+0131->i, U+012A->i, U+012B->i, U+012C->i, U+012D->i, U+012E->i, U+012F->i, U+0130->i, U+0134->j, U+0135->j, U+0136->k, U+0137->k, U+0139->l, U+013A->l, U+013B->l, U+013C->l, U+013D->l, U+013E->l, U+0141->l, U+0142->l, U+0143->n, U+0144->n, U+0145->n, U+0146->n, U+0147->n, U+0148->n, U+014C->o, U+014D->o, U+014E->o, U+014F->o, U+0150->o, U+0151->o, U+0154->r, U+0155->r, U+0156->r, U+0157->r, U+0158->r, U+0159->r, U+015A->s, U+015B->s, U+015C->s, U+015D->s, U+015E->s, U+015F->s, U+0160->s, U+0161->s, U+0162->t, U+0163->t, U+0164->t, U+0165->t, U+0168->u, U+0169->u, U+016A->u, U+016B->u, U+016C->u, U+016D->u, U+016E->u, U+016F->u, U+0170->u, U+0171->u, U+0172->u, U+0173->u, U+0174->w, U+0175->w, U+0176->y, U+0177->y, U+0178->y, U+0179->z, U+017A->z, U+017B->z, U+017C->z, U+017D->z, U+017E->z, U+01A0->o, U+01A1->o, U+01AF->u, U+01B0->u, U+01CD->a, U+01CE->a, U+01CF->i, U+01D0->i, U+01D1->o, U+01D2->o, U+01D3->u, U+01D4->u, U+01D5->u, U+01D6->u, U+01D7->u, U+01D8->u, U+01D9->u, U+01DA->u, U+01DB->u, U+01DC->u, U+01DE->a, U+01DF->a, U+01E0->a, U+01E1->a, U+01E6->g, U+01E7->g, U+01E8->k, U+01E9->k, U+01EA->o, U+01EB->o, U+01EC->o, U+01ED->o, U+01F0->j, U+01F4->g, U+01F5->g, U+01F8->n, U+01F9->n, U+01FA->a, U+01FB->a, U+0200->a, U+0201->a, U+0202->a, U+0203->a, U+0204->e, U+0205->e, U+0206->e, U+0207->e, U+0208->i, U+0209->i, U+020A->i, U+020B->i, U+020C->o, U+020D->o, U+020E->o, U+020F->o, U+0210->r, U+0211->r, U+0212->r, U+0213->r, U+0214->u, U+0215->u, U+0216->u, U+0217->u, U+0218->s, U+0219->s, U+021A->t, U+021B->t, U+021E->h, U+021F->h, U+0226->a, U+0227->a, U+0228->e, U+0229->e, U+022A->o, U+022B->o, U+022C->o, U+022D->o, U+022E->o, U+022F->o, U+0230->o, U+0231->o, U+0232->y, U+0233->y, U+1E00->a, U+1E01->a, U+1E02->b, U+1E03->b, U+1E04->b, U+1E05->b, U+1E06->b, U+1E07->b, U+1E08->c, U+1E09->c, U+1E0A->d, U+1E0B->d, U+1E0C->d, U+1E0D->d, U+1E0E->d, U+1E0F->d, U+1E10->d, U+1E11->d, U+1E12->d, U+1E13->d, U+1E14->e, U+1E15->e, U+1E16->e, U+1E17->e, U+1E18->e, U+1E19->e, U+1E1A->e, U+1E1B->e, U+1E1C->e, U+1E1D->e, U+1E1E->f, U+1E1F->f, U+1E20->g, U+1E21->g, U+1E22->h, U+1E23->h, U+1E24->h, U+1E25->h, U+1E26->h, U+1E27->h, U+1E28->h, U+1E29->h, U+1E2A->h, U+1E2B->h, U+1E2C->i, U+1E2D->i, U+1E2E->i, U+1E2F->i, U+1E30->k, U+1E31->k, U+1E32->k, U+1E33->k, U+1E34->k, U+1E35->k, U+1E36->l, U+1E37->l, U+1E38->l, U+1E39->l, U+1E3A->l, U+1E3B->l, U+1E3C->l, U+1E3D->l, U+1E3E->m, U+1E3F->m, U+1E40->m, U+1E41->m, U+1E42->m, U+1E43->m, U+1E44->n, U+1E45->n, U+1E46->n, U+1E47->n, U+1E48->n, U+1E49->n, U+1E4A->n, U+1E4B->n, U+1E4C->o, U+1E4D->o, U+1E4E->o, U+1E4F->o, U+1E50->o, U+1E51->o, U+1E52->o, U+1E53->o, U+1E54->p, U+1E55->p, U+1E56->p, U+1E57->p, U+1E58->r, U+1E59->r, U+1E5A->r, U+1E5B->r, U+1E5C->r, U+1E5D->r, U+1E5E->r, U+1E5F->r, U+1E60->s, U+1E61->s, U+1E62->s, U+1E63->s, U+1E64->s, U+1E65->s, U+1E66->s, U+1E67->s, U+1E68->s, U+1E69->s, U+1E6A->t, U+1E6B->t, U+1E6C->t, U+1E6D->t, U+1E6E->t, U+1E6F->t, U+1E70->t, U+1E71->t, U+1E72->u, U+1E73->u, U+1E74->u, U+1E75->u, U+1E76->u, U+1E77->u, U+1E78->u, U+1E79->u, U+1E7A->u, U+1E7B->u, U+1E7C->v, U+1E7D->v, U+1E7E->v, U+1E7F->v, U+1E80->w, U+1E81->w, U+1E82->w, U+1E83->w, U+1E84->w, U+1E85->w, U+1E86->w, U+1E87->w, U+1E88->w, U+1E89->w, U+1E8A->x, U+1E8B->x, U+1E8C->x, U+1E8D->x, U+1E8E->y, U+1E8F->y, U+1E96->h, U+1E97->t, U+1E98->w, U+1E99->y, U+1EA0->a, U+1EA1->a, U+1EA2->a, U+1EA3->a, U+1EA4->a, U+1EA5->a, U+1EA6->a, U+1EA7->a, U+1EA8->a, U+1EA9->a, U+1EAA->a, U+1EAB->a, U+1EAC->a, U+1EAD->a, U+1EAE->a, U+1EAF->a, U+1EB0->a, U+1EB1->a, U+1EB2->a, U+1EB3->a, U+1EB4->a, U+1EB5->a, U+1EB6->a, U+1EB7->a, U+1EB8->e, U+1EB9->e, U+1EBA->e, U+1EBB->e, U+1EBC->e, U+1EBD->e, U+1EBE->e, U+1EBF->e, U+1EC0->e, U+1EC1->e, U+1EC2->e, U+1EC3->e, U+1EC4->e, U+1EC5->e, U+1EC6->e, U+1EC7->e, U+1EC8->i, U+1EC9->i, U+1ECA->i, U+1ECB->i, U+1ECC->o, U+1ECD->o, U+1ECE->o, U+1ECF->o, U+1ED0->o, U+1ED1->o, U+1ED2->o, U+1ED3->o, U+1ED4->o, U+1ED5->o, U+1ED6->o, U+1ED7->o, U+1ED8->o, U+1ED9->o, U+1EDA->o, U+1EDB->o, U+1EDC->o, U+1EDD->o, U+1EDE->o, U+1EDF->o, U+1EE0->o, U+1EE1->o, U+1EE2->o, U+1EE3->o, U+1EE4->u, U+1EE5->u, U+1EE6->u, U+1EE7->u, U+1EE8->u, U+1EE9->u, U+1EEA->u, U+1EEB->u, U+1EEC->u, U+1EED->u, U+1EEE->u, U+1EEF->u, U+1EF0->u, U+1EF1->u, U+1EF2->y, U+1EF3->y, U+1EF4->y, U+1EF5->y, U+1EF6->y, U+1EF7->y, U+1EF8->y, U+1EF9->y"

Because I have a tests > test in my wordforms.txt. So tests should find test.

Edit : ah ok. So I suppose my morphology is half operating :)

swiatkiewicz mentioned this issue Sep 15, 2014

Fuzzy Word Matching #56 #61

Merged

abahgat added the feature label Oct 2, 2014

abahgat added a commit that referenced this issue Jan 2, 2015

Merge pull request #61 from efigence/master

40fb2b4

Fuzzy Word Matching with Sphinx #56

rlisowski mentioned this issue Feb 15, 2015

elasticsearch engine support #70

Closed

Fuzzy Word Matching #56

Fuzzy Word Matching #56

Comments

ZaLiTHkA commented Jul 8, 2014

abahgat commented Jul 8, 2014

ZaLiTHkA commented Jul 9, 2014

rlisowski commented Aug 5, 2014

rlisowski commented Aug 5, 2014

rlisowski commented Sep 16, 2014

ZaLiTHkA commented Sep 16, 2014

rlisowski commented Sep 16, 2014

ZaLiTHkA commented Sep 16, 2014

dominch commented Feb 4, 2015

rlisowski commented Feb 4, 2015

dominch commented Feb 4, 2015

rlisowski commented Feb 4, 2015

rlisowski commented Feb 4, 2015

dominch commented Feb 4, 2015

rlisowski commented Feb 4, 2015

dominch commented Feb 5, 2015

swiatkiewicz commented Feb 5, 2015

dominch commented Feb 5, 2015

rlisowski commented Feb 5, 2015

dominch commented Feb 5, 2015

dominch commented Feb 5, 2015

swiatkiewicz commented Feb 6, 2015

dominch commented Feb 6, 2015

swiatkiewicz commented Feb 6, 2015

dominch commented Feb 6, 2015

swiatkiewicz commented Feb 6, 2015

dominch commented Feb 6, 2015

rlisowski commented Feb 14, 2015

dominch commented Feb 18, 2015

Androc commented Feb 18, 2015

Androc commented Feb 18, 2015

dominch commented Feb 18, 2015

Androc commented Feb 18, 2015

Androc commented Feb 18, 2015

dominch commented Feb 18, 2015

Androc commented Feb 18, 2015

Androc commented Feb 18, 2015

Androc commented Feb 19, 2015

Androc commented Feb 19, 2015

dominch commented Feb 19, 2015

Androc commented Feb 19, 2015

dominch commented Feb 19, 2015

Androc commented Feb 19, 2015

Androc commented Feb 19, 2015

dominch commented Feb 19, 2015

Androc commented Feb 19, 2015