Spell Checker suggestions #2227
Replies: 10 comments
-
Hello, This task involves
In short, this task requires changing a data structure and loads of PyFST code. I wish best of luck to the volunteers :) |
Beta Was this translation helpful? Give feedback.
-
Have you tried Symspell? I just published an extension for Spacy on PyPi implementing Symspell. I have a simple good idea on how the accuracy can be improved using character groups which are like unit morphemes. |
Beta Was this translation helpful? Give feedback.
-
@xwiz explain it in detail please, I am interested |
Beta Was this translation helpful? Give feedback.
-
Basically the root idea is to sense the context and grammatical correctness based on word juxtaposition frequency and potentially grammatical rules for completeness. For example look at the following sentences:
These two sentences are corrections provided by From those suggestions, it is clear that symspell does not take into account context, grammatical rules and word-with-word commonness (i.e. how words are commonly combined). Furthermore a spelling corrector should be able to determine randomness as well as incomprehensibility. Most human beings won't be able to understand: 'whas do dou meansad thati am thefull' but most deep learning based correctors would figure out something out of that. The right way to respond to a sentence like that would be 'what' or 'pardon'? It is not completely random but it is incoherent. The solution would imply
I will start a repo to test the statistical/non deep learning approach right away. |
Beta Was this translation helpful? Give feedback.
-
Hi, I created a spell checker which uses context to correct the misspells. If you get time do have a look! I plan to submit it to Spacy Universe GitHub: https://github.com/R1j1t/contextualSpellCheck There are some features left (like RWE or using cython) for development, but I plan to complete it in the coming days. While developing this, I really liked the Spacy documentation and the architecture. I did not think it would be this easy to add the extensions or pipeline object. Love the Spacy project! |
Beta Was this translation helpful? Give feedback.
-
@R1j1t have you ever compared yourself with all the other major contextual spell checkers? It would be good to see which ones have more features, and others that are the best at integration with other systems, and learn from that. |
Beta Was this translation helpful? Give feedback.
-
Interesting one @R1j1t Did you try it with my example above? i.e.
|
Beta Was this translation helpful? Give feedback.
-
That is a good idea, @DonaldTsang, contributions are always welcome! @xwiz I tried the sentence you quoted, but the model output is not what you or I would expect. >>> doc = nlp("What doyuoknowabout antyhing?")
>>> doc._.outcome_spellCheck
'What about mean?' I would not defend the model, but I think it will take time. Just out of curiosity I tried that same input sentence on duckduckgo and google search. On DuckDuckGo, I did not get any suggestion (seemed a bit weird) and on google I was prompted
But, like I said, it is still a work in progress and at present, I am focusing on RWE and performance improvement. If you find the work interesting please jump in and help out on some the tasks you mentioned. I would love to see people contributing. |
Beta Was this translation helpful? Give feedback.
-
Yeah Google uses a mostly statistical based approach and it works well. The return value could mean that |
Beta Was this translation helpful? Give feedback.
-
okay, but dont you think |
Beta Was this translation helpful? Give feedback.
-
Having a spell checker that involves:
Beta Was this translation helpful? Give feedback.
All reactions