Skip to content

Commit 80659a7

Browse files
author
lulinbing
committed
First version
1 parent d1335f4 commit 80659a7

35 files changed

+815
-70
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
*.log
44
_build/
55
temp/
6+
data/
67

78
# pycharm temp files
89
.idea/

README.md

Lines changed: 9 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -1,72 +1,11 @@
1-
# 评论情感极性判断模型
2-
基于京东评论的情感极性判断模型,采用了fasttext进行分类。
1+
# 前言
2+
一些关于自然语言的基本模型
33

4-
## 模型效果
5-
使用京东80w条评论数据训练,10w条评论数据测试:
6-
#### 模型参数
4+
# 目录
5+
* 基于HMM的中文分词模型
6+
* 基于fasttext的情感极性判断模型
77

8-
lr = 0.01
9-
lr_update_rate = 100
10-
dim = 300
11-
ws = 5
12-
epoch = 10
13-
word_ngrams = 3
14-
loss = hs
15-
bucket = 2000000
16-
thread = 4
17-
18-
#### 效果
19-
20-
('precision:', 0.85055)
21-
('recall:', 0.85055)
22-
('examples:', 100000)
23-
24-
## 快速开始
25-
#### 语料分词
26-
27-
python manage.py cut
28-
29-
#### 模型训练
30-
31-
python manage.py train
32-
33-
#### 模型测试
34-
35-
python manage.py test
36-
37-
## 文档
38-
#### 代码文档
39-
基于sphnix生成,请确保已经安装
40-
41-
cd doc
42-
make html
43-
44-
#### 博客
45-
博客地址:http://blog.csdn.net/sinat_33741547/article/details/78803766
46-
47-
## 参考
48-
#### Enriching Word Vectors with Subword Information
49-
50-
[1] P. Bojanowski\*, E. Grave\*, A. Joulin, T. Mikolov, [*Enriching Word Vectors with Subword Information*](https://arxiv.org/pdf/1607.04606v1.pdf)
51-
52-
```
53-
@article{bojanowski2016enriching,
54-
title={Enriching Word Vectors with Subword Information},
55-
author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
56-
journal={arXiv preprint arXiv:1607.04606},
57-
year={2016}
58-
}
59-
```
60-
61-
#### Bag of Tricks for Efficient Text Classification
62-
63-
[2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, [*Bag of Tricks for Efficient Text Classification*](https://arxiv.org/pdf/1607.01759v2.pdf)
64-
65-
```
66-
@article{joulin2016bag,
67-
title={Bag of Tricks for Efficient Text Classification},
68-
author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},
69-
journal={arXiv preprint arXiv:1607.01759},
70-
year={2016}
71-
}
72-
```
8+
# 历史版本
9+
## 2017.12.21
10+
* 增加基于HMM的中文分词模型
11+
* 增加基于fasttext的情感极性判断模型

data/.gitkeep

Whitespace-only changes.

segment/README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# 中文分词模型
2+
一个纯粹使用HMM的中文分词模型
3+
4+
# 模型效果
5+
6+
准确率:0.7416711648494009
7+
召回率:0.6686293783881109
8+
F1:0.7032587944456672
9+
10+
# 快速开始
11+
### 模型训练
12+
13+
python manage.py train
14+
15+
### 模型测试
16+
17+
python manage.py test
18+
19+
# 文档
20+
### 代码文档
21+
基于sphnix生成,请确保已经安装
22+
23+
cd doc
24+
make html
25+
26+
### 博客
27+
博客地址
28+
29+
# 参考

segment/doc/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line.
5+
SPHINXOPTS =
6+
SPHINXBUILD = sphinx-build
7+
SPHINXPROJ = segment
8+
SOURCEDIR = .
9+
BUILDDIR = _build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

segment/doc/conf.py

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
# -*- coding: utf-8 -*-
2+
#
3+
# segment documentation build configuration file, created by
4+
# sphinx-quickstart on Tue Dec 19 14:03:29 2017.
5+
#
6+
# This file is execfile()d with the current directory set to its
7+
# containing dir.
8+
#
9+
# Note that not all possible configuration values are present in this
10+
# autogenerated file.
11+
#
12+
# All configuration values have a default; values that are commented out
13+
# serve to show the default.
14+
15+
# If extensions (or modules to document with autodoc) are in another directory,
16+
# add these directories to sys.path here. If the directory is relative to the
17+
# documentation root, use os.path.abspath to make it absolute, like shown here.
18+
#
19+
import os
20+
import sys
21+
sys.path.insert(0, os.path.abspath('.'))
22+
root_path = os.path.join(os.path.abspath(os.path.dirname(__file__)), '..')
23+
sys.path.insert(0, root_path)
24+
25+
26+
# -- General configuration ------------------------------------------------
27+
28+
# If your documentation needs a minimal Sphinx version, state it here.
29+
#
30+
# needs_sphinx = '1.0'
31+
32+
# Add any Sphinx extension module names here, as strings. They can be
33+
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
34+
# ones.
35+
extensions = ['sphinx.ext.autodoc']
36+
37+
# Add any paths that contain templates here, relative to this directory.
38+
templates_path = ['_templates']
39+
40+
# The suffix(es) of source filenames.
41+
# You can specify multiple suffix as a list of string:
42+
#
43+
# source_suffix = ['.rst', '.md']
44+
source_suffix = '.rst'
45+
46+
# The master toctree document.
47+
master_doc = 'index'
48+
49+
# General information about the project.
50+
project = u'segment'
51+
copyright = u'2017, lpty'
52+
author = u'lpty'
53+
54+
# The version info for the project you're documenting, acts as replacement for
55+
# |version| and |release|, also used in various other places throughout the
56+
# built documents.
57+
#
58+
# The short X.Y version.
59+
version = u'0.1'
60+
# The full version, including alpha/beta/rc tags.
61+
release = u'0.1'
62+
63+
# The language for content autogenerated by Sphinx. Refer to documentation
64+
# for a list of supported languages.
65+
#
66+
# This is also used if you do content translation via gettext catalogs.
67+
# Usually you set "language" from the command line for these cases.
68+
language = None
69+
70+
# List of patterns, relative to source directory, that match files and
71+
# directories to ignore when looking for source files.
72+
# This patterns also effect to html_static_path and html_extra_path
73+
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
74+
75+
# The name of the Pygments (syntax highlighting) style to use.
76+
pygments_style = 'sphinx'
77+
78+
# If true, `todo` and `todoList` produce output, else they produce nothing.
79+
todo_include_todos = False
80+
81+
82+
# -- Options for HTML output ----------------------------------------------
83+
84+
# The theme to use for HTML and HTML Help pages. See the documentation for
85+
# a list of builtin themes.
86+
#
87+
html_theme = 'alabaster'
88+
89+
# Theme options are theme-specific and customize the look and feel of a theme
90+
# further. For a list of options available for each theme, see the
91+
# documentation.
92+
#
93+
html_theme_options = {
94+
'description': u'segment',
95+
'font_family': u'"Hiragino Sans GB","STHeiti","Microsoft Yahei"',
96+
'caption_font_family': u'"Hiragino Sans GB","STHeiti","Microsoft Yahei"',
97+
}
98+
99+
# Add any paths that contain custom static files (such as style sheets) here,
100+
# relative to this directory. They are copied after the builtin static files,
101+
# so a file named "default.css" will overwrite the builtin "default.css".
102+
html_static_path = ['_static']
103+
104+
105+
# -- Options for HTMLHelp output ------------------------------------------
106+
107+
# Output file base name for HTML help builder.
108+
htmlhelp_basename = 'segmentdoc'
109+
110+
111+
# -- Options for LaTeX output ---------------------------------------------
112+
113+
latex_elements = {
114+
# The paper size ('letterpaper' or 'a4paper').
115+
#
116+
# 'papersize': 'letterpaper',
117+
118+
# The font size ('10pt', '11pt' or '12pt').
119+
#
120+
# 'pointsize': '10pt',
121+
122+
# Additional stuff for the LaTeX preamble.
123+
#
124+
# 'preamble': '',
125+
126+
# Latex figure (float) alignment
127+
#
128+
# 'figure_align': 'htbp',
129+
}
130+
131+
# Grouping the document tree into LaTeX files. List of tuples
132+
# (source start file, target name, title,
133+
# author, documentclass [howto, manual, or own class]).
134+
latex_documents = [
135+
(master_doc, 'segment.tex', u'segment Documentation',
136+
u'lpty', 'manual'),
137+
]
138+
139+
140+
# -- Options for manual page output ---------------------------------------
141+
142+
# One entry per manual page. List of tuples
143+
# (source start file, name, description, authors, manual section).
144+
man_pages = [
145+
(master_doc, 'segment', u'segment Documentation',
146+
[author], 1)
147+
]
148+
149+
150+
# -- Options for Texinfo output -------------------------------------------
151+
152+
# Grouping the document tree into Texinfo files. List of tuples
153+
# (source start file, target name, title, author,
154+
# dir menu entry, description, category)
155+
texinfo_documents = [
156+
(master_doc, 'segment', u'segment Documentation',
157+
author, 'segment', 'One line description of project.',
158+
'Miscellaneous'),
159+
]
160+
161+
162+

segment/doc/index.rst

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
.. sentiment documentation master file, created by
2+
sphinx-quickstart on Thu Dec 14 14:42:17 2017.
3+
You can adapt this file completely to your liking, but it should at least
4+
contain the root `toctree` directive.
5+
6+
基于HMM的中文分词模型的文档
7+
========================
8+
9+
中文分词模型主要由CORPUS、MODEL、API组成
10+
11+
1.CORPUS
12+
--------
13+
14+
封装了一套关于语料处理的API,具体内容请看:
15+
16+
.. toctree::
17+
:maxdepth: 1
18+
19+
segment.corpus
20+
21+
22+
2.MODEL
23+
--------
24+
25+
封装了模型的代码,具体内容请看:
26+
27+
.. toctree::
28+
:maxdepth: 1
29+
30+
segment.model
31+
32+
33+
3.API
34+
--------
35+
36+
封装了工程对外提供接口,具体内容请看:
37+
38+
.. toctree::
39+
:maxdepth: 1
40+
41+
segment.api
42+
43+
44+
Indices and tables
45+
==================
46+
47+
* :ref:`genindex`
48+
* :ref:`modindex`
49+
* :ref:`search`

segment/doc/make.bat

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
@ECHO OFF
2+
3+
pushd %~dp0
4+
5+
REM Command file for Sphinx documentation
6+
7+
if "%SPHINXBUILD%" == "" (
8+
set SPHINXBUILD=sphinx-build
9+
)
10+
set SOURCEDIR=.
11+
set BUILDDIR=_build
12+
set SPHINXPROJ=segment
13+
14+
if "%1" == "" goto help
15+
16+
%SPHINXBUILD% >NUL 2>NUL
17+
if errorlevel 9009 (
18+
echo.
19+
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
20+
echo.installed, then set the SPHINXBUILD environment variable to point
21+
echo.to the full path of the 'sphinx-build' executable. Alternatively you
22+
echo.may add the Sphinx directory to PATH.
23+
echo.
24+
echo.If you don't have Sphinx installed, grab it from
25+
echo.http://sphinx-doc.org/
26+
exit /b 1
27+
)
28+
29+
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
30+
goto end
31+
32+
:help
33+
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
34+
35+
:end
36+
popd

segment/doc/segment.api.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.. automodule:: segment.api
2+
:members:
3+
:undoc-members:
4+
:show-inheritance:

0 commit comments

Comments
 (0)