qinwf
diff --git a/‎.Rbuildignore
Lines changed: 2 additions & 0 deletions b/‎.Rbuildignore
Lines changed: 2 additions & 0 deletions
diff --git a/‎.gitignore
Lines changed: 3 additions & 0 deletions b/‎.gitignore
Lines changed: 3 additions & 0 deletions
diff --git a/‎DESCRIPTION
Lines changed: 15 additions & 0 deletions b/‎DESCRIPTION
Lines changed: 15 additions & 0 deletions
diff --git a/‎LICENSE
Lines changed: 2 additions & 0 deletions b/‎LICENSE
Lines changed: 2 additions & 0 deletions
diff --git a/‎NAMESPACE
Lines changed: 1 addition & 0 deletions b/‎NAMESPACE
Lines changed: 1 addition & 0 deletions
diff --git a/‎R/jiebaRD-package.r
Lines changed: 17 additions & 0 deletions b/‎R/jiebaRD-package.r
Lines changed: 17 additions & 0 deletions
diff --git a/‎inst/dict/README.md
Lines changed: 31 additions & 0 deletions b/‎inst/dict/README.md
Lines changed: 31 additions & 0 deletions
diff --git a/‎inst/dict/backup.rda
210 Bytes b/‎inst/dict/backup.rda
210 Bytes
diff --git a/‎inst/dict/hmm_model.zip
193 KB b/‎inst/dict/hmm_model.zip
193 KB
diff --git a/‎inst/dict/idf.zip
1.9 MB b/‎inst/dict/idf.zip
1.9 MB
@@ -0,0 +1,2 @@
+^.*\.Rproj$
+^\.Rproj\.user$
@@ -0,0 +1,3 @@
+.Rproj.user
+.Rhistory
+.RData
@@ -0,0 +1,15 @@
+Package: jiebaRD
+Type: Package
+Title: Chinese Text Segmentation Data for jiebaR Package
+Description: jiebaR is a package for Chinese text segmentation, keyword extraction
+   and speech tagging. This package provide the data files required by jiebaR.
+Version: 0.1
+Date: 2015-01-03
+Author: Qin Wenfeng
+Maintainer: Qin Wenfeng <[email protected]>
+License: MIT + file LICENSE
+Suggests:
+    jiebaR
+URL: https://github.com/qinwf/jiebaRD/
+BugReports: https://github.com/qinwf/jiebaRD/issues
+NeedsCompilation: no
@@ -0,0 +1,2 @@
+YEAR: 2014-2015
+COPYRIGHT HOLDER: Qin Wenfeng
@@ -0,0 +1 @@
+exportPattern("^[[:alpha:]]+")
@@ -0,0 +1,17 @@
+#' A package for Chinese text segmentation
+#'
+#' jiebaR is a package for Chinese text segmentation, keyword extraction
+#' and speech tagging. This package provide the data files required by jiebaR.
+#' jiebaR supports four types of segmentation mode: Maximum Probability, Hidden Markov Model,
+#' Query Segment and Mix Segment.
+#'
+#' You can use custom dictionary to be included in the jiebaR default dictionary.
+#' jiebaR can also identify new words, but adding your own new words will ensure a higher
+#' accuracy.
+#'
+#' @docType package
+#' @name jiebaRD
+#' @author Qin Wenfeng <\url{http://qinwenfeng.com}>
+#' @references CppJieba \url{https://github.com/aszxqw/cppjieba};
+#' @seealso JiebaR \url{https://github.com/qinwf/jiebaR};
+NULL
@@ -0,0 +1,31 @@
+# CppJieba字典
+
+文件后缀名代表的是词典的编码方式。
+比如filename.utf8 是 utf8编码，filename.gbk 是 gbk编码方式。
+
+
+## 分词
+
+### jieba.dict.utf8/gbk
+
+作为最大概率法(MPSegment: Max Probability)分词所使用的词典。
+
+### hmm_model.utf8/gbk
+
+作为隐式马尔科夫模型(HMMSegment: Hidden Markov Model)分词所使用的词典。
+
+__对于MixSegment(混合MPSegment和HMMSegment两者)则同时使用以上两个词典__
+
+
+## 关键词抽取
+
+### idf.utf8
+
+IDF(Inverse Document Frequency)
+在KeywordExtractor中，使用的是经典的TF-IDF算法，所以需要这么一个词典提供IDF信息。
+
+### stop_words.utf8
+
+停用词词典
+
+
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+.Rproj.user`
	`2`	`+.Rhistory`
	`3`	`+.RData`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+YEAR: 2014-2015`
	`2`	`+COPYRIGHT HOLDER: Qin Wenfeng`