Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaged vPhon for PyPI and added convert_line() function #2

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
**/*.pyc
**/**/__pycache__
9 changes: 7 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
# CHANGELOG

## [0.3.2] - 2018-11-26
### Added
- Added `convert_line` function to use vPhon within Python code.
- Added `setup.py` and packaged vPhon for PyPI

## [0.2.6] - 2016-11-09
### Added
- Added -o, --ortho flag to output Viet orthography along with IPA transcription.
- Added -m, --delimit flag to have output separated by a user-specified delimiter.

### Changed
- Output now uses Python 3-style print function
- Output now uses Python 3-style print function
- Supplying a filename as a command-line argument no longer works: vPhon now reads only from STDIN and writes only to STDOUT. Any text processing should be done in usual Unix fashion (i.e. using shell redirection, `cat`, etc.).

### Fixed
Expand All @@ -16,5 +21,5 @@

## [0.2.5b] - 2016-03-16
### Added
- Added -t, --tokenize flag to preserve underscores or hyphens in tokenized inputs, so that e.g. anh_ta is output as anh1_ta1.
- Added -t, --tokenize flag to preserve underscores or hyphens in tokenized inputs, so that e.g. anh_ta is output as anh1_ta1.
This flag has the effect of not automatically treating inputs with hyphens as non-Viet words.
34 changes: 25 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# vPhon: a Vietnamese phonetizer

Package: vPhon version 0.2.6
Package: vPhon version 0.3.2

Author: James Kirby <[email protected]>
Authors: James Kirby <[email protected]>

Web: https://github.com/kirbyj/vPhon

Expand Down Expand Up @@ -35,7 +35,7 @@ By default, vPhon does not recognize final palatal segments [c ɲ], as their val

As of version 0.2.2, final labialized allophones of /ŋ k/ are represented as [ŋ͡m k͡p].

###Tones
### Tones

vPhon represents tone using one of two methods. By default, vPhon will return Chao tone numbers based on Alves (2007a), Hoàng (1989), Nguyễn and Edmonson (1997), and Vũ (1982).

Expand Down Expand Up @@ -66,11 +66,27 @@ tones are both phonetized as 4 when vPhon is passed the `-6` or `-8` flags, repr

## Installation

No installation is required. You must have a working version of Python (>= 2.4) installed and in your path. vPhon requires
the `__future__`, `string`, `StringIO`, and `optparse` modules, all of which should come standard with Python >= 2.4.x.
<!-- No installation is required. You must have a working version of Python (>= 2.4) installed and in your path. vPhon requires -->
<!-- the `__future__`, `string`, `StringIO`, and `optparse` modules, all of which should come standard with Python >= 2.4.x. -->

vPhon can be installed from pip using:
<!-- edit this once uploaded to PyPI -->
```
pip install --index-url https://test.pypi.org/simple/ vPhon
```

## Usage

### From within Python

Import vPhon and call the `convert_line()` function like so:
``` python
from vPhon import vPhon
vPhon.convert_line("Trong một cuộc", dialect="n")
```

### As a standalone script

vPhon takes an obligatory `-d, --dialect` option, specifying the dialect correspondence set to be used for phonetization
([N]orthern, [C]entral, or [S]outhern). The correspondence files may be found in the `Rules/` directory, and modified as necessary.

Expand Down Expand Up @@ -107,15 +123,15 @@ If no argument is supplied on the command line, vPhon will enter an interactive
The `--tokenize` flag is useful if you are processing an older source in which morphemes are separated by hyphens, and you wish to retain the hyphens in your output, or if you are processing the output of e.g. [vnTokenizer](http://mim.hus.vnu.edu.vn/phuonglh/softwares/vnTokenizer):

```
[user@terminal]$ python vPhon.py -d N -t test/tokenized.txt
[user@terminal]$ python vPhon.py -d N -t test/tokenized.txt
căw24 oŋ͡m33_ta3 kuŋ͡m35g viən33 cɯə33 biət45
```

The `--delimit` flag will produce produce output where each phonetic symbol is separated by user-specified delimiter. If you use this flag, you must also specify a delimiter.

## Notes

All non-alphanumeric characters in the input are stripped prior to processing (unless the `--tokenize` option is selected, in which case `-` and `_` will be retained in the output).
All non-alphanumeric characters in the input are stripped prior to processing (unless the `--tokenize` option is selected, in which case `-` and `_` will be retained in the output).

Any input containing non-Vietnamese orthography, or series of characters not conforming to Vietnamese phonotactics, will be braced in the output, e.g.

Expand All @@ -130,10 +146,10 @@ Try running the examples in the `test/` directory to get a better idea of this b
If you use vPhon for a project or paper, please cite it as:

Kirby, James. 2008. vPhon: a Vietnamese phonetizer (version 0.2.6). Retrieved on <date> from http://github.com/kirbyj/vPhon/.

## Alternatives

[ADRPhone](http://www.mica.edu.vn/ADRPhone) is a lightweight, standalone phonetizer for Vietnamese written in standard C by Nguyễn Thị Minh Tuyền and Mathias Rossignol. It has many of the same functions as vPhon, but helpfully outputs XML as well.
[ADRPhone](http://www.mica.edu.vn/ADRPhone) is a lightweight, standalone phonetizer for Vietnamese written in standard C by Nguyễn Thị Minh Tuyền and Mathias Rossignol. It has many of the same functions as vPhon, but helpfully outputs XML as well.

## Thank You

Expand Down
5 changes: 5 additions & 0 deletions build/lib/vPhon/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# -*- coding: utf-8 -*-
name = "vPhon"
from .rules.north import *
from .rules.central import *
from .rules.south import *
5 changes: 5 additions & 0 deletions build/lib/vPhon/rules/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# -*- coding: utf-8 -*-
name = "rules"
from .north import *
from .central import *
from .south import *
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading