NOTE: This project is a work in progress and will not currently run properly.

Japanese Data Extractor

Extracting data from Japanese texts is a complicated topic. In this project, we do not try any fancy machine learning, but rather try to extract and process data using regular expressions.

Getting Started

I recommend setting up a project in PyCharm and pulling the sources from git. Run 'pip install requirements.txt' to install all required packages. Go to Edit Configurations in PyCharm and specify the below scripts. (PyCharm will add the project root to PYTHONPATH, otherwise you will have to do this manually.)

The executable scripts are per below: OBS: This project is still in early stages and have no executable scripts yet

Prerequisites

This should run fine on any environment that supports Python 3.6.

Built With

Development tools

Python 3.6 - Language runtime
PyCharm - IDE by JetBrains

Key Libraries

regex - Regex library that extends the standard re-library that is the default library that comes with Python.

See requirements.py for all libraries used.

Authors

Krister S Jakobsson - Implementation and pretty much everything else

License

This project is licensed under the Boost License - see the license file for details

Acknowledgments

Regular-Expression.info - Great page explaining regex in general and differences between platforms and libraries in particular. Link
regex101.com - Great online tool for playing around with and learning about regex. Link

Disclaimer: I am in no way associated with above mentioned homepages and tools, and take no responsibility for how they use data you input on their platforms. Use them at your own risk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Japanese Data Extractor

Getting Started

Prerequisites

Built With

Authors

License

Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

Japanese Data Extractor

Getting Started

Prerequisites

Built With

Authors

License

Acknowledgments