Skip to content

Latest commit

 

History

History
49 lines (29 loc) · 1.87 KB

README.md

File metadata and controls

49 lines (29 loc) · 1.87 KB

NOTE: This project is a work in progress and will not currently run properly.

Japanese Data Extractor

Extracting data from Japanese texts is a complicated topic. In this project, we do not try any fancy machine learning, but rather try to extract and process data using regular expressions.

Getting Started

I recommend setting up a project in PyCharm and pulling the sources from git. Run 'pip install requirements.txt' to install all required packages. Go to Edit Configurations in PyCharm and specify the below scripts. (PyCharm will add the project root to PYTHONPATH, otherwise you will have to do this manually.)

The executable scripts are per below: OBS: This project is still in early stages and have no executable scripts yet

Prerequisites

This should run fine on any environment that supports Python 3.6.

Built With

Development tools

Key Libraries

  • regex - Regex library that extends the standard re-library that is the default library that comes with Python.

See requirements.py for all libraries used.

Authors

  • Krister S Jakobsson - Implementation and pretty much everything else

License

This project is licensed under the Boost License - see the license file for details

Acknowledgments

  • Regular-Expression.info - Great page explaining regex in general and differences between platforms and libraries in particular. Link
  • regex101.com - Great online tool for playing around with and learning about regex. Link

Disclaimer: I am in no way associated with above mentioned homepages and tools, and take no responsibility for how they use data you input on their platforms. Use them at your own risk.