Skip to content

A JavaScript module for web applications that need to work with Chinese text.

License

Notifications You must be signed in to change notification settings

alexamies/chinesedict-js

Repository files navigation

Chinese Dictionary JavaScript Module

Status: early prototype, interface will change

An ECMAScript 2015 (ES6) browser module for showing Chinese-English dictionary terms in web pages. The JavaScript code will load one or more dictionaries from JSON files. Basical lookup is provided as well as parsing Chinese text and highlighting the words contained in the dictionary. When a user mouses over the dictionary terms then a tooltip with the English equivalent will be displayed. When a user clicks on a term then the other details of the dictionary term will be shown.

The JavaScript module does not require a web framework, like Material or React, but it should be compatible with those. It is designed and built using plain JavaScript and to be used with modern browsers.

Indexing by traditional and simplified Chinese text is supported but no lookup by Pinyin or English is possible.

This module is used on the web sites Chinese Notes, NTI Reader, and the Humanistic Buddhism Reader.

Prerequisites

Install Node.js, version 11.

Quickstart

The file index.html is ready to be served as a demo web page. The easiest way to run this yourself is to clone it from GitHub:

git clone https://github.com/alexamies/chinesedict-js.git
cd chinesedict-js

It needs to be served on a web server (not just opened in a browser from the local file system). For example, using Express:

npm install
npm run start

Open the index.html file in a web browser at http://localhost:8080/index.html Click on one of the highlighted words. If everything is ok you should see a dialog like this (on Chrome):

Depending on the contents of the contents of the dictionary, multi-character terms may show the details for individual characters. For example, in the screenshot below.

Using the module in your own projects

Follow the instructions here to use the chinesedict-js in your own module. Refer to the demo examples if you get stuck.

Setup

Suppose that your own project is in directory public_html:

mkdir public_html
cd public_html
npm init -y

Installation

Get the chinesedict-js JavaScript module with the command:

npm install @alexamies/chinesedict-js

The files from the GitHub project will be located in the directory

node_modules/@alexamies/chinesedict-js

Basic use

With the basic use of the chinesedict-js module, a DictionaryView is created that takes care of TML DOM manipulation required to display the dictionary elements. You can import the module into your web pages with a JavaScript import statement:

import { DictionarySource,
         PlainJSBuilder } from '@alexamies/chinesedict-js';

Make sure that you reference your own JavaScript code in your HTML page using a script element with type="module"

<script src="demo_app.js" type="module"></script>

In demo_app.js, add JavaScript code to import the ES6 module:

const source = new DictionarySource('assets/words.json',
                                    'Demo Dictionary',
                                    'Just for a demo');
const builder = new PlainJSBuilder([source],
	                               '.textbody',
	                               'dict-dialog',
	                               'all');
const dictView = builder.buildDictionary(); 

Matching terms will be highlighted. The dictionary will be loaded asynchronously.

After the dictionary is loaded it can respond to user clicks on a word with a dialog. Words can also looked up directly:

const term = dictView.lookup('力'); // Example term
const entry = term.getEntries()[0]; // Get the entry
console.log(`English: ${ entry.getEnglish() }`);
console.log(`Pinyin: ${ entry.getPinyin() }`);

The PlainJSBuilder is a DictionaryBuilder implementation that creates and initializes the dictionary view for browser apps that do not depend on a web application framework. The parameters to the constructor of PlainJSBuilder are

  1. source - An array of sources with the filenames of the dictionary JSON files
  2. selector - A DOM selector for the Chinese text to be segmented
  3. dialog_id - A DOM id used to find the dialog
  4. highlight - Highlight either all the terms ('all' by default) or only proper nouns ('proper')

where 'div_id' is select for the HTML elements containing the Chinese text.

Also, in your SCSS or CSS file import the stylesheet:

@import '@alexamies/chinesedict-js/chinesedict';

The dialog-polyfill can be used for cross-browser compatibility. The dialog-polyfill files needs to be copied manually at the moment. On the command line:

npm install dialog-polyfill
cp node_modules/dialog-polyfill/dialog-polyfill.js dist/.
cp node_modules/dialog-polyfill/dialog-polyfill.css dist/.

The dialog-polyfill is not used as a module. Load it into your HTML page:

<script src="/assets/dialog-polyfill.js"></script>

Also, import the stylesheet into your CSS file with a CSS import statement:

@import '/assets/dialog-polyfill.css';

Customize the Dictionary

A very small demo dict is provided with some examples.

You can customize the module with your own dictionaries, HTML content and styles. The dictionary should be structured the same as the example words.json files provided. If you have not got your own dictionary then you can use the NTI Buddhist Text Reader Project, or Chinese Notes Chinese-English dictionary, which may be reused under the Creative Commons Attribution-Share Alike 3.0 License-CCASE 3.0.

You can use either the build/word2json.py or build/gen_dictionary.js file Nodejs command line utility to generate the dictionary file. Both utilities assumes the tab separated variable format of the words.txt file in the Chinese Notes project.

Basic usage for the Python command line utility is

python3 words2json.py "file1.tsv,file2.tsv" words.json

The words2json.py script can accept a number of source files as the first argument. They should be enclosed in double quotes and separated by commas.

Basic usage for the Node.js command line utility is

node build/gen_dictionary.js

To build with the Chinese Notes dictionary:

CHINESE_DICT_JS=$PWD
cd ..
git clone https://github.com/alexamies/chinesenotes.com.git
CNREADER_HOME=$PWD/chinesenotes.com
cd $CHINESE_DICT_JS

To generate the dictionary use the command

npm install
npm run generate_dict $CNREADER_HOME/data/words.txt

To restrict the entries to a specific topic use the --topic argument. For example,

node build/gen_dictionary.js --topic "Literary Chinese" build/words.tsv

The dictionary file is stored in JSON format.

For the CC-CEDICT dictionary

curl -O https://www.mdbg.net/chinese/export/cedict/cedict_1_0_ts_utf-8_mdbg.zip
unzip cedict_1_0_ts_utf-8_mdbg.zip
rm cedict_1_0_ts_utf-8_mdbg.zip
npm run cc-cedict

It is possible to add multiple dictionaries, as shown in the screenshot below.

Example code for mulitple dictionaries is give in demo2/test3.html with JavaScript file demo2/demo_app.js.

Integration

The module JavaScript is generated from TypeScript, which can help provide direct integration for TypeScript apps.

Building with TypeScript

The JavaScript module is based on a TypeScript module. Both use the same ECMAScript 2015 (ES6) module concepts.

Compile the TypeScript module and demo app

npm run compile

Build the JavaScript into an optimized bundle with module dependencies

npm run build

Run the demo app

npm run start

This will generate the demo_app.js file used in the basic example.

You may need to copy the type declaration file index.d.ts from the GitHub project. For example,

mkdir your_project
cd your_project
npm install @alexamies/chinesedict-js
cd ..
git clone https://github.com/alexamies/chinesedict-js.git
cp chinesedict-js/index.d.ts node_modules/@alexamies/chinesedict-js/.

Using with dictionary with your own HTML presentation

The Material Design Web example shows how to use the module without DOM dependencies. The load the dictionary data only first import DictionaryLoader and DictionarySource.

import { DictionaryLoader,
         DictionarySource,
         TextParser } from '@alexamies/chinesedict-js';

Load the dictionary with code like

const source = new DictionarySource(
               'ntireader.json',
               'NTI Reader Dictionary',
               'Nan Tien Institute Reader dictionary');
const dictionaries = new DictionaryCollection();
const loader = new DictionaryLoader([source], dictionaries);
const observable = loader.loadDictionaries();
observable.subscribe(
  () => { 
    thisApp.dictionaries = loader.getDictionaryCollection();
    const loadingStatus = thisApp.querySelectorNonNull("#loadingStatus")
    loadingStatus.innerHTML = "Dictionary loading status: loaded";
  },
  (err) => { console.error(`load error:  + ${ err }`); },
);

The loadDictionaries() function returns an RxJS Observable. When that completes the dictionary will be loaded and you can get the headwords with type Map<string, Term>.

You can parse a text string into individual terms with TextParser:

const parser = new TextParser(dictionaries);
const terms = parser.segmentText(text);

To run the demo, first copy the material demo app to an external directory

cd
cp -r chinesedict-js/material .

Copy a demo dictionary

cp chinesedict-js/assets/ntireader.json .

Change to the material directory

cd material

Install the dependencies with the comm

npm install

Compile the TypeScript

npm run compile

Build the CSS and JavaScript bundles with the command

npm run build

Start the webpack dev server with the command

npm start

You should see something like the screenshot below.

If you get stuck read the instructions at Using MDC Web with Sass and ES2015.

Cross browser support

Cross browser support is provided for the HTML dialog using dialog-polyfill since the dialog element is not yet supported natively by Edge or Safari.

Modern browsers ES6 style JavaScript including modules. If you want to support older browsers you will need to do that with a different compilation target for the tsc TypeScript compiler above. However, this will result in less readable and slower code.

Mobile Device Support

The module can be used on web pages designed for mobile devices although it has not yet been designed for an optimal experience.

Performance

Bundling and minification with WebPack or Babel may help but their current ES6 module support lags behind browsers. Use of common JS modules does not perform adequately, except for very small dictionaries and text sizes.

App Engine

An example of deploying to App Engine is given in demo. See this at chinesedictdemo.appspot.com.

Other Frameworks

Develop an implementation of the TypeScript DictionaryBuilder interface or work with the JavaScript directly to create an initialize the dictionary for your framework.

About

A JavaScript module for web applications that need to work with Chinese text.

Resources

License

Stars

Watchers

Forks

Packages

No packages published