KOBE

xxxx

Project | arXiv

Towards KnOwledge-Based pErsonalized Product Description Generation in E-commerce.
Qibin Chen^*, Junyang Lin^*, Yichang Zhang, Hongxia Yang, Jingren Zhou, Jie Tang.
^*Equal contribution.
In KDD 2019 (Applied Data Science Track)

Prerequisites

Linux or macOS
Python 3
PyTorch >= 1.0.1
NVIDIA GPU + CUDA cuDNN

Getting Started

Installation

Clone this repo.

git clone https://github.com/Lumonk/KOBE
cd KOBE

Please install dependencies by

pip install -r requirements.txt

Dataset

We use the TaoDescribe dataset, which contains 2,129,187 product titles and descriptions in Chinese.
(optional) You can download the un-preprocessed dataset from here or here (for users in China).

Training

Download preprocessed data

First, download the preprocessed TaoDescribe dataset by running python scripts/download_preprocessed_tao.py.
- If you're in regions where Dropbox are blocked (e.g. Mainland China), try python scripts/download_preprocessed_tao.py --cn.
(optional) You can peek into the data/aspect-user/preprocessed/test.src.str and data/aspect-user/preprocessed/test.tgt.str, which include product titles and descriptions in the test set, respectively. In src files, <x> <y> means this product is intended to show with aspect <x> and user category <y>. Note: this slightly differs from the <A-1>, <U-1> format descripted in the paper but basically they are the same thing. You can also peek into data/aspect-user/preprocessed/test.supporting_facts_str to see the knowledge we extracted from dbpedia for the corresponding product.

Start training

Different configurations for models in the paper are stored under the configs/ directory. Launch a specific experiment with --config to specify the path to your desired model config and --expname to specify the name/number of this experiment which will be used in logging.
We include three config files here: the baseline, KOBE without adding external knowledge, and full KOBE model.
Baseline

python core/train.py --config configs/baseline.yaml --expname baseline

KOBE without adding knowledge

python core/train.py --config configs/aspect_user.yaml --expname aspect-user

KOBE

python core/train.py --config configs/aspect_user_knowledge.yaml --expname aspect-user-knowledge

The default batch size is set to 64. If you are having OOM problems, try to decrease it with the flag --batch-size.

Track training progress

You can use TensorBoard. It can take (roughly) 12 hours for the training to stop. To get comparable results in paper, you need to train for even longer (by editing epoch in the config files). However, the current setting is enough to demonstrate the effectiveness of our model.

tensorboard --logdir experiments --port 6006

Generation

During training, the generated descriptions on the test set is saved at experiments/<expname>/candidate.txt and the ground truth is at reference.txt. This is generated by greedy search to save time in training and doesn't block repetitive terms.
To do beam search with beam width = 10, run the following command.

python core/train.py --config configs/baseline.yaml --mode eval --restore experiments/finals-baseline/checkpoint.pt --expname eval-baseline --beam-size 10

Evaluation

BLEU
DIVERSITY

Experiment results

baseline
aspect-user:
aspect-User_2: aspect, no user, encoder layers = 4
sapect-user-know :

TODO

If you have ANY difficulties to get things working in the above steps, feel free to open an issue. You can expect a reply within 24 hours.

Cite

Please cite our paper if you use this code in your own work:

@article{chen2019towards,
  title={Towards Knowledge-Based Personalized Product Description Generation in E-commerce},
  author={Chen, Qibin and Lin, Junyang and Zhang, Yichang and Yang, Hongxia and Zhou, Jingren and Tang, Jie},
  journal={arXiv preprint arXiv:1903.12457},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
aikucun_data		aikucun_data
configs		configs
core		core
data		data
experiments		experiments
scripts		scripts
toolkits		toolkits
webspider		webspider
.gitignore		.gitignore
2500freqword.str		2500freqword.str
500freqword.str		500freqword.str
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
out_file.txt		out_file.txt
processed.str		processed.str
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KOBE

xxxx

Project | arXiv

Prerequisites

Getting Started

Installation

Dataset

Training

Download preprocessed data

Start training

Track training progress

Generation

Evaluation

Experiment results

Cite

About

Releases

Packages

Languages

License

jmluu/KOBE

Folders and files

Latest commit

History

Repository files navigation

KOBE

xxxx

Project | arXiv

Prerequisites

Getting Started

Installation

Dataset

Training

Download preprocessed data

Start training

Track training progress

Generation

Evaluation

Experiment results

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages