Skip to content

joeyliang1024/Social-Media-Mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SMM

Teammates

王竣樺 梁致銓 曾繁斌

Patent Crawler

Build Env

  1. python -m venv venv
  2. venv/Script/Activate
  3. pip install -r requirements.txt

Proxy Pool Usage

We currently utilizing 10 proxies provided by the free account of Webshare.

Warning

REMEMBER del the save page part of get_page() or your computer will be filled with htmls

Run Crawler

cd Patent_Search_Crawler
# crawling
python main.py
# merging files
python merger.py

Data

You can find the complete raw data in this link: link

  • We use the data merge_data.sqlite for training.

Tip

The merge_data.sqlite should be put at SMM folder.

Explanation of the preprocessed data (in /SMM/EDGPAT/data):

Tip

The datasets below are for training, no need to preprocess.

  • 2-1-level.csv: IPC Level mapping from 1 to 2.
  • 3-2-level.csv: IPC Level mapping from 2 to 3.
  • 4-3-level.csv: IPC Level mapping from 3 to 4.
  • 5-4-level.csv: IPC Level mapping from 4 to 5.
  • real-data.json: Illustrate the company's patents within the current year.

Note

we encode year 2018 as 0.

IPC Patent level example: patent example

Final Project: Exploring Patent Trends in Taiwan with Event-based Graph Techniques

Our Goal

Our project focuses on developing a patent prediction model specifically for forecasting Taiwan's future patent trends. Utilizing Event-based Graph techniques, this model analyzes historical patent data to identify emerging trends and patterns.

  • Data-Driven: Uses real-world patent data (Taiwan) to identify trends.
  • Dynamic: Adapts to changes in technology and innovation.
  • Predictive: Forecasts areas likely to see growth in patent filings.

Model FrameWork

model architecture

Framework of the proposed model. We just show the calculations of the patent classification codes and one of the related companies for simplicity.

Code

We utilized the code from EDGPAT

Warning

The Python env should be Python 3.6!

Required packages:

Preprocessing

Just run the build_input.ipynb

split data

We split the data into three parts: training, validation and testing by year.

Training

Run the code:

sh EDGPAT/train.sh

Note

This code will ouptut the training result in EDGPAT/out.txt

Our Results

Origin Paper Results Our Results
RecallNDCGPHR
Top-100.11750.17250.5491
Top-200.16460.17420.6304
Top-300.18680.17690.6612
Top-400.20060.17690.6800
RecallNDCGPHR
Top-100.15770.33950.8381
Top-200.23490.33640.8995
Top-300.28370.34010.9217
Top-400.31500.34270.9321
Droput 0.5 Dropout 0
RecallNDCGPHR
Top-100.16680.35520.8407
Top-200.23630.34960.8851
Top-300.27620.34870.9073
Top-400.30510.34900.9164
RecallNDCGPHR
Top-100.15770.33950.8381
Top-200.23490.33640.8995
Top-300.28370.34010.9217
Top-400.31500.34270.9321