Skip to content

Latest commit

 

History

History
102 lines (66 loc) · 4.7 KB

File metadata and controls

102 lines (66 loc) · 4.7 KB

Technology forecasting using GNN

Research for master's degree of data scienece

Autonomous technology forecasting with GNN

About this project

In this project, I used link prediction algorithm based graph neural network to predict promising technology at self-driving vehicle field. I compared two different GNN models, graph convolutional network and variational graph auto-encoder. Among them, variational graph auto-encoder performs better than GCN. So I conduct link prediction task using VGAE in this project.

I will upload my paper ASAP.

Please check this if you want to know, how to make co-contribution network and how to extract promising technologies from network.

I will upload presentation file about link prediction.

Experiments details and results

The framework of this project is as follows.

A network was built based on a 'co-contribution relationship' between repositories. This is like projecting a heterogeneous network of developer-repositories to developers.

Refer to the figure below for how to build the network.

image

Community detection (Louvaion method) was used to create a community in the network, which represents an independent research area in the field of autonomous driving open source. In this study, six current major technical fields were derived.

The figure below represents six major autonomous driving open source technologies at the present time. Each node represents a repository. Through this, you can know the main technologies at the moment and the main repositories for each technology.

major tech

The figure below shows the result of running community detection again after link prediction. Through this, promising technologies for autonomous driving open source in the future can be derived.

promising tech





Dataset

Studies on prediction of promising technologies in the past have mostly used paper data. However, the paper data has a disadvantage in that it is difficult to discover the latest research trends due to the time it takes from research to registration. So, I would like to use open source data to solve such shortcomings and make predictions about promising technologies that reflect the latest research trends.

The data used in the project are 385 repositories including keywords related to 'autonomous driving'.

Each repository has basic information such as 'repository name', 'owner', and 'star counts' as well as data such as 'contributor list'.

Statistics

  • 23,017 repositories contain related keywords such as 'self-driving car' or 'autonomous drivig'

  • 3.2% repositories are owned by 'organization' not 'user'. In this study, only repositories owned by these 'organizations' are dealt with.

  • 385 repositories remained after filtering by 'contributor conts', 'stargazer couns' and 'forker counts'. They are finally used in experiments.

Features

data data type
repository name str
repository ID int
owner ID int
owner type str
repository full name str
topcis list
contributors list
contributor counts int
stargazer counts int
forker counts int
created date date
last updated datae date
readme str





Software Requirements

  • python >= 3.5
  • pytorch >= 1.9
  • pytorch geometric >= 2.02 : There are methods that are not supported in lower versions, so be sure to install them in this version or higher. Typically, the 'Train test edge split' method is not supported in previous versions.
  • scikit-learn
  • numpy
  • pandas
  • scipy
  • gephi : Tools for network visualization. It is not necessary to use this, but in this project, network visualization was performed using gephi. See here for more details.





Key files

  • link_prediction_GCN.py : Conduct link prediction using graph convolutional network model. In this project this model was not used because it did not perform well compared to other models.

  • link_prediction_GAE.py : The model used to predict the actual link. It gave better performance compared to GCN.

  • utils.py : Files are included to build the network and visualize the results. If you want to check to the degree or centrality of the network, run this file.