Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
LJoson committed May 9, 2021
0 parents commit c669515
Show file tree
Hide file tree
Showing 471 changed files with 52,772 additions and 0 deletions.
13 changes: 13 additions & 0 deletions readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## 学习

https://zhuanlan.zhihu.com/p/362193124





2021/5/3 15:16:47
各位同学在参赛过程中如遇到问题,可参考FAQ文档,文档中的问题包括数据•代金券•TI-ONE平台操作等等方面。
【腾讯文档】2021腾讯广告算法大赛FAQ
https://docs.qq.com/doc/DV1hFUGpMV1l3eVdV
如您有其他的问题,欢迎群内反馈,我们会不断更新丰富FAQ~祝比赛顺利🥰
80 changes: 80 additions & 0 deletions structuring/VideoStructuring/MultiModal-Tagging/ReadMe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# 0.简介
多模态视频标签模型框架

# 1. 代码结构
- configs--------------------# 模型选择,参数文件
- src------------------------# 数据加载/模型相关代码
- scripts--------------------# 数据预处理/训练/测试脚本
- checkpoints----------------# 模型权重/日志
- pretrained-----------------# 预训练模型
- dataset--------------------# 数据集和标签字典
- utils----------------------# 工具脚本
- ReadMe.md

# 2. 环境配置
sudo apt-get update
sudo apt-get install libsndfile1-dev ffmpeg imagemagick nfs-kernel-server
pip install -r requirement.txt

## 2.1 配置 imagemagick
删除或注释配置文件/etc/ImageMagick-6/policy.xml中的:
`<policy domain="path" rights="none" pattern="@*" />`

# 3. 训练流程
## 3.1 数据预处理
根据特定任务准备训练数据,视频/音频特征提取可参考scripts/preprocess.sh
bash scripts/preprocess.sh tagging.txt 0 0 4 1

## 3.2 加载预训练模型参数
参考pretrained目录下说明

## 3.3 启动训练
python scripts/train_tagging.py --config configs/config.ad_content.yaml

## 3.4 训练验证tensorboard曲线
tensorboard --logdir checkpoints --port 8080

# 4. 验证流程
python scripts/eval.py --config configs/config.ad_content.yaml --ckpt_step -1

1. 分别输出多模态融合特征, 视觉特征,音频特征,文本特征的评测指标:
* Hit@1(模型预测得分最高标签的准确率)
* PERR(按预测得分大小,取前k个预测输出tag对应的准确率,其中k=该样本gt中包含的标签个数)
* MAP(mean average precision)
* [GAP(Global Average Precision)](https://www.kaggle.com/c/youtube8m/overview/evaluation)
2. 输出对每个标签的频次统计和每个标签的ap(average precision) **用于分析每个标签的准确度**
3. 输出各个标签之间的相关性统计矩阵M, $M_{a,B}$即样本标签为a时, 模型预测为B(b1,b2,...)的分布频次统计, 保留前top_k个结果保**用于将相似标签合并,更新标签字典文件**
4. 保存输出文件eval_tag_analysis.txt, `每行依次表示tag_freq,tag_ap,tag_conf,tag_precision,tag_recall`, 通过scripts/tag_analysis.ipynb对验证结果进行分析

# 5. 测试流程
python scripts/inference.py --model_pb checkpoints/ad_content_form/v1/export/step_7000_0.8217 \
--tag_id_file dataset/dict/tag-id-ad_content_b0.txt \
--test_dir dataset/looklike_interview \
--postfix mp4 \
--output ./pred_output.txt \
--top_k 5
> 参数说明
```
--model_pb 导出模型pb目录
--tag_id_file 标签字典文件
--test_dir 输入测试文件目录
--output 预测输出标签结果保存文件
--top_k 预测输出标签个数
--postfix 测试文件格式, mp4或者jpg文件
```

# 6. Badcase 分析
## 6.1 预测可视化
python scripts/write_prediction.py --inference_file checkpoints/ad_content_form/v1/inference_result_fusion.txt --sample_num 200 --save_dir temp --tag_id_file dataset/dict/tag-id-ad_content_b0.txt --test_dir dataset/videos/ad_content --gt_file dataset/info_files/ad_content_datafile_b0.txt --postfix mp4

> 参数说明
```
--inference_file 预测输出标签结果保存文件
--sample_num 随机采样可视化sample_num个视频
--gt_file 样本对应gt文件(可选项)
--save_dir 可视化文件保存路径
--filter_tag_name 只可视化带有该标签的样本(可选项)
--tag_id_file 标签字典文件(可选项,当filter_tag_name不为空时需要)
--test_dir 测试文件目录
--postfix 测试文件格式, mp4或者jpg文件
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
#############################################################
# 1. Model Define Configs
#############################################################
ModelConfig:
model_type: 'NextVladBERT'
use_modal_drop: True #在训练过程中,对多模态特征的某一模态进行丢弃
with_embedding_bn: False #对不同模态输入特征进行BN归一化
modal_drop_rate: 0.3
with_video_head: True #视频特征
with_audio_head: True #音频特征
with_text_head: True #文本特征
with_image_head: True # False #图片特征

#视频特征(16384)
video_head_type: 'NeXtVLAD'
video_head_params:
nextvlad_cluster_size: 128
groups: 16
expansion: 2
feature_size: 1024 #inception feature dim
max_frames: 300

#语音特征(1024)
audio_head_type: 'NeXtVLAD'
audio_head_params:
nextvlad_cluster_size: 64
groups: 16
expansion: 2
feature_size: 128 #vggfish feature dim
max_frames: 300

#文本特征(1024)
text_head_type: 'BERT'
text_head_params:
bert_config:
attention_probs_dropout_prob: 0.1
hidden_act: "gelu"
hidden_dropout_prob: 0.1
hidden_size: 768
initializer_range: 0.02
intermediate_size: 3072
max_position_embeddings: 512
num_attention_heads: 12
num_hidden_layers: 12
type_vocab_size: 2
vocab_size: 21128
bert_emb_encode_size: 1024

#图片特征(2048)
image_head_type: 'resnet_v2_50'
image_head_params: {}


#多模态特征融合方式
fusion_head_type: 'SE'
fusion_head_params:
hidden1_size: 1024
gating_reduction: 8 # reduction factor in se context gating
drop_rate:
video: 0.8
audio: 0.5
image: 0.5
text: 0.5
fusion: 0.8

#tagging分类器参数
tagging_classifier_type: 'LogisticModel'
tagging_classifier_params:
num_classes: 82 #标签数目, 按需修改

#############################################################
#2. Optimizer & Train Configs
#############################################################
OptimizerConfig:
optimizer: 'AdamOptimizer'
optimizer_init_params: {}
clip_gradient_norm: 1.0
learning_rate_dict:
video: 0.0001
audio: 0.0001
text: 0.00001
image: 0.0001
classifier: 0.01
loss_type_dict:
tagging: "CrossEntropyLoss"
max_step_num: 10000
export_model_steps: 1000
learning_rate_decay: 0.1
start_new_model: True # 如果为True,重新训练; 如果False,则resume
num_gpu: 1
log_device_placement: False
gpu_allow_growth: True
pretrained_model:
text_pretrained_model: 'pretrained/bert/chinese_L-12_H-768_A-12/bert_model.ckpt'
image_pretrained_model: 'pretrained/resnet_v2_50/resnet_v2_50.ckpt'
train_dir: './checkpoints/tagging5k_temp' #训练模型保存目录,按需修改

#############################################################
# 3. DataSet Config
#############################################################
DatasetConfig:
batch_size: 32
shuffle: True
train_data_source_list:
train799:
file: '../dataset/tagging/GroundTruth/datafile/train.txt' # preprocessing脚本生成文件,按需求修改 (datafile)
batch_size: 32

valid_data_source_list:
val799:
file: '../dataset/tagging/GroundTruth/datafile/val.txt' # preprocessing脚本生成文件,按需求修改
batch_size: 32

preprocess_root: 'src/dataloader/preprocess/'
preprocess_config:
feature:
- name: 'video,video_frames_num,idx'
shape: [[300,1024], [],[]]
dtype: 'float32,int32,string'
class: 'frames_npy_preprocess.Preprocess'
extra_args:
max_frames: 300
feat_dim: 1024
return_frames_num: True
return_idx: True

- name: 'audio,audio_frames_num'
shape: [[300,128], []]
dtype: 'float32,int32'
class: 'frames_npy_preprocess.Preprocess'
extra_args:
max_frames: 300
feat_dim: 128
return_frames_num: True

- name: 'image'
shape: [[224,224,3]]
dtype: 'float32'
class: 'image_preprocess.Preprocess'

- name: 'text'
shape: [[128]]
dtype: 'int64'
class: 'text_preprocess.Preprocess'
extra_args:
vocab: 'pretrained/bert/chinese_L-12_H-768_A-12/vocab.txt'
max_len: 128
label:
- name: 'tagging'
dtype: 'float32'
shape: [[82]] # 根据 num_classes修改
class: 'label_preprocess.Preprocess_label_sparse_to_dense'
extra_args:
index_dict: '../dataset/label_id.txt' # 按需求更改
Loading

0 comments on commit c669515

Please sign in to comment.