Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scene Graph Generation by Iterative Message Passing #106

Open
wrryu09 opened this issue Jan 29, 2023 · 0 comments
Open

Scene Graph Generation by Iterative Message Passing #106

wrryu09 opened this issue Jan 29, 2023 · 0 comments

Comments

@wrryu09
Copy link
Member

wrryu09 commented Jan 29, 2023

Danfei Xu, Yuke Zhu, Christopher B. Choy, Li Fei-Fei
; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5410-5419


we propose a novel end-to-end model that generates such structured scene representation from an input image.

모델은 scene graph inference problem 해결을 위해 standard RNN을 이용하며, message passing을 이용한 반복적인 예측 향상을 학습함.

joint inference model을 사용함으로써 물체와 그들의 관계들에 대한 더 나은 문맥적 단서라는 이점을 취함


Intro

scene graph is a visually-grounded graph over the object instances in an image, where the edges depict their pairwise relationships.

이미지에서 자동적으로 scene graph 생성하는 모델을 만들고자 함

  • object instance : characterized by a bounding box with an object category label
  • relationship : characterized by a directed edge btw two bounding boxes with a relationship predicate

major challenge : reasoning about relationships

local prediction은 scene graph generation문제를 물체의 짝 사이의 관계를 독립적으로 예측하는 문제로 단순화시킬 수 있지만 주면 문맥을 무시하는 문제

⇒ instead of inferring each component of a scene graph in isolation, the model passes messages containing contextual information btw a pair of bipartite sub-graphs of the scene graph, and iteratively refines its predictions using RNNs.



Scene graph generation

densely connected graph inference → expensive

use CRF but to acheive greater flexibility, use GRU(Gated Recurrnet Unit) instead of RNN unit.

각 반복마다 각 GRU는 이전의 hidden state와 incoming message를 인풋으로 해 아웃풋으로 새로운 hidden state 생성

⇒ 모델이 scene graph topology를 따라 GRU 유닛에 메시지 전달할 수 있게 해줌


we formulate two disjoint sub-graphs that are essentially the dual graph to each other.

defines channels for msgs to pass from…

  • primal graph : edge GRUs → node GRUs.
  • dual graph : node GRUs → edge GRUs

⇒ with primal-dual formulation … can improve inference efficieny by iteratively passing msgs btw sub-graphs instead of though a densely connected graph.



Experiments

goal : analyze our model in datasets with both sparse & dense relationship annotations

dataset : VisualGenome(sparse), NYU Depth v2(dense)


sementic scene graph generation

setup : localize a set of objects, classify their category labels, predict relationships btw each pair of the objects.

  1. predicate classification
  2. scene graph classification
  3. scene graph generation

results ⇒

  • performances of our model and the baselines : shows learning to modulate the info from other hidden states enables the network to extract more relevant information and yields superior performances.
  • predicate classification performances of our models trained with diff # of iterations : degrades after two iterations (noisy msg start to permeate through the graph and hamper the final prediction ?)
  • per-type predicate recall : gap btw models expands for less frequent predicates
    • our model uses contextual info to cope with the uneven distribution in the relationship annotations but baseline model makes predictions in isolation ..so suffers more

support relation prediction

results ⇒

  • having contextual information further improves support relation prediction
  • incorrect predictions typically occur in ambiguous supports
  • Geomatric structures that have weak visual features also cause failures

visual uncertainty may be resolved by having additional depth info?



Conclusion

we addressed the problem of automatically generating a visually grounded scene graph from an img by a novel end-to-end model

it performs iterative msg passing btw primal and dual sub-graph along the topological structure of a scene graph → improves the quality of node and edge predictions by incorporating informative contextual cues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant