Yu-Jung Heo (ν—ˆμœ μ •)

I am a principal research engineer at KT Tech Innovation Group, South Korea. My research lies at the intersection of natural language processing (NLP), computer vision (CV), and machine learning (ML), with a focus on representation learning and generative modeling. I earned my Ph.D. degree in Computer Science and Engineering from Seoul National University under the supervision of Prof. Byoung-Tak Zhang.
Reseach interest: Multimodal learning, Knowledge-enhanced reasoning
Email : yj.heo@kt.com | yjheo@snu.ac.kr | yjheo.ai@gmail.com
[2025/06] Our paper on evaluating multicultural VLMs is accepted at ACL 2025
[2024/07] Our paper on multimodal dialogue response generation is accepted at ECCV 2024
[2024/06] Our team ranked 1st place in The Smart-101 Challenge at Multimodal Algorithmic Reasoning Workshop, CVPR 2024 πŸ†
[2024/05] Our paper on translation artifacts in cross-lingual visual question answering is accepted at ACL 2024 Findings
[2024/04] Our paper on evaluation metrics for multimodal story understanding is accepted at COGSCI 2024
[2024/02] Our paper on structure-aware multimodal sequential learning is accepted at AAAI 2024
[2023/10] Our article on video turing test is published in AI Magazine
[2023/06] I joined AI2X Lab, Tech Innovation Group, KT πŸ’‘
[2022/02] Our paper on knowledge-based visual question answering is accepted at ACL 2022
... Click to see more news! (until 2021) [2021/10] Our paper on video turing test is accepted at AAAI-FSS 2021
[2021/02] Our paper on video story understanding is accepted at AAAI 2021
[2020/06] Our team ranked 1st place in The 1st ActivityNet Entities Object Localization challenge at International Challenge on Activity Recognition (ActivityNet), CVPR 2020 πŸ†
[2020/03] Our paper on multimodal learning is accepted at CVPR 2020
[2020/02] I'm co-organizing The 2nd workshop on Video Turing Test: Toward Human-Level Video Story Understanding and the 2nd DramaQA Challenge in ECCV 2020
[2019/12] Our paper on compositional structure learning is accepted at AAAI 2020 (oral)
[2019/12] I'm co-organizing The 1st DramaQA Challenge in KSC 2019
[2019/10] I'm co-organizing The 1st workshop on Video Turing Test: Toward Human-Level Video Story Understanding in ICCV 2019
[2019/06] Our team ranked 1st place in The 1st GQA challenge at Visual Question Answering and Dialog Workshop in CVPR 2019 πŸ†
[2019/01-2019/05] I interned at Kakao Brain, Pangyo, Seongnam, Korea
[2018/12] Our paper on goal-oriented visual dialogue is accepted at NeurIPS 2018 (spotlight)
[2018/08] Our team ranked 5/312~1.6% (in-the-money) in The 2nd YouTube-8M Video Understanding challenge, ECCV 2018
*Authors contributed equally

Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration ChaeHun Park*, Yujin Baek*, Jaeseok Kim, Yu-Jung Heo, Du-Seong Chang, and Jaegul Choo
ACL 2025 [pdf] [code] [data]

BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
Hee Suk Yoon*, Eunseop Yoon*, Joshua Tian Jin Tee*, Kang Zhang, Yu-Jung Heo, Du-Seong Chang, and Chang D. Yoo
ECCV 2024 (acceptance ratio: 2395/8585~27.9%) [pdf] [code]

Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering ChaeHun Park*, Koanho Lee*, Hyesu Lim, Jaeseok Kim, Junmo Park, Yu-Jung Heo, Du-Seong Chang, and Jaegul Choo
ACL 2024 Findings (acceptance ratio: 975/4407~22.1%) [pdf]

CogME: A Cognition-Inspired Multi-Dimensional Evaluation Metric for Story Understanding
Minjung Shin, Seongho Choi, Yu-Jung Heo, Minsu Lee, Byoung-Tak Zhang, and Jeh-Kwang Ryu
COGSCI 2024 [pdf]

Structure-aware Multimodal Sequential Learning for Visual Dialog
Young-Jin Kim*, Min-Jun Kim*, Kyunghwan An, Jinwoo Ahn, Jaeseok Kim, Yu-Jung Heo, Du-Seong Chang, and Eun-Sol Kim
AAAI 2024 (acceptance ratio: 2342/9862~23.75%) [pdf]

Video Turing Test: A First Step Towards Human-Level AI
Minsu Lee*, Yu-Jung Heo*, Seongho Choi, Woo Suk Choi and Byoung-Tak Zhang
AI Magazine, Volume 44, Issue 4 (Winter 2023) [pdf]

Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering
Yu-Jung Heo, Eun-Sol Kim, Woo Suk Choi and Byoung-Tak Zhang
ACL 2022 (acceptance ratio: 701/3378~20.75%) [pdf] [code]


Toward a Human-Level Video Understanding Intelligence
Yu-Jung Heo*, Minsu Lee*, Seongho Choi, Woo Suk Choi, Minjung Shin, Minjoon Jung, Jeh-Kwang Ryu and Byoung-Tak Zhang
AAAI 2021 Fall Symposium Series on Artificial Intelligence for Human-Robot Interaction [pdf]

DramaQA: Character-Centered Video Story Understanding with Hierarchical QA
Seongho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee and Byoung-Tak Zhang
AAAI 2021 (acceptance ratio: 1692/7911~21.39%) [pdf] [dataset] [code]

Hypergraph Attention Networks for Multimodal Learning
Eun-Sol Kim*, Woo-Young Kang*, Kyoung-Woon On, Yu-Jung Heo and Byoung-Tak Zhang
CVPR 2020 (acceptance ratio: 1470/6656~22.09%) [pdf]
**We ranked 1/51 at the 1st GQA Challenge in Visual Question Answering and Dialog Workshop in CVPR 2019.


Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data
Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo and Byoung-Tak Zhang
AAAI 2020 Oral (acceptance ratio: 454/7737~5.87%) [pdf]
**Preliminary version of this paper is presented at ICML 2019 Workshop on Learning and Reasoning with Graph-Structured Representations [pdf] and AAAI 2019 Workshop on Network Interpretability for Deep Learning [pdf].


Constructing Hierarchical Q&A Datasets for Video Story Understanding
Yu-Jung Heo, Kyoung-Woon On, Seongho Choi, Jaeseo Lim, Jinah Kim, Jeh-Kwang Ryu, Byung-Chull Bae and Byoung-Tak Zhang
AAAI 2019 Spring Symposium Series on Story-Enabled Intelligence [pdf]

Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog
Sang-Woo Lee, Yu-Jung Heo and Byoung-Tak Zhang
NeurIPS 2018 Spotlight (acceptance ratio: 198/4856~4.07%) [pdf]
**Preliminary version of this paper is presented at NeurIPS 2017 Workshop on Visually-Grounded Interaction and Language.




Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024
Jinwoo Ahn*, Junhyeok Park*, Min-Jun Kim, Kang-Hyeon Kim, So-Yeong Sohn, Yun-Ji Lee, Du-Seong Chang, Yu-Jung Heo†, Eun-Sol Kim†
CVPR 2024 Workshop on Multimodal Algorithmic Reasoning (MAR) [pdf]
**We ranked 1st place in the Smart-101 Challenge at Multimodal Algorithmic Reasoning Workshop in CVPR 2024


Instruction-tuned Self-Questioning Framework for Multimodal Reasoning
You-Won Jang, Yu-Jung Heo, Jaeseok Kim, Minsu Lee, Du-Seong Chang, and Byoung-Tak Zhang
ICCV 2023 Workshop on Closing the Loop between Vision and Language (CLVL)

Scene Graph Parsing via Abstract Meaning Representation in Pre-trained Language Models
Woo Suk Choi, Yu-Jung Heo, Dharani Punitan and Byoung-Tak Zhang
NAACL 2022 Workshop on Deep Learning on Graphs for Natural Language Processing (DLG4NLP) [pdf]

Toward General Scene Graph: Integration of Visual Semantic Knowledge with Entity Synset Alignment Woo Suk Choi, Kyoung-Woon On, Yu-Jung Heo and Byoung-Tak Zhang
ACL 2020 Workshop on Advances in Language and Vision Research (ALVR) [pdf] [code]

Temporal Attention Mechanism with Conditional Inference for Large-scale Multi-Label Video Classification
Eun-Sol Kim, Kyoung-Woon On, Jongseok Kim, Yu-Jung Heo, Seoungho Choi, Hyun-Dong Lee and Byoung-Tak Zhang
ECCV 2018 Workshop on the 2nd YouTube-8M Large-Scale Video Understanding [pdf] [slide]
**We ranked at 5/312~1.6% (In-the-money) in the 2nd YouTube-8M Video Understanding Challenge in ECCV 2018.


Attention Memory for Locating an Object through Visual Dialogue
Cheolho Han*, Yu-Jung Heo*, Woo-Young Kang, Jae-Hyun Jun and Byoung-Tak Zhang
CVPR 2017 Workshop on VQA Challenge [pdf]

Criteria for Human-Compatible AI in Two-Player Vision-Language Tasks
Cheolho Han*, Sang-Woo Lee*, Yu-Jung Heo, Woo-Young Kang, Jae-Hyun Jun and Byoung-Tak Zhang
IJCAI 2017 Workshop on Linguistic and Cognitive Approaches to Dialog Agents (LaCATODA) [pdf]

Domestic Journal

Scene Graph Generation Framework using Image Region Description
Woo Suk Choi, Yu-Jung Heo, and Byoung-Tak Zhang
KIISE Transactions on Computer Practices, Vol. 29, No. 12, Dec, 2023


Efficient Compositional Translation Embedding for Visual Relationship Detection
Yu-Jung Heo, Eun-Sol Kim, Woo Suk Choi, Kyoung-Woon On and Byoung-Tak Zhang
Journal of KIISE, Vol. 49, No. 7, Jul, 2022 [pdf]


DramaQA: Character-Centered Video Story Understanding with Hierarchical QA
Seongho Choi, Kyoung-Woon On, Yu-Jung Heo, Youwon Jang, Ahjeong Seo, Seungchan Lee, Minsu Lee and Byoung-Tak Zhang
KIISE Transactions on Computer Practices, Vol. 27, No. 1, Jan, 2021 [pdf]


Analyzing and Solving GuessWhat?!
Sang-Woo Lee, Cheolho Han, Yu-Jung Heo, Woo-Young Kang, Jae-Hyun Jun and Byoung-Tak Zhang
Journal of KIISE, Vol. 45, No. 1, Jan, 2018 [pdf]


Robust Scheduling based on Daily Activity Learning by using Markov Decision Process and Inverse Reinforcement Learning
Sang-Woo Lee, Dong-Hyun Kwak, Kyoung-Woon On, Yu-Jung Heo, Woo-Young Kang, Ceyda Cinarel and Byoung-Tak Zhang
KIISE Transactions on Computer Practices, Vol. 23, No. 10, Oct, 2017 [pdf]


Regional Projection Histogram Matching and Linear Regression based Video Stabilization for a Moving Vehicle
Yu-Jung Heo, Min-Kook Choi, Hyun-Gyu Lee and Sang-Chul Lee
Journal of Broadcast Engineering Vol. 19, No. 6, Nov, 2014 [pdf]


Domestic Conference

Scene Graph Generation Model utilizing Image Region Descriptions
Woo Suk Choi, Yu-Jung Heo and Byoung-Tak Zhang
Proc. Korea Computer Congress 2023 (KCC 2023)

✨ Best presentation award

Event Detection based on Predictive Uncertainty of User World Models
Yu-Jung Heo, Kibeom Kim, HoJoon Song, Hyejung Yoon and Byoung-Tak Zhang
Proc. Korea Computer Congress 2022 (KCC 2022)

✨ Award for 7 top-performing teams (announced at the ETRI human understanding AI challenge: Learning and Reasoning lifelog)

Video Story Understanding with Multi-level Character Attention Model
Seongho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee and Byoung-Tak Zhang
Proc. Korea Computer Congress 2021 (KCC 2021) [pdf]

✨ Best paper award

Future State Generation for Action Prediction in Cross Domain
Hyunseo Kim, Yu-Jung Heo, Kibeom Kim and Byoung-Tak Zhang
Proc. Korea Computer Congress 2021 (KCC 2021) [pdf]


Knowledge-aware Visual Question Answering with Structural Attention Model
Yu-Jung Heo, Eun-Sol Kim, Woo Suk Choi and Byoung-Tak Zhang
Proc. Korea Computer Congress 2020 (KCC 2020) [pdf]


A study on Scene Graph Unification of Visual Semantic Knowledge using synonym
Woo Suk Choi, Kyoung-Woon on, Yu-Jung Heo and Byoung-Tak Zhang
Proc. Korea Computer Congress 2020 (KCC 2020) [pdf]


A study on analysis of human and machine visual attention map for Visual Question Answering
Hyuk-Gi Lee, Yu-Jung Heo and Byoung-Tak Zhang
Proc. Korea Computer Congress 2020 (KCC 2020) [pdf]


DramaQA: Human Level Video Story Understanding through Multilevel Question-Answering
Seongho Choi, Kyoung-Woon On, Yu-Jung Heo, Gi-Cheon Kang and Byoung-Tak Zhang
Proc. Korea Software Congress 2019 (KSC 2019) [pdf]

✨ Best presentation award

Compositional Structure Learning for Sequential Video Data
Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, and Byoung-Tak Zhang
Proc. Korea Computer Congress 2019 (KCC 2019) [pdf]

✨ Best paper award

A Study on Object Detection Technology for an Improved Visual Relationship Detection
Hyunji Choi, Yu-Jung Heo and Byoung-Tak Zhang
Proc. Korea Computer Congress 2019 (KCC 2019) [pdf]

✨ Best paper award

Analysis of Learning Strategy in AQM Framework for Goal-Oriented Visual Dialogue
Yu-Jung Heo, Sang-Woo Lee and Byoung-Tak Zhang
Proc. Korea Computer Congress 2018 (KCC 2018) [pdf]


Comparison of Generative Classification Model and Discriminative Classification Model for AQM Framework
Yu-Jung Heo, Sang-Woo Lee and Byoung-Tak Zhang
Proc. Korea Software Congress 2017 (KSC 2017) [pdf]


Structural Knowledge Representation Learning for Content-based Question Answering
Yu-Jung Heo, Kyoung-Woon On, Eun-Sol Kim and Byoung-Tak Zhang
Proc. Korea Computer Congress 2017 (KCC 2017) [pdf]

✨ Best presentation award

Analyzing and Solving GuessWhat?!
Sang-Woo Lee, Cheolho Han, Yu-Jung Heo, Woo-Young Kang, Jae-Hyun Jun and Byoung-Tak Zhang
Proc. Korea Computer Congress 2017 (KCC 2017) [pdf]

✨ Best paper award

Goal-oriented Question Generator model using Attention for GuessWhat?!
Jae-Hyun Jun, Woo-Young Kang, Cheolho Han, Yu-Jung Heo and Byoung-Tak Zhang
Proc. Korea Computer Congress 2017 (KCC 2017) [pdf]

✨ Best presentation award

Adaptive Question Answering System for Personalized Language Education
Yu-Jung Heo, Eun-Sol Kim, Kyoung-Woon On and Byoung-Tak Zhang
Proc. Korean Institute of Intelligence Systems Spring Conference 2017 (KIIS 2017) [pdf]


Multimodal Story Learning with Dynamic Memory Construction
Yu-Jung Heo, Eun-Sol Kim, Kyoung-Woon On and Byoung-Tak Zhang
Proc. Korea Software Congress 2016 (KSC 2016) [pdf]


Robust Scheduling based on Daily Activity Learning by using Markov Decision Process and Inverse Reinforcement Learning
Sang-Woo Lee, Dong-Hyun Kwak, Kyoung-Woon On, Yu-Jung Heo, Woo-Young Kang, Ceyda Cinarel and Byoung-Tak Zhang
Proc. Korea Software Congress 2016 (KSC 2016) [pdf]

✨ Best presentation award

Dual Deep Memories for Video Question Answering
Kyung-Min Kim, Changjun Nan, Jung-Woo Ha, Yu-Jung Heo and Byoung-Tak Zhang
Proc. Korea Software Congress 2016 (KSC 2016) [pdf]

✨ Best presentation award

Pororobot: A Deep Learning Robot that Plays Video Q&A Games
Yu-Jung Heo, Kyung-Min Kim, and Byoung-Tak Zhang
Proc. Korea Software Congress 2015 (KSC 2015) [pdf]

✨ Best paper award

Automated Visualization Methodology for Surface of Driving Road by Extracting Motion Parameters of Road Images
Yu-Jung Heo, Bo-Gyu Park, Hyun-Gyu Lee, Min-Kook Choi and Sang-Chul Lee
Workshop on Image Processing and Image Understanding 2015 (IPIU 2015) [pdf]


Classification of Driving Events using Multi-sensor and Visualization of Driving Information
Bo-Gyu Park, Yu-Jung Heo, Hyun-Gyu Lee, Min-Kook Choi and Sang-Chul Lee
Workshop on Image Processing and Image Understanding 2015 (IPIU 2015) [pdf]


Regional Projection Histogram Matching and Linear Regression based Video Stabilization for a Moving Vehicle
Yu-Jung Heo, Min-Kook Choi, Hyun-Gyu Lee, and Sang-Chul Lee
Proc. Korean Institute of Broadcast and Media Engineers summer conference 2014 (KIBME 2014) [pdf]

✨ Best paper award

Last update: June 2025 by Yu-Jung Heo