AI/논문 리뷰

CEM: Commonsense-aware Empathetic Response Generation 요약

쿠쿠*_* 2023. 8. 28. 10:17

Abstract

However, since empathy includes both aspects of affection and cognition, we argue that in addition to identifying the user’s emotion, cognitive understanding of the user’s situation should also be considered. To this end, we propose a novel approach for empathetic response generation, which leverages commonsense to draw more information about the user’s situation and uses this additional information to further enhance the empathy expression in generated responses. (EMPATHETICDIALOGUES dataset을 사용하였고, automatic과 human 평가에서 좋은 성능을 보였다.)

 

1 Introduction

Empathy is a desirable trait of human daily conversations that enables individuals to understand, perceive, and respond appropriately to the situation and feelings of others.(Empathy의 중요성 언급)

However, empathy is a broad construct that includes aspects of affection and cognition.(이때 affection과 congnition을 구분지었으며, emotion이 비록 empathy의 중요한 역할을 하지만 그것만이 전부는 아니다.)

 

Examples from the EMPATHETICDIALOGUES dataset in which commonsense is used to gain additional information about the user’s emotion and situation before responding empathetically.

Therefore, we believe that providing dialogue systems with this external knowledge could play a critical role in understanding the user’s situation and feelings, which leads to more informative and empathetic responses

 

그래서 제안된 게 => "Commonsense-aware Empathetic Chatting Machine (CEM)"

Our contributions are summarized as follows:

- propose to leverage commonsense to improve the understanding of interlocutors’ situations and feelings.
- introduce CEM, a novel approach that uses various types of commonsense reasoning to enhance empathetic response generation.
- Automatic and manual evaluation demonstrate that with the addition of commonsense, CEM is able to generate more informative and empathetic responses compared with the previous methods.

 

2 Preliminaries

2.1 Empathetic Dialogue Generation

Empathy is commonly known as a complex multi-dimensional construct that includes broad aspects of affection and cognition.(꽤 새로운 용어라서 social psychology나 psychotherapy분야에서 명확한 정의가 없긴 했지만)

Affective empathy enables us to experience the emotion of others through various emotional stimuli (Cuff et al. 2016), while cognitive empathy enables us to understand the situations and implicit mental states of others, such as intentions, causes, desires, requirements, etc.(또다시 Affective와 Cognitive의 차이를 언급한다.)

 

In recent years, research on implementing empathy in dialogue systems and generating empathetic responses has gained considerable attention. (그래서 많은 연구들이 벌어졌지만 주로 focus on detecting the context emotion and do not pay enough attention to the cognitive aspect of empathy. => 이 논문에서는 cognitive도 놓치지 않으려고 한다.)

 

2.2 Commonsense and Empathy

As mentioned, a major part of cognitive empathy is understanding the situations and feelings of others.

Hence, we hypothesize that enabling dialogue systems to leverage commonsense and drive implications from what the user has explicitly shared is highly beneficial for a better understanding of the user’s situation and feelings, which leads to more effective cognitive empathy and thus, more empathetic responses.

그래서, 이 논문에서는 ATOMIC(commonsense knowledge reasoning inferences about everyday의 집합/ dataset을 의미하진 않고 기술적 base라고 보면 될듯..?)을 사용하고 BART-based COMET(pre-trained GPT-2 model)모델을 사용함.

 

2.3 Task Formulation

EMPATHETICDIALOGUES 데이터를 사용함.(일반적인 데이터에 대한 설명)

=> Our goal is to generate the listener’s next utterance which is coherent to the context, informative, and empathetic to the speaker’s situation and feelings.

 

3 Methodology

Our proposed model, CEM, is built upon the standard Transformer.

CEM은 크게 다섯가지 과정으로 나뉜다.=> context encoding, knowledge acquisition, context refinement, knowledge selection, and response generation.

Overview of our model (CEM)

3.1 Context Encoding

we concatenate the utterances in the dialogue history and prepend a special token [CLS] to obtain the context input C = [CLS] ⊕ u1 ⊕ u2 ⊕ u3 ⊕ ... ⊕ uk−1, where ⊕ is the concatenation operation. We use the final hidden representation of [CLS] as the representation of the whole sequence.

We acquire the embedding EC of the sequence C by summing up the word embedding, positional embedding, and dialogue state embedding.(이때 C의 각각의 utterance가 listener 또는 speaker로부터 올수 있기에 양쪽사이를 구분시키기 위해 dialoague state embedding을 사용했다.) 이후에, the sequence embedding EC is then fed to a context encoder to produce the contextual representation.

3.2 Knowledge Acquisition

For input sequence C, we respectively append five special relation tokens ([xReact], [xWant], [xNeed], [xIntent], [xEffect]) to the last utterance in the dialogue history and use COMET to generate five commonsense inferences [csr 1 , csr 2 , ..., csr 5 ] per relation r. 각 relation마다, we concatenate the generated common

sense inferences to obtain its commonsense sequence CSr = csr 1 ⊕ csr 2 ⊕ ... ⊕ csr 5 . (이후에, xReact는 user's emotion과 같은 affective state관련 knowledge이고 나머지는 user's situation과 같은 cognitive state관련 knowledge이기에 relations을 그림과 같이 두개로 나눈다. Cognitive sequences에  [CLS]를 추가하며, xReact는 주로 sentences보다 emotion words이기에 hidden representation의 평균을 사용한다.)

3.3 Context Refinement

In order to refine the context by additional information, we first respectively concatenate each of the commonsense relation representations to the context representation HCTX at the token level.

Concatenating the representations at a sequence level하는 것과 대조적으로 token-level concatenation은 sequence안의 each word안에서 추가적인 knoweldge를 혼합시킨다. 앞서 구분한것처럼  we use two separate encoders (affection-refined and cognition-refined), corresponding to the two groups of relations, to encode the fused representations and obtain commonsense-refined context representations for each relation respectively.

3.4 Knowledge Selection

공감적인 반응을 생산하기 위해 commonsense representation중 하나만 사용하는 것은 이상적이지 않다. Hence, we want to enable our model to generate responses based on the mixture of both affective and cognitive information.

 

3.5 Response Generation

Note that the cross attention to the encoder outputs is modified to the commonsense-refined contextual representation HfCTX, which has fused the information from both the context and the commonsense inferences.

3.6 Training Objectives

We adopt the standard negative log-likelihood (NLL) loss on the target response Y.

4 Experiments

5 Conclusions and Future Work

'AI > 논문 리뷰' 카테고리의 다른 글

Efficient Methods for Natural Language Processing: A Survey 요약  (0) 2023.08.18
Representation Learning  (0) 2023.04.30
Simclr, Moco and BYOL  (0) 2023.04.30
VAE_ Variational Auto-Encoder  (0) 2023.04.30
PGGAN & DCGAN  (0) 2023.04.30