Abstract: Multi-modal emotion recognition (MER) using speech and text has attracted extensive attention because of the easy availability of data for these two modalities. Recently, the self-surprised ...