When people are chatting, they intuitively know when to intervene or continue listening. This occurs due to the ability to determine “transition points” at which the interlocutor can change the speaker. Scientists emphasize that such a communication mechanism is based not only on pauses or intonation, but also on the semantic content of the speech.
According to psychology and computer science professor JP de Ruiter, it was previously believed that it was intonation and visual cues that helped identify such moments in a conversation. But as experiments show, the semantic part is more important: even if the text is presented in a single tone, humans, unlike artificial intelligence, can still guess appropriate transition points.
AI systems have been trained primarily on written texts, including articles, discussions, and background information, but not on transcripts of actual conversations. The structure of live speech is more informal, short and simple; this distinguishes it from standard written speech, and therefore the AI has no experience with natural speech flows.
Scientists believe that artificial intelligence needs to be additionally trained on the basis of natural dialogues to improve its speaking ability. But this remains a challenge: Speech data at scale is not yet available.
Source: Ferra

I am a professional journalist and content creator with extensive experience writing for news websites. I currently work as an author at Gadget Onus, where I specialize in covering hot news topics. My written pieces have been published on some of the biggest media outlets around the world, including The Guardian and BBC News.