RI: Small: Identifying and Producing Code-Switching from Spoken, Lexical, and Sociolinguistic Features

Hirschberg, Julia (PI)

Columbia University

Projet

Description

This project will investigate conversations between the vast number of persons in our world who speak multiple languages and who frequently interleave those languages in their speech in what is known as code-switching. It is important to not only be able to identify when, why, and to what effect code-switching occurs but also to correctly interpret what is said and to be able to generate similarly code-switched responses from voice assistants, which can improve their ability to interact with such users. Recent advances in speech technology in recent years have resulted in widespread use of voice assistants such as Siri, Google Assistant, and Alexa. These interfaces enable vast improvement in information access by voice for languages such as English, French, German, Cantonese, Mandarin, and Spanish. However, such access is limited to monolingual speech, which for many multilingual speakers is not the most natural form of speech production. Thus, code-switched speech is rarely understood correctly by voice assistants and is never produced in their responses. To enable efficient and natural communication for these people, it is important to develop speech technology that can not only understand code-switched input but also produce similar human-like output. This project examines how spoken and written code-switching interacts with other paralinguistic aspects of communication to improve code-switched text and speech understanding and production. It will explore research questions not yet studied in code-switching research including (1) whether speakers entrain, speak more similarly, on strategies of code-switching in speech; (2) whether there is a quantifiable relationship between code-switching and empathy in speech, where empathy is a speaker’s intention to convey that they understand another’s problems and want to help address them; (3) whether the presence of named entities, such as names or geographical locations, primes code-switching; (4) which dialogue acts, such as questions or statements or backchannels, tend to be produced most often in code-switched speech; and (5) how speakers produce intonational contours when they code-switch: does their intonation production match either of the languages they are producing or is it different from both? Statistical and machine-learning techniques will be used to address these questions in spoken and lexical features of code-switched speech in Standard American English with Spanish, Mandarin Chinese, and Hindi. The goal is to highlight aspects of code-switching that can be further explored by the research community.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Statut	Actif
Date de début/de fin réelle	9/1/24 → 8/31/27

Keywords

Lengua y lingüística
Lingüística y lenguaje
Estudios culturales
Redes de ordenadores y comunicaciones
Ingeniería (todo)
Informática (todo)

Accéder au projet

https://www.nsf.gov/awardsearch/showAward?AWD_ID=2418307

RI: Small: Identifying and Producing Code-Switching from Spoken, Lexical, and Sociolinguistic Features

Détails sur le projet

Description

Keywords

Accéder au projet

Empreinte numérique