RI: Small: Identifying and Producing Code-Switching from Spoken, Lexical, and Sociolinguistic Features

  • Hirschberg, Julia (PI)

Project: Research project

Project Details

Description

This project will investigate conversations between the vast number of persons in our world who speak multiple languages and who frequently interleave those languages in their speech in what is known as code-switching. It is important to not only be able to identify when, why, and to what effect code-switching occurs but also to correctly interpret what is said and to be able to generate similarly code-switched responses from voice assistants, which can improve their ability to interact with such users. Recent advances in speech technology in recent years have resulted in widespread use of voice assistants such as Siri, Google Assistant, and Alexa. These interfaces enable vast improvement in information access by voice for languages such as English, French, German, Cantonese, Mandarin, and Spanish. However, such access is limited to monolingual speech, which for many multilingual speakers is not the most natural form of speech production. Thus, code-switched speech is rarely understood correctly by voice assistants and is never produced in their responses. To enable efficient and natural communication for these people, it is important to develop speech technology that can not only understand code-switched input but also produce similar human-like output. This project examines how spoken and written code-switching interacts with other paralinguistic aspects of communication to improve code-switched text and speech understanding and production. It will explore research questions not yet studied in code-switching research including (1) whether speakers entrain, speak more similarly, on strategies of code-switching in speech; (2) whether there is a quantifiable relationship between code-switching and empathy in speech, where empathy is a speaker’s intention to convey that they understand another’s problems and want to help address them; (3) whether the presence of named entities, such as names or geographical locations, primes code-switching; (4) which dialogue acts, such as questions or statements or backchannels, tend to be produced most often in code-switched speech; and (5) how speakers produce intonational contours when they code-switch: does their intonation production match either of the languages they are producing or is it different from both? Statistical and machine-learning techniques will be used to address these questions in spoken and lexical features of code-switched speech in Standard American English with Spanish, Mandarin Chinese, and Hindi. The goal is to highlight aspects of code-switching that can be further explored by the research community.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date9/1/248/31/27

ASJC Scopus Subject Areas

  • Language and Linguistics
  • Linguistics and Language
  • Cultural Studies
  • Computer Networks and Communications
  • Engineering(all)
  • Computer Science(all)