R. Clifton Bailey Statistics Seminar Series
Capturing distributional patterns in sparse linguistic data
Géraldine Walther
Assistant Professor
Department of English (Linguistics)
George Mason University
Date: Friday, January 26, 2024
Time: 11:00 A.M. – 12:00 P.M. Eastern Time
Location: Nguyen Engineering Building, Room 1109
Abstract
Language is widely regarded as a distinctively human capability. While other species exhibit – sometimes fairly complex – communication systems, none of them come even close to the expressive capabilities of human language. There are over 7000 languages in the world, with tremendous variation along all observable dimensions, from sound patterns to discourse structures.
From a general cognitive perspective, a language can be viewed as a coded communication system that exhibits two familiar yet remarkable characteristics. It is learned to an approximately equivalent level of proficiency from typically sparse and biased input. It is also processed efficiently by speakers under conditions that tend to be ‘noisy’ or non-optimal in other respects. This talk focuses on some of the organizational properties of language that confirm its status as a coding system, and illustrates how these properties contribute to the striking uniformity of learning outcomes and the impressively high levels of success achieved in meeting communicative goals. Particular emphasis is placed on the importance of studying linguistic distributions across a variety of linguistic systems, while acknowledging the challenges that arise due to the sparsity of naturalistic data and related behavioral evidence. The talk concludes by illustrating how an understanding of distributional patterns can be applied to the development of assistive technologies that enhance the inclusivity of diagnostic tests by identifying distinctive linguistic variation that correlates with observed differences in attention profiles of neurotypical and neurodiverse populations.
About the Speaker
Géraldine Walther is an Assistant Professor of Computational Linguistics in the Linguistics Program at George Mason University. Her research lies at the intersection between Natural Language Processing (NLP), Cognitive Psychology, and Linguistics and Philology. The main focus of her research resides in the way observable linguistic system organization relates to human interaction and development. At the core of her lab’s research lies the quantitative investigation of the distributions of linguistic patterns across a wide range of typologically diverse languages, including Romance (Latin, French, Romansh), Semitic (Maltese, Modern Standard Arabic), Western Iranian (Persian, Kurmanji Kurdish, Sorani Kurdish), Indo-Aryan (Hindi/Urdu, Sinhalese), Dravidian (Tamil), Kiranti (Khaling, Kulung, Limbu), Sinitic (Mandarin), Japonic (Japanese), Koreanic (Korean), and Cariban (Akawaio). Depending on the question under investigation, members of GMU Computational Linguistics lab either apply state-of-the-art quantitative techniques to data that are already available in large-scale linguistic resources, or else collect original data, leveraging state-of-the art NLP techniques for Language Documentation, which are especially suited to collecting naturalistic data for un(der)described and un(der)documented languages. A long-term goal of the lab’s research is also to contribute to more applied initiatives.
Géraldine Walther is currently a Co-Pi on two NSF grants: an NSF DEL/DLI Grant focusing on leveraging NLP tools for Language Documentation and an NSF EAGER Grant exploring the use of Computational Linguistics for research into assistive technologies that enhance the inclusivity of cybersecurity mitigation strategies by identifying diagnostic linguistic patterns that correlate with variation in attention patterns of neurotypical and various neurodiverse populations.
Event Organizers
David Kepplinger
Nicholas Rios