UMBC's Language Technology Seminar Series (LaTeSS – pronounced lattice) showcases talks from experts researching various language technologies, including but not limited to natural language processing, computational linguistics, speech processing, and digital humanities. Join the group on myUMBC here: https://my3.my.umbc.edu/groups/langtech
Schedule
Maxwell Hope
Data Scientist at the U.S. Census Bureau
Expanding Voices: Nonbinary Representation in Speech-Generating Devices
Synthetic voices used by Speech-Generating Device (SGD) users are predominantly shaped by binary gender norms, limiting the representation of nonbinary individuals. This talk begins by exploring the current landscape of gender in synthetic voices and nonbinary vocal characteristics, revealing how existing models, trained on cisgender male and female speech, fail to authentically represent nonbinary identities. I will then present a case study of a nonbinary SGD user who used three synthetic voices constructed from gender expansive speakers over the course of a week, documenting their experiences in daily journal entries. Finally, I’ll examine the impact of inclusive voice design and community-informed research on identity affirmation and communication efficacy, drawing from the case study’s findings. These insights underscore the critical need for gender expansive synthetic voices that prioritize both gender affirmation and expressiveness, with far-reaching implications for voice technology and user engagement.
Bio
Maxwell Hope (he/they) recently earned a PhD from the University of Delaware, where his dissertation explored the creation, perception, and use of gender expansive synthetic voices. By day, they work at the U.S. Census Bureau, focusing on usability testing and data science; by night, they continue to conduct speech science research, deeply rooted in community-driven principles. They have three cats: Hoagie, Helena and Hensley.Past Talks
Jonathan K. Kummerfeld
Senior Lecturer at the University of Sydney
AI Resilient Interfaces for Code Generation and Efficient Reading
AI is being integrated into virtually every computer system we use, but often in ways that mean we cannot see the decisions AI makes for us. If we don't see a decision, we cannot notice whether we agree with it, and what we don't notice, we cannot change. For example, using an AI summarization system means trusting that it has captured all the aspects of a document that are relevant to you. If the task is high stakes then the only way to check is to read the original document, but that significantly decreases the value of the summary. In this talk, I will present the concept of AI resilient interfaces: systems that use AI while giving users the information they need to notice and change its decisions. I will walk through two examples of novel systems that are more AI resilient than the typical solution to the problem, for (1) SQL generation, and (2) faster reading. I will conclude with thoughts on the potential and pitfalls of designing with AI resilience in mind.
Bio
Jonathan K. Kummerfeld is a Senior Lecturer (ie., research tenure-track Assistant Professor) in the School of Computer Science at the University of Sydney. He is currently also a DECRA fellow, and collaborates with a range of academics across the world, including on DARPA-funded projects on AI agents that communicate. He completed his Ph.D. at the University of California, Berkeley, and was previously a postdoc at the University of Michigan, and a visiting scholar at Harvard. Jonathan’s research focuses on interactions between people and NLP systems, developing more effective algorithms, workflows, and systems for collaboration. He has been on the program committee for over 50 conferences and workshops. He currently serves as Co-CTO of ACL Rolling Review (a peer review system), and is a standing reviewer for the Computational Linguistics journal and the Transactions of the Association for Computational Linguistics journal.Reno Kriz
Research Scientist at the Johns Hopkins University Human Lanugage Technology Center of Excellence (HLTCOE)
Takeaways from the SCALE 2024 Workshop on Event-Centric Video Retrieval
Information dissemination for current events has traditionally consisted of professionally collected and produced materials, leading to large collections of well-written news articles and high-quality videos. As a result, most prior work in event analysis and retrieval has focused on leveraging this traditional news content, particularly in English. However, much of the event-centric content today is generated by non-professionals, such as on-the-scene witnesses to events who hastily capture videos and upload them to the internet without further editing; these are challenging to find due to quality variance, as well as a lack of text or speech overlays providing clear descriptions of what is occurring. To address this gap, SCALE 2024, a 10-week research workshop hosted at the Human Language Technology Center of Excellence (HLTCOE), focused on multilingual event-centric video retrieval, or the task of finding videos about specific current events. Around 50 researchers and students participated in this workshop and were split up into five sub-teams. The Infrastructure team focused on developing MultiVENT 2.0, a challenging new video retrieval dataset consisting of 20x more videos than prior work and targeted queries about specific world events across six languages. The other teams worked on improving models from specific modalities, specifically Vision, Optical Character Recognition (OCR), Audio, and Text. Overall, we came away with three primary findings: extracting specific text from a video allows us to take better advantage of powerful methods from the text information retrieval community; LLM summarization of initial text outputs from videos is helpful, especially for noisy text coming from OCR; and no one modality is sufficient, with fusing outputs from all modalities resulting in significantly higher performance.