Lara J. Martin

UMBC's Language Technology Seminar Series (LaTeSS – pronounced lattice) showcases talks from experts researching various language technologies, including but not limited to natural language processing, computational linguistics, speech processing, and digital humanities. Join the group on myUMBC here: https://my3.my.umbc.edu/groups/langtech

Schedule

Past Talks

Tuesday, October 14, 2025
2:00pm - 3:00pm ET
ITE 325B

Recording

Bryan Li
PhD Candidate at the University of Pennsylvania

Towards Multilingual Evaluations of Knowledge for Large Language Models
Contemporary language models (LMs) support dozens of languages, promising to broaden information access for global users. However, existing multilingual evaluations largely study factual recall tasks, failing to address knowledge-intensive tasks shaped by the uneven coverage and different perspectives of knowledge across languages. This dissertation investigates how LMs handle such tasks by examining their internal parametric knowledge and their use of externally-provided contextual knowledge. In the first part, I introduce benchmarks for complex reasoning and territorial disputes, and find that LM responses on both tasks exhibit a lack of cross-lingual robustness, outputting inconsistent answers to underlying queries written in different languages. I then show that lightweight methods of leveraging program code and persona-based prompting can mitigate these issues.

In the second part, I explore the retrieval-augmented generation (RAG) setting, which combines LM's internal parametric knowledge with contextual knowledge from external knowledge bases (KBs). Focusing on the territorial disputes task, I show that while RAG over single-language or single-source KBs has mixed effects on robustness, retrieving over multilingual and multi-source KBs — Wikipedia, as well as a large-scale dataset of state media articles I collected — substantially boosts robustness. Together, these findings highlight the need for LMs that can navigate, and assist users in navigating, the real-world distribution of knowledge across languages and sources. This is a practice dissertation talk, and your feedback would be greatly appreciated!

Bio

Bryan Li is a final-year PhD student at the University of Pennsylvania, advised by Prof. Chris Callison-Burch. His research focuses on multilingual evaluations of LLMs, spanning both the fields of natural language processing and computational social science. His work has appeared in conferences such as ACL, COLM, and ICLR. Outside of research, you can find him in a trendy cafe, a river-side running trail, or at home listening to a good podcast.

Wednesday, April 30, 2025
4:00pm - 5:00pm ET
ITE 325B

Recording

Alyssa Hillary Zisk
Research Team Lead at AssistiveWare

Trying to say 怎么fucking了: code-mixing with augmentative and alternative communication (AAC)
People who use multiple languages mix our languages. Neurotypical and neurodivergent alike, we mix our languages, and the patterns of how we do so are often similar. However, AAC users may not get to mix our languages: some languages aren't supported by any existing high-tech AAC tools. Even if all of an AAC user's languages are supported by at least one option, they may or may not be supported by the same options -- and AAC systems don't always support combining languages even when they technically have all of a person's languages. This talk mixes the personal (怎么fucking了 is Dr. Zisk's actual test phrase for bilingual AAC) with the technical (automatic language detection) and the sociotechnical (what we don't make available in AAC systems, even though we technically could) to address current gaps in multilingual AAC.

Bio

Dr. Alyssa Hillary Zisk (they/them) is the AAC research team lead at AssistiveWare. They are also an Autistic part-time AAC user with two many research interests and no real intention of narrowing it down. Some of their AAC research is relevant to their AAC use and needs, such as studying AAC use and relevant speech experiences for autistic people who use both AAC and speech. Other parts of their AAC research, including their work on large anonymous language use data, are separate from Dr. Zisk's AAC use and needs.

Monday, November 18, 2024
3:00pm - 4:00pm ET
ITE 406

Stephanie Valencia²
Assistant Professor at the University of Maryland, College Park

Using Participatory Design and AI to Create Agency-increasing Augmentative and Alternative Communication Systems
Agency and communication are integral to personal development, enabling us to pursue and express our goals. However, agency in communication is not fixed–Many individuals who use speech-generating devices to communicate encounter social constraints and technical limitations that can restrict what they can say, how they can say it, and when they can contribute to a discussion. In this talk, I will delve into how an agency-centered design approach can foster more accessible communication experiences and help us uncover opportunities for design. Drawing from empirical research and collaborative co-design with people with disabilities, I will highlight how various technological tools—such as automated transcription, physical interaction artifacts, and AI-driven language generation—can impact conversational agency. Additionally, I will share practical design strategies and discuss existing challenges for co-designing communication technologies that enhance user agency and participation.

Bio

Dr. Valencia² is dedicated to promoting equitable access to assistive technologies (AT), advocating for open-source hardware, and championing the inclusion of underrepresented groups in technology design and development. Dr. Valencia²’s research endeavors are centered on elevating user agency, accessibility, and enjoyment. Employing participatory design methodologies, she has explored the integration of diverse design elements such as artificial intelligence and embodied expressive objects to empower augmentative and alternative communication users. Dr. Valencia² works on conceptualizing these innovations but also in building and deploying them to make a real-world impact. Rigorous empirical studies are an integral part of her work, ensuring that the efficacy and significance of design contributions are thoroughly assessed. She earned her Ph.D. at the Human-computer Interaction Institute at Carnegie Mellon University.

Thursday, November 14, 2024
11:30am - 12:45pm ET
Sondheim 110

Recording

Peiqi (Patrick) Sui
PhD Student at McGill University

Confabulation: What Could LLM Hallucinations Do For Storytelling?
Are hallucinations always bad? Most of NLP research presumes a normative stance that they are, but it overlooks the cognitive and communicative affordances of a type of particularly story-like hallucinations (which we'll call confabulations). Consider two general categories of LLM applications: using them as tools, or interacting with them as viable cultural agents. The two have very different training objectives in terms of the tradeoff between factuality and alignment with the human behavior of storytelling, and when it comes to ensuring the latter, LLMs that could effectively confabulate would be especially useful. For instance, confabulations could enable LLMs to perform speculative narration and address omissions in history resulting from social injustice, in the hope of enacting what literary theorist Saidiya Hartman calls 'critical fabulation' at scale, and giving interactive storytelling a wider social impact.

Bio

Patrick Sui is a second-year PhD student in English at McGill University, advised by Richard Jean So. He mainly works in digital humanities & cultural analytics, and spends most of his time thinking about how literary studies could uniquely contribute to AI research about language. His current research topics include benchmarks for close reading & interpretive reasoning, modeling close reading behaviors with information theory, knowledge-grounded style transfer for co-creative systems, AI literacy & writing pedagogy, and all kinds of computational literary theory.

Friday, October 18, 2024
2:00pm - 3:00pm ET
ITE 325B

Recording

Maxwell Hope
Data Scientist at the U.S. Census Bureau

Expanding Voices: Nonbinary Representation in Speech-Generating Devices
Synthetic voices used by Speech-Generating Device (SGD) users are predominantly shaped by binary gender norms, limiting the representation of nonbinary individuals. This talk begins by exploring the current landscape of gender in synthetic voices and nonbinary vocal characteristics, revealing how existing models, trained on cisgender male and female speech, fail to authentically represent nonbinary identities. I will then present a case study of a nonbinary SGD user who used three synthetic voices constructed from gender expansive speakers over the course of a week, documenting their experiences in daily journal entries. Finally, I’ll examine the impact of inclusive voice design and community-informed research on identity affirmation and communication efficacy, drawing from the case study’s findings. These insights underscore the critical need for gender expansive synthetic voices that prioritize both gender affirmation and expressiveness, with far-reaching implications for voice technology and user engagement.

Bio

Maxwell Hope (he/they) recently earned a PhD from the University of Delaware, where his dissertation explored the creation, perception, and use of gender expansive synthetic voices. By day, they work at the U.S. Census Bureau, focusing on usability testing and data science; by night, they continue to conduct speech science research, deeply rooted in community-driven principles. They have three cats: Hoagie, Helena and Hensley.

Tuesday, October 8, 2024
1:30pm - 2:30pm ET
ITE 325B

Recording

Reno Kriz
Research Scientist at the Johns Hopkins University Human Lanugage Technology Center of Excellence (HLTCOE)

Takeaways from the SCALE 2024 Workshop on Event-Centric Video Retrieval
Information dissemination for current events has traditionally consisted of professionally collected and produced materials, leading to large collections of well-written news articles and high-quality videos. As a result, most prior work in event analysis and retrieval has focused on leveraging this traditional news content, particularly in English. However, much of the event-centric content today is generated by non-professionals, such as on-the-scene witnesses to events who hastily capture videos and upload them to the internet without further editing; these are challenging to find due to quality variance, as well as a lack of text or speech overlays providing clear descriptions of what is occurring. To address this gap, SCALE 2024, a 10-week research workshop hosted at the Human Language Technology Center of Excellence (HLTCOE), focused on multilingual event-centric video retrieval, or the task of finding videos about specific current events. Around 50 researchers and students participated in this workshop and were split up into five sub-teams. The Infrastructure team focused on developing MultiVENT 2.0, a challenging new video retrieval dataset consisting of 20x more videos than prior work and targeted queries about specific world events across six languages. The other teams worked on improving models from specific modalities, specifically Vision, Optical Character Recognition (OCR), Audio, and Text. Overall, we came away with three primary findings: extracting specific text from a video allows us to take better advantage of powerful methods from the text information retrieval community; LLM summarization of initial text outputs from videos is helpful, especially for noisy text coming from OCR; and no one modality is sufficient, with fusing outputs from all modalities resulting in significantly higher performance.

Bio

Reno Kriz is a research scientist at the Johns Hopkins University Human Language Technology Center of Excellence (HLTCOE). His primary research interests involve leverage large pre-trained models for a variety of natural language understanding tasks, including those crossing into other modalities, e.g., vision and speech understanding. These multimodal interests have recently involved the 2024 Summer Camp for Language Exploration (SCALE) on event-centric video retrieval and understanding. He received his PhD from the University of Pennsylvania where he worked with Chris Callison-Burch and Marianna Apidianaki on text simplification and natural language generation. Prior to that, he received BA degrees in Computer Science, Mathematics, and Economics from Vassar College.

Tuesday, September 10, 2024
3pm - 4pm ET
ITE 325B

Recording

Jonathan K. Kummerfeld
Senior Lecturer at the University of Sydney

AI Resilient Interfaces for Code Generation and Efficient Reading
AI is being integrated into virtually every computer system we use, but often in ways that mean we cannot see the decisions AI makes for us. If we don't see a decision, we cannot notice whether we agree with it, and what we don't notice, we cannot change. For example, using an AI summarization system means trusting that it has captured all the aspects of a document that are relevant to you. If the task is high stakes then the only way to check is to read the original document, but that significantly decreases the value of the summary. In this talk, I will present the concept of AI resilient interfaces: systems that use AI while giving users the information they need to notice and change its decisions. I will walk through two examples of novel systems that are more AI resilient than the typical solution to the problem, for (1) SQL generation, and (2) faster reading. I will conclude with thoughts on the potential and pitfalls of designing with AI resilience in mind.

Bio

Jonathan K. Kummerfeld is a Senior Lecturer (ie., research tenure-track Assistant Professor) in the School of Computer Science at the University of Sydney. He is currently also a DECRA fellow, and collaborates with a range of academics across the world, including on DARPA-funded projects on AI agents that communicate. He completed his Ph.D. at the University of California, Berkeley, and was previously a postdoc at the University of Michigan, and a visiting scholar at Harvard. Jonathan’s research focuses on interactions between people and NLP systems, developing more effective algorithms, workflows, and systems for collaboration. He has been on the program committee for over 50 conferences and workshops. He currently serves as Co-CTO of ACL Rolling Review (a peer review system), and is a standing reviewer for the Computational Linguistics journal and the Transactions of the Association for Computational Linguistics journal.

UMBC Language Technology Seminar Series (LaTeSS)

Hosted by Lara J. Martin at UMBC

Schedule

Past Talks