Jesse Thomason

I lead the GLAMOR Lab at USC. Our research enables agents and robots to better understand and respond to human language by considering the grounded context in which that language occurs by considering three threads: 1) We jointly learning models with language, world perception, and physical action to enable end-to-end agent behavior and improve continual learning; 2) We investigate ways to take advantage of the extra-textual visual world and embodied context in which language is uttered to improve reasoning in language-and-vision and language-guided robotics tasks; and 3) We work to improve speech and sign recognition by leveraging contextual and structural information, as well as to apply language technologies to accessibility and health applications.

Assistant Professor @ University of Southern California	jessetho🙃usc.edu
Thomas Lord Department of Computer Science
I am hiring one to two PhD students interested in computational approaches to sign languages or working with robot hardware for language-guided interaction; apply in Fall 2025!

News
Invited Talk RSS	Large Foundation Model for Interactive Robot Learning: Large Pretrained Models Enable Symbolic Planning without Expert Specification	website slides	June 2025

Invited Talk RSS	Workshop on Semantic Reasoning and Goal Understanding in Robotics: Embracing Language as Grounded Communication	website slides	June 2025

Invited Talk UPenn	Computational Linguistics lunch: Embracing Language as Grounded Communication	website slides	May 2025

Invited Talk CoRL	Workshop on Language and Robot Learning: Flip the Script: Bring Robotics to NLP	website slides	November 2024

Invited Talk Georgia Tech	Summit on Responsible Computing, AI, and Society: Use AI Grout without Losing AI Grit	website slides	October 2024

Invited Talk RO-MAN	HRI4Wellbeing Workshop: Bringing LPTMs and Symbolic Reasoning Together for Robots	website slides	August 2024

Featured Research Scientific American	Scientists Are Putting ChatGPT Brains Inside Robot Bodies. What Could Possibly Go Wrong?	website	March 2024

Invited Talk University of Utah	@ Utah Robotics Center Seminar: Language Guided Robots	slides	January 2024

Invited Talk NeurIPS	6th Robot Learning Workshop: LPTMs Can Help Robots Without Ignoring Robotics	website slides	December 2023

Invited Talk CMU	LTI Colloquium: Using Large Models as Duct Tape, Not Hammers	website slides video	October 2023

Invited Talk ICML	Workshop on Interactive Learning with Implicit Human Feedback	website slides	July 2023

Organizer CoRL	Workshop on Language and Robot Learning (LangRob)	website	December 2022

Dataset Release Amazon	TEACh: Task-driven Embodied Agents that Chat	website	October 2021

Organizer IROS	Semantic Policy and Action Representations for Autonomous Robots (SPAR) Workshop	website	September 2021

New Position University of Southern California	Assistant Professor - Viterbi Department of Computer Science	website	August 2021

Invited Talk USC/ISI	@ USC/ISI NL Seminar	website slides video	February 2021

Outreach PhD Recruiting	2020-2021 CS[-ish] PhD Recruiting	website	November 2020

Invited Talk Stanford	@ Stanford NLP Seminar	website slides	October 2020

Organizer ECCV	Embodied Vision, Actions & Language (EVAL) Workshop	website	August 2020

New Position Amazon	Visiting Academic at Alexa AI		August 2020

Invited Talk ACL–NLP4ConvAI	@ Second Workshop on NLP for Conversational AI	website slides	July 2020

Organizer ACL	First Workshop on Advances in Language and Vision Research (ALVR)	website	July 2020

Invited Talk NeurIPS–ViGIL	@ Visually Grounded Interaction and Language (ViGIL) Workshop	website slides video	December 2019

Invited Talk University of Southern California	@ USC AI Rising Stars Symposium	slides	December 2019

Invited Talk University of Utah	@ Utah Robotics Center Seminar	slides	November 2019

Invited Talk IROS–SPAR	@ Semantic Policy and Action Representations for Autonomous Robots (SPAR) Workshop	website	November 2019

Invited Talk Microsoft Research	Vision-and-Dialog Navigation	slides video	July 2019

Co-Chair NAACL	Combined Workshop on Spatial Language Understanding (SpLU) and Grounded Communication for Robotics (RoboNLP)	website	June 2019

Organizer SIGdial	Special Session on Physically Situated Dialogue	website	July 2018

Organizer RSS	Workshop on Models and Representations for Natural Human-Robot Communication	website	June 2018

New Position UW	Postdoc with Luke Zettlemoyer		June 2018

Dissertation Defense UT Austin	Continually Improving Grounded Natural Language Understanding through Human-Robot Dialog		April 2018

Teaching

CSCI 699: History of Language and Computing

▶This course is designed for early career PhD students with an interest in understanding the bases and common assumptions in modern natural language processing research. We will study the history of thought and paradigms surrounding language and computing. We will read original texts as well as retrospectives and summary arguments from influential writers and researchers in recent history as well as those predating modern computation. Students will draw connections between historical perspectives and abstractions to modern day technological innovations and assumptions in natural language processing. Students will develop a rich understanding of the historical context of their own work in computing and language, and be better prepared to situate their research contributions in the long context of language processing.

•Spring 2025 syllabus

•Spring 2024 syllabus

CSCI 444: Natural Language Processing

▶Natural Language Processing (NLP) is an area of computing research and practice that aims to enable machines to reason over human text and speech. High profile technologies like ChatGPT brought NLP to the forefront of public discussion both inside and outside academia. But what underpins such technologies? This course will explore how natural language can serve as an interaction medium between users and machines with a focus on the history and development of language models (LMs). Students will become familiar with concepts and methods in NLP like distributional semantics, and see how those concepts feed into the architectural design of modern LMs trained using deep learning, and will get hands-on experience with building and evaluating small-scale LMs. The class will also explore details and variants of the real-world consequences of deploying large-scale LMs and NLP technologies more generally, such as the ethics and harms associated with them.

•Fall 2024 syllabus

CSCI 566: Deep Learning and its Applications

▶Recently, deep learning has advanced many AI-related problems: image retrieval, video analysis, natural language processing, self-driving, medical applications, and more. Our goal is to guide students to get familiar with these recent cutting-edge deep learning (DL) advances in computer vision and natural language processing. Through this course, students will gain a basic understanding of DL algorithms, and how to set up and solve problems involving deep learning techniques. The course will include a couple of practical assignments and a final course project. For the final course project, students will be encouraged to pick their own topics, but can also select from a provided list of projects.

•Spring 2023 website syllabus

CSCI 499: Natural Language Processing for Interactive AI

▶Natural Language Processing for Interactive AI is an upper division undergraduate course in which students explore how natural language can serve as an interaction medium between users and AI agents. We cover topics in natural language processing, computer vision, and machine learning, as well as the intersection of planning and search-oriented machine learning algorithms with such language understanding techniques and paradigms. The core modules of the course cover text classification, language modeling with LSTMs and word embeddings, attention mechanisms and Transformers, and multimodality and reinforcement learning. Deliverables include paper reviews, a paper presentation, three increasingly complex coding assignments, and a course project expected to be carried out throughout the semester.

•Fall 2022 syllabus

CSCI 699: Grounding Natural Language

▶Grounding Natural Language is a PhD seminar course introducing the broad space of both multimodal language processing, for example language and vision models, and language models for decision making, for example dialogue systems and language-guided robotics. The course explores the ways in which other sensory modalities, especially visual input and embodiment in 3-dimensional space, can influence and guide representation learning for language. Deliverables include hour-long paper presentations, in which students digest research papers and present them in the context of modern NLP, and a course project expected to be carried out throughout the semester.

•Spring 2022 syllabus

Papers and Preprints

2025

Multi-modal Synthetic Data Training and Model Collapse: Insights from VLMs and Diffusion Models
Zizhao Hu, Mohammad Rostami, and Jesse Thomason.
International Conference on Multimodal Interaction (ICMI), 2025.
categories: continual learning, language and vision
conference paper

@inproceedings{hu:mmmc,
  title={Multi-modal Synthetic Data Training and Model Collapse: Insights from {VLMs} and Diffusion Models},
  author={Zizhao Hu and Mohammad Rostami and Jesse Thomason},
  booktitle={International Conference on Multimodal Interaction (ICMI)},
  year={2025},
  url={https://arxiv.org/abs/2505.08803}
}

ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
Jiahui Zhang, Yusen Luo, Abrar Anwar, Sumedh Anand Sontakke, Joseph J Lim, Jesse Thomason, Erdem Biyik, and Jesse Zhang.
2nd Workshop on Out-of-Distribution Generalization in Robotics @ RSS, 2025.
*Best Paper Award.
categories: physical robots, reinforcement learning, language and robotics
workshop paper website

@inproceedings{zhang:rewind,
  title={{ReWiND}: Language-Guided Rewards Teach Robot Policies without New Demonstrations},
  author={Jiahui Zhang and Yusen Luo and Abrar Anwar and Sumedh Anand Sontakke and Joseph J Lim and Jesse Thomason and Erdem Biyik and Jesse Zhang},
  booktitle={2nd Workshop on Out-of-Distribution Generalization in Robotics @ RSS},
  year={2025},
  url={https://arxiv.org/abs/2505.10911}
}

Efficient Evaluation of Multi-Task Robot Policies With Active Experiment Selection
Abrar Anwar, Rohan Gupta, Zain Merchant, Sayan Ghosh, Willie Neiswanger, and Jesse Thomason.
Workshop on Robot Evaluation for the Real World @ RSS, 2025.
categories: physical robots, evaluation
workshop paper

@inproceedings{anwar:efficientroboeval,
  title={Efficient Evaluation of Multi-Task Robot Policies With Active Experiment Selection},
  author={Abrar Anwar and Rohan Gupta and Zain Merchant and Sayan Ghosh and Willie Neiswanger and Jesse Thomason},
  booktitle={Workshop on Robot Evaluation for the Real World @ RSS},
  year={2025},
  url={https://arxiv.org/abs/2502.09829}
}

HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval
Matthew Hong, Anthony Liang, Kevin Kim, Harshitha Rajaprakash, Jesse Thomason, Erdem Biyik, and Jesse Zhang.
Human in the loop Robot Learning Workshop @ RSS, 2025.
categories: physical robots, reinforcement learning, language and robotics
workshop paper

@inproceedings{hong:hand,
  title={{HAND} Me the Data: Fast Robot Adaptation via Hand Path Retrieval},
  author={Matthew Hong and Anthony Liang and Kevin Kim and Harshitha Rajaprakash and Jesse Thomason and Erdem Biyik and Jesse Zhang},
  booktitle={Human in the loop Robot Learning Workshop @ RSS},
  year={2025},
  url={https://arxiv.org/abs/2505.20455}
}

PSALM-V: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models
Wang Zhu, Miaosen Chai, Ishika Singh, Robin Jia, and Jesse Thomason.
arXiv, 2025.
categories: neurosymbolic, physical robots, language and planning
preprint paper

@article{zhu:psalmv,
  title={{PSALM-V}: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models},
  author={Wang Zhu and Miaosen Chai and Ishika Singh and Robin Jia and Jesse Thomason},
  journal={arXiv},
  year={2025},
  url={https://arxiv.org/abs/2506.20097}
}

From Calibration to Collaboration: LLM Uncertainty Quantification Should Be More Human-Centered
Siddartha Devic, Tejas Srinivasan, Jesse Thomason, Willie Neiswanger, and Vatsal Sharan.
arXiv, 2025.
categories: evaluation, AI trust, interpretability
preprint paper

@article{devic:calibtocollab,
  title={From Calibration to Collaboration: LLM Uncertainty Quantification Should Be More Human-Centered},
  author={Siddartha Devic and Tejas Srinivasan and Jesse Thomason and Willie Neiswanger and Vatsal Sharan},
  journal={arXiv},
  year={2025},
  url={https://arxiv.org/abs/2506.07461}
}

Why Do Some Inputs Break Low-Bit LLM Quantization?
Ting-Yun Chang, Muru Zhang, Jesse Thomason, and Robin Jia.
arXiv, 2025.
categories: interpretability, evaluation
preprint paper

@article{chang:breaklowbit,
  title={Why Do Some Inputs Break Low-Bit {LLM} Quantization?},
  author={Ting-Yun Chang and Muru Zhang and Jesse Thomason and Robin Jia},
  journal={arXiv},
  year={2025},
  url={https://arxiv.org/abs/2506.12044}
}

M3PT: A Transformer for Multimodal, Multi-Party Social Signal Prediction with Person-aware Blockwise Attention
Yiming Tang, Abrar Anwar, and Jesse Thomason.
The 2nd Workshop on Nonverbal Cues for Human-Robot Cooperative Intelligence @ ICRA, 2025.
*Best Paper Award.
categories: language and vision, language and action
workshop paper

@inproceedings{tang:m3pt,
  title={{M3PT}: A Transformer for Multimodal, Multi-Party Social Signal Prediction with Person-aware Blockwise Attention},
  author={Yiming Tang and Abrar Anwar and Jesse Thomason},
  booktitle={The 2nd Workshop on Nonverbal Cues for Human-Robot Cooperative Intelligence @ ICRA},
  year={2025},
  url={https://arxiv.org/abs/2501.13416}
}

TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models
David Bai, Ishika Singh, David Traum, and Jesse Thomason.
Workshop on Language and Semantics of Task and Motion Planning @ ICRA, 2025.
categories: language and planning, neurosymbolic
workshop paper website

@inproceedings{singh:twostep,
  title={{TwoStep}: Multi-agent Task Planning using Classical Planners and Large Language Models},
  author={David Bai and Ishika Singh and David Traum and Jesse Thomason},
  booktitle={Workshop on Language and Semantics of Task and Motion Planning @ ICRA},
  year={2025},
  url={https://arxiv.org/abs/2403.17246}
}

Large Language Models Do Multi-Label Classification Differently
Marcus Ma, Georgios Chochlakis, Niyantha Maruthu Pandiyan, Jesse Thomason, and Shrikanth Narayanan.
arXiv, 2025.
categories: evaluation, interpretability
preprint paper

@article{ma:llmmlc,
  title={Large Language Models Do Multi-Label Classification Differently},
  author={Marcus Ma and Georgios Chochlakis and Niyantha Maruthu Pandiyan and Jesse Thomason and Shrikanth Narayanan},
  journal={arXiv},
  year={2025},
  url={https://arxiv.org/abs/2505.17510}
}

Zero-Shot Iterative Formalization and Planning in Partially Observable Environments
Liancheng Gong, Wang Zhu, Jesse Thomason, and Li Zhang.
arXiv, 2025.
categories: language and planning, neurosymbolic
preprint paper

@article{gong:pdddlegoplus,
  title={Zero-Shot Iterative Formalization and Planning in Partially Observable Environments},
  author={Liancheng Gong and Wang Zhu and Jesse Thomason and Li Zhang},
  journal={arXiv},
  year={2025},
  url={https://arxiv.org/abs/2505.13126}
}

The American Sign Language Knowledge Graph: Infusing ASL Models with Linguistic Knowledge
Lee Kezar, Nidhi Munikote, Zian Zeng, Zed Sevcikova Sehyr, Naomi Caselli, and Jesse Thomason.
Findings of North American Chapter of the Association for Computational Linguistics (NAACL Findings), 2025.
categories: benchmark, sign language, neurosymbolic
conference paper

@inproceedings{kezar:aslkg,
  title={The {A}merican Sign Language Knowledge Graph: Infusing {ASL} Models with Linguistic Knowledge},
  author={Lee Kezar and Nidhi Munikote and Zian Zeng and Zed Sevcikova Sehyr and Naomi Caselli and Jesse Thomason},
  booktitle={Findings of North American Chapter of the Association for Computational Linguistics (NAACL Findings)},
  year={2025},
  url={https://arxiv.org/abs/2411.03568}
}

Language Models can Infer Action Semantics for Classical Planners from Environment Feedback
Wang Zhu, Ishika Singh, Robin Jia, and Jesse Thomason.
North American Chapter of the Association for Computational Linguistics (NAACL), 2025.
categories: neurosymbolic, language and planning
conference paper

@inproceedings{zhu:psalm,
  title={Language Models can Infer Action Semantics for Classical Planners from Environment Feedback},
  author={Wang Zhu and Ishika Singh and Robin Jia and Jesse Thomason},
  booktitle={North American Chapter of the Association for Computational Linguistics (NAACL)},
  year={2025},
  url={https://arxiv.org/abs/2406.02791}
}

Promoting Cognitive Health in Elder Care with Large Language Model-Powered Socially Assistive Robots
Maria R. Lima, Amy O'Connell, Feiyang Zhou, Alethea Nagahara, Avni Hulyalkar, Anura Deshpande, Jesse Thomason, Ravi Vaidyanathan, and Maja Matarić.
Conference on Human Factors in Computing Systems (CHI), 2025.
categories: language and robotics, cognitive health, dialogue
conference paper

@inproceedings{limi:chi2025,
  title={Promoting Cognitive Health in Elder Care with Large Language Model-Powered Socially Assistive Robots},
  author={Maria R. Lima and Amy O'Connell and Feiyang Zhou and Alethea Nagahara and Avni Hulyalkar and Anura Deshpande and Jesse Thomason and Ravi Vaidyanathan and Maja Matari\'{c}},
  booktitle={Conference on Human Factors in Computing Systems (CHI)},
  year={2025},
  url={https://dl.acm.org/doi/full/10.1145/3706598.3713582}
}

Adjust for Trust: Mitigating Trust-Induced Inappropriate Reliance on AI Assistance
Tejas Srinivasan and Jesse Thomason.
arXiv, 2025.
categories: AI trust, dialogue
preprint paper

@article{srinivasan:adjustfortrust,
  title={Adjust for Trust: Mitigating Trust-Induced Inappropriate Reliance on AI Assistance},
  author={Tejas Srinivasan and Jesse Thomason},
  journal={arXiv},
  year={2025},
  url={https://arxiv.org/abs/2502.13321}
}

Better Slow than Sorry: Introducing Positive Friction for Reliable Dialogue Systems
Mert Inan, Anthony Sicilia, Suvodip Dey, Vardhan Dongre, Tejas Srinivasan, Jesse Thomason, Gokhan Tur, Dilek Hakkani-Tur, and Malihe Alikhani.
arXiv, 2025.
categories: dialogue, language and action
preprint paper

@article{inan:slowthansorry,
  title={Better Slow than Sorry: Introducing Positive Friction for Reliable Dialogue Systems},
  author={Mert Inan and Anthony Sicilia and Suvodip Dey and Vardhan Dongre and Tejas Srinivasan and Jesse Thomason and Gokhan Tur and Dilek Hakkani-Tur and Malihe Alikhani},
  journal={arXiv},
  year={2025},
  url={https://arxiv.org/abs/2501.17348}
}

2024

Bridging Language and Action: A Survey of Language-Conditioned Robot Manipulation
Hongkuan Zhou, Xiangtong Yao, Oier Mees, Yuan Meng, Ted Xiao, Yonatan Bisk, Jean Oh, Edward Johns, Mohit Shridhar, Dhruv Shah, Jesse Thomason, Kai Huang, Joyce Chai, Zhenshan Bing, and Alois Knoll.
arXiv, 2024.
categories: language and robotics
preprint paper

@article{zhou:langrobmanip,
  title={Bridging Language and Action: A Survey of Language-Conditioned Robot Manipulation},
  author={Hongkuan Zhou and Xiangtong Yao and Oier Mees and Yuan Meng and Ted Xiao and Yonatan Bisk and Jean Oh and Edward Johns and Mohit Shridhar and Dhruv Shah and Jesse Thomason and Kai Huang and Joyce Chai and Zhenshan Bing and Alois Knoll},
  journal={arXiv},
  year={2024},
  url={https://arxiv.org/abs/2312.10807}
}

Systematic Translation from Natural Language Robot Task Descriptions to STL
Sara Mohammadinejad, Sheryl Paul, Yuan Xia, Vidisha Kudalkar, Jesse Thomason, and Jyotirmoy V. Deshmukh.
Bridging the Gap Between AI and Reality (AISoLA), 2024.
categories: neurosymbolic, language and planning
conference paper

@inproceedings{mohammadinejad:dialoguestl,
  title={Systematic Translation from Natural Language Robot Task Descriptions to {STL}},
  author={Sara Mohammadinejad and Sheryl Paul and Yuan Xia and Vidisha Kudalkar and Jesse Thomason and Jyotirmoy V. Deshmukh},
  booktitle={Bridging the Gap Between AI and Reality (AISoLA)},
  year={2024},
  url={https://link.springer.com/chapter/10.1007/978-3-031-75434-0_18}
}

When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models
Ting-Yun Chang, Jesse Thomason, and Robin Jia.
Empirical Methods in Natural Language Processing (EMNLP), 2024.
categories: interpretability
conference paper

@inproceedings{chang:partsgtsums,
  title={When Parts are Greater Than Sums: Individual {LLM} Components Can Outperform Full Models},
  author={Ting-Yun Chang and Jesse Thomason and Robin Jia},
  booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
  year={2024},
  url={https://arxiv.org/abs/2406.13131}
}

Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash
Parsa Hejabi, Elnaz Rahmati, Alireza S. Ziabari, Preni Golazizian, Jesse Thomason, and Morteza Dehghani.
Wordplay: When Language Meets Games @ ACL, 2024.
categories: benchmark, evaluation
workshop paper

@inproceedings{hejabi:balderdash,
  title={Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent {B}alderdash},
  author={Parsa Hejabi and Elnaz Rahmati and Alireza S. Ziabari and Preni Golazizian and Jesse Thomason and Morteza Dehghani},
  booktitle={Wordplay: When Language Meets Games @ ACL},
  year={2024},
  url={https://arxiv.org/abs/2411.10422}
}

Contrast Sets for Evaluating Language-Guided Robot Policies
Abrar Anwar, Rohan Gupta, and Jesse Thomason.
Conference on Robot Learning (CoRL), 2024.
categories: language and robotics, evaluation, physical robots
conference paper

@inproceedings{anwar:robotcontrasteval,
  title={Contrast Sets for Evaluating Language-Guided Robot Policies},
  author={Abrar Anwar and Rohan Gupta and Jesse Thomason},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2024},
  url={https://arxiv.org/abs/2406.13636}
}

ViSaRL: Visual Reinforcement Learning Guided by Human Saliency
Anthony Liang, Jesse Thomason, and Erdem Biyik.
Intelligent Robots and Systems (IROS), 2024.
categories: physical robots, reinforcement learning
conference paper

@inproceedings{liang:visarl,
  title={{ViSaRL}: Visual Reinforcement Learning Guided by Human Saliency},
  author={Anthony Liang and Jesse Thomason and Erdem Biyik},
  booktitle={Intelligent Robots and Systems (IROS)},
  year={2024},
  url={https://arxiv.org/abs/2403.10940}
}

Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, and Khyathi Raghavi Chandu.
Findings of Association for Computational Linguistics (ACL Findings), 2024.
categories: language and vision, neurosymbolic
conference paper

@inproceedings{srinivasan:recoverr,
  title={Selective {"}Selective Prediction{"}: Reducing Unnecessary Abstention in Vision-Language Reasoning},
  author={Tejas Srinivasan and Jack Hessel and Tanmay Gupta and Bill Yuchen Lin and Yejin Choi and Jesse Thomason and Khyathi Raghavi Chandu},
  booktitle={Findings of Association for Computational Linguistics (ACL Findings)},
  year={2024},
  url={https://arxiv.org/abs/2402.15610}
}

Generating Contextually-Relevant Navigation Instructions for Blind and Low Vision People
Zain Merchant, Abrar Anwar, Emily Wang, Souti Chattopadhyay, and Jesse Thomason.
Interactive AI for Human-Centered Robotics (InterAI) Workshop @ Ro-MAN, 2024.
*Best Paper Award (2nd Place).
categories: evaluation, language and vision
workshop paper

@inproceedings{merchant:blvnav,
  title={Generating Contextually-Relevant Navigation Instructions for Blind and Low Vision People},
  author={Zain Merchant and Abrar Anwar and Emily Wang and Souti Chattopadhyay and Jesse Thomason},
  booktitle={Interactive AI for Human-Centered Robotics (InterAI) Workshop @ Ro-MAN},
  year={2024},
  url={https://arxiv.org/abs/2407.08219}
}

The COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation
Wilbert Pumacay, Ishika Singh, Jiafei Duan, Ranjay Krishna, Jesse Thomason, and Dieter Fox.
Robotics: Science and Systems (RSS), 2024.
categories: benchmark, evaluation, physical robots
conference paper

@inproceedings{pumacay:colosseum,
  title={{The COLOSSEUM}: A Benchmark for Evaluating Generalization for Robotic Manipulation},
  author={Wilbert Pumacay and Ishika Singh and Jiafei Duan and Ranjay Krishna and Jesse Thomason and Dieter Fox},
  booktitle={Robotics: Science and Systems (RSS)},
  year={2024},
  url={https://arxiv.org/abs/2402.08191}
}

Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding
Chancharik Mitra, Abrar Anwar, Rodolfo Corona, Dan Klein, Trevor Darrell, and Jesse Thomason.
North American Chapter of the Association for Computational Linguistics (NAACL), 2024.
categories: language and vision
conference paper

@inproceedings{mitra:whichone,
  title={Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding},
  author={Chancharik Mitra and Abrar Anwar and Rodolfo Corona and Dan Klein and Trevor Darrell and Jesse Thomason},
  booktitle={North American Chapter of the Association for Computational Linguistics (NAACL)},
  year={2024},
  url={https://arxiv.org/abs/2311.06694}
}

Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks
Ting-Yun Chang, Jesse Thomason, and Robin Jia.
North American Chapter of the Association for Computational Linguistics (NAACL), 2024.
categories: interpretability
conference paper

@inproceedings{chang:localization,
  title={Do Localization Methods Actually Localize Memorized Data in {LLMs}? {A} Tale of Two Benchmarks},
  author={Ting-Yun Chang and Jesse Thomason and Robin Jia},
  booktitle={North American Chapter of the Association for Computational Linguistics (NAACL)},
  year={2024},
  url={https://arxiv.org/abs/2311.09060}
}

Efficient End-to-End Visual Document Understanding with Rationale Distillation
Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, and Kristina Toutanova.
North American Chapter of the Association for Computational Linguistics (NAACL), 2024.
categories: language and vision, neurosymbolic
conference paper

@inproceedings{zhu:vizdoc,
  title={Efficient End-to-End Visual Document Understanding with Rationale Distillation},
  author={Wang Zhu and Alekh Agarwal and Mandar Joshi and Robin Jia and Jesse Thomason and Kristina Toutanova},
  booktitle={North American Chapter of the Association for Computational Linguistics (NAACL)},
  year={2024},
  url={https://arxiv.org/abs/2311.09612}
}

WinoViz: Probing Visual Properties of Objects Under Different States
Woojeong Jin, Tejas Srinivasan, Jesse Thomason, and Xiang Ren.
Workshop on Secure and Trustworthy Large Language Models (SeT LLM) @ ICLR, 2024.
categories: language and vision, benchmark
workshop paper

@inproceedings{jin:winoviz,
  title={{WinoViz}: Probing Visual Properties of Objects Under Different States},
  author={Woojeong Jin and Tejas Srinivasan and Jesse Thomason and Xiang Ren},
  booktitle={Workshop on Secure and Trustworthy Large Language Models (SeT LLM) @ ICLR},
  year={2024},
  url={https://arxiv.org/abs/2402.13584}
}

2023

Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering
Wang Zhu, Jesse Thomason, and Robin Jia.
Empirical Methods in Natural Language Processing (EMNLP), 2023.
categories: neurosymbolic, semantic parsing
conference paper

@inproceedings{zhu:chainofquestions,
  title={Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering},
  author={Wang Zhu and Jesse Thomason and Robin Jia},
  booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
  year={2023},
  url={https://arxiv.org/abs/2305.14901}
}

Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation
Yuliang Cai, Jesse Thomason, and Mohammad Rostami.
Findings of Empirical Methods in Natural Language Processing (EMNLP Findings), 2023.
categories: continual learning, language and vision
conference paper

@inproceedings{cai:taskattentive,
  title={Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation},
  author={Yuliang Cai and Jesse Thomason and Mohammad Rostami},
  booktitle={Findings of Empirical Methods in Natural Language Processing (EMNLP Findings)},
  year={2023},
  url={https://arxiv.org/abs/2303.14423}
}

Exploring Strategies for Efficient Real-World VLN Evaluation
Abrar Anwar, Rohan Gupta, Elle Szabo, and Jesse Thomason.
Workshop on Language and Robot Learning (LangRob) @ CoRL, 2023.
categories: language and robotics, vln
workshop paper

@inproceedings{anwar:langrob23,
  title={Exploring Strategies for Efficient Real-World {VLN} Evaluation},
  author={Abrar Anwar and Rohan Gupta and Elle Szabo and Jesse Thomason},
  booktitle={Workshop on Language and Robot Learning (LangRob) @ CoRL},
  year={2023},
  url={https://openreview.net/forum?id=uABEHp6tjy}
}

The Sem-Lex Benchmark: Modeling ASL Signs and Their Phonemes
Lee Kezar, Elana Pontecorvo, Adele Daniels, Connor Baer, Ruth Ferster, Lauren Berger, Jesse Thomason, Zed Sevcikova Sehyr, and Naomi Caselli.
Conference on Computers and Accessibility (ASSETS), 2023.
categories: sign language, benchmark
conference paper

@inproceedings{kezar:semlex,
  title={The {Sem-Lex} Benchmark: Modeling {ASL} Signs and Their Phonemes},
  author={Lee Kezar and Elana Pontecorvo and Adele Daniels and Connor Baer and Ruth Ferster and Lauren Berger and Jesse Thomason and Zed Sevcikova Sehyr and Naomi Caselli},
  booktitle={Conference on Computers and Accessibility (ASSETS)},
  year={2023},
  url={https://doi.acm.org/?doi=3597638.3608408}
}

Exploring Strategies for Modeling Sign Language Phonology
Lee Kezar, Riley Carlin, Tejas Srinivasan, Zed Sevcikova Sehyr, Naomi Caselli, and Jesse Thomason.
European Symposium on Artificial Neural Networks (ESANN), 2023.
categories: continual learning, sign language
conference paper

@inproceedings{kezar:esann,
  title={Exploring Strategies for Modeling Sign Language Phonology},
  author={Lee Kezar and Riley Carlin and Tejas Srinivasan and Zed Sevcikova Sehyr and Naomi Caselli and Jesse Thomason},
  booktitle={European Symposium on Artificial Neural Networks (ESANN)},
  year={2023},
  url={https://www.esann.org/sites/default/files/proceedings/2023/ES2023-83.pdf}
}

RREx-BoT: Remote Referring Expressions with a Bag of Tricks
Gunnar Sigurdsson, Jesse Thomason, Gaurav Sukhatme, and Robinson Piramuthu.
Intelligent Robots and Systems (IROS), 2023.
categories: physical robots, language and robotics, vln
conference paper

@inproceedings{sigurdsson:rrexbot,
  title={{RREx-BoT}: Remote Referring Expressions with a Bag of Tricks},
  author={Gunnar Sigurdsson and Jesse Thomason and Gaurav Sukhatme and Robinson Piramuthu},
  booktitle={Intelligent Robots and Systems (IROS)},
  year={2023},
  url={https://arxiv.org/abs/2301.12614}
}

ProgPrompt: Program generation for situated robot task planning using large language models
Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg.
Autonomous Robots, 2023.
categories: language and planning, language and robotics, physical robots
journal paper coverage

@article{singh:progprompt:ar,
  title={{ProgPrompt}: Program generation for situated robot task planning using large language models},
  author={Ishika Singh and Valts Blukis and Arsalan Mousavian and Ankit Goyal and Danfei Xu and Jonathan Tremblay and Dieter Fox and Jesse Thomason and Animesh Garg},
  journal={Autonomous Robots},
  year={2023},
  url={https://link.springer.com/article/10.1007/s10514-023-10135-3}
}

I2I: Initializing Adapters with Improvised Knowledge
Tejas Srinivasan, Furong Jia, Mohammad Rostami, and Jesse Thomason.
Conference on Lifelong Learning Agents (CoLLAs), 2023.
categories: language and vision, continual learning
conference paper

@inproceedings{srinivasan:i2i,
  title={{I2I}: Initializing Adapters with Improvised Knowledge},
  author={Tejas Srinivasan and Furong Jia and Mohammad Rostami and Jesse Thomason},
  booktitle={Conference on Lifelong Learning Agents (CoLLAs)},
  year={2023},
  url={https://arxiv.org/abs/2304.02168}
}

Multimodal Speech Recognition for Language-Guided Embodied Agents
Allen Chang, Xiaoyuan Zhu, Aarav Monga, Seoho Ahn, Tejas Srinivasan, and Jesse Thomason.
Annual Conference of the International Speech Communication Association (INTERSPEECH), 2023.
categories: language and vision, speech recognition
conference paper

@inproceedings{chang:embodiedspeech,
  title={Multimodal Speech Recognition for Language-Guided Embodied Agents},
  author={Allen Chang and Xiaoyuan Zhu and Aarav Monga and Seoho Ahn and Tejas Srinivasan and Jesse Thomason},
  booktitle={Annual Conference of the International Speech Communication Association (INTERSPEECH)},
  year={2023},
  url={https://arxiv.org/abs/2302.14030}
}

Iterative Vision-and-Language Navigation
Jacob Krantz, Shurjo Banerjee, Wang Zhu, Jason J. Corso, Peter Anderson, Stefan Lee, and Jesse Thomason.
Computer Vision and Pattern Recognition (CVPR), 2023.
categories: continual learning, vln
conference paper website

@inproceedings{krantz:ivln,
  title={Iterative Vision-and-Language Navigation},
  author={Jacob Krantz and Shurjo Banerjee and Wang Zhu and Jason J. Corso and Peter Anderson and Stefan Lee and Jesse Thomason},
  booktitle={Computer Vision and Pattern Recognition (CVPR)},
  year={2023},
  url={https://arxiv.org/abs/2210.03087}
}

Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions?
Wang Zhu, Ishika Singh, Yuan Huang, Robin Jia, and Jesse Thomason.
Workshop on Open-Domain Reasoning Under Multi-Modal Settings (ODRUM) @ CVPR, 2023.
categories: vln, language and vision
workshop paper

@inproceedings{zhu:nonsensevln,
  title={Does {VLN} Pretraining Work with Nonsensical or Irrelevant Instructions?},
  author={Wang Zhu and Ishika Singh and Yuan Huang and Robin Jia and Jesse Thomason},
  booktitle={Workshop on Open-Domain Reasoning Under Multi-Modal Settings (ODRUM) @ CVPR},
  year={2023},
  url={https://arxiv.org/abs/2311.17280}
}

Curriculum Learning for Data-Efficient Vision-Language Alignment
Tejas Srinivasan, Xiang Ren, and Jesse Thomason.
Workshop on Open-Domain Reasoning Under Multi-Modal Settings (ODRUM) @ CVPR, 2023.
categories: language and vision
workshop paper

@inproceedings{srinivasan:tonics,
  title={Curriculum Learning for Data-Efficient Vision-Language Alignment},
  author={Tejas Srinivasan and Xiang Ren and Jesse Thomason},
  booktitle={Workshop on Open-Domain Reasoning Under Multi-Modal Settings (ODRUM) @ CVPR},
  year={2023},
  url={https://arxiv.org/abs/2207.14525}
}

ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg.
International Conference on Robotics and Automation (ICRA), 2023.
categories: language and robotics, language and planning, physical robots
conference paper website coverage

@inproceedings{singh:progprompt:icra,
  title={{ProgPrompt}: Generating Situated Robot Task Plans using Large Language Models},
  author={Ishika Singh and Valts Blukis and Arsalan Mousavian and Ankit Goyal and Danfei Xu and Jonathan Tremblay and Dieter Fox and Jesse Thomason and Animesh Garg},
  booktitle={International Conference on Robotics and Automation (ICRA)},
  year={2023},
  url={https://arxiv.org/abs/2209.11302}
}

Improving Sign Recognition with Phonology
Lee Kezar, Jesse Thomason, and Zed Sevcikova Sehyr.
European Chapter of the Association for Computational Linguistics (EACL), 2023.
categories: language and vision, sign language
conference paper

@inproceedings{kezar:islr_phonology,
  title={Improving Sign Recognition with Phonology},
  author={Lee Kezar and Jesse Thomason and Zed Sevcikova Sehyr},
  booktitle={European Chapter of the Association for Computational Linguistics (EACL)},
  year={2023},
  url={https://arxiv.org/abs/2302.05759}
}

Geolocated Social Media Posts are Happier: Understanding the Characteristics of Check-in Posts on Twitter
Julie Jiang, Jesse Thomason, Francesco Barbieri, and Emilio Ferrara.
Web Sciences (WebSci), 2023.
categories: language and vision
conference paper

@inproceedings{jiang:geolocatedhappy,
  title={Geolocated Social Media Posts are Happier: Understanding the Characteristics of Check-in Posts on Twitter},
  author={Julie Jiang and Jesse Thomason and Francesco Barbieri and Emilio Ferrara},
  booktitle={Web Sciences (WebSci)},
  year={2023},
  url={https://arxiv.org/abs/2207.10887}
}

Multimodal embodied attribute learning by robots for object-centric action policies
Xiaohan Zhang, Saeid Amiri, Jivko Sinapov, Jesse Thomason, Peter Stone, and Shiqi Zhang.
Autonomous Robots, 2023.
categories: language and robotics
journal paper

@article{zhang:multimodal_embodied_ar23,
  title={Multimodal embodied attribute learning by robots for object-centric action policies},
  author={Xiaohan Zhang and Saeid Amiri and Jivko Sinapov and Jesse Thomason and Peter Stone and Shiqi Zhang},
  journal={Autonomous Robots},
  year={2023},
  url={https://link.springer.com/article/10.1007/s10514-023-10098-5}
}

2022

CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation
Vishnu Sashank Dorbala, Gunnar Sigurdsson, Robinson Piramuthu, Jesse Thomason, and Gaurav Sukhatme.
Workshop on Language and Robot Learning (LangRob) @ CoRL, 2022.
categories: vln
workshop paper

@inproceedings{dorbala:clip_nav,
  title={{CLIP-Nav}: Using {CLIP} for Zero-Shot Vision-and-Language Navigation},
  author={Vishnu Sashank Dorbala and Gunnar Sigurdsson and Robinson Piramuthu and Jesse Thomason and Gaurav Sukhatme},
  booktitle={Workshop on Language and Robot Learning (LangRob) @ CoRL},
  year={2022},
  url={https://arxiv.org/abs/2211.16649}
}

ALFRED-L: Investigating the Role of Language for Action Learning in Interactive Visual Environments
Arjun Akula, Spandana Gella, Aishwarya Padmakumar, Mahdi Namazifar, Mohit Bansal, Jesse Thomason, and Dilek Hakkani-Tur.
Empirical Methods in Natural Language Processing (EMNLP), 2022.
categories: language and action, vln
conference paper

@inproceedings{akula:alfredl,
  title={{ALFRED-L}: Investigating the Role of Language for Action Learning in Interactive Visual Environments},
  author={Arjun Akula and Spandana Gella and Aishwarya Padmakumar and Mahdi Namazifar and Mohit Bansal and Jesse Thomason and Dilek Hakkani-Tur},
  booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
  year={2022},
  url={https://aclanthology.org/2022.emnlp-main.636/}
}

Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems
Wang Zhu, Jesse Thomason, and Robin Jia.
Findings of Empirical Methods in Natural Language Processing (EMNLP Findings), 2022.
categories: neurosymbolic, evaluation, language and vision
conference paper source

@inproceedings{zhu:multi_image_contrast_vqa,
  title={Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems},
  author={Wang Zhu and Jesse Thomason and Robin Jia},
  booktitle={Findings of Empirical Methods in Natural Language Processing (EMNLP Findings)},
  year={2022},
  url={https://arxiv.org/abs/2210.15037}
}

CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
Tejas Srinivasan, Ting-Yun Chang, Leticia Leonor Pinto Alva, Georgios Chochlakis, Mohammad Rostami, and Jesse Thomason.
Neural Information Processing Systems (NeurIPS), 2022.
categories: language and vision, benchmark, continual learning
conference paper source

@inproceedings{srinivasan:climb,
  title={{CLiMB}: A Continual Learning Benchmark for Vision-and-Language Tasks},
  author={Tejas Srinivasan and Ting-Yun Chang and Leticia Leonor Pinto Alva and Georgios Chochlakis and Mohammad Rostami and Jesse Thomason},
  booktitle={Neural Information Processing Systems (NeurIPS)},
  year={2022},
  url={https://arxiv.org/abs/2206.09059}
}

VAuLT: Augmenting the Vision-and-Language Transformer with the Propagation of Deep Language Representations
Georgios Chochlakis, Tejas Srinivasan, Jesse Thomason, and Shrikanth Narayanan.
arXiv, 2022.
categories: language and vision
preprint paper source

@article{chocklakis:vault,
  title={{VAuLT}: Augmenting the Vision-and-Language Transformer with the Propagation of Deep Language Representations},
  author={Georgios Chochlakis and Tejas Srinivasan and Jesse Thomason and Shrikanth Narayanan},
  journal={arXiv},
  year={2022},
  url={https://arxiv.org/abs/2208.09021}
}

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions
Jing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, and Xin Eric Wang.
Association for Computational Linguistics (ACL), 2022.
categories: vln, language and action
conference paper source

@inproceedings{gu:acl22,
  title={Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions},
  author={Jing Gu and Eliana Stefani and Qi Wu and Jesse Thomason and Xin Eric Wang},
  booktitle={Association for Computational Linguistics (ACL)},
  year={2022},
  url={https://arxiv.org/abs/2203.12667}
}

TEACh: Task-driven Embodied Agents that Chat
Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, and Dilek Hakkani-Tur.
Conference on Artificial Intelligence (AAAI), 2022.
categories: benchmark, language and action, dialogue
conference paper website source coverage

@inproceedings{padmakumar:teach,
  title={{TEACh}: Task-driven Embodied Agents that Chat},
  author={Aishwarya Padmakumar and Jesse Thomason and Ayush Shrivastava and Patrick Lange and Anjali Narayan-Chen and Spandana Gella and Robinson Piramuthu and Gokhan Tur and Dilek Hakkani-Tur},
  booktitle={Conference on Artificial Intelligence (AAAI)},
  year={2022},
  url={https://arxiv.org/abs/2110.00534}
}

2021

LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi Gao, Govind Thattai, Jesse Thomason, and Gaurav Sukhatme.
Controllable Generative Modeling in Language and Vision (CtrlGen) Workshop @ NeurIPS, 2021.
categories: language and action
workshop paper source

@inproceedings{zhao:luminous,
  title={{LUMINOUS}: Indoor Scene Generation for Embodied AI Challenges},
  author={Yizhou Zhao and Kaixiang Lin and Zhiwei Jia and Qiaozi Gao and Govind Thattai and Jesse Thomason and Gaurav Sukhatme},
  booktitle={Controllable Generative Modeling in Language and Vision (CtrlGen) Workshop @ NeurIPS},
  year={2021},
  url={https://arxiv.org/abs/2111.05527}
}

Language Grounding with 3D Objects
Jesse Thomason, Mohit Shridhar, Yonatan Bisk, Chris Paxton, and Luke Zettlemoyer.
Conference on Robot Learning (CoRL), 2021.
categories: benchmark, language and vision
conference paper video source

@inproceedings{thomason:snare,
  title={Language Grounding with {3D} Objects},
  author={Jesse Thomason and Mohit Shridhar and Yonatan Bisk and Chris Paxton and Luke Zettlemoyer},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2021},
  url={https://arxiv.org/abs/2107.12514}
}

Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion
Alessandro Suglia, Qiaozi Gao, Jesse Thomason, Govind Thattai, and Gaurav Sukhatme.
Novel Ideas in Learning-to-Learn through Interaction (NILLI) Workshop @ EMNLP, 2021.
categories: language and action
workshop paper source

@inproceedings{suglia:embert,
  title={Embodied {BERT}: A Transformer Model for Embodied, Language-guided Visual Task Completion},
  author={Alessandro Suglia and Qiaozi Gao and Jesse Thomason and Govind Thattai and Gaurav Sukhatme},
  booktitle={Novel Ideas in Learning-to-Learn through Interaction (NILLI) Workshop @ EMNLP},
  year={2021},
  url={https://arxiv.org/abs/2108.04927}
}

2020

The RobotSlang Benchmark: Dialog-guided Robot Localization and Navigation
Shurjo Banerjee, Jesse Thomason, and Jason J. Corso.
Conference on Robot Learning (CoRL), 2020.
categories: physical robots, language and robotics, dialogue, vln
conference paper website video source coverage

@inproceedings{banerjee:corl20,
  title={{The RobotSlang Benchmark}: Dialog-guided Robot Localization and Navigation},
  author={Shurjo Banerjee and Jesse Thomason and Jason J. Corso},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2020},
  url={https://arxiv.org/abs/2010.12639}
}

Experience Grounds Language
Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, and Joseph Turian.
Empirical Methods in Natural Language Processing (EMNLP), 2020.
categories: language and action, language and vision, language and robotics
conference paper video coverage

@inproceedings{bisk:emnlp20,
  title={Experience Grounds Language},
  author={Yonatan Bisk and Ari Holtzman and Jesse Thomason and Jacob Andreas and Yoshua Bengio and Joyce Chai and Mirella Lapata and Angeliki Lazaridou and Jonathan May and Aleksandr Nisnevich and Nicolas Pinto and Joseph Turian},
  booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
  year={2020},
  url={https://arxiv.org/abs/2004.10151}
}

RMM: A Recursive Mental Model for Dialog Navigation
Homero Roman Roman, Yonatan Bisk, Jesse Thomason, Asli Celikyilmaz, and Jianfeng Gao.
Findings of Empirical Methods in Natural Language Processing (EMNLP Findings), 2020.
categories: vln, dialogue
—
Also presented at the Third International Workshop on Spatial Language Understanding (SpLU), 2020.
conference paper source

@inproceedings{roman:emnlpf20,
  title={{RMM}: A Recursive Mental Model for Dialog Navigation},
  author={Homero Roman Roman and Yonatan Bisk and Jesse Thomason and Asli Celikyilmaz and Jianfeng Gao},
  booktitle={Findings of Empirical Methods in Natural Language Processing (EMNLP Findings)},
  year={2020},
  url={https://arxiv.org/abs/2005.00728}
}

| SpLU website

Interpreting Black Box Models via Hypothesis Testing
Collin Burns, Jesse Thomason, and Wesley Tansey.
Foundations of Data Science (FODS), 2020.
categories: interpretability
conference paper source

@inproceedings{burns:fods20,
  title={Interpreting Black Box Models via Hypothesis Testing},
  author={Collin Burns and Jesse Thomason and Wesley Tansey},
  booktitle={Foundations of Data Science (FODS)},
  year={2020},
  url={https://arxiv.org/abs/1904.00045}
}

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, and Dieter Fox.
Computer Vision and Pattern Recognition (CVPR), 2020.
categories: benchmark, language and action
conference paper website video source

@inproceedings{shridhar:cvpr20,
  title={{ALFRED}: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks},
  author={Mohit Shridhar and Jesse Thomason and Daniel Gordon and Yonatan Bisk and Winson Han and Roozbeh Mottaghi and Luke Zettlemoyer and Dieter Fox},
  booktitle={Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  url={https://arxiv.org/abs/1912.01734}
}

Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog
Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, and Raymond J. Mooney.
The Journal of Artificial Intelligence Research, 2020.
categories: dialogue, language and robotics, physical robots
—
Also presented at the IJCAI Journal Track (IJCAI), 2021.
journal paper

@article{thomason:jair20,
  title={Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog},
  author={Jesse Thomason and Aishwarya Padmakumar and Jivko Sinapov and Nick Walker and Yuqian Jiang and Harel Yedidsion and Justin Hart and Peter Stone and Raymond J. Mooney},
  journal={The Journal of Artificial Intelligence Research},
  volume={67},
  year={2020},
  url={https://jair.org/index.php/jair/article/view/11485}
}

| IJCAI website

2019

Vision-and-Dialog Navigation
Jesse Thomason, Michael Murray, Maya Cakmak, and Luke Zettlemoyer.
Conference on Robot Learning (CoRL), 2019.
categories: dialogue, vln, benchmark
conference paper website video demo source poster

@inproceedings{thomason:corl19,
  title={Vision-and-Dialog Navigation},
  author={Jesse Thomason and Michael Murray and Maya Cakmak and Luke Zettlemoyer},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2019},
  url={https://arxiv.org/abs/1907.04957}
}

Improving Robot Success Detection using Static Object Data
Rosario Scalise, Jesse Thomason, Yonatan Bisk, and Siddhartha Srinivasa.
Intelligent Robots and Systems (IROS), 2019.
categories: physical robots, language and robotics, language and vision
—
Also presented at the Combined Workshop on Spatial Language Understanding & Grounded Communication for Robotics (SpLU-RoboNLP), 2019.
conference paper video source slides

@inproceedings{scalise:iros19,
  title={Improving Robot Success Detection using Static Object Data},
  author={Rosario Scalise and Jesse Thomason and Yonatan Bisk and Siddhartha Srinivasa},
  booktitle={Intelligent Robots and Systems (IROS)},
  year={2019},
  url={https://arxiv.org/abs/1904.01650}
}

| SpLU-RoboNLP poster

Augmenting Knowledge through Statistical, Goal-oriented Human-Robot Dialog
Saeid Amiri, Sujay Bajracharya, Cihangir Goktolga, Jesse Thomason, and Shiqi Zhang.
Intelligent Robots and Systems (IROS), 2019.
categories: dialogue, language and robotics
conference paper video slides

@inproceedings{amiri:iros19,
  title={Augmenting Knowledge through Statistical, Goal-oriented Human-Robot Dialog},
  author={Saeid Amiri and Sujay Bajracharya and Cihangir Goktolga and Jesse Thomason and Shiqi Zhang},
  booktitle={Intelligent Robots and Systems (IROS)},
  year={2019},
  url={https://arxiv.org/abs/1907.03390}
}

Shifting the Baseline: Single Modality Performance on Visual Navigation & QA
Jesse Thomason, Daniel Gordon, and Yonatan Bisk.
North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
categories: evaluation, language and vision, vln
conference paper poster

@inproceedings{thomason:naacl19,
  title={Shifting the Baseline: Single Modality Performance on Visual Navigation \& {QA}},
  author={Jesse Thomason and Daniel Gordon and Yonatan Bisk},
  booktitle={North American Chapter of the Association for Computational Linguistics (NAACL)},
  year={2019},
  url={https://arxiv.org/abs/1811.00613}
}

Improving Grounded Natural Language Understanding through Human-Robot Dialog
Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, and Raymond J. Mooney.
International Conference on Robotics and Automation (ICRA), 2019.
categories: dialogue, language and robotics, physical robots
—
Also presented at the SIGDIAL Special Session on Physically Situated Dialogue (RoboDIAL), 2018.
Also presented at the RSS Workshop on Models and Representations for Natural Human-Robot Communication (MRHRC), 2018.
conference paper video poster

@inproceedings{thomason:icra19,
  title={Improving Grounded Natural Language Understanding through Human-Robot Dialog},
  author={Jesse Thomason and Aishwarya Padmakumar and Jivko Sinapov and Nick Walker and Yuqian Jiang and Harel Yedidsion and Justin Hart and Peter Stone and Raymond J. Mooney},
  booktitle={International Conference on Robotics and Automation (ICRA)},
  year={2019},
  url={https://arxiv.org/abs/1903.00122}
}

| RoboDIAL paper RoboDIAL video MRHRC paper MRHRC poster

Prospection: Interpretable Plans From Language By Predicting the Future
Chris Paxton, Yonatan Bisk, Jesse Thomason, Arunkumar Byravan, and Dieter Fox.
International Conference on Robotics and Automation (ICRA), 2019.
categories: language and robotics, language and planning
conference paper

@inproceedings{paxton:icra19,
  title={Prospection: Interpretable Plans From Language By Predicting the Future},
  author={Chris Paxton and Yonatan Bisk and Jesse Thomason and Arunkumar Byravan and Dieter Fox},
  booktitle={International Conference on Robotics and Automation (ICRA)},
  year={2019},
  url={https://arxiv.org/abs/1903.08309}
}

2018

Interaction and Autonomy in RoboCup@Home and Building-Wide Intelligence
Justin Hart, Harel Yedidsion, Yuqian Jiang, Nick Walker, Rishi Shah, Jesse Thomason, Aishwarya Padmakumar, Rolando Fernandez, Jivko Sinapov, Raymond J. Mooney, and Peter Stone.
AI-HRI AAAI Fall Symposium Series @ AAAI, 2018.
categories: language and robotics
workshop paper

@inproceedings{hart:aaai-fss18,
  title={Interaction and Autonomy in RoboCup@Home and Building-Wide Intelligence},
  author={Justin Hart and Harel Yedidsion and Yuqian Jiang and Nick Walker and Rishi Shah and Jesse Thomason and Aishwarya Padmakumar and Rolando Fernandez and Jivko Sinapov and Raymond J. Mooney and Peter Stone},
  booktitle={AI-HRI AAAI Fall Symposium Series @ AAAI},
  year={2018},
  url={https://arxiv.org/abs/1810.02919}
}

Multi-modal Predicate Identification using Dynamically Learned Robot Controllers
Saeid Amiri, Suhua Wei, Shiqi Zhang, Jivko Sinapov, Jesse Thomason, and Peter Stone.
International Joint Conference on Artificial Intelligence (IJCAI), 2018.
categories: physical robots, language and robotics
conference paper

@inproceedings{amiri:ijcai18,
  title={Multi-modal Predicate Identification using Dynamically Learned Robot Controllers},
  author={Saeid Amiri and Suhua Wei and Shiqi Zhang and Jivko Sinapov and Jesse Thomason and Peter Stone},
  booktitle={International Joint Conference on Artificial Intelligence (IJCAI)},
  year={2018},
  url={https://www.ijcai.org/proceedings/2018/0645.pdf}
}

Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions
Jesse Thomason, Jivko Sinapov, Raymond J. Mooney, and Peter Stone.
Conference on Artificial Intelligence (AAAI), 2018.
categories: language and robotics
—
Also presented at the Workshop on Language Grounding for Robotics (RoboNLP), 2017.
conference paper source slides

@inproceedings{thomason:aaai18,
  title={Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions},
  author={Jesse Thomason and Jivko Sinapov and Raymond J. Mooney and Peter Stone},
  booktitle={Conference on Artificial Intelligence (AAAI)},
  year={2018},
  url={https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16512/}
}

| RoboNLP paper RoboNLP poster

Maximum-Variance Total Variation Denoising for Interpretable Spatial Smoothing
Wesley Tansey, Jesse Thomason, and James G. Scott.
Conference on Artificial Intelligence (AAAI), 2018.
categories: interpretability
—
Also presented at the ICML Workshop on Human Interpretability in Machine Learning (ICML-WHI), 2017.
conference paper poster

@inproceedings{tansey:aaai18,
  title={Maximum-Variance Total Variation Denoising for Interpretable Spatial Smoothing},
  author={Wesley Tansey and Jesse Thomason and James G. Scott},
  booktitle={Conference on Artificial Intelligence (AAAI)},
  year={2018},
  url={https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16974}
}

| ICML-WHI paper ICML-WHI poster

2017

Opportunistic Active Learning for Grounding Natural Language Descriptions
Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Justin Hart, Peter Stone, and Raymond J. Mooney.
Conference on Robot Learning (CoRL), 2017.
categories: dialogue, language and robotics, physical robots
conference paper video source poster

@inproceedings{thomason:corl17,
  title={Opportunistic Active Learning for Grounding Natural Language Descriptions},
  author={Jesse Thomason and Aishwarya Padmakumar and Jivko Sinapov and Justin Hart and Peter Stone and Raymond J. Mooney},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2017},
  url={http://proceedings.mlr.press/v78/thomason17a/thomason17a.pdf}
}

Improving Black-box Speech Recognition using Semantic Parsing
Rodolfo Corona, Jesse Thomason, and Raymond J. Mooney.
International Joint Conference on Natural Language Processing (IJCNLP), 2017.
categories: speech recognition, semantic parsing
conference paper poster

@inproceedings{corona:ijcnlp17,
  title={Improving Black-box Speech Recognition using Semantic Parsing},
  author={Rodolfo Corona and Jesse Thomason and Raymond J. Mooney},
  booktitle={International Joint Conference on Natural Language Processing (IJCNLP)},
  year={2017},
  url={https://www.aclweb.org/anthology/I17-2021/}
}

Multi-Modal Word Synset Induction
Jesse Thomason and Raymond J. Mooney.
International Joint Conference on Artificial Intelligence (IJCAI), 2017.
categories: language and vision
conference paper poster slides

@inproceedings{thomason:ijcai17,
  title={Multi-Modal Word Synset Induction},
  author={Jesse Thomason and Raymond J. Mooney},
  booktitle={International Joint Conference on Artificial Intelligence (IJCAI)},
  year={2017},
  url={https://www.ijcai.org/proceedings/2017/0575.pdf}
}

Integrated Learning of Dialog Strategies and Semantic Parsing
Aishwarya Padmakumar, Jesse Thomason, and Raymond J. Mooney.
European Chapter of the Association for Computational Linguistics (EACL), 2017.
categories: dialogue, semantic parsing
conference paper

@inproceedings{padmakumar:eacl17,
  title={Integrated Learning of Dialog Strategies and Semantic Parsing},
  author={Aishwarya Padmakumar and Jesse Thomason and Raymond J. Mooney},
  booktitle={European Chapter of the Association for Computational Linguistics (EACL)},
  year={2017},
  url={http://www.cs.utexas.edu/users/ml/papers/padmakumar.eacl17.pdf}
}

BWIBots: A platform for bridging the gap between AI and human--robot interaction research
Piyush Khandelwal, Shiqi Zhang, Jivko Sinapov, Matteo Leonetti, Jesse Thomason, Fangkai Yang, Ilaria Gori, Maxwell Svetlik, Priyanka Khante, Vladimir Lifschitz, J. K. Aggarwal, Raymond J. Mooney, and Peter Stone.
The International Journal of Robotics Research, 2017.
categories: language and robotics
journal paper

@article{khandelwal:ijrr17,
  title={BWIBots: A platform for bridging the gap between AI and human--robot interaction research},
  author={Piyush Khandelwal and Shiqi Zhang and Jivko Sinapov and Matteo Leonetti and Jesse Thomason and Fangkai Yang and Ilaria Gori and Maxwell Svetlik and Priyanka Khante and Vladimir Lifschitz and J. K. Aggarwal and Raymond J. Mooney and Peter Stone},
  journal={The International Journal of Robotics Research},
  publisher={Sage},
  year={2017},
  url={http://www.cs.utexas.edu/users/pstone/Papers/bib2html-links/IJRR17-khandelwal.pdf}
}

2016

Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy"
Jesse Thomason, Jivko Sinapov, Maxwell Svetlik, Peter Stone, and Raymond J. Mooney.
International Joint Conference on Artificial Intelligence (IJCAI), 2016.
categories: dialogue, physical robots, language and robotics
conference paper video source poster slides

@inproceedings{thomason:ijcai16,
  title={Learning Multi-Modal Grounded Linguistic Semantics by Playing ``{I} Spy''},
  author={Jesse Thomason and Jivko Sinapov and Maxwell Svetlik and Peter Stone and Raymond J. Mooney},
  booktitle={International Joint Conference on Artificial Intelligence (IJCAI)},
  year={2016},
  url={http://www.ijcai.org/Proceedings/16/Papers/491.pdf}
}

2015

Learning to Interpret Natural Language Commands through Human-Robot Dialog
Jesse Thomason, Shiqi Zhang, Raymond J. Mooney, and Peter Stone.
International Joint Conference on Artificial Intelligence (IJCAI), 2015.
categories: physical robots, semantic parsing, dialogue, language and robotics
conference paper video source poster slides

@inproceedings{thomason:ijcai15,
  title={Learning to Interpret Natural Language Commands through Human-Robot Dialog},
  author={Jesse Thomason and Shiqi Zhang and Raymond J. Mooney and Peter Stone},
  booktitle={International Joint Conference on Artificial Intelligence (IJCAI)},
  year={2015},
  url={https://www.ijcai.org/Proceedings/15/Papers/273.pdf}
}

2014

Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild
Jesse Thomason, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Raymond J. Mooney.
Conference on Computational Linguistics (COLING), 2014.
categories: language and vision
conference paper poster

@inproceedings{thomason:coling14,
  title={Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild},
  author={Jesse Thomason and Subhashini Venugopalan and Sergio Guadarrama and Kate Saenko and Raymond J. Mooney},
  booktitle={Conference on Computational Linguistics (COLING)},
  year={2014},
  url={http://anthology.aclweb.org/C/C14/C14-1115.pdf}
}

2013

Prosodic Entrainment and Tutoring Dialogue Success
Jesse Thomason, Huy Nguyen, and Diane Litman.
Artificial Intelligence in Education (AIED), 2013.
categories: dialogue
conference paper poster

@inproceedings{thomason:aied13,
  title={Prosodic Entrainment and Tutoring Dialogue Success},
  author={Jesse Thomason and Huy Nguyen and Diane Litman},
  booktitle={Artificial Intelligence in Education (AIED)},
  year={2013},
  url={https://link.springer.com/chapter/10.1007/978-3-642-39112-5_104}
}

Differences in User Responses to a Wizard-of-Oz versus Automated System
Jesse Thomason and Diane Litman.
North American Chapter of the Association for Computational Linguistics (NAACL), 2013.
categories: dialogue
conference paper slides

@inproceedings{thomason:naacl13,
  title={Differences in User Responses to a Wizard-of-Oz versus Automated System},
  author={Jesse Thomason and Diane Litman},
  booktitle={North American Chapter of the Association for Computational Linguistics (NAACL)},
  year={2013},
  url={http://www.aclweb.org/anthology/N13-1098}
}

Thesis work

2018

Continually Improving Grounded Natural Language Understanding through Human-Robot Dialog
Jesse Thomason.
Department of Computer Science, The University of Texas at Austin, 2018.
categories: language and robotics, dialogue, language and vision, semantic parsing
thesis paper slides

@phdthesis{thomason:thesis18,
  title={Continually Improving Grounded Natural Language Understanding through Human-Robot Dialog},
  author={Jesse Thomason},
  booktitle={Department of Computer Science, The University of Texas at Austin},
  school={Department of Computer Science, The University of Texas at Austin},
  year={2018},
  url={http://www.cs.utexas.edu/users/ml/papers/thomason.thesis18.pdf}
}

2016

Continuously Improving Natural Language Understanding for Robotic Systems through Semantic Parsing, Dialog, and Multi-modal Perception
Jesse Thomason.
Doctoral Dissertation Proposal, 2016.
categories: dialogue, language and robotics, language and vision, semantic parsing
thesis paper slides

@inproceedings{thomason:proposal16,
  title={Continuously Improving Natural Language Understanding for Robotic Systems through Semantic Parsing, Dialog, and Multi-modal Perception},
  author={Jesse Thomason},
  booktitle={Doctoral Dissertation Proposal},
  year={2016},
  url={http://www.cs.utexas.edu/users/ml/papers/thomason.proposal16.pdf}
}

Jesse Thomason

Assistant Professor @ University of Southern California

jessetho🙃usc.edu

Thomas Lord Department of Computer Science

News