Posts by Collection

portfolio

publications

ACM, New York, NY, USA, Article 17, 4 pages

Published in DOI=http://dx.doi.org/10.1145/2662117.2662134, 2014

This paper is about ACM, New York, NY, USA, Article 17, 4 pages.

Recommended citation: Sachin Kumar, Vikas C. Raykar, and Priyanka Agrawal. Decisions under drift: Adapting binary decision thresholds to drifts in test distribution. In Proceedings of the 6th IBM Collaborative Academia Research Exchange Conference. (2014). "ACM, New York, NY, USA, Article 17, 4 pages." DOI=http://dx.doi.org/10.1145/2662117.2662134.
Download Paper

Article 17

Published in 4 pages. DOI=http://dx.doi.org/10.1145/2662117.2662134, 2014

Download paper here

Recommended citation: Sachin Kumar, Vikas C. Raykar, and Priyanka Agrawal. Decisions under drift: Adapting binary decision thresholds to drifts in test distribution. In Proceedings of the 6th IBM Collaborative Academia Research Exchange Conference. ACM, New York, NY, USA. (2014). "Article 17." 4 pages. DOI=http://dx.doi.org/10.1145/2662117.2662134.
Download Paper

Earth Mover Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading

Published in In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) 2017, 2017

This paper is about Earth Mover Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading.

Recommended citation: Sachin Kumar, Soumen Chakrabarti, Shourya Roy. (2017). "Earth Mover Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading." In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) 2017.
Download Paper

Soumen Chakrabarti

Published in Shourya Roy. Earth Mover Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) 2017, 2017

Download paper here

Recommended citation: Sachin Kumar. (2017). "Soumen Chakrabarti." Shourya Roy. Earth Mover Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) 2017.
Download Paper

Machine Translation with Continuous Outputs

Published in ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models, 2018

Recommended citation: Sachin Kumar, Yulia Tsvetkov. (2018). “Machine Translation with Continuous Outputs.” ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models.

Recommended citation: Sachin Kumar, Yulia Tsvetkov. (2018). "Machine Translation with Continuous Outputs." ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models.

Mining & Summarizing E-petitions for Enhanced Understanding of Public Opinion

Published in In Proceedings of the International Conference on Information and Knowledge Management (CIKM) 2018, 2018

Download paper here

Recommended citation: Shreshtha Mundra*, Sachin Kumar*, Manjira Sinha, Sandya Mannarswamy. (2018). "Mining & Summarizing E-petitions for Enhanced Understanding of Public Opinion." In Proceedings of the International Conference on Information and Knowledge Management (CIKM) 2018.
Download Paper

Yulia Tsvetkov

Published in A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with Bilingual Semantic Similarity Rewards. The 4th Workshop on Neural Generation and Translation (ACL) 2020, 2020

Download paper here

Recommended citation: Zi-Yi Dou, Sachin Kumar. (2020). "Yulia Tsvetkov." A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with Bilingual Semantic Similarity Rewards. The 4th Workshop on Neural Generation and Translation (ACL) 2020.
Download Paper

Aliaksei Severyn

Published in Yulia Tsvetkov. Controlled Text Generation as Continuous Optimization with Multiple Constraints. Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS) 2021, 2021

Download paper here

Recommended citation: Sachin Kumar, Eric Malmi. (2021). "Aliaksei Severyn." Yulia Tsvetkov. Controlled Text Generation as Continuous Optimization with Multiple Constraints. Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS) 2021.
Download Paper

Controlled Text Generation as Continuous Optimization with Multiple Constraints

Published in Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS) 2021, 2021

This paper is about Controlled Text Generation as Continuous Optimization with Multiple Constraints.

Recommended citation: Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov. (2021). "Controlled Text Generation as Continuous Optimization with Multiple Constraints." Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS) 2021.
Download Paper

Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs

Published in Multilingual Representation Learning Workshop at EMNLP 2021, 2021

This paper is about Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs.

Recommended citation: [pdf] [code] Monisha Jegadeesan, Sachin Kumar, John Wieting, Yulia Tsvetkov. (2021). "Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs." Multilingual Representation Learning Workshop at EMNLP 2021.

John Wieting

Published in Yulia Tsvetkov. Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs. Multilingual Representation Learning Workshop at EMNLP 2021, 2021

Recommended citation: [pdf] [code] Monisha Jegadeesan, Sachin Kumar. (2021). “John Wieting.” Yulia Tsvetkov. Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs. Multilingual Representation Learning Workshop at EMNLP 2021.

Recommended citation: [pdf] [code] Monisha Jegadeesan, Sachin Kumar. (2021). "John Wieting." Yulia Tsvetkov. Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs. Multilingual Representation Learning Workshop at EMNLP 2021.

Machine Translation into Low-Resource Language Varieties

Published in In the proceedings of 2021 Conference on Association of Computational Linguistics (ACL), 2021

This paper is about Machine Translation into Low-Resource Language Varieties.

Recommended citation: Sachin Kumar, Antonios Anastasopoulos, Shuly Wintner, Yulia Tsvetkov. (2021). "Machine Translation into Low-Resource Language Varieties." In the proceedings of 2021 Conference on Association of Computational Linguistics (ACL).
Download Paper

Sachin Kumar

Published in Yulia Tsvetkov. An Exploration of Data Augmentation Techniques for Improving English to Tigrinya Translation. The 2nd AfricaNLP Workshop at EACL 2021, 2021

Download paper here

Recommended citation: Lidia Kidane. (2021). "Sachin Kumar." Yulia Tsvetkov. An Exploration of Data Augmentation Techniques for Improving English to Tigrinya Translation. The 2nd AfricaNLP Workshop at EACL 2021.
Download Paper

Shuly Wintner

Published in Yulia Tsvetkov. Machine Translation into Low-Resource Language Varieties. In the proceedings of 2021 Conference on Association of Computational Linguistics (ACL), 2021

Download paper here

Recommended citation: Sachin Kumar, Antonios Anastasopoulos. (2021). "Shuly Wintner." Yulia Tsvetkov. Machine Translation into Low-Resource Language Varieties. In the proceedings of 2021 Conference on Association of Computational Linguistics (ACL).
Download Paper

Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation

Published in 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), 2022

Download paper here

Recommended citation: Melanie Sclar, Peter West, Sachin Kumar, Yulia Tsvetkov and Yejin Choi. (2022). "Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation." 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022).
Download Paper

Assessing Language Model Deployment with Risk Cards

Published in preprint, 2023

Recommended citation: [pdf] Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, M. R. Leiser, Saif Mohammad. (2023). “Assessing Language Model Deployment with Risk Cards.” preprint.

Recommended citation: [pdf] Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, M. R. Leiser, Saif Mohammad. (2023). "Assessing Language Model Deployment with Risk Cards." preprint.

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models

Published in 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), 2023

Download paper here

Recommended citation: [code] Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David R. Mortensen, Noah A. Smith, Yulia Tsvetkov. (2023). "Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models." 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023).
Download Paper

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey

Published in 2023 Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), 2023

Download paper here

Recommended citation: Sachin Kumar*, Vidhisha Balachandran*, Lucille Njoo, Antonios Anastasopoulos, Yulia Tsvetkov. (2023). "Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey." 2023 Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023).
Download Paper

Minding Language Models’ Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker

Published in 2023 Conference of the Association for Computational Linguistics (ACL 2023). Outstanding Paper Award, 2023

Download paper here

Recommended citation: [code] Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi and Yulia Tsvetkov. (2023). "Minding Language Models’ Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker." 2023 Conference of the Association for Computational Linguistics (ACL 2023). Outstanding Paper Award.
Download Paper

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Published in 2024 Conference of the Association for Computational Linguistics (ACL 2024). Best Resource Paper Award, 2024

Download paper here

Recommended citation: [code] Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo. (2024). "Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research." 2024 Conference of the Association for Computational Linguistics (ACL 2024). Best Resource Paper Award.
Download Paper

MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization

Published in Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS) 2024, 2024

Download paper here

Recommended citation: [code] Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Valentin Hoffman, Tomasz Limisiewicz, Yulia Tsvetkov, Noah A. Smith. (2024). "MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization." Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS) 2024.
Download Paper

SSD-2: Scaling and Inference-time Fusion of Diffusion Language Models

Published in 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), 2024

Download paper here

Recommended citation: [code] Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov, Marjan Ghazvininejad. (2024). "SSD-2: Scaling and Inference-time Fusion of Diffusion Language Models." 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024).
Download Paper

The Art of Saying No: Contextual Noncompliance in Language Models

Published in Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS) 2024: Datasets and Benchmarks, 2024

Download paper here

Recommended citation: Faeze Brahman*, Sachin Kumar*, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi. (2024). "The Art of Saying No: Contextual Noncompliance in Language Models." Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS) 2024: Datasets and Benchmarks.
Download Paper

What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization

Published in 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), 2024

Download paper here

Recommended citation: [code] YuHan Liu, Shangbin Feng, Xiaochuang Han, Vidhisha Balachandran, Chan Young Park, Sachin Kumar, Yulia Tsvetkov. (2024). "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization." 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024).
Download Paper

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Published in Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS) 2024, 2024

Download paper here

Recommended citation: [code] Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, Nouha Dziri. (2024). "WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models." Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS) 2024.
Download Paper

BLAB: Brutally Long Audio Bench

Published in preprint, 2025

Download paper here

Recommended citation: Orevaoghene Ahia, Martijn Bartelds, Kabir Ahuja, Hila Gonen, Valentin Hofmann, Siddhant Arora, Shuyue Stella Li, Vishal Puttagunta, Mofetoluwa Adeyemi, Charishma Buchireddy, Ben Walls, Noah Bennett, Shinji Watanabe, Noah A. Smith, Yulia Tsvetkov, Sachin Kumar. (2025). "BLAB: Brutally Long Audio Bench." preprint.
Download Paper

ComPO: Community Preferences for Language Model Personalization

Published in 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025), 2025

Download paper here

Recommended citation: Sachin Kumar*, Chan Young Park*, Yulia Tsvetkov, Noah A. Smith, Hannaneh Hajishirzi. (2025). "ComPO: Community Preferences for Language Model Personalization." 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025).
Download Paper

GroundCocoa: A Benchmark for Evaluating Compositional & Conditional Reasoning in Language Models

Published in 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025), 2025

Download paper here

Recommended citation: Harsh Kohli, Sachin Kumar, Huan Sun. (2025). "GroundCocoa: A Benchmark for Evaluating Compositional & Conditional Reasoning in Language Models." 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025).
Download Paper

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Published in 2025 Conference of the Association for Computational Linguistics (ACL 2025), 2025

Download paper here

Recommended citation: Lester James V. Miranda, Yizhong Wang, Yanai Elazar, Sachin Kumar, Valentina Pyatkin, Faeze Brahman, Noah A. Smith, Hannaneh Hajishirzi, Pradeep Dasigi. (2025). "Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback." 2025 Conference of the Association for Computational Linguistics (ACL 2025).
Download Paper

On Distributional Robustness of In-Context Learning for Text Classification

Published in Second Workshop on Test-Time Adaptation: Putting Updates to the Test! @ICML 2025, 2025

Recommended citation: [pdf] Carolina Hatanpää, Noah A. Smith, Sachin Kumar. (2025). “On Distributional Robustness of In-Context Learning for Text Classification.” Second Workshop on Test-Time Adaptation: Putting Updates to the Test! @ICML 2025.

Recommended citation: [pdf] Carolina Hatanpää, Noah A. Smith, Sachin Kumar. (2025). "On Distributional Robustness of In-Context Learning for Text Classification." Second Workshop on Test-Time Adaptation: Putting Updates to the Test! @ICML 2025.

RewardBench: Evaluating Reward Models for Language Modeling

Published in 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025) Findings, 2025

Download paper here

Recommended citation: Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi. (2025). "RewardBench: Evaluating Reward Models for Language Modeling." 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025) Findings.
Download Paper

Steering off Course: Reliability Challenges in Steering Language Models

Published in 2025 Conference of the Association for Computational Linguistics (ACL 2025). Oral (top 8%), Panel (top 0.8%), 2025

Download paper here

Recommended citation: Patrick Queiroz Da Silva, Hari Sethuraman, Dheeraj Rajagopal, Hannaneh Hajishirzi, Sachin Kumar. (2025). "Steering off Course: Reliability Challenges in Steering Language Models." 2025 Conference of the Association for Computational Linguistics (ACL 2025). Oral (top 8%), Panel (top 0.8%).
Download Paper

talks

teaching