Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models
Published in 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), 2023
Recommended citation: [code] Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David R. Mortensen, Noah A. Smith, Yulia Tsvetkov. (2023). “Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models.” 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023).
Recommended citation: [code] Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David R. Mortensen, Noah A. Smith, Yulia Tsvetkov. (2023). "Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models." 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023).
Download Paper