Palm: Scaling language modeling with pathways A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... Journal of Machine Learning Research 24 (240), 1-113, 2023 | 5088 | 2023 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 2144 | 2023 |
Palm 2 technical report R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 2023 | 1411 | 2023 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 671* | 2024 |
Ul2: Unifying language learning paradigms Y Tay, M Dehghani, VQ Tran, X Garcia, J Wei, X Wang, HW Chung, ... arXiv preprint arXiv:2205.05131, 2022 | 276 | 2022 |
Scaling up models and data with t5x and seqio A Roberts, HW Chung, G Mishra, A Levskaya, J Bradbury, D Andor, ... Journal of Machine Learning Research 24 (377), 1-8, 2023 | 154 | 2023 |
Unifying language learning paradigms Y Tay, M Dehghani, VQ Tran, X Garcia, D Bahri, T Schuster, HS Zheng, ... arXiv preprint arXiv:2205.05131 10, 2022 | 139 | 2022 |
Palm: Scaling language modeling with pathways. arXiv 2022 A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... arXiv preprint arXiv:2204.02311 10, 2022 | 114 | 2022 |
Scaling laws for neural machine translation B Ghorbani, O Firat, M Freitag, A Bapna, M Krikun, X Garcia, C Chelba, ... arXiv preprint arXiv:2109.07740, 2021 | 86 | 2021 |
Building machine translation systems for the next thousand languages A Bapna, I Caswell, J Kreutzer, O Firat, D van Esch, A Siddhant, M Niu, ... arXiv preprint arXiv:2205.03983, 2022 | 81 | 2022 |
Madlad-400: A multilingual and document-level large audited dataset S Kudugunta, I Caswell, B Zhang, X Garcia, D Xin, A Kusupati, R Stella, ... Advances in Neural Information Processing Systems 36, 2024 | 77 | 2024 |
Transcending scaling laws with 0.1% extra compute Y Tay, J Wei, HW Chung, VQ Tran, DR So, S Shakeri, X Garcia, HS Zheng, ... arXiv preprint arXiv:2210.11399, 2022 | 68 | 2022 |
The unreasonable effectiveness of few-shot learning for machine translation X Garcia, Y Bansal, C Cherry, G Foster, M Krikun, M Johnson, O Firat International Conference on Machine Learning, 10867-10878, 2023 | 64 | 2023 |
Beyond human data: Scaling self-training for problem-solving with language models A Singh, JD Co-Reyes, R Agarwal, A Anand, P Patil, X Garcia, PJ Liu, ... arXiv preprint arXiv:2312.06585, 2023 | 63 | 2023 |
PaLM 2 Technical Report; 2023 R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 2023 | 51 | 2023 |
Towards continual learning for multilingual machine translation via vocabulary substitution X Garcia, N Constant, AP Parikh, O Firat arXiv preprint arXiv:2103.06799, 2021 | 43 | 2021 |
Unimax: Fairer and more effective language sampling for large-scale multilingual pretraining HW Chung, N Constant, X Garcia, A Roberts, Y Tay, S Narang, O Firat arXiv preprint arXiv:2304.09151, 2023 | 41 | 2023 |
A multilingual view of unsupervised machine translation X Garcia, P Foret, T Sellam, AP Parikh arXiv preprint arXiv:2002.02955, 2020 | 39 | 2020 |
Harnessing multilinguality in unsupervised machine translation for rare languages X Garcia, A Siddhant, O Firat, AP Parikh arXiv preprint arXiv:2009.11201, 2020 | 33 | 2020 |
Towards the next 1000 languages in multilingual machine translation: Exploring the synergy between supervised and self-supervised learning A Siddhant, A Bapna, O Firat, Y Cao, MX Chen, I Caswell, X Garcia arXiv preprint arXiv:2201.03110, 2022 | 32 | 2022 |