Transformers: State-of-the-art natural language processing T Wolf, L Debut, V Sanh, J Chaumond, C Delangue, A Moi, P Cistac, ... Proceedings of the 2020 conference on empirical methods in natural language …, 2020 | 16107* | 2020 |
Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... | 1620 | 2023 |
Datasets: A Community Library for Natural Language Processing Q Lhoest, A Villanova del Moral, Y Jernite, A Thakur, P von Platen, S Patil, ... Proceedings of the 2021 Conference on Empirical Methods in Natural Language …, 2021 | 574* | 2021 |
The bigscience roots corpus: A 1.6 tb composite multilingual dataset H Laurençon, L Saulnier, T Wang, C Akiki, A Villanova del Moral, ... Advances in Neural Information Processing Systems 35, 31809-31826, 2022 | 168 | 2022 |
Distributed deep learning in open collaborations M Diskin, A Bukhtiyarov, M Ryabinin, L Saulnier, A Sinitsin, D Popov, ... Advances in Neural Information Processing Systems 34, 7879-7897, 2021 | 51 | 2021 |
Evaluate & evaluation on the hub: Better best practices for data and model measurements L Von Werra, L Tunstall, A Thakur, AS Luccioni, T Thrush, A Piktus, ... arXiv preprint arXiv:2210.01970, 2022 | 24 | 2022 |
Croissant: A Metadata Format for ML-Ready Datasets M Akhtar, O Benjelloun, C Conforti, P Gijsbers, J Giner-Miguelez, N Jain, ... Proceedings of the Eighth Workshop on Data Management for End-to-End Machine …, 2024 | 15 | 2024 |
Training transformers together A Borzunov, M Ryabinin, T Dettmers, Q Lhoest, L Saulnier, M Diskin, ... NeurIPS 2021 Competitions and Demonstrations Track, 335-342, 2022 | 9 | 2022 |
AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages CC Emezue, S Gandhi, L Tunstall, A Abid, J Meyer, Q Lhoest, P Allen, ... arXiv preprint arXiv:2303.12582, 2023 | | 2023 |
Actes de la conférence CAID 2020 F de Vieilleville, S May, A Lagrange, A Dupuis, R Ruiloba, FN Mboula, ... | | 2021 |