Seguir
Sharan Narang
Sharan Narang
Research Engineer, Meta AI
Dirección de correo verificada de meta.com
Título
Citado por
Citado por
Año
Exploring the limits of transfer learning with a unified text-to-text transformer
C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ...
Journal of machine learning research 21 (140), 1-67, 2020
196942020
Llama 2: Open foundation and fine-tuned chat models
H Touvron, L Martin, K Stone, P Albert, A Almahairi, Y Babaei, ...
arXiv preprint arXiv:2307.09288, 2023
105982023
Palm: Scaling language modeling with pathways
A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ...
Journal of Machine Learning Research 24 (240), 1-113, 2023
50982023
Deep speech 2: End-to-end speech recognition in english and mandarin
D Amodei, S Ananthanarayanan, R Anubhai, J Bai, E Battenberg, C Case, ...
International conference on machine learning, 173-182, 2016
38422016
Scaling instruction-finetuned language models
HW Chung, L Hou, S Longpre, B Zoph, Y Tay, W Fedus, Y Li, X Wang, ...
Journal of Machine Learning Research 25 (70), 1-53, 2024
29092024
Mixed precision training
P Micikevicius, S Narang, J Alben, G Diamos, E Elsen, D Garcia, ...
arXiv preprint arXiv:1710.03740, 2017
20262017
The llama 3 herd of models
A Dubey, A Jauhri, A Pandey, A Kadian, A Al-Dahle, A Letman, A Mathur, ...
arXiv preprint arXiv:2407.21783, 2024
12222024
Self-consistency improves chain of thought reasoning in language models
X Wang, J Wei, D Schuurmans, Q Le, E Chi, S Narang, A Chowdhery, ...
arXiv preprint arXiv:2203.11171, 2022
10632022
Deep voice 3: Scaling text-to-speech with convolutional sequence learning
W Ping, K Peng, A Gibiansky, SO Arik, A Kannan, S Narang, J Raiman, ...
arXiv preprint arXiv:1710.07654, 2017
899*2017
Deep learning scaling is predictable, empirically
J Hestness, S Narang, N Ardalani, G Diamos, H Jun, H Kianinejad, ...
arXiv preprint arXiv:1712.00409, 2017
7832017
Byt5: Towards a token-free future with pre-trained byte-to-byte models
L Xue, A Barua, N Constant, R Al-Rfou, S Narang, M Kale, A Roberts, ...
Transactions of the Association for Computational Linguistics 10, 291-306, 2022
4212022
Exploring sparsity in recurrent neural networks
S Narang, E Elsen, G Diamos, S Sengupta
arXiv preprint arXiv:1704.05119, 2017
3712017
DSD: regularizing deep neural networks with dense-sparse-dense training flow
S Han, J Pool, S Narang, H Mao, S Tang, E Elsen, B Catanzaro, J Tran, ...
arXiv preprint arXiv:1607.04381 3 (6), 2016
346*2016
Wt5?! training text-to-text models to explain their predictions
S Narang, C Raffel, K Lee, A Roberts, N Fiedel, K Malkan
arXiv preprint arXiv:2004.14546, 2020
2052020
Block-sparse recurrent neural networks
S Narang, E Undersander, G Diamos
arXiv preprint arXiv:1711.02782, 2017
1572017
Scaling up models and data with t5x and seqio
A Roberts, HW Chung, G Mishra, A Levskaya, J Bradbury, D Andor, ...
Journal of Machine Learning Research 24 (377), 1-8, 2023
1542023
Effective long-context scaling of foundation models
W Xiong, J Liu, I Molybog, H Zhang, P Bhargava, R Hou, L Martin, ...
arXiv preprint arXiv:2309.16039, 2023
1402023
Scale efficiently: Insights from pre-training and fine-tuning transformers
Y Tay, M Dehghani, J Rao, W Fedus, S Abnar, HW Chung, S Narang, ...
arXiv preprint arXiv:2109.10686, 2021
1312021
Llama 2: open foundation and fine-tuned chat models. arXiv
H Touvron, L Martin, K Stone, P Albert, A Almahairi, Y Babaei, ...
arXiv preprint arXiv:2307.09288, 2023
1152023
Palm: Scaling language modeling with pathways. arXiv 2022
A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ...
arXiv preprint arXiv:2204.02311 10, 2022
1142022
El sistema no puede realizar la operación en estos momentos. Inténtalo de nuevo más tarde.
Artículos 1–20