On the parameterization and initialization of diagonal state space models A Gu, A Gupta, K Goel, C Ré Advances in Neural Information Processing Systems 35, 35971-35983, 2022 | 303 | 2022 |
Diagonal state spaces are as effective as structured state spaces A Gupta, A Gu, J Berant Advances in Neural Information Processing Systems 35, 22982-22994, 2022 | 276 | 2022 |
Injecting numerical reasoning skills into language models M Geva*, A Gupta*, J Berant Proceedings of the 58th Annual Meeting of the Association for Computational …, 2020 | 233 | 2020 |
Long range language modeling via gated state spaces H Mehta, A Gupta, A Cutkosky, B Neyshabur The Eleventh International Conference on Learning Representations, 2023 | 217 | 2023 |
Arithmetic circuits: A chasm at depth 3 A Gupta, P Kamath, N Kayal, R Saptharishi SIAM Journal on Computing 45 (3), 1064-1079, 2016 | 187* | 2016 |
Break It Down: A Question Understanding Benchmark T Wolfson, M Geva, A Gupta, M Gardner, Y Goldberg, D Deutch, J Berant Transactions of the Association for Computational Linguistics 8, 183-198, 2020 | 186 | 2020 |
Approaching the chasm at depth four A Gupta, P Kamath, N Kayal, R Saptharishi Journal of the ACM (JACM) 61 (6), 1-16, 2014 | 140 | 2014 |
Scrolls: Standardized comparison over long language sequences U Shaham, E Segal, M Ivgi, A Efrat, O Yoran, A Haviv, A Gupta, W Xiong, ... Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022 | 129 | 2022 |
Analyzing transformers in embedding space G Dar, M Geva, A Gupta, J Berant Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023 | 111 | 2023 |
Gmat: Global memory augmentation for transformers A Gupta, J Berant arXiv preprint arXiv:2006.03274, 2020 | 52 | 2020 |
Memory-efficient Transformers via Top-k Attention A Gupta, G Dar, S Goodman, D Ciprut, J Berant Proceedings of the Second Workshop on Simple and Efficient Natural Language …, 2021 | 37 | 2021 |
Diagonal state space augmented transformers for speech recognition G Saon, A Gupta, X Cui ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 32 | 2023 |
Reconstruction of depth-4 multilinear circuits with top fan-in 2 A Gupta, N Kayal, S Lokam Proceedings of the forty-fourth annual ACM symposium on Theory of computing …, 2012 | 29 | 2012 |
Algebraic geometric techniques for depth-4 PIT & sylvester-gallai conjectures for varieties A Gupta Electronic Colloquium on Computational Complexity (ECCC) 21 (130), 50, 2014 | 28 | 2014 |
Never train from scratch: Fair comparison of long-sequence models requires data-driven priors I Amos, J Berant, A Gupta The Twelfth International Conference on Learning Representations, 2024 | 21 | 2024 |
Random arithmetic formulas can be reconstructed efficiently A Gupta, N Kayal, Y Qiao computational complexity 23, 207-303, 2014 | 21 | 2014 |
Simplifying and understanding state space models with diagonal linear rnns A Gupta, H Mehta, J Berant arXiv preprint arXiv:2212.00768, 2022 | 20 | 2022 |
Efficient reconstruction of random multilinear formulas A Gupta, N Kayal, S Lokam 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, 778-787, 2011 | 18 | 2011 |
Value-aware Approximate Attention A Gupta, J Berant Proceedings of the 2021 Conference on Empirical Methods in Natural Language …, 2021 | 5 | 2021 |
Exploring the limits of decoder-only models trained on public speech recognition corpora A Gupta, G Saon, B Kingsbury arXiv preprint arXiv:2402.00235, 2024 | 4 | 2024 |