arXiv

Open Access Scientific Research

Established 1991 · Cornell University · 2,400,000+ articles



Most Cited This Month

Attention Is All You Need

Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (2017)

We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable.

12,847
Citations
 
cs.CL
1706.03762
10.48550/arXiv.1706.03762

Scaling Laws for Neural Language Models

Kaplan, McCandlish, Henighan, Brown, Chess, Child, Gray, Radford, Wu, Amodei (2020)

We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude.

4,291
Citations
 
cs.LG
2001.08361
10.48550/arXiv.2001.08361

Constitutional AI: Harmlessness from AI Feedback

Bai, Kadavath, Kundu, Askell, Kernion, Jones, Chen, Goldie, Mirhoseini, McKinnon, et al. (2022)

We experiment with methods for training a harmless AI assistant through a process we call Constitutional AI. The method involves both a supervised learning and a reinforcement learning from human feedback phase, using a set of principles to guide the model.

2,103
Citations
 
cs.AI
2212.08073
10.48550/arXiv.2212.08073

Denoising Diffusion Probabilistic Models

Ho, Jain, Abbeel (2020)

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion models and denoising score matching.

8,412
Citations
 
cs.LG
2006.11239
10.48550/arXiv.2006.11239

Language Models are Few-Shot Learners

Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell, et al. (2020)

We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. We train GPT-3, an autoregressive language model with 175 billion parameters.

15,230
Citations
 
cs.CL
2005.14165
10.48550/arXiv.2005.14165

Deep Residual Learning for Image Recognition

He, Zhang, Ren, Sun (2015)

We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.

21,547
Citations
 
cs.CV
1512.03385
10.48550/arXiv.1512.03385
Library aisle with bookshelves
Archive Statistics
2,417,893
Total Articles
14,291
New This Week
33
Years Online

Subject Areas
cs.AI Artificial Intelligence
cs.LG Machine Learning
cs.CL Computation & Language
cs.CV Computer Vision
cs.CR Cryptography & Security
cs.SE Software Engineering
stat.ML Statistics: Machine Learning
math.OC Optimization & Control
physics.comp-ph Computational Physics
quant-ph Quantum Physics
hep-th High Energy Physics: Theory
cond-mat Condensed Matter
Books on wooden shelf

Submission Guidelines

arXiv accepts submissions in physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. All submissions undergo a moderation process to verify appropriateness.

Authors retain copyright. Articles are available under open access licenses. There are no publication fees.

Read Submission Guidelines


- 1 -