JMLR

BitNet: 1-bit Pre-training for Large Language Models

Authors

Lei Wang Yi Wu Hongyu Wang Shuming Ma Lingxiao Ma Wenhui Wang Li Dong Shaohan Huang Huaijie Wang Jilong Xue Ruiping Wang Furu Wei

Research Topics

Machine Learning

View Full Paper

Paper Information

Journal:
Journal of Machine Learning Research
Added to Tracker:
Sep 08, 2025

Abstract

The increasing size of large language models (LLMs) has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption. Previous research typically applies quantization after pre-training. While these methods avoid the need for model retraining, they often cause notable accuracy loss at extremely low bit-widths. In this work, we explore the feasibility and scalability of 1-bit pre-training. We introduce BitNet b1 and BitNet b1.58, the scalable and stable 1-bit Transformer architecture designed for LLMs. Specifically, we introduce BitLinear as a drop-in replacement of the nn.Linear layer in order to train 1-bit weights from scratch. Experimental results show that BitNet b1 achieves competitive performance, compared to state-of-the-art 8-bit quantization methods and FP16 Transformer baselines. With the ternary weight, BitNet b1.58 matches the half-precision Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, BitNet defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. It enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.

Author Details

Lei Wang

Author

Yi Wu

Author

Hongyu Wang

Author

Shuming Ma

Author

Lingxiao Ma

Author

Wenhui Wang

Author

Li Dong

Author

Shaohan Huang

Author

Huaijie Wang

Author

Jilong Xue

Author

Ruiping Wang

Author

Furu Wei

Author

Research Topics & Keywords

Machine Learning

Research Area

Citation Information

APA Format


                                
                                    
                                    Lei Wang
                                
                                    
                                        , 
                                    
                                    Yi Wu
                                
                                    
                                        , 
                                    
                                    Hongyu Wang
                                
                                    
                                        , 
                                    
                                    Shuming Ma
                                
                                    
                                        , 
                                    
                                    Lingxiao Ma
                                
                                    
                                        , 
                                    
                                    Wenhui Wang
                                
                                    
                                        , 
                                    
                                    Li Dong
                                
                                    
                                        , 
                                    
                                    Shaohan Huang
                                
                                    
                                        , 
                                    
                                    Huaijie Wang
                                
                                    
                                        , 
                                    
                                    Jilong Xue
                                
                                    
                                        , 
                                    
                                    Ruiping Wang
                                
                                    
                                         & 
                                    
                                    Furu Wei
                                
                                . 
                                BitNet: 1-bit Pre-training for Large Language Models. 
                                Journal of Machine Learning Research
                                .

BibTeX Format


@article{paper530,

  title = { BitNet: 1-bit Pre-training for Large Language Models },

  author = { 
                                
                                    Lei Wang
                                
                                     and Yi Wu
                                
                                     and Hongyu Wang
                                
                                     and Shuming Ma
                                
                                     and Lingxiao Ma
                                
                                     and Wenhui Wang
                                
                                     and Li Dong
                                
                                     and Shaohan Huang
                                
                                     and Huaijie Wang
                                
                                     and Jilong Xue
                                
                                     and Ruiping Wang
                                
                                     and Furu Wei
                                
                                },

  journal = { Journal of Machine Learning Research },



  url = { https://www.jmlr.org/papers/v26/24-2050.html }

}

Back to Papers

View Full Paper More from JMLR