LoRA (Low-Rank Adaptation) is a popular technique designed to fine-tune large language models (LLM). This technology was originally proposed by Microsoft researchers and included in the paper "LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS". LoRA differs from other techniques in that instead of adjusting all parameters of the neural network, it focuses on updating a small number of low-rank matrices, significantly reducing the amount of computation required to train the model.
Since LoRA’s fine-tuning quality is comparable to full-model fine-tuning, many people refer to this method as a fine-tuning artifact. Since its release, many people have been curious about the technology and wanted to write code to better understand the research. In the past, the lack of proper documentation has been an issue, but now, we have tutorials to help.
The author of this tutorial is Sebastian Raschka, a well-known machine learning and AI researcher. He said that among various effective LLM fine-tuning methods, LoRA is still his first choice. To this end, Sebastian wrote a blog "Code LoRA From Scratch" to build LoRA from scratch. In his opinion, this is a good learning method.
This article introduces low-rank adaptation (LoRA) by writing code from scratch. Sebastian fine-tuned the DistilBERT model in the experiment and used it applied to classification tasks.
The comparison results between the LoRA method and the traditional fine-tuning method show that the LoRA method achieved 92.39% in test accuracy, which is better than fine-tuning only the last few layers of the model (86.22% of the test accuracy) shows better performance. This shows that the LoRA method has obvious advantages in optimizing model performance and can better improve the model's generalization ability and prediction accuracy. This result highlights the importance of adopting advanced techniques and methods during model training and tuning to obtain better performance and results. By comparing how
#Sebastian achieves it, we will continue to look down.
Expressing a LoRA layer in code is like this:
Among them, in_dim is the input dimension of the layer you want to modify using LoRA, and the corresponding out_dim is the output dimension of the layer. A hyperparameter, the scaling factor alpha, is also added to the code. Higher alpha values mean greater adjustments to model behavior, and lower values mean the opposite. Additionally, this article initializes matrix A with smaller values from a random distribution and initializes matrix B with zeros.
It’s worth mentioning that where LoRA comes into play is usually the linear (feedforward) layer of a neural network. For example, for a simple PyTorch model or module with two linear layers (for example, this might be the feedforward module of the Transformer block), the forward method can be expressed as:
When using LoRA, LoRA updates are usually added to the output of these linear layers, and the code is as follows:
If you want to implement LoRA by modifying an existing PyTorch model, a simple way is to replace each linear layer with a LinearWithLoRA layer:
A summary of the above concepts is shown in the figure below:
In order to apply LoRA, this article replaces the existing linear layers in the neural network with a combination of the original linear Layer and LoRALayer's LinearWithLoRA layer.
LoRA can be used for models such as GPT or image generation. For simple explanation, this article uses a small BERT (DistilBERT) model for text classification.
Since this article only trains new LoRA weights, it is necessary to set the requires_grad of all trainable parameters to False to freeze all model parameters:
Next, use print (model) to check the structure of the model:
It can be seen from the output that the model consists of 6 transformer layers, including linear layers:
In addition , the model has two linear output layers:
LoRA can be selectively enabled for these linear layers by defining the following assignment function and loop:
Check the model again using print (model) to check its updated structure:
##
As you can see above, the Linear layer has been successfully replaced by the LinearWithLoRA layer.
If you train the model using the default hyperparameters shown above, it results in the following performance on the IMDb movie review classification dataset:
In the next section, this paper compares these LoRA fine-tuning results with traditional fine-tuning results.
In the previous section, LoRA achieved a test accuracy of 89.44% under default settings, How does this compare to traditional fine-tuning methods?
For comparison, this article conducted another experiment, taking training the DistilBERT model as an example, but only updated the last 2 layers during training. The researchers achieved this by freezing all model weights and then unfreezing the two linear output layers:
Classification performance obtained by training only the last two layers As follows:
The results show that LoRA performs better than the traditional method of fine-tuning the last two layers, but it uses 4 times fewer parameters . Fine-tuning all layers required updating 450 times more parameters than the LoRA setup, but only improved test accuracy by 2%.
The results mentioned above are all performed by LoRA under the default settings, and the hyperparameters are as follows:
If the user wants to try different hyperparameter configurations, he can use the following command:
However, the optimal hyperparameter configuration is as follows:
Under this configuration, the result is:
Notable Yes, even with only a small set of trainable parameters in the LoRA setting (500k VS 66M), the accuracy is slightly higher than that obtained with full fine-tuning.
Original link: https://lightning.ai/lightning-ai/studios/code-lora-from-scratch?cnotallow=f5fc72b1f6eeeaf74b648b2aa8aaf8b6
The above is the detailed content of How to write LoRA code from scratch, here is a tutorial. For more information, please follow other related articles on the PHP Chinese website!