The language model (LLM) size and training data in the large model era have increased, including natural language and code.
Code is the intermediary between humans and computers, converting high-level goals into executable intermediate steps. It has the characteristics of grammatical standard, logical consistency, abstraction and modularity.
A research team at the University of Illinois at Urbana-Champaign recently published a review report summarizing the multiple benefits of integrating code into LLM training data.
Paper link: https://arxiv.org/abs/2401.00812v1
For details In addition to improving LLM's code generation capabilities, the benefits also include the following three points:
1. Helps unlock LLM's reasoning capabilities and enables it to be applied to a series of For more complex natural language tasks;
2. Guide LLM to generate structured and precise intermediate steps, which can then be connected to external execution ends through function calls. ;
3. The code compilation and execution environment can be used to provide more diverse feedback signals for further improvement of the model.
In addition, the researchers also tracked LLM’s ability to understand instructions, decompose goals, plan and How the ability to execute actions and draw from feedback can play a key role in downstream tasks.
Finally, the article also proposes key challenges and future research directions in the field of "enhancing LLM with code".
Taking OpenAI’s GPT Codex as an example, after code pre-training for LLM, you can Expanding the scope of LLM tasks, in addition to natural language processing, the model can also generate code for mathematical theories, perform general programming tasks, data retrieval, etc.
The code generation task has two characteristics: 1) the code sequence needs to be executed effectively, so it must have coherent logic, 2) each intermediate step can be verified step by step (step- by-step logic verification).
Utilizing and embedding code in pre-training can improve the performance of LLM Chain of Thought (CoT) technology in traditional natural language downstream tasks, indicating that code training can improve LLM’s ability to perform complex reasoning.
By implicitly learning from the structured form of code, code LLM also shows better performance in common sense structural reasoning tasks, such as related to markup, HTML and diagram understanding. task.
Recent research results indicate that connecting LLMs to other functional endpoints (i.e., using external tools and execution module enhancements LLMs) helps LLMs perform tasks more accurately and reliably.
These functional purposes enable LLMs to acquire external knowledge, participate in multiple modal data, and interact effectively with the environment.
From related work, researchers have observed a common trend that LLMs generate programming languages or utilize predefined functions to build interfaces with other The connection of functional terminals is a "code-centric" paradigm.
Contrary to the fixed practical flow of strictly hard-coded tool calls in the LLM inference mechanism, the code-centric paradigm allows LLM to dynamically generate tokens and use adaptable parameters. Calling the execution module provides a simple and clear method for LLM to interact with other functional terminals, enhancing the flexibility and scalability of its applications.
Importantly, this paradigm allows LLM to interact with numerous functional terminals across different modalities and domains; by expanding the number and variety of accessible functional terminals, LLM can handle more complex tasks.
This article mainly studies the text and multi-modal tools connected to LLM, as well as the functional end of the physical world, including robots and autonomous driving, demonstrating the role of LLM in solving various modes and Versatility in terms of domain problems.
LLMs exhibit performance beyond their training parameters, in part due to the model's ability to absorb feedback signals, especially when non-static in real world applications.
However, the choice of feedback signal must be careful, because noisy cues may hinder the performance of LLM on downstream tasks.
In addition, since labor costs are high, it is crucial to automatically collect feedback while maintaining loyalty.
Embedding LLMs into the code execution environment can achieve automatic feedback of the above conditions.
Because code execution is largely deterministic, the feedback LLMs get from the results of executing the code remains faithful to the target task; the code interpreter also queries internal feedback for LLMs An automatic path is provided to debug and optimize error codes generated by LLMs without manual annotation.
Additionally, the code environment allows LLMs to integrate a variety of external feedback forms, including but not limited to binary correctness feedback, natural language explanations of results, and reward value ranking, This enables a highly customizable approach to improving performance.
The causal relationship between code pre-training and LLMs inference enhancement
While it seems intuitive that certain properties of code data may contribute to the reasoning capabilities of LLMs, the exact extent of their impact on enhancing reasoning skills remains obscure.
In the next step of research work, it will be important to study whether these code attributes can actually enhance the inference ability of trained LLMs in the training data.
If it is true that pre-training on specific properties of code can directly improve the reasoning capabilities of LLMs, then understanding this phenomenon will be key to further improving the complex reasoning capabilities of current models.
The reasoning ability is not limited to the code
Although the reasoning ability is enhanced through code pre-training, The underlying model still lacks the human-like reasoning capabilities expected from true general artificial intelligence.
In addition to code, a large number of other textual data sources have the potential to enhance LLM inference capabilities, where the inherent characteristics of code, such as lack of ambiguity, executability, and logical sequential structure, provide a better way to collect or Guiding principles are provided for creating these datasets.
But if we continue to stick to the paradigm of training language models on large corpora with language modeling goals, it will be difficult to have a sequentially readable language that is more abstract than a formal language: highly structured , closely related to symbolic language and abundantly present in digital network environments.
The researchers envision that exploring alternative data patterns, diverse training objectives, and novel architectures will provide more opportunities to further enhance model inference capabilities.
Challenges in applying the code-centric paradigm
In LLMs, code is used to connect to different The main challenge with function terminals is learning the correct way to call different functions, including selecting the correct function (function) terminal and passing the correct parameters at the appropriate time.
For example, for a simple task (web page navigation), given a limited set of action primitives, such as mouse movement, click, and page scrolling, some examples are given (few -shot), a powerful basic LLM often requires the LLM to accurately master the use of these primitives.
For more complex tasks in data-intensive fields such as chemistry, biology, and astronomy, which involve calls to domain-specific python libraries containing many complex functions with different functions, enhance LLMs to correctly call these The learning capability of functional functions is a forward-looking direction that enables LLMs to perform expert-level tasks in fine-grained domains.
Learn from multiple rounds of interaction and feedback
LLMs often require multiple interactions with users and the environment, Continuously corrects yourself to improve completion of complex tasks.
While code execution provides reliable and customizable feedback, a perfect way to fully exploit this feedback has not yet been established.
Although the current selection-based methods are useful, they cannot guarantee improved performance and are inefficient; the recursion-based methods heavily rely on the context learning ability of LLM, which may limit its applicability ; Although the fine-tuning method has made continuous improvements, data collection and fine-tuning are resource-intensive and difficult to use in practice.
Researchers believe that reinforcement learning may be a more effective way to utilize feedback and improve, providing a dynamic way to adapt to feedback, potentially through carefully designed reward functions. to address the limitations of current technology.
But a lot of research is still needed to understand how to design reward functions and how to best integrate reinforcement learning with LLMs to complete complex tasks.
The above is the detailed content of Uncovering the magic wand of the LLM wizard, the UIUC Chinese team reveals the three major advantages of code data. For more information, please follow other related articles on the PHP Chinese website!