Fudan University team released DISC-LawLLM, a Chinese smart legal system, to build a judicial evaluation benchmark and open source 300,000 fine-tuned data-AI-php.cn

With the rise of smart justice, smart legal systems driven by smart methods are expected to benefit different groups. For example, easing paperwork for legal professionals, providing legal advisory services to the general public, and providing study and exam coaching for law students.

Due to the uniqueness of legal knowledge and the diversity of judicial tasks, previous smart judicial research mainly focused on designing automated algorithms for specific tasks, which was difficult to provide for the judicial field. The demand for supporting services is far from being implemented. Large language models (LLMs) have demonstrated powerful capabilities in different traditional tasks, bringing hope for the further development of intelligent legal systems.

Recently, Fudan University’s Data Intelligence and Social Computing Laboratory (FudanDISC) released a Chinese smart legal system driven by a large language model - DISC-LawLLM. The system can provide a variety of legal services to different user groups. In addition, the laboratory also built an evaluation benchmark DISC-Law-Eval to evaluate the legal language model from both objective and subjective aspects. The performance of the model in the evaluation has obvious advantages compared with the existing large legal models.

The research team also released a high-quality supervised fine-tuning (SFT) data set containing 300,000 - DISC-Law-SFT. The model parameters and technical reports are also open source.

Fudan University team released DISC-LawLLM, a Chinese smart legal system, to build a judicial evaluation benchmark and open source 300,000 fine-tuned data

Home page address: https://law.fudan-disc.com
Github address: https://github.com/FudanDISC/DISC-LawLLM
Technical report: https://arxiv.org/abs/2309.11325

01 Sample Display

#When users have legal questions, they can consult the model and describe the questions , the model will give relevant legal regulations and explanations, recommended solutions, etc.

Fudan University team released DISC-LawLLM, a Chinese smart legal system, to build a judicial evaluation benchmark and open source 300,000 fine-tuned data

## and judicial agencies can use models to complete legal text summarization, judicial event detection, entity and relationship extraction, etc., to reduce paperwork and improve work efficiency.

## 图 2 Judicial document analysis

Fudan University team released DISC-LawLLM, a Chinese smart legal system, to build a judicial evaluation benchmark and open source 300,000 fine-tuned data

##Q is preparing for the judicial examination process , you can ask questions to the model to help consolidate legal knowledge and answer legal exam questions.

## When supported by legal provisions, the model will search relevant content in the knowledge base based on the question and give a reply.

Fudan University team released DISC-LawLLM, a Chinese smart legal system, to build a judicial evaluation benchmark and open source 300,000 fine-tuned data ##

02 Introduction to DISC-LawLLM

DISC-LawLLM is based on the high-quality data set DISC-Law-SFT we built in the general field Chinese large model Baichuan A large legal model obtained by fine-tuning the full parameter command on -13B. It is worth noting that our training data and training methods can be adapted to any base large model.

DISC-LawLLM has three core capabilities:

Fudan University team released DISC-LawLLM, a Chinese smart legal system, to build a judicial evaluation benchmark and open source 300,000 fine-tuned data

#1. Basic legal text processing capabilities. In view of the different basic capabilities of legal text understanding and generation, including information extraction, text summarization, etc., we constructed fine-tuned data based on existing NLP judicial task public data and real-world legal-related texts. ^{2. Legal reasoning thinking ability. In response to the needs of tasks in the smart judicial field, we used legal syllogism, the basic legal reasoning process of judges, to reconstruct the instruction data, effectively improving the legal reasoning ability of the model.
3. The ability to retrieve and follow knowledge in the judicial field is very important. When solving problems in the field of smart justice, it is usually necessary to search based on the relevant background laws or cases of the problem. In order to enhance the retrieval and compliance capabilities of the intelligent legal processing system, we equipped it with a retrieval enhancement module
The overall framework of the model is shown in Figure 5:

^{Set the structure of DISC-Law-SFT}

## Figure 6 The structure of DISC-Law-SFT

DISC-Law-SFT is divided into two sub-datasets, namely DISC-Law-SFT-Pair and DISC-Law-SFT-Triplet. The former introduces legal reasoning to LLM capabilities, while the latter helps improve the model's ability to utilize external knowledge. ^{Table 1: Introduction to the contents of the DISC-Law-SFT data set}

##Data source

The data of the DISC-Law-SFT data set comes from three parts. The first is the NLP judicial task public data set related to Chinese law, including Legal information extraction, entity and relationship extraction, judicial text summarization, judicial examination questions and answers, judicial reading comprehension, crime/sentence prediction, etc.; second, it collects legal-related original texts from the real world, such as laws and regulations, judicial cases, and judgment documents. , judicial-related examinations, etc.; the third is a general open source data set. We used alpaca_gpt4_data_zh and Firefly, which can enrich the diversity of the training set and reduce the risk of the model's basic capability degradation during the SFT training phase.

Instruction to construct

to the above one and two After the source data is converted into "input-output" instruction pairs, we use the following three methods to reconstruct the instruction data to improve data quality.

Behaviour Shaping

In legal syllogisms, large The premise is the applicable legal rules, the minor premise is the facts of the case, and the conclusion is the legal judgment. This constitutes a basic legal reasoning process for judges. Every case can be drawn to a clear conclusion through a syllogism, as follows:
Major premise: legal rules
Minor premise: facts of the case
Conclusion: Legal Judgment

We use GPT-3.5-turbo to complete the reconstruction and refinement of behavior shaping Output, ensuring that each conclusion is drawn from a legal provision and a case fact.

Knowledge expansion

Not applicable to behavior shaping For multiple-choice questions, we directly use legal knowledge to extend the output to provide more reasoning details. Many law-related exams and knowledge competitions only provide answer options, we use LLM to expand the legal knowledge involved, give correct answers, and reconstruct instruction pairs.

Thinking Cultivation

Chain of Thought (CoT) has been Proven to effectively improve the model's reasoning capabilities. To further empower the model with legal reasoning capabilities, we designed a thought chain with specific legal meaning, called LCoT, which requires the model to use legal syllogisms to derive answers. LCoT converts the input judge.
Case: ##Instruction triplet construction
In order to train the retrieval enhanced model, we constructed the DISC-Law-SFT-Triplet sub-dataset, the data is# Triples of the form ##, we use the three strategies listed in the instruction pair construction to process the original data, obtain input and output, and design heuristic rules to extract reference information from the original data.
04 Experiment
##Training

The training process of DISC-LawLLM is divided into two stages: SFT and retrieval enhancement.

Retrieval enhancement

Although we use high quality The instruction data fine-tunes the LLM, but it may produce inaccurate responses due to hallucinations or outdated knowledge. To solve this problem, we designed a retrieval module to enhance DISC-LawLLM.

Given a user input, the retriever returns the most relevant Top-K documents from the knowledge base by calculating their similarity to the input. These candidate documents, together with user input, are constructed using templates designed by us and then input into DISC-LawLLM. By querying the knowledge base, the model can better understand the main premises, resulting in more accurate and reliable answers.

^{Figure 7: Retrieval enhanced DISC-LawLLM}

Evaluation method

Evaluation Benchmark DISC-Law-Eval

We built a DISC-Law-Eval, a fair smart legal system evaluation benchmark, evaluates from both objective and subjective perspectives, filling the gap that there is currently no benchmark to comprehensively evaluate smart legal systems.

## Figure 8: DISC-Law-Eval evaluation benchmark

Objective Evaluation

In order to objectively and quantitatively evaluate the legal knowledge and reasoning capabilities of the intelligent legal system, we designed An objective evaluation data set consists of a series of single-item and multiple-choice questions from China's legal standardized examinations and knowledge competitions, and the questions are divided into three levels: difficult, normal and easy based on content complexity and deductive difficulty. It can provide a more challenging and reliable way to measure whether the model can use its knowledge to reason about the correct answer. We demonstrate performance by calculating accuracy.

Subjective evaluation

For the subjective evaluation part, we use Assessments are conducted in a question-and-answer paradigm, simulating the process of subjective exam questions. We hand-constructed a high-quality test set from legal consultations, online forums, justice-related publications, and legal documents. We use GPT-3.5-turbo as a referee model to evaluate the model's output and provide a score from 1 to 5 using three criteria: accuracy, completeness, and clarity.

Evaluation results

##Comparison models

Compare our model DISC-LawLLM (without external knowledge base) with 4 general LLMs and 4 Chinese legal LLMs, including GPT-3.5-turbo , ChatGLM-6B, Baichuan-13B-Chat, Chinese-Alpaca2-13B; LexiLaw, LawGPT, Lawyer LLaMA, ChatLaw.

Objective evaluation results

DISC-LawLLM in all Large models with equal parameter numbers exceeded all comparisons in tests at different difficulty levels. Even compared to GPT-3.5-turbo with 175B parameters, DISC-LawLLM shows superior performance on some tests. Table 2 shows the objective evaluation results, in which bold indicates the best result and underline indicates the second best result.
## Table 2: Objective evaluation results
Subjective evaluation results
In objective reviews, DISC-LawLLM received the highest overall score and the highest scores in the two criteria of accuracy and clarity. Table 3 shows the subjective evaluation results, where bold indicates the best results.
## Table 3: Subjective evaluation results
05 Summary
We released DISC-LawLLM, an intelligent legal system that provides legal services in multiple application scenarios. Based on the public NLP task data set in the legal field, the original legal text and the open source general instruction data set, the legal instructions are reconstructed according to the legal syllogism for supervision and fine-tuning. In order to improve the reliability of the output, we added an external retrieval module. By improving legal reasoning and knowledge retrieval capabilities, DISC-LawLLM outperforms existing legal LLMs on the legal benchmark set we constructed. Research in this field will bring more prospects and possibilities to achieve legal resource balance, etc. We have released the constructed data set and model weights to promote further research.}

The above is the detailed content of Fudan University team released DISC-LawLLM, a Chinese smart legal system, to build a judicial evaluation benchmark and open source 300,000 fine-tuned data. For more information, please follow other related articles on the PHP Chinese website!