The focus of this year’s upgrade is the introduction of multi-modal large model capabilities.
As the video and musical compositions created by Sora and Suno spark an audiovisual revolution around the world, how will large-scale multimodal applications in industry evolve? On March 27, as China's leading "AI manufacturing" solution provider, Innovation Qizhi unveiled their forward-looking answer.
After half a year of hard work, Innovation Qizhi released the more powerful Qizhi Haiming Industrial Large Model 2.0 version (AInno-75B) at a press conference held in Beijing. Several large-model native applications also made their debut, including ChatVision, ChatCAD, and ChatRobot was also upgraded to the Pro version.
## to innovate Qizhi CTO Zhang Faen at the press conference
The application of Scaling laws is helpful for research People and engineers predict the performance gains from increasing model size and the number of parameters needed to achieve specific performance goals. At present, some consensus has been formed on the interface. Improving parameters can improve model performance. Compared with AInno-15B, AInno-75B has achieved significant growth in size and performance. The focus of this year’s upgrade is the introduction of multi-mode large model capabilities. Zhang Faen explained that this advanced large model can handle multiple information modalities including text, pictures, and videos, and can even integrate data types unique to industrial scenarios, such as CAD drawings and EEG signals. Its output is equally diverse and can generate text, images, videos, CAD design drawings or tool body operation behaviors.
In order to break this situation, Chuangxinqizhi took the lead in introducing industrial large model technology into the field of industrial design and launched a Text-to-CAD application - "ChatCAD": through a simple dialogue and question-and-answer format, you can quickly understand Based on the designer's creative intention, industrial design drawings that meet the requirements are automatically generated and exported to traditional software for fine-tuning.
## - Live demonstration of industrial pulley designEven in the face of lengthy and complex component design requirements, ChatCAD can handle it. For example, "Help me design a turbine. The turbine consists of a motor and an engine cover. The specific requirements are as follows: the motor is cylindrical, 20 in length, and 16 in diameter. The turbine consists of a cylindrical turbine shaft and 5 fan blades. The turbine shaft is 20 in length. The diameter is 12, the top of the turbine should have a cylindrical cone rotating shaft, the shaft cap length is 9, the diameter is 12, the engine cover has a diameter of 50, a length of 30, and the distance between the turbine blades and the engine cover is 1."
ChatCAD still generates results and continues to improve based on feedback. The designs generated by ChatCAD also support mainstream file formats and can be seamlessly connected to other industrial software to facilitate subsequent integration and modification.
Live demonstration of turbine design
This feature makes Mr. Wang very excited. He believes that ChatCAD is expected to help the industry reduce repetitive labor and avoid rigid specification restrictions, thereby affecting the manual quotation of the entire industry.
So, how is ChatCAD implemented? Zhang Faen explained that CAD is different from common modalities such as text, pictures, and videos. It needs to represent geometric data such as points, lines, edges, circles, columns, and processes. "So we also call it a modality, which is a modality that the C side does not have. We need to invent our own intermediate language to express CAD, generate this intermediate language or intermediate code for large models, and then translate these intermediate codes into CAD .”
## ha hung ▲ ▲ △ ▲ ▲ to to be ? It can be used directly for processing, but complex designs still need to be perfected. The goal of ChatCAD is to become a right-hand assistant for engineers in design institutes. It is expected to shorten the design process that originally took ten hours to one hour, with the large model responsible for 90% of the work and the remaining 10% being optimized manually.It is worth mentioning that Chuangxinqizhi has successfully integrated advanced large model technology into various industrial software such as CAD, MES, and BI, realizing a comprehensive integration of "R&D design-production control-information management". Intelligent transformation and upgrading of processes.
2. ChatVision: A new tool for industrial safety supervision## Since the live demonstration, C
hatVision finds the power socket in the screen", "Find out "White hard hat" and other specific targets.These instructions seem very simple. Without a large model, they need to be developed for each small recognition category (such as hard hat, smoking) The specific algorithm is difficult to modify after debugging and deployment, and the implementation cost is high and the cycle is long. The emergence of large models subverts the traditional paradigm. A single large model can cover the functions of multiple small models, surpassing it in terms of performance, accuracy, and generalization capabilities. , and supports natural language interaction, which greatly simplifies the development and deployment process.During the live demonstration, the screen changed: one colleague took off his work hat to play with his mobile phone, and another colleague took off his safety clothing. The demonstrator issued Instruction: "Please analyze this screen carefully. If there are any violations, send an email to the administrator."This instruction is very knowledge-intensive. It not only involves the judgment of violations, but also determines whether to trigger email sending and the recipients. . This is the typical service model of large-model native applications. As a result, ChatVision called many security monitoring skills in the background to identify, not only marked three violations, but also sent an email with screenshots.
There is a clear demonstration in the officially released ChatVision DEMO
The ChatVision demonstration fully reflects the planning and reasoning capabilities of industrial large models. It can convert user intentions into a series of external tool calls to complete complex video understanding tasks in an orderly manner.Zhang Faen, CTO of Innovation Qizhi, said that the company has accumulated more than 200 visual algorithms and model assets in the past few years, and Industrial large models have opened up new horizons for the application of these assets. Large models can not only act as intelligent orchestrators to optimize user experience, but their multi-modal capabilities can also enhance video understanding and play a significant role in the field of enterprise security.
The last demonstration case highlights the cutting-edge application of large models in the multi-modal field. Faced with a real workshop video, the demonstrator put forward a difficult requirement: "Please analyze this video carefully, tell me whether anyone is eating and mark the time when this action occurred." This task requires a large model to perform continuous action recognition on long-term sequence images and mark the start and end times of the actions. As a result, ChatVision accurately located the scene where workers were eating within the first 15 seconds of the video.
"Eating is a very common event, and the ability of large models to understand events is far better than traditional small algorithm models." Zhang Faen explained. For a long time, there has been an urgent need to ensure production and engineering safety through video. In the future, related work around large models will be expected to achieve intelligent video understanding of production safety conditions and production process compliance.
In Wang Xian’s view, safety is always the top priority in engineering projects. For many years, engineering safety training rarely involves on-site hazard identification. He believes that ChatVision has broad application prospects, and it is expected to be implemented in on-site safety helmet detection, high-altitude safety rope wearing, safety equipment carrying and other scenarios. ChatVision also has great potential in the supervision industry. Currently, many on-site safety inspections still rely heavily on manpower.
AInno-15B's native application ChatRobot has implemented voice control of industrial robots. Just tell ChatRobot "Bring me a cup of coffee", and it can direct the industrial robot arm to search for coffee on the shelf and design its own route to deliver the goods to you. ChatRobot Pro can process more complex information carrier EEG signals.
At the press conference, the demonstrator randomly selected a product (Uniform Green Tea) and asked a person with multiple electrodes fixed on his scalp to use his motor imagination to control an industrial robot to put the drink into his hands. The man wearing the collector is trying to think of three things: left, right, and selection. The cursor also moves left and right on the screen based on the signals translated by the large model. When the cursor moves to the target icon, he will stare at the icon and click the cursor to select it.
Next, ChatRobot Pro will independently complete the intelligent orchestration of tasks, generate executable task steps, and interact with the industrial robot interface in real time to instruct the robot to complete the task.
EEG signals are signals generated during brain activity. The relationship between brain activity and EEG signals is very complex, and how to decode it has become a major problem for researchers. While traditional approaches have low accuracy, AInno-75B shows potential for interpreting this type of multimodal information. Some foreign brain-computer interface technologies use invasive electrodes to obtain EEG signals, which involves a series of engineering issues such as electrode design, surgical implantation, rejection reaction, signal transmission, and signal decoding. Innovation Qizhi uses non-invasive EEG caps to collect EEG information, which greatly reduces the engineering difficulty.
However, Zhang Faen also said that the invasive method can obtain more channels and clearer EEG signals, which will facilitate subsequent decoding of more complex brain intentions. A vivid metaphor is: the invasive method of collecting EEG signals is like listening to a concert inside a stadium, while the non-invasive method is like listening to a concert outside the stadium. There will be a big difference in the clarity of the singing. Currently, the research and development work that Innovation Qizhi is doing is to verify the multi-modal capabilities of large industrial models and conduct technical pre-research for possible future brain-controlled industrial automation scenarios.
This is also an end-to-end native application, Zhang Faen emphasized. The entire process from EEG signal input to direct output of the final result (a robotic arm delivering the goods to the demonstrator) is completed by the neural network, without relying on hand-designed features or traditional data processing.
In addition to natural language interaction and motor imagination recognition, ChatRobot Pro also makes full use of the reasoning capabilities of industrial large models to realize long sequence task orchestration and complex decision-making. Giving powerful intelligent control and decision-making capabilities to different bodies (whether it is industrial robotic arms or AGVs, etc.) will also be the future direction of the innovative Qizhi Industrial large model.
In the era of generative AI, there is no precedent for industrial application, and innovation and wisdom have always been Explore the possibilities in industrial scenarios.
Zhang Faen calls the prospect of large models in the direction of enterprise services “Promising”. But he admitted that during the window period of technological change, everyone's understanding is often uneven, especially for relatively large changes. People's understanding needs time to follow up, and he is no exception.
In addition to the new native applications, the overall performance and effect of ChatDOC released last year have been improved, and the product functions have become more complete. ChatBI has added support for Excel and CSV data, and now the accuracy of generating SQL statements and analysis reports has increased by 15%. Large model serving engines are easier to deploy and provide higher inference performance.
"Innovation Qizhi will further polish the ChatX application built directly based on the core generation capabilities of industrial large models." Zhang Faen said.
The above is the detailed content of Watch videos, draw CAD, and recognize motion imagery! 75B's large multi-modal industrial model is so capable. For more information, please follow other related articles on the PHP Chinese website!