AI big model fierce melee! Within a day, giants such as Huawei, Ali, and Tencent have taken shots.

In recent months, the intensity of involution of domestic large models can be described as "fairy fights". On Friday, the melee of large models reached a new height. According to incomplete statistics of Wall Street, only today, companies such as Huawei, Ali, Tencent, Shang Tang and JD.COM released or updated large models.

Who is the most likely to build the China version of GPT-4 in the grand occasion of "Hundred Models War"?

On July 7, Huawei Cloud released Pangu Big Model 3.0 at the Developer Conference 2023. Zhang Pingan, managing director of Huawei and CEO of Huawei Cloud, said that Pangu Big Model 3.0 isCompletely industry-orientedThe big model of, including "5+N+X" three-tier architecture.

Zhang Pingan said at the meeting that,Pangu model can’t write poems, but only do things., will focus on the three innovation directions of "industry reshaping", "technology rooting" and "opening up and flying together", and continue to build core competitiveness and provide better services for industry customers, partners and developers.

The three-tier architecture is as follows:

The L0 layer includes five basic models: natural language, vision, multi-modal, prediction and scientific calculation, which provide a variety of skills to meet the needs of industry scenarios. Pangu 3.0 provides customers with a series of basic large models with 10 billion parameters, 38 billion parameters, 710 parameters and 100 billion parameters, matching the diversified needs of customers in different scenarios, different time delays and different response speeds. At the same time, it provides a brand-new set of capabilities, including knowledge question and answer, copy generation and code generation of NLP large model, as well as image generation and image understanding of multi-modal large model. These skills can be directly called by customers and partner enterprises. No matter how large the model is, Pangu provides a consistent capability set.

The L1 layer is n industry big models, and Huawei Cloud can not only provide industry general big models trained by using industry open data, including government affairs, finance, manufacturing, mining, meteorology and other big models; You can also train your own proprietary big model for customers on the L0 and L1 layers of Pangu big model based on the own data of industry customers.

L2 provides customers with more detailed scene models, and pays more attention to specific industry applications or specific business scenarios, such as government hotline, network assistant, pilot drug screening, conveyor belt foreign body detection, typhoon path prediction, and provides customers with "out-of-the-box" model services.

The Pangu model adopts a completely layered decoupling design, which can quickly adapt and meet the changing needs of the industry. Customers can load independent data sets for their own large models, upgrade their basic models and upgrade their capability sets separately.

On the basis of L0 and L1 big models, Huawei Cloud also provides customers with a big model industry development kit. Through the secondary training of customers’ own data, customers can have their own exclusive big model. At the same time, according to the different data security and compliance demands of customers, Pangu Big Model also provides diversified deployment forms of public cloud, big model cloud area and hybrid cloud.

At the World Artificial Intelligence Conference in 2023, Alibaba Cloud officially launched the new AI painting product "Tongyi Wanxiang".

Based on Composer, a combinatorial generation model developed by Ali, Tongyi Wanxiang proposed a framework of "combinatorial generation" based on diffusion model. By disassembling and combining image design elements such as color matching, layout and style, the image generation effect with high controllability and great freedom was provided.

Users can input prompt words in Tongyi Wanxiang to output corresponding images. In addition to Wensheng diagram, Tongyi Wanxiang has also introduced functions including style transfer and similar diagram generation.

From then on, the threshold of picture design will be greatly reduced, and whether it is art design, games or cultural creation, it will usher in a change.

At present, Tongyi Wanxiang has the following three functions: Wensheng diagram, similarity diagram generation and style transfer.

The function of Wensheng drawing is the basic form. As long as you input prompt and select the creative style (watercolor, oil painting, Chinese painting, flat illustration, secondary element, sketch, 3D cartoon, etc.), Tongyi Wanxiang can automatically generate a huge amount of creative inspiration. Tongyi Wanxiang has been officially launched to provide services to the outside world.

Similar graph generation allows users to quickly expand similar materials in batches according to existing materials. As long as users provide a reference image, they can get an image with similar content and style.

Style migration is to generate a new picture with a specified style for an original picture.

The following picture is a test from Zhiyuan Xin, which uses Tongyi Wanxiang to change the female wearing white gauze in the following picture into the style of French impressionist painter Renoir.

After the migration was completed, such an impressionist portrait was obtained.

According to the "Zhiyuan Xin" evaluation, part of the drawing ability of Tongyi Wanxiang is approaching the world’s most awesome AI painting artifact Midjourney.

During the World Artificial Intelligence Conference, Tencent Cloud announced the upgrade of the MaaS platform, and applied the industry’s large-scale model capabilities to new scenarios such as financial risk control, simultaneous interpretation and translation, and customer service for digital intellectuals. Among them, the first financial risk control model is 10 times more efficient than traditional risk control.

In the field of technology base, self-developed star pulse high-performance computing network and vector database provide more abundant computing infrastructure for the industrial application of large models. Among them, the newly upgraded Tencent Cloud self-developed star-pulse high-performance computing network can improve the GPU utilization by 40%, save the model training cost by 30%~60%, and bring 10 times communication performance improvement for the AI ​ ​ model. HCC, a new generation computing cluster based on Tencent Cloud, can support the super computing scale of 100,000 cards. Tencent Cloud AI native vector database supports up to 1 billion-level vector retrieval scale, and the delay is controlled in milliseconds, which is 10 times higher than that of traditional stand-alone plug-in database, and has the peak capacity of one million-level queries per second (QPS).

In terms of application innovation, Tencent Cloud’s large model capability is applied to financial risk control, interactive translation, and customer service of digital intelligence, which greatly improves the efficiency of intelligent application.

The financial risk control solution blessed by the big model of the industry has a 10-fold efficiency improvement compared with the previous one. Through Tencent’s accumulated experience of black and gray production confrontation for more than 20 years and thousands of real business scenarios, the overall anti-fraud effect is about 20% higher than that of the traditional model. Based on the prompt model, enterprises can iterate the risk control capability, from sample collection, model training to deployment online, achieving zero manual participation in the whole process, and the modeling time has been reduced from 2 weeks to only 2 days. Even if the sample accumulation is limited, the rapid construction can be completed and the "cold start" process can be skipped.

In the field of interactive translation, based on the industry model technology, simultaneous interpretation technology no longer needs millions of training data, and only needs "small sample" training to achieve better results. Translation in professional fields can also reduce the participation of manual optimization, ensure the translation effect, and land in multiple vertical industries. Among them, Tencent simultaneous interpretation has provided AI simultaneous interpretation service for the main forum of the World Artificial Intelligence Conference for six consecutive years.

In the field of digital homo sapiens, Tencent Cloud launched a small sample digital human factory this year, which can reproduce 2D digital avatars within 24 hours with only a small amount of data, thus greatly reducing the service cost of enterprises applying digital homo sapiens. Now, relying on the AI generation algorithm, the speed of 3D image reproduction of Digital Intelligence has been greatly improved. Through the generative action drive and the ability of industry model, enterprises can obtain more "personalized, professional, natural and realistic" digital intelligence employees, making "face-to-face" professional services possible.

During the World Artificial Intelligence Congress, in the artificial intelligence forum of "Love Boundless and Growing Day by Day", Shangtang Technology announced that the large model system of "Shang Tang SenseNova Growing Day by Day" would be upgraded in all directions, and a series of large model products under this system would be updated and landed.

As a natural language processing model with hundreds of billions of parameters, Shang Tang negotiated Sensechat version 2.0, which broke through the input length limit of large language model, and introduced model versions with different parameter levels, which can perfectly adapt to the application requirements of different terminals and scenarios such as mobile terminals and cloud platforms, and reduce deployment costs. The model parameters of Shang Tang’s self-developed production model Shang Tang Seconds Picture SenseMirage 3.0 have been increased from 1 billion since it was first released in April this year to 7 billion, which can realize professional photography-level picture detail description.

Moreover, compared with version 1.0, Shang Tang Ruying SenseAvatar 2.0 digital life platform improves the fluency of voice and mouth shape by more than 30%, realizes the effect of 4K HD video, and brings AIGC to generate images and digital people to sing. In addition, the spatial reconstruction efficiency of Shang Tang Qiongyu SenseSpace 2.0 is improved by 20%, and the rendering performance is improved by 50%. The drawing time of every 100 square kilometers scene can be completed in only 38 hours (supported by 1200 TFLOPS/ second computing power); However, Shang Tang Gewu SenseThings 2.0 restores the texture and materials of small objects to millimeter-level fineness, and breaks through the collection problem of highly reflective and specular objects.

existFinancial sectorShang Tang cooperates with customers such as banks, insurance companies, securities firms, etc., and uses digital people to carry out intelligent customer service and intelligent marketing, and provides new functions such as investment analysis and research report writing by accessing the ability of large language model, so as to reduce costs and increase efficiency. In addition, after mounting the financial knowledge base, it can also output the content question and answer based on the customer’s product description 100%, and realize the information update in time.

existMedical sceneBased on massive medical knowledge and clinical data, Shang Tang has built a large model of Chinese medical language "Da Yi", which provides multi-scene and multi-round conversation capabilities such as guidance, consultation, health consultation and decision-making. In the future, it will also support multi-modal comprehensive analysis of medical images, texts and structured data, and can continuously improve the ability of medical language understanding and reasoning, and continuously empower the efficiency of hospital diagnosis and treatment and the improvement of patient services.

Local AI unicorn Mobvoi released "Sequence Monkey"

Mobvoi presented the big model "Sequence Monkey" and AI CoPilot solution in the World Artificial Intelligence Conference. According to reports, "Sequence Monkey" is a large language model with multi-modal generation capability. The language-centered capability system of the model covers six dimensions of "knowledge, dialogue, mathematics, logic, reasoning and planning", and can support different tasks such as text generation, picture generation, 3D content generation, language generation and speech recognition. "Sequence Monkey" has the abilities of natural language understanding, knowledge, logic and reasoning, and can conduct dialogues based on these abilities.

JD.COM: I am training a big model and have confidence in its prospects.

He Xiaodong, vice president of JD.COM Group and president of Exploration Research Institute, said that the cost of the basic general large model, which currently takes about two months to train, is estimated at tens of millions of yuan, and he is confident in the commercial prospect and landing scene of the large model. He suggested that the big model of startup companies should find their own "moat". Facing the current situation of "Hundred Models War", He Xiaodong believes that pressure and competition are good things for the market and will effectively promote the development of the industry.

Reporting/feedback