HONG KONG, June 8, 2026 /PRNewswire/ — Just now, Unisound officially released U2, its new-generation general-purpose large language model.
As a native agentic large model built for individuals, developers, and organizations, U2 is guided by a clear technical proposition: high intelligence density × high Token value. Instead of blindly stacking parameters, it pursues high intelligence density, using fewer activated resources to carry stronger capabilities. Instead of simply competing on output length, it pursues high Token value, making every call move closer to a deliverable result.
Unlike traditional large language models, which are often more oriented toward single-turn Q&A or short-chain generation, U2 places greater emphasis on continuous execution for real-world tasks. Across complex office work, software engineering, deep research, and multi-tool collaboration scenarios, U2 can autonomously decompose and advance complex workflows of 100+ steps, connecting requirement understanding, task planning, environment interaction, tool use, process correction, and result validation into a complete execution loop—moving beyond “providing answers” toward “getting work done.”
Top-tier performance in authoritative evaluations, demonstrating U2’s core strength
In the latest series of authoritative capability evaluations in China and overseas, U2 has entered the top tier of mainstream large models across several key capability areas:
On GPQA Diamond, which measures knowledge and complex reasoning capability, U2 scored 87.9, outperforming GLM-5.1, Hy3 preview, DeepSeek-V4-Flash (High), and MiniMax M2.7, showing stable understanding, reasoning, and problem-solving capability on highly challenging knowledge questions.
On SWE-Bench Verified, which evaluates real-world software engineering capability, U2 scored 75, placing it among the top tier of mainstream models.
On Claw-Eval (pass@3), an end-to-end evaluation of autonomous Agent execution capability, U2 scored 76.9, outperforming Hy3 preview, DeepSeek-V4-Flash (High), and MiniMax M2.7. This further validates its stable performance in tool use, workflow orchestration, and task delivery.
On GDPval, which evaluates real-world office and knowledge-work delivery capability, U2 scored 72.9, demonstrating solid professional office productivity. Compared with traditional question-answering benchmarks, GDPval focuses more on whether a model can complete high-value deliverables in real work scenarios, including document analysis, report writing, spreadsheet processing, chart generation, slide creation, and other typical office tasks.
Together, these results send an important signal: U2 does not win through a single isolated capability. Instead, it delivers systematic performance across reasoning, coding, Agent execution, and office delivery.
Hybrid Thinking + Harness joint training: bringing native model capabilities into real workflows
For Unisound, U2 is not merely a model codename. It represents our renewed thinking about the value of large models in the AI 2.0 era. We believe that today’s large models should no longer be evaluated only by parameter scale or content generation length. When AI truly enters real workflows, users care not only whether a model can produce an impressive answer, but whether it can actually complete the task.
Therefore, from the beginning of its design, U2 was not intended to be a general-purpose model limited to chat scenarios. It is a native agentic large model built for task execution.
To enable a model to truly complete tasks, larger parameters alone are not enough. Real workflows are often complex, dynamic, and long-chain. The model must quickly understand goals, decompose tasks, and search for solution paths, while also performing logical calibration, constraint checking, and result verification at critical points. Traditional explicit Chain-of-Thought (CoT) offers stronger interpretability, but often requires generating large volumes of intermediate reasoning text, resulting in higher Token consumption and inference latency. Fully relying on latent-space reasoning, while more efficient, may lead to logical drift in complex tasks and lacks sufficient controllability and verification capability.
To resolve this tension, U2 innovatively introduces a Hybrid Thinking mechanism. It does not choose between explicit CoT and implicit reasoning. Instead, within the same reasoning process, it dynamically switches thinking modes according to task stage, complexity, and uncertainty.
At the early stage of a task, U2 prioritizes efficient exploration in latent space, completing path search, task decomposition, candidate solution generation, and execution planning without decoding every intermediate thought into visible Tokens. When the task reaches critical judgment, complex constraint handling, or result convergence, the model switches to explicit reasoning, using a readable and verifiable reasoning process to complete logical calibration, process verification, and final decision-making.
Furthermore, U2 introduces Bounded Latent Rollout and Entropy-aware Switching, enabling the model to dynamically adjust its thinking mode based on uncertainty in the reasoning process. When implicit exploration remains stable, the model maintains efficient reasoning. When uncertainty rises and the reasoning path may diverge, it promptly returns to explicit Chain-of-Thought, using deterministic Tokens for precise derivation and result convergence.
This means U2 is not simply shortening the reasoning chain. It is reconstructing the division of labor in model thinking: high-cost stages such as open-ended exploration and path planning are internalized more into latent space, while logical verification, constraint calibration, and result convergence are left to explicit reasoning. As a result, U2 can reduce ineffective reasoning steps and redundant intermediate text while maintaining reliability and controllability in complex tasks, achieving “fewer Tokens, deeper thinking.”
On the knowledge foundation, U2 further applies high-knowledge-density data screening and purification to filter duplicated, low-quality, and hallucinated data, and to perform knowledge-point-level refinement and extraction. Combined with sparse knowledge encoding and a knowledge distillation architecture, U2 compresses redundant model parameters and solidifies high-value knowledge capabilities into a more efficient model structure.
At the task execution layer, U2 introduces an Agent-Harness collaborative training paradigm. We believe Harness should not simply be an external wrapper, but should co-evolve with model capabilities. Therefore, U2 incorporates the improvement of native Agent capability and the iterative optimization of Harness into the same training loop: on one hand, Harness continuously optimizes the task execution chain according to U2’s model characteristics; on the other hand, high-quality execution trajectories generated in real tasks feed back into the model, further strengthening its capabilities in task planning, tool use, process correction, and result acceptance.
Ultimately, this complete closed loop must be grounded in a pragmatic training system. We did not train U2 to merely memorize correct answers. Instead, through curriculum learning, process supervision, trajectory comparison, and multidimensional rewards, we teach it how to plan, execute, correct errors, and validate results in complex tasks. With Agent-Harness co-evolution, U2 can continuously strengthen long-chain execution capability in real task trajectories, truly moving from “able to chat” to “able to complete tasks.”
Three core capabilities supporting a closed loop of task delivery
Around real task delivery, U2 focuses on strengthening three core capabilities: Reasoning, Coding, and Agent.
In Reasoning, U2 emphasizes low-deviation execution and long-horizon logical stability. When facing complex, multi-step tasks, the model must not only answer local questions, but also maintain goal consistency over time, dynamically balance budget, time, constraints, and feasible paths, and ultimately produce a better solution.
In Coding, U2 is no longer limited to code generation. It is oriented toward end-to-end engineering delivery. It can generate code from natural-language requirements, understand multi-file project structures, maintain consistency across interfaces, dependencies, and invocation logic, and continuously advance task completion through environment debugging and autonomous debugging.
In Agent capability, U2 focuses on improving multi-tool collaboration, long-process orchestration, and environment interaction. When facing open-ended goals, it can decompose task priorities, understand the capability boundaries of APIs, combine calls across different tools, and adjust execution strategies based on feedback from external systems.
Together, these three capabilities form U2’s closed loop of task delivery: first understanding and planning, then execution and collaboration, and finally verification and delivery. This is why U2 is better suited to being tested in real work scenarios, rather than remaining at the level of single-turn dialogue or isolated capability demos.
Application scenario: from a single answer to task completion
U2 has autonomous task execution capability from requirement understanding to complete deliverable generation, and can be widely applied to the following four typical scenarios:
1. Full-spectrum interface design
Responsive web development: Generate multi-page websites with production-grade layouts, real navigation flows, and complete interaction states based on design requirements, with one-click packaging and deployment support.
Mobile Web App: Build native-like social applications, including Feed streams, Stories, posting entrances, notifications, personal profiles, image grids, and bottom navigation, with all resources localized.
Design system implementation: Automatically constrain colors, fonts, spacing, and other style systems while adapting to both PC and mobile interfaces, enabling end-to-end output from visual design to code.
2. Deep research and analysis
Industry and policy research: Retrieve and clean multi-source data across platforms, then output structured research reports in formats including Word, PPT, and HTML deep-dive webpages with dynamic interactive charts.
Data visualization analysis: Automatically generate interactive charts such as timelines, trend curves, and heatmaps to support expert-level analysis and presentation.
Multi-format compliant delivery: Support one-click export of documents that meet formatting requirements, serving different scenarios such as internal sharing and external reporting.
3. Immersive interactive game development
Classic casual games: Independently complete the closed loop of algorithm design, code writing, and debugging, delivering playable and interactive HTML5 games such as Tetris.
Physics simulators: Build simulators for multi-pendulum chaotic systems, particle motion, and other scenarios based on real physics formulas, supporting parameter adjustment and real-time trajectory rendering.
4. Efficient office automation
Business report analysis: Capture core metrics such as sales, costs, and inventory across systems, then automatically generate visual dashboards and Word reports with trend charts and anomaly annotations.
Industry landscape analysis: Aggregate data on market structure, technology routes, and policy drivers, then output interactive competitive matrices and presentation-ready PPTs.
Periodic business reviews: Fully autonomously orchestrate data cleaning, cross-validation, and report generation workflows, automating core business review processes for organizations.
For Unisound, the release of U2 is not just a routine model upgrade. It is a critical move in our long-term journey toward native agentic large models.
From benchmark results to closed-loop delivery in real scenarios, we aim to use higher intelligence density and higher Token value to turn every call into tangible productivity.
U2 is now officially available on Unisound Token Hub, open to individuals, developers, and organizations.
Experience it here: https://maas.unisound.com/models/u2




