CCAI-SUPER Model
Last updated
Last updated
The engine powering CCAI's ecological development is CCAI-SUPER, a cutting-edge large-scale language model developed by the CCAI team over the past few years and has gone through multiple iterations, a new high-performance multi-modal model series that adopts A new expert composition architecture, along with significant advances in training and serving infrastructure, enables it to push the boundaries of efficiency, inference, and long-context performance.
CCAI-SUPER is able to handle extremely long contexts and understand the jargon within them; it is capable of recalling and reasoning about fine-grained information from at least 10M tokens. This scale is unprecedented in contemporary large language models (LLMs), capable of processing long-form mixed-modal inputs, including entire document collections, multiple hours of video, and almost a day's worth of audio.
CCAI-SUPER is the first model to outperform human experts in MMLU (Massive Multi-Task Language Understanding), one of the most popular methods of testing the knowledge and problem-solving abilities of AI models. And exceeds state-of-the-art performance across a range of benchmarks, including text and encoding.
CCAI-SUPER achieved better performance than most previous benchmarks, especially in mathematics, science and reasoning (+28.9%), multi-language ability (+22.3%), and video understanding (+11.2%) ) and code (+8.9%).
A series of improvements across CCAI-SUPER's entire model stack (architecture, data, optimization, and systems) enable long-context understanding of inputs without compromising performance, and can handle up to 10 million tokens. Translated to real-world data, this context length allows the CCAI-SUPER model to easily handle almost a day's worth of recordings (i.e. 22 hours), ten times the length of the entire War and Peace book of 1440 pages (or 587,287 words) That's more than the entire Flax codebase (41,070 lines of code), or three hours of video at 1 frame per second. Furthermore, the CCAI-SUPER model is multimodal in nature and supports the interleaving of data from different modalities, so it can support a mix of audio, visual, textual and code inputs in the same input sequence.