English-Vietnamese Parallel Speech Dataset
A parallel speech dataset consisting of sentence-level aligned English and Vietnamese utterances, designed for training Speech-to-Speech translation models.
Use verified, licensed data with confidence. You can download right away or check the data through inquiry.
A parallel speech dataset consisting of sentence-level aligned English and Vietnamese utterances, designed for training Speech-to-Speech translation models.
A parallel speech dataset consisting of sentence-level aligned English and Indonesian utterances, designed for training Speech-to-Speech translation models.
A parallel speech dataset consisting of sentence-level aligned English and Korean utterances, designed for training Speech-to-Speech translation models.
A multilingual chain-of-thought reasoning dataset built from complex problems requiring step-by-step decomposition and coherent answer generation, with AI-generated drafts reviewed by expert-level annotators.
An expert chain-of-thought text dataset built from expert verbal reasoning to support LLM training for step-by-step reasoning.
A high-difficulty text dataset built from doctoral-level exam questions and solutions to support LLM training for expert reasoning and problem solving.
A multi-turn benchmark dataset built by benchmarking BFCL to evaluate agent action performance across finance, legal, medical, manufacturing, and defense domains.
A multi-turn conversational dataset designed to evaluate model response capabilities against major safety risk categories and attack patterns.
A video dataset collected for training Physical AI models in manufacturing environments. Includes human-object manipulation footage along with structured annotations such as trajectory and mesh data.
A dataset consisting of AI-generated videos sampled at 1fps with frame-level scene description captions. Applicable for video understanding and multimodal model training.