Data Marketplace

Use verified, licensed data with confidence. You can download right away or check the data through inquiry.

A total of 245 datasets
  • Pre-training DataAudio

    English-Vietnamese Parallel Speech Dataset

    A parallel speech dataset consisting of sentence-level aligned English and Vietnamese utterances, designed for training Speech-to-Speech translation models.

  • Pre-training DataAudio

    English-Indonesian Parallel Speech Dataset

    A parallel speech dataset consisting of sentence-level aligned English and Indonesian utterances, designed for training Speech-to-Speech translation models.

  • Pre-training DataAudio

    English-Korean Parallel Speech Dataset

    A parallel speech dataset consisting of sentence-level aligned English and Korean utterances, designed for training Speech-to-Speech translation models.

  • Frontier DataText

    Multilingual Chain-of-Thought Reasoning Text Dataset

    A multilingual chain-of-thought reasoning dataset built from complex problems requiring step-by-step decomposition and coherent answer generation, with AI-generated drafts reviewed by expert-level annotators.

  • Frontier DataText

    Expert CoT Text Dataset

    An expert chain-of-thought text dataset built from expert verbal reasoning to support LLM training for step-by-step reasoning.

  • Frontier DataText

    Doctoral Exam Questions and Solutions Text Dataset

    A high-difficulty text dataset built from doctoral-level exam questions and solutions to support LLM training for expert reasoning and problem solving.

  • Frontier DataText

    Domain-Specific Benchmark Dataset

    A multi-turn benchmark dataset built by benchmarking BFCL to evaluate agent action performance across finance, legal, medical, manufacturing, and defense domains.

  • Frontier DataText

    Safety Response Multi-turn Dataset

    A multi-turn conversational dataset designed to evaluate model response capabilities against major safety risk categories and attack patterns.

  • Pre-training DataVideo

    Physical AI: Human-Object Interaction Video Dataset

    A video dataset collected for training Physical AI models in manufacturing environments. Includes human-object manipulation footage along with structured annotations such as trajectory and mesh data.

  • Pre-training DataVideo

    AI-Generated Video with Frame-level Caption Dataset

    A dataset consisting of AI-generated videos sampled at 1fps with frame-level scene description captions. Applicable for video understanding and multimodal model training.

Check out the details of Snowflakes, Flitto's core dataset.

Explore Flitto's high-precision datasets, structured for seamless integration. Elevate your AI models and business decision-making with high-quality, ready-to-integrate data.