A senior-focused speech corpus for the next generation of voice health AI.
A multilingual speech dataset dedicated to older adults, recorded under harmonized protocols and ethical oversight — purpose-built for AI and ML teams advancing voice models toward real-world clinical and well-being use.
Speakers spanning four decades of older adulthood, with consistent representation across each age band.
Speaker recruitment is balanced across self-reported gender categories within each language.
Speaker ratios held consistent across all languages, so cross-language modeling is meaningful from the start.
Consistent acoustic environments and recording standards, validated for downstream modeling.
Seven languages, harmonized under a shared protocol so cross-language modeling is meaningful out of the box. Each language uses locally appropriate stimuli and licensed materials where required, with metadata aligned across the full corpus.
Connected speech elicited from a visual stimulus, supporting analysis of fluency, content, and discourse structure.
Short structured narratives capturing memory and language organization.
Semantic and phonemic generation tasks.
Language-matched standardized passages for prosodic and acoustic analysis.
Sustained phonation and elicitations capturing voice quality, pitch, and articulation.
Open-ended prompts for naturalistic, spontaneous modeling.
Each speaker is documented with rich contextual metadata, giving model developers a meaningful clinical anchor without requiring access to identifiable medical records.
Captured through widely recognized screening instruments, selected for cross-language comparability.
Self-reported indicators relevant to respiratory, cardiovascular, and pain status.
Standardized mood and affective screening scores.
Age, self-reported gender, language background, and education.
Device class, acoustic environment, task code, and consent version.
Multi-dimensional context enables modeling not only of cognitive decline, but also of the broader signals that voice can carry.
Study protocol cleared across all study sites, under independent oversight.
Consent, data handling, and subject rights aligned with European data protection standards.
Covering research use, commercial licensing, and model training, with subject rights honored throughout the data lifecycle.
Available to licensees, including DPIA, ROPA, and DPA.
Ensuring data integrity and preventing duplication across sources.
Full compliance documentation is shared with licensees as part of the onboarding package.
Tell us what you're building, the languages you need, and your downstream goals. We'll come back with a tailored proposal.
Contact us ↗