FinVoc2Vec: A new Approach to Measuring Vocal Tone in Corporate Disclosures
Corporate disclosures are designed to inform investors through carefully chosen words. However, spoken disclosures, such as in earnings conference calls, carry another layer of communication: the human voice. Even when managers aim to sound neutral, how they speak may unintentionally reveal information about future firm prospects. Understanding whether these vocal cues are informative, and how to measure them reliably, is therefore highly relevant. A recent study published in the Journal of Accounting Research by TRR 266 researcher Doron Reichmann and his co-authors Jonas Ewertz, Charlotte Knickrehm and Martin Nienhaus sheds light on this topic by critically assessing existing methods for measuring vocal tone and introducing their own approach FinVoc2Vec tailored to real-world corporate disclosures.
Psychology and communications research has long emphasized the role of vocal tone as an important channel of human communication. In finance and accounting research, this has led to the development of machine-learning models that attempt to quantify vocal tone. Yet these models face a fundamental problem: they are trained on actor-based recordings with clearly expressed emotions created in controlled studio environments. Conference calls involve telephone-quality audio, background noise, hesitations, voice breaks, and far more subtle emotional expression. As a result, existing models are not accurate enough and are unreliable when applied to real conference calls.
A deep-learning model for corporate disclosures: FinVoc2Vec
This study introduces the newly developed deep-learning model FinVoc2Vec specifically designed for corporate disclosure settings. The model is trained directly on earnings conference call audio and is therefore able to adapt to the acoustic characteristics of real-world financial communication. The difference lies in how the training data are constructed. To separate vocal tone from what is being said, the model is trained on linguistically neutral sentences. That means sentences that convey no positive or negative meaning in their words. This design choice allows FinVoc2Vec to focus on vocal delivery rather than linguistic sentiment. While this inevitably results in more subtle signals, it improves confidence that the model captures vocal tone rather than text-based tone. When evaluated against existing methods, FinVoc2Vec is the only model that classifies vocal tone in conference calls significantly better than chance. Although classification accuracy remains moderate, due to the inherent difficulty of the task, it clearly outperforms actor-trained models and other benchmark approaches.
Does vocal tone contain useful information?
The results show that managerial vocal tone in earnings conference calls predicts firms’ future financial outcomes, but only when vocal tone is measured using a model designed for real-world corporate disclosures, like FinVoc2Vec. Using FinVoc2Vec, the research finds that more positive vocal tone, particularly during Q&A sessions, is associated with positive earnings changes in the following quarters. This indicates that vocal tone conveys information beyond what managers explicitly communicate through words. Stock prices do not seem to react immediately to vocal tone. Instead, the market response unfolds gradually over roughly 60 days after the call, suggesting that investors initially underreact to this subtle information. Consistent with this delayed response, a simple trading strategy based on vocal tone earns positive abnormal returns.
Impact for stakeholders
These findings are highly relevant as FinVoc2Vec provides a significantly more reliable method to extract meaningful emotional information from managerial speech than existing models. For stakeholders such as investors, financial analysts and corporate managers, this research is important because it shows that vocal tone contains real information about future firm performance that goes beyond what is communicated in words. Investors might use FinVoc2Vec-based signals to refine trading strategies. Analysts can better interpret managerial sentiment. Managers may become more aware that emotional cues in their voice can unintentionally disclose information.
Limitations and further research
Researchers gain a strong benchmarking tool for measuring and classifying vocal tone. However, the training set, while larger than prior datasets, still cannot fully capture the diversity of real-world accents, dialects, and emotional expressions, which may affect generalizability. Future research could therefore explore the role of accents and dialects as a potential source of bias in vocal tone classification. These results also call for research exploring new ways to more accurately capture information signals from managers’ vocal delivery.
The paper is accompanied by two online repositories. The Python package “ccalign” is available at Github. FinVoc2Vec is publicly available at Hugging Face.
To cite this blog:
Reichmann, Doron (2025). FinVoc2Vec: A new Approach to Measuring Vocal Tone in Corporate Disclosures. TRR 266 Accounting for Transparency Blog. https://www.accounting-for-transparency.de/finvoc2vec-measuring-vocal-tone-in-corporate-disclosures
Responses