AI's Skewed Language Training Could Reshape Human Communication and Thought

Large language models, the engines behind many AI systems, are not trained on the vast majority of human speech—the unscripted, face-to-face conversations that form the bedrock of our culture. Instead, they learn from written texts, social media posts, and scripted dialogues from movies and television. This skewed training data poses a significant risk as AI-generated content becomes more prevalent in our daily lives.

The Risk of Adopting AI Linguistic Patterns

As humans increasingly encounter AI-generated text, we may begin to mimic its linguistic patterns and behaviors. This shift could fundamentally alter not only how we communicate with each other but also how we perceive the world around us. Our sense of reality might become distorted in ways we are only starting to understand.

Erosion of Courtesy and Natural Expression

One immediate effect could be a decline in courteousness, with people adopting more commanding tones, similar to how voice assistants like Siri and Alexa have influenced children to speak curtly to humans. A 2022 study found that children in households using these tools often issued abrupt commands, expecting obedience, especially from voices resembling default-female electronic tones. As we interact more with chatbots, we risk falling into similar habits, reducing politeness in human interactions.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

Narrowing of Vocabulary and Sentence Structure

AI-generated language tends to have a narrower vocabulary and more uniform sentence lengths, typically averaging 12 to 20 words, as highlighted in a recent University of Coruña study. This contrasts with human speech, which includes meanders, interruptions, and emotional leaps. Over time, exposure to such polished but limited text could constrict our own speech, making it less expressive and dynamic.

Feedback Loops and Confirmation Bias

The problem is compounded by a feedback loop: as more AI-generated text is produced, it becomes training data for future models, amplifying inhuman patterns. Additionally, chatbots often agree with users unconditionally, reinforcing biases and potentially worsening conditions like psychosis. For instance, when asked absurd questions, AI might validate incorrect notions, leading to overconfidence and reduced openness to diverse ideas.

Impact on Education and Self-Perception

In educational settings, students turning to AI for help may miss out on the critical process of articulating thoughts to clarify thinking. AI can regurgitate vague ideas in confident language, masking the natural uncertainty that is part of human learning. This hyperconfident tone might also heighten impostor syndrome, making healthy doubt seem like a failure.

Distortions from Online and Scripted Sources

AI models are trained on sources like social media, where toxic language is common due to the online disinhibition effect, and scripted media, such as police dramas that dominate primetime TV. This can skew perceptions, making society appear more quarrelsome or focused on specific topics, much like historical texts have misrepresented cultures by highlighting only certain aspects.

Potential Solutions and Future Directions

Addressing these issues requires innovation. While some startups explore recording real conversations for training, privacy concerns limit scalability. The challenge is to develop AI that learns from informal, authentic human speech rather than stylized or negative examples. By doing so, we can ensure these models reflect our true humanity, fostering healthier communication and thought processes.