
1 August 2023
Spiral of silence and confusion with nicknames: what linguistic difficulties do moderators face?
When we talk about moderation, many questions arise. How strict should it be? And what is more important—freedom of expression or user comfort? After all, restrictions on freedom of expression may also lead to discomfort for those being restricted.
Policies of safe spaces are thus concerned with preventing the marginalization of voices already hurt by dominant power relations. This may be implemented through strict no-tolerance policies of “hate speech” or other discussion that would undermine the political project assumed in the space of the community. In practice, this often means that people can be censored or ejected from a space for not properly observing the standards of speech, tone, or style. (Building a digital girl army: The cultivation of feminist safe spaces online. New Media & Society, 20. Clark-Parsons, 2018)
When we talk about moderation, many questions arise. How strict should it be? And what is more important—freedom of expression or user comfort? After all, restrictions on freedom of expression may also lead to discomfort for those being restricted. There are also issues at a deeper level.
When moderation rules are defined, an automatic system is configured, and human experts are present to read comments and check to ensure that rules are being followed. But how can cognitive distortions be avoided? What can such distortions lead to? On the other hand, there is also the question of how users feel in actively moderated communities.
Language, distortions and AI
First, a little theory. Language is simply a means of accessing cognitive processes. In other words, language, in particular, preserves human experience and cognition.
What does this mean? Well, it means that, on the one hand, language captures and holds the entire system of human knowledge and thinking, and on the other hand, it captures and holds the particular active speaker/user. Each person has his/her own experience and individual set of signs explaining why, at times, we do not understand each other. There are just as many opinions and ways of thinking as there are people and just as many cognitive distortions. Just have a go at trying to model them all! This prevalence of cognitive distortions makes AI a far cry from how it is portrayed in the movies.
Any attempt to embed all possible variations into AI code seems doomed to fail. This is why online translation tools continue to be imperfect despite all their progress. We always mean something that needs to be understood in context, and this context can be non-verbal and situational.
Much is written by both researchers and columnists about the ethical problems of such “highly developed yet imperfect” AI. Sometimes, system distortions can result in serious consequences, though they were originally developed to help us avoid these very consequences. After all, computers have no understanding of ethics or feelings about the human factor. But the AI developer does. Even if a developer plans to take into account the maximum number of cognitive distortions when writing his software, his product will nevertheless be affected by others, which he has failed to take into account. It’s literally impossible for a developer to consider everything — for example, how can they consider his very own perception?
It’s all about Context
Automatic moderation systems are also based on AI. Sometimes, this is simpler AI; at other times, it is more complex and trainable. For pre-moderation, the cognitive side of language becomes the main stumbling block. The issue comes to a head at the moment when a system encounters meaning: a single word can have different semantic shades that are entirely dependent upon context.
Mexican striker Javier Hernandez wears his nickname ‘Chicharito’ on his shirt
But here, we also need to understand our approach to the word ‘context’. When we spoke about the process of moderating various sporting events, we mentioned FIFA’s favorite example of a football player named Gorilla and how his fans use the appropriate emoji. What creates the context in this situation? The Italian national team and the club where this player plays? Of course. But the emotional and cognitive tone of the message — and even those of previous messages — also form the context. How can an automatic moderation tool understand the meaning a user intends for a given emoji? What is the emotional tone in each case? What does it refer to? Again, the gorilla example became so interesting precisely within the context of a match between Italy and England, during which fans made a critically high number of racist comments, meaning that a single word or emoji in the course of two or three messages might, quite possibly, have exactly the opposite meaning.
The issue arises not only with Gorilla. As for player nicknames that may create issues, some of our personal favourites are Divine Ponytail, The Butcher of Bilbao, Baby Horse, and Kaiser. Even if there is no context that makes them offensive, such an element of fantasy makes it harder for the system to understand the context around their use.
In what cases do animal names become insulting? When do physical descriptions turn into sexist statements? And when does the shorthand for the name Richard become a swear word? Can a moderation system really recognize these nuances?
Some AI systems, such as the Toxic Mod System, flag potentially toxic comments, lower their priority in search results, and mark them in the admin panel, allowing moderators to decide how to deal with them. Nevertheless, there are two problems with such an approach. First, the system cannot be trained, and second, it lacks independence. It cannot be done without in-flight moderation and fails to solve real-time problems.
In this regard, classical pre-moderation systems might be much more efficient — they do not let toxic messages through, and their difficulty is finding the proper level of strictness which allows users to make a misprint in words like ‘shiitake’, relate tales of something read in childhood without requiring them to write in the Queen’s English or mention a friend named Dick without the risk of being banned from the system. We also need a deep analysis of the methods users apply to bypass moderation, such as the use of neologisms. Thread crapping, for example, cannot be recognized by automatic systems, in principle. When a user persistently writes about the sale of apples in a chat dedicated to a Premier League match, the neural network cannot catch this fact.
At Watchers, after the introduction of a three-tier system of moderation, the number of messages containing negative content decreased from 11 to 4%.
The use of linguistically neutral pronouns is also a complex case because they carry contextual superstructures that cannot be identified by formal features. Anna Gibson, author of “Free Speech and Safe Spaces: How Moderation Policies Shape Online Discussion Spaces,” says that as individuals become more identified with a group, they move from first-person singular pronouns to plural pronouns.
Does this indicate that a soft filter on automatic moderation is better than a hard one? Especially if the service has human moderators who review complaints and control the flow of messages?
Yes, this turns out to be the case. Human moderators, however, don’t solve the primary issue of cognitive distortion. If the problem with automatic moderation is that it simply fails to ‘notice’ nuances of meaning and treats everyone with the same brush, then human specialists have the problem of still being human, which means they may perceive context incorrectly, read the detail in or out of a situation, and apply personal bias vis-à-vis what should be banned and what should be allowed.
©FIFA by Getty
What is it that helps us avoid such distortions in chat? At Watchers, we find clearly formulated rules and a combination of different approaches within a single online community. With this approach, the system insures the person and vice versa, and users also participate in the moderation process. Users themselves don’t influence the creation of rules for the community as a whole but do their own ‘polishing’ to ensure they feel comfortable in the chat/thread. This can be achieved by hiding unpleasant messages or making particular participants invisible.
Fully-free communities and self-censorship
It might seem that if there is no moderation within a community, allowing absolutely free speech to flourish, then users will feel free and maximally able to engage in self-expression. But in fact, such things as self-censorship and a spiral of silence can kick in, making this less than true. When people find themselves part of a minority—even in an online community—they tend to hide their point of view so as not to be ostracized.
There is no system moderation on Reddit, in which all moderation rules are set by authors as part of particular subreddits.
It is believed that if a community is built on the principles of anonymity, then the spiral of silence is less of a problem, meaning that the tendency to self-censor is also weaker. In actual fact, this is not quite true. An online community is still a community, even though users don’t know anything about each other’s real lives. Reluctance to be ostracized is exceptionally high, even in cases where a social group is new for the user.
For those belonging to a marginalized group or a community that has long been faced with discrimination, the spiral of silence and self-censorship are especially significant in terms of their effect.
Here, we go back to the cognitive undertones of messages/comments. Hidden intentions and, of course, the context within a chat can convince people that they need to refrain from participating in the conversation. Certain words and expressions that demonstrate the attitude of the online community to a particular phenomenon may emphasize to the user that he/she is part of ‘the minority’ here.
This issue also arises due to the pronouns we use, as mentioned above—these become expressive of the group identity, even if one participant has expressed this opinion. “We make something great again” and “We are against something”… are expressions of collective opinion, even when written by a single participant. Perceived as such, they can prevent members of minorities experiencing discrimination from posting in the chat/thread.
It is crucial to evaluate messages from a cognitive point of view, even in anonymous communities emphasizing freedom of speech, independent of whether they are statements, suggestions, assumptions, or discriminatory manifestos. If necessary, such material can be pessimised. This helps create a comfortable environment for everyone, in which all participants can freely express their standpoints and actively participate in the experience of co-creating a comfortable environment.