YouTube Monetization and Censorship by Proxy: A Machine Learning Prospective: Review

Authors:

As consumption of digital content has climbed, so has censorship of the content. The censorship has only increased with companies more sensitive to the type of content that they tie their advertising to on digital platforms. Demonetization of videos is a primary way content is censored on YouTube. The goal of the paper is to understand whether a set of YouTube characteristics (i.e., attributes) can predict whether changing attribute values will lead to the censorship (demonization) of a video. The methodology is helpful to free speech advocates who may believe content is being unfairly or unlawfully censored.

The aim of the study:

In this article analyzing some of the scrutinize and challenges the YouTube monetization process faces.

The first problem is Flawed Monetization Policy. Youtubers must follow all the community guidelines, terms of service, copyright policy, Google Ad-sense program policy, channel monetization policy, and ad-friendly content guidelines and even if YouTube can still limit the monetization or remove it on a video-by-video basis.

The second one is Biased Algorithm. YouTube guarantees that any patterns that indicate any inadvertent biases or falsifications associated with its algorithm are concealed from public view.

The third problem is Undisclosed Data. YouTube does not share the results of this censorship algorithm at all, which means that the authors cannot guess which videos will be actively promoted and which will lose monetization for nothing.

This research is aimed to learning algorithm censor content through demonetization with a proxy and machine learning. This article uses machine learning techniques to gain insight into YouTube’s demonization algorithm, which is said to be a means of censoring content.

In the beginning the authors describe all possible methods of content moderation on YouTube platform. These methods include the following five:

Spam and Deceptive Practices
Sensitive Content
Violent or Dangerous Content
Regulated Goods (e.g., firearms)
Copyrighted Content

Monetization is one of the few ways for content creators to make money on the platform. It is important to note that the decision as to where to monetize a video appears to be automated by YouTube. The demonetization of videos and content on YouTube suppresses that content and drastically affects a user’s ability to find the content and negatively impacts a content creator in trying to get their content to an audience.

In the next part, the authors describe what content is most often blocked without a reason. The authors tell us that the most popular content for monetization removal is so-called harmful or dangerous content. Administrators and machine learning can put any content they don’t like in violation of this category. However, the authors can dispute this decision. Also, demonetization can act as an indirect means for censorship of content on the YouTube platform.

Methodology:

On this picture shows an overview of the proposed methodology and consists of all steps.

The authors start with the process of data collection. They pre-classified videos from left-wing, moderate and right-wing by the transparency.tube — this is webpage, that employed machine learning techniques to classify YouTube channels according to their content. There are taken about 400 different videos:

Randomly selected videos from ten channels from each category on Transparency.Tube.
The same number of “Left” and “Right” channels were selected in order to balance the dataset.
Five most recent videos from each channel.
Ensured a broad coverage of video types, i.e., videos that varied in duration, long-form, recorded podcasts, short clips, and live streams.

Then, the set of features listed in this picture and user comments from 400 seed YouTube videos used YouTube’s own API.

The authors used Random Forest, Liner Regression, and SVM machine learning algorithms to preprocess the metadata. They cannot be applied specifically to the text of the comments collected under the video, because many comments were written from cell phones and contain typos, slang words, jargon and abbreviations.

Therefore, four stages of pretreatment were done:

Noun, verb and adjective extraction — Part-of-speech tagging (POS) function of the Natural Language Toolkit, NLTK4.
Stop-word extraction — stop-words in comments are removed.
Lemmatization — The Natural Language Toolkit (NLTK) will not count different forms of a word (test, testing, tests, tested).
Vectorization — Script was run to turn the array into a single string with spaces separating each tag.

Then they do next stage of pre-processing. It called Auto-Labelling — the script looked for embedded HTML tags when playing a video in the dataset to determine if advertisements were present and/or played, then the video was deemed to be monetized by YouTube.

In the main analysis they would find if the set of attributes can act as a proxy where change in the features values can fall into one of the two classes: Monetized or De-Monetized. In the author’s methodology, as shown in the picture 1, they chose four training methods: C 4.5 – simple decision tree classifier, Random Forest – advanced decision tree classifier, Linear Regression (LR), and Support Vector Machine (SVM). They built the model this way: All of the machine learning machines take as input the attributes listed in the table 1. Most of the attributes were chosen based on YouTube’s statements about what factors are considered by the platform when monetizing.

Results:

As a result of the study, the logistic regression (LR) model achieved an accuracy of about 70% in predicting whether a video would make money. The two most important characteristics were the length of the channel and the number of subscribers. This is because the longer the video, the more ads YouTube can place and the more likely it is to make money. It’s the same with the number of subscribers. Channels with a large number of subscribers effectively feed YouTube’s monetization algorithm and win its favors.

The SVM (support vector machine) model also achieved a prediction accuracy of around 70% when converting video into cash. The two important factors were the number of subscribers to the channel and the number of views of the channel, two highly correlated variables.

The Random Forest (RF) model showed the highest prediction accuracy of 85%. The four predictive models generated suggest that the political ideology of YouTube channels – left, right or center-right is irrelevant to YouTube’s algorithms that determine the monetization of each video. This conclusion undermines repeated accusations that YouTube “censors” right-wing content.

Limitations:

The study was conducted in the USA on videos and channels aimed primarily at U.S. audiences.
YouTube’s monetization algorithm is designed with advertisers in mind, which can act as a form of indirect censorship towards creators.
A video is considered “monetized” if it contains advertising. The article does not highlight the frequency of advertising in the video.

My opinion:

The main contribution of the article to analyzing YouTube censorship is to answer the question of there is a wrongful restriction of right-wing political content. It is crucial to know that YouTube is a fair platform, and everyone can do his videos without any unfair restrictions. It is very important, that there is a website with the freedom content.

However, the as we understand the research has made with the USA segment of YouTube, so in other regions may be another conclusion of this research. It would be a great point for the future works to analyze not so democratic regions. For example, some governments can create requests to YouTube to remove content that contradicts general policy in the country. And in many cases YouTube goes along with totalitarian and authoritarian states and removes these videos.

Another real case in point is the complete removal of monetization in the Russian Federation after the war with Ukraine began. YouTube has disabled monetization completely, which means that the system of promoting videos on the platform has become much worse. This affects all independent media and free journalists, who have seen their salaries cut many times over, which means they will not be able to truthfully report on the crimes of the Russian Federation.

Vasily Levenstam

Sources:

Anthony Zappin, Haroon Malik, Elhadi M. Shakshuki, David A. Dampier, YouTube Monetization and Censorship by Proxy: A Machine Learning Prospective, Procedia Computer Science, Volume 198, 2022, Pages 23-32, ISSN 1877-0509, https://doi.org/10.1016/j.procs.2021.12.207
Webstite: https://transparency.tube/

One thought on “YouTube Monetization and Censorship by Proxy: A Machine Learning Prospective: Review”

agathecaboulet says:

May 12, 2022 at 9:04 am

Hello, very interesting article. I knew it was difficult for creators to get their videos monetized, but there are different factors at play. I think the platform’s criteria are too broad and some things in the videos that are not very disturbing are seen as problematic by the algorithm, without “real” reasons. I follow a lot of youtubers and now they don’t really rely on monetizing videos anymore because they are generally demonetized, so content creators now rely more on video sponsorship. But I find this quite unequal because small creators or those who are just starting out can’t rely on sponsorships and also get their videos demonetized quite easily. Also, it can be discouraging for creators to always be contesting the various demonetizations. I have several friends who are just starting out on YouTube, and they are always “fighting” with YouTube to get their videos monetized, sometimes for minor details. Also, as mentioned at the end of the article, YouTube remains a free platform in most parts of the world, with a multitude of videos and channels, but, even if for content creators’ freedom of expression is fundamental, monetization is essential to generate profit and have a salary. Just because a video is on YouTube does not mean it is monetized, the presence of the video on the platform and the monetization are two different things.

LikeLike

YouTube Monetization and Censorship by Proxy: A Machine Learning Prospective: Review

Published by levnstm

One thought on “YouTube Monetization and Censorship by Proxy: A Machine Learning Prospective: Review”

Leave a comment Cancel reply

Share this:

Related

Published by levnstm

One thought on “YouTube Monetization and Censorship by Proxy: A Machine Learning Prospective: Review”

Leave a comment Cancel reply