Fake news Detection using traditional ML and modern DL methods

8 min readDec 20, 2020

pc — https://www.ie.edu/exponential-learning/blog/data-science/machine-learning-marketing/

The Internet and social media made the access to the news information much easier and comfortable. Often Internet users can follow the events of their interest in online mode, and the spread of mobile devices makes this process even easier.

But with great possibilities come great challenges. Mass media have a huge influence on the society, and as it often happens, there is someone who wants to take advantage of this fact. Sometimes to achieve some goals mass-media may manipulate the information in different ways. This leads to producing of the news articles that are not completely true or even completely false. There even exist lots of websites that produce fake news almost exclusively. They deliberately publish hoaxes, propaganda and disinformation purporting to be real news — often using social media to drive web traffic and amplify their effect. The main goal of fake news websites is to affect the public opinion on certain matters (mostly political). Examples of such website may be found in Ukraine, United States of America, Germany, China and lots of other countries [1]. Thus, fake news is a global issue as well as a global challenge.

Recent fake news

Africa Is Being Used As A Guniea Pig To Test the COVID-19 Vaccine
Rs 2,000 Currency Notes Will Not Be Discontinued From December 31
A tweet from Bill Gates’ official Twitter account said ‘Everyone is asking me to give back, and now is the time. I am doubling all payments sent to BTC address for the next 30 minutes. You send $1,000, I send you back $2,000’
Bill Clinton loses it in interview-admits he is a murderer

Objective

Many scientists believe that fake news issues may be addressed by means of machine learning and artificial intelligence [2]. There is a reason for that: recently artificial intelligence algorithms have started to work much better on lots of classification problems (image recognition, voice detection and so on) because hardware is cheaper and bigger datasets are available.[3]

In this blog we present some of the Machine Learning algorithms and Deep Learning algorithms to detect whether a given article/ statement is Real or Fake using a public dataset available on Kaggle.

Understanding the Dataset

A dataset with the following attributes:

id: unique id for a news article
title: the title of a news article
author: author of the news article
text: the text of the article; could be incomplete
label: a label that marks the article as potentially unreliable
1: unreliable or fake
0: reliable or real

Word Cloud

World cloud shows a pictorial representation of the frequency of words in the dataset for a given class label (real and fake)

Preprocessing of the Dataset

i) Preparing the training corpus

While doing the sentiment analysis we try to extract the best features out of the dataset. For this, in the first step of the preprocessing stage we remove common words(stop words) using the nltk library, we shorten the words like from say ‘having’ to ‘have’, remove punctuations, convert words like ‘i’m’ to ‘i am’, remove non alphanumeric characters from the dataset.

ii) Bag Of Words

The Bag of Words model is simplifying the representation of text(such as sentence or text dataset or document), which is represented in bag(multiset) of its words, disregarding the grammar and even word order but keeping multiplicity.

iii) Tf-Idf

TF-IDF is an acronym of Term Frequency-Inverse Document Frequency. TF-IDF is a numerical statistic that is intended to reflect how important a word/term is to a document in a corpus.

Prediction using Traditional Machine Learning Algorithms

# Here the Input features are first created using Bag Of Words and TF-IDF

Naive Bayes
Logistic Regression
Decision Tree
SVM

But why did our model failed so bad?

As you can see the accuracy is around 50–55 which is very bad for binary classification. The model doesn’t seem to understand about the difference between a real and fake news at all. The reason could be that the corpus of the validation set is very different from the training corpus. Or is it that ML models are not understanding the context at all? We now try something advanced- Glove and Word2Vec and see if with this we can improve our model by giving it a more aware feature set.

Using Word Embeddings

iv) Glove

Global Vectors for Word Representation or GloVe is an unsupervised learning algorithm for obtaining vector representations for words.

v) Word2Vector

Word2vec is a combination of models used to represent distributed representations of words in a corpus. In simple words, Word vector algorithms use the context of the words to learn numerical representations for words, so that words used in the same context have similar looking word vectors.

# Here the Input features are first created using Glove & Word2Vec

Naive Bayes
Logistic Regression
Decision Tree
SVM

better than before but still not very reliable.

Prediction Using Modern Deep Learning Algorithms

We have used 3 variant of LSTM models:-

LSTM
LSTM+Attention
Bi-LSTM

Why LSTM over Rnn ???

Due to long term Dependencies(later output may depend a lot on the earlier input but due to vanishing gradient it becomes more or less undependable) hence we have opted for LSTM over vanilla RNN

Schematic diagram of a basic LSTM cell

(pic credit:- https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/)

“Attention” is all you need (improvisation on Encoder decoder model)

So let us talk about intuition first. In the past conventional methods like TFIDF/CountVectorizer etc. we used to find features from the text by doing a keyword extraction. Some words are more helpful in determining the category of a text than others. However, in this method we sort of lost the sequential structure of the text. With LSTM and deep learning methods, while we can take care of the sequence structure, we lose the ability to give higher weight to more important words. Can we have the best of both worlds?

The answer is Yes. Actually, Attention is all you need. In the author’s words:

“Not all words contribute equally to the representation of the sentence’s meaning. Hence, we introduce attention mechanism to extract such words that are important to the meaning of the sentence and aggregate the representation of those informative words to form a sentence vector”

Our LSTM attention model architecture

(Pic_credit:-https://www.sciencedirect.com/science/article/abs/pii/S0306457318305612 )

Bi-directional LSTM

Bidirectional recurrent neural networks(RNN) are really just putting two independent RNNs together. This structure allows the networks to have both backward and forward information about the sequence at every time step

Using bidirectional will run your inputs in two ways, one from past to future and one from future to past and what differs this approach from unidirectional is that in the LSTM that runs backward you preserve information from the future and using the two hidden states combined you are able in any point in time to preserve information from both past and future.

(pic credit :-https://www.semanticscholar.org/paper/EagleBot%3A-A-Chatbot-Based-Multi-Tier-Question-for-Rana/de6d854a7f5e4995081b8f58aadf957845f9dbe3 )

Conclusion

Even though there is a bit of success in detection of fake news using some of the Machine learning techniques yet since with every passing day characteristics and definition and types of fake news in social media networks is changing drastically causing a challenge in classification of fake news. But with the advent of deep learning techniques and applications in recent past, most of research works is implementing deep learning methods, like CNN,, Deep neural network and Deep auto encoder model,LSTM, RNN , in various applications, like audio and speech processing, Natural language processing and modelling, information retrieval, objective recognition and computer vision, hence post midsem we worked particularly on modern DL techniques like LSTM ,attention models, Bi-Lstm and we saw a huge improvement in accuracy than our old traditional ML techniques as we can see from the tables itself.

Link to the source-code:github

Acknowledgment

This article is based on a project in the Machine Learning course at IIIT Delhi. We are truly grateful for the guidance of our professor, Dr. Tanmoy Chakraborty (website), and all the TAs of the course especially Nirav Dhawan (our project mentor) for suggesting the project and guiding us in achieving these results.I also want to thank my teammates for equally contributing to this project.

Teaching Fellow: Ms Ishita Bajaj

Teaching Assistants: Pragya Srivastava, Shiv Kumar Gehlot, Chhavi Jain, Vivek Reddy, Shikha Singh and Nirav Diwan.

Contributions

Soumam Banerjee ( https://www.linkedin.com/in/soumam-banerjee-6393b1132/ ): Literature Survey, Text Preprocessing, Unigram & Bigram combinations for Bag of words, Logistic Regression, Multinomial Naive Bayes, Report & Analysis of models. Data & Results Visualization,

Jaswanth Naidu ( https://www.linkedin.com/in/jaswanth-846902180/ ): Literature Survey, Text Preprocessing, Datasets Handling, SVM, W2V+Glove+Bi-LSTM, Report & Analysis of models. Data & Results Visualization,

Shivam Sharma (https://www.linkedin.com/in/shivam-sharma-30ba265b/): Literature Survey, Text Preprocessing, Unigram & Bigram combinations for Tf-IDF, Logistic Regression, Multinomial Naive Bayes, Report & Analysis of models. Report & Analysis of models. Data & Results Visualization,

References

[1] Fake news websites. (n.d.) Wikipedia. [Online]. Available:

https://en.wikipedia.org/wiki/Fake_news_website. Accessed Feb. 6, 2017.

[2] Cade Metz. (2016, Dec. 16). The bittersweet sweepstakes to build an AI

that destroys fake news. [Online]. Available: https://www.wired.com/

2016/12/bittersweet-sweepstakes-build-ai-destroys-fake-news/

[3] Mykhailo Granik, Volodymyr Mesyura Computer Science Department

Vinnytsia National Technical University

[4] Jeffrey Pennington. (2014). GloVe: Global Vectors for Word Representations.