They gathered data from Kaggle.com which described basic physical appearance. They then used Python to take all of that data and break it into its own category compared to alignment of the actual character it was referring to.
Then, using a “data mining tool, ‘Weka’, to create a prediction model”, they selected the best possible algorithm model to match their dataset.
They tested it without changing any parameters on the outcome of the dataset. Its first/baseline computation resulted in a ~65.7% accuracy rate.
They “discovered that using a kernel density estimation (KDE) increased the accuracy of almost 2% up to 67.03 %”.
“It can also be noted under the ‘error prediction’ column that the model is much more certain in classifying ‘good/neutral’ characters than ‘bad’ ones. The reasoning for this can be that many of the ‘bad characters’ have multiple identities/personalities, changing form- or looks when doing evil actions” which the algorithm that they wrote did not take into account. There are, however, also a greater number of good superheroes than there are evil ones so this could have also resulted in the algorithm generating more “good” responses than “bad” responses.
“There are extensive ethical issues with predicting a person’s alignment based on their characteristics. Rongxing Lu et al. argue that preserving privacy and handling data in an ethical and anonymous way is necessary to defend our freedom. (Rongxing et al.) In this paper, I am not predicting real people’s alignment, and thereby am not exposed to these legal and ethical concerns. Nevertheless, these concerns have to be taken into consideration if this model is to be applied elsewhere — i.e. if someone were to use it to predict a person that they just met is good or evil.
There are a great array of possibilities in which you can describe a character, and there is information in datasets describing superheroes in more detail (powers, origin etc.). However, one has to limit themselves to a certain threshold for a model to be viable. Even though more information could lead to a more accurate model, the main thought behind this model is to predict a character’s alignment based on a short glimpse or rumors. Too much information or complexity can also cause the model to capture too much noise from the data, causing what is referred to as ‘overfitting’ (Witten et al.).”
Frantzvaag, Vetle Ottem. Predicting Whether a Marvel Character Is Good or Evil Using Big Data Analytics. 2 Oct. 2019, towardsdatascience.com/predicting-whether-a-marvel-character-is-good-or-evil-using-big-data-analytics-fb2ed78c3610.