Publisher
University of Tennessee at Chattanooga
Place of Publication
Chattanooga (Tenn.)
Abstract
Steemit is a decentralized social media site powered by the STEEM blockchain. Traditional platforms operate on a top-down model that benefits their maintainers rather than content producers. Steemit instead passes rewards to the users that provide value to the platform. Users earn one of three cryptocurrencies by posting and performing tasks such as content moderation and community administration. This is an overview of the data collection process performed in support of academic research related to the network. Data was collected to determine whether unearned status (via cryptocurrency) affected a user’s reputation. Initial statistics indicate distinct clusters of users displaying similarly high ranks despite having opposing combinations of currencies and User Engagement (estimated by post comment numbers). Data was collected via web scraping using Python programs and browser automation software. Next, the raw data was cleaned. Topic modeling and Sentiment analysis were performed on a portion of the data to determine the content and tone of collected posts. The result is uniform, feature rich tables of both continuous and categorical data that can be analyzed or used as training data in machine learning algorithms. This presentation provides a low complexity framework for the collection and preprocessing of numerical and text data from social media sites. Datasets processed in this way could be used to predict the success of a post based on its topic before publishing. This can provide guidance on how to best spend time and resources contributing to the platform.
Document Type
presentations
Language
English
Rights
http://rightsstatements.org/vocab/InC/1.0/
License
http://creativecommons.org/licenses/by/4.0/
Recommended Citation
Lloyd, Zach; Mathews, Sara; and Williams, Elise, "Data collection, preprocessing and descriptive analytics of social media data". ReSEARCH Dialogues Conference proceedings. https://scholar.utc.edu/research-dialogues/2024/Proceedings/1.
Data collection, preprocessing and descriptive analytics of social media data
Steemit is a decentralized social media site powered by the STEEM blockchain. Traditional platforms operate on a top-down model that benefits their maintainers rather than content producers. Steemit instead passes rewards to the users that provide value to the platform. Users earn one of three cryptocurrencies by posting and performing tasks such as content moderation and community administration. This is an overview of the data collection process performed in support of academic research related to the network. Data was collected to determine whether unearned status (via cryptocurrency) affected a user’s reputation. Initial statistics indicate distinct clusters of users displaying similarly high ranks despite having opposing combinations of currencies and User Engagement (estimated by post comment numbers). Data was collected via web scraping using Python programs and browser automation software. Next, the raw data was cleaned. Topic modeling and Sentiment analysis were performed on a portion of the data to determine the content and tone of collected posts. The result is uniform, feature rich tables of both continuous and categorical data that can be analyzed or used as training data in machine learning algorithms. This presentation provides a low complexity framework for the collection and preprocessing of numerical and text data from social media sites. Datasets processed in this way could be used to predict the success of a post based on its topic before publishing. This can provide guidance on how to best spend time and resources contributing to the platform.