Committee Chair

Sartipi, Mina

Committee Member

Osman, Osama A.; Wu, Dalei

Department

Dept. of Computer Science and Engineering

College

College of Engineering and Computer Science

Publisher

University of Tennessee at Chattanooga

Place of Publication

Chattanooga (Tenn.)

Abstract

Rare event case data occur at such an infrequent rate that even having high amounts of it can leave researchers starving for more information. There has always existed a tug and pull relationship among rare event case data, where a higher count of entries often leads to a lack of explanatory variables, and vice versa. In the research spectrum of rare event case probability prediction, several methods of data sampling exist to remedy the main issue of rare event case data: a lack of data to collect and learn from. The most effective methods often involve altering the distribution of the training samples in a data set. The least utilized of these methods is negative sampling, where positive entries in a data set are used to generate negative entries. To outline the utility of negative sampling, this work discusses the application of five types of negative sampling on a vehicular accident prediction project, where non-accident records are generated through manipulating the temporal and spatial attributes of existing accident records. Moreover, different methods of data manipulation, including feature selection and different negative to positive data ratios, are used to explore what types of explanatory variables are most important when predicting vehicular accidents. Additionally, two types of predictive models, a Multilayer Perceptron and a Logistic Regression model, are created and directly compared in terms of predictive capability. Ultimately, the best model for predictive performance is heavily dependent on the specific implementation and desired results.

Degree

M. S.; A thesis submitted to the faculty of the University of Tennessee at Chattanooga in partial fulfillment of the requirements of the degree of Master of Science.

Date

12-2020

Subject

Machine learning; Sampling (Statistics); Traffic accidents--Mathematical models

Keyword

data manipulation; data sampling; machine learning; negative sampling; rare event data; vehicular accident predictive modelling

Document Type

Masters theses

DCMI Type

Text

Extent

vii, 47 leaves

Language

English

Rights

http://rightsstatements.org/vocab/InC/1.0/

License

http://creativecommons.org/licenses/by/4.0/

Share

COinS