Improving Water Quality Index Prediction Using Regression Learning Models

Abstract

Rivers are the main sources of freshwater supply for the world population. However, many economic activities contribute to river water pollution. River water quality can be monitored using various parameters, such as the pH level, dissolved oxygen, total suspended solids, and the chemical properties. Analyzing the trend and pattern of these parameters enables the prediction of the water quality so that proactive measures can be made by relevant authorities to prevent water pollution and predict the effectiveness of water restoration measures. Machine learning regression algorithms can be applied for this purpose. Here, eight machine learning regression techniques, including decision tree regression, linear regression, ridge, Lasso, support vector regression, random forest regression, extra tree regression, and the artificial neural network, are applied for the purpose of water quality index prediction. Historical data from Indian rivers are adopted for this study. The data refer to six water parameters. Twelve other features are then derived from the original six parameters. The performances of the models using different algorithms and sets of features are compared. The derived water quality rating scale features are identified to contribute toward the development of better regression models, while the linear regression and ridge offer the best performance. The best mean square error achieved is 0 and the correlation coefficient is 1.

Publication
In International Journal of Environmental Research and Public Health