Date of Award
2025
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
Committee Chair
Tathagata Mukherjee
Committee Member
Chaity Banerjee
Committee Member
Aaron Kaulfus
Research Advisor
Tathagata Mukherjee
Subject(s)
Cloud computing, Virtual storage (Computer science) Time-series analysis, Forecasting
Abstract
Large-scale cloud storage systems managing petabyte-scale data face cost challenges due to sparse, irregular file access patterns. Traditional attribute-based methods often fail to capture dynamic temporal and event-driven behaviors. This thesis compares classical statistical and deep learning approaches for predicting file access patterns using five months of real-world access logs from Tsinghua University’s FTP server (2.9 million events). Five forecasting models—ARIMA, SARIMA, Exponential Smoothing, Prophet, and 1D CNNs—are evaluated across 1,161 files. A novel hybrid clustering method combines time series similarity with forecasting accuracy metrics. Results show ARIMA significantly outperforms deep learning (2.3× better accuracy: 0.0188 vs. 0.036 MAE) for hourly forecasts, particularly for high-frequency files. Clustering achieves strong separation (silhouette score: 0.889), identifying four behavioral patterns that support targeted forecasting strategies with 40–60% improved accuracy. These insights enable cluster-specific intelligent data tiering, reducing storage costs by 30–50% while maintaining data accessibility.
Recommended Citation
Ammenadka, Swaroopa, "A comparative study of classical and deep feature learning methods for prediction of user request patterns for cloud resources" (2025). Theses. 767.
https://louis.uah.edu/uah-theses/767