Improving clustering by detecting and removeing outliers based on Euclidean and exponential distance approach

Authors
.
Abstract
In this paper, a new method for identifying outlier time series based on GARCH model by exponential distance approach is presented in three steps: first fuzzy and hard clustering methods are implemented on time series, then the outlier time series are detected and removed from the dataset. After removeing outlying time series, clustering algorithms are applied for dataset again. The 30 stocks of the top active, lucrative and profitable stocks in the Iranian stock market are used to evaluate the presented methods. By computing the Silhouette and Xe-Beni indexes, the accuracy of the clustering methods are compared and finally, it is shown that by removong the outlier time series, the GARCH model based on the exponential distance approach has the best performans.
Keywords

. ژیاوی هان.، میشلن کامبر، ژان پی، داده کاوی مفاهیم و تکنیک‌ها، دکتر مهدی اسماعیلی، نیاز دانش، تهران، (۱۳۹۶).

2. Ahmed, M., Mahmood, A., Islam, M., “A survey of anomaly detection techniques in financial domain”, Future Generation Computer Systems, 55 (2016) 278–288.

3. Caiado, J., Crato, N. , “A GARCH-based method for clustering of financial time series: International stock markets evidence”, Recent Advances in Stochastic Modeling and Data Analysis, World Scientific Publishing, New Jersey, ( 2007) 542–551.

4. Campello, R.J.G.B., Hruschka. E.R., “A fuzzy extension of the silhouette width criterion for cluster analysis”, Fuzzy Sets and Systems, 157 (2006) 2858 – 2875.

5. Desgraupes, B., “Clustering Indices”, University of Paris Ouest - Lab Modal’X, 1 (2017) 1-34.

6. D’Urso, P., Cappelli, C., Di Lallo, D., Massari, R., “Clustering of financial time series”, Physica A, 392 (2013) 2114-2129.

7. D’Urso, P., DeGiovanni, L., Massari, R., “GARCH-based robust clustering of time series”, Fuzzy Sets and Systems, 305 (2016) 1-28.

8. Gan, G., Kwok-Po Ng, M., “k -means clustering with outlier removal”, Pattern Recognition Letters, 90 (2017) 8-14.

9. Gosain, A., Dahiya, S., “Performance Analysis of Various Fuzzy Clustering Algorithms:A Review”, 7th International Conference on Communication, Computing and Virtualization 2016, 79 (2016) 100-111.

10. Hautamaki, V., Cherednichenko, S., Karkkainen, I., Kinnunen, T., Franti, P., “Improving K-Means by Outlier Removal”, The 14th In Scandinavian Conference on Image Analysis, 3540 (2005) 978-987.

11. Otranto, E,. “Clustering heteroskedastic time series by model-based procedures“, Computational Statistics & Data Analysis, Elsevier, 52(2008) 4685-4698.

12. Prabhjot, K., I. M. S, L., Anjana, G., “DOFCM: A Robust Clustering Technique Based upon Density”, IACSIT International Journal of Engineering and Technology, 3 (2011) 297-303.

13. TSAY, R.S., Analysis of financial time series, John Wiley & Sons: New York, (2002). 14.Wu, K.-L., Yang, M.-S., “Alternative c-means clustering algorithms“, Pattern Recognition, 35(2002) 2267–2278.