برآورد پارامترهای توزیع فوق‌هندسی تعمیم‌یافته تولید شده به‌وسیلۀ فرایند تولد-مرگ

نویسنده
دانشگاه صنعتی قوچان، گروه ریاضی
چکیده
در این مقاله یک توزیع‌ فوق‌هندسی‌ تعمیم‌یافته که به‌کمک فرایند تولد-‌مرگ و برای مدل‌بندی داده‌های بیوانفورماتیک ساخته شده است را در‌ نظر می‌گیریم. تحت برقراری بعضی شرایط، یک سیستم معادلات درست‌نمایی را به‌دست می‌آوریم که جواب حاصل از آن منطبق بر برآوردگرهای بیشینۀ درست‌نمایی پارامترهای مورد نظر است. یک روش تقریبی همراه با بررسی شبیه‌سازی برای برآورد پارامترها ارائه می‌شود. هم‌چنین به‌منظور ارائه کاربردهای این توزیع، سه نوع داده واقعی در بیوانفورماتیک را با توزیع مورد نظر برازش داده و نتایج را با استفاده از شاخص‌های آماری با چهار توزیع گسسته دیگر مقایسه می‌کنیم که بر این اساس ملاحظه می‌شود توزیع فوق‌هندسی تعمیم‌یافته نسبت به چهار توزیع گسسته دیگر مدل مناسب‌تری است.
کلیدواژه‌ها

عنوان مقاله English

Estimation of Parameters for a Generalized Hypergeometric Distribution Generated by Birth-Death Process

نویسنده English

Davood Farbod
Quchan University of Technology
چکیده English

In this paper, we consider a three parametric regularly varying generalized Hypergeometric distribution which have been generated by Birth-Death process for describing phenomena in bioinformatics (Danielian and Astola, 2006). Under satisfying some conditions, we obtain the system of likelihood equations which its solution coincides with the maximum likelihood estimators. The given maximum likelihood estimators are the same as some moment estimators.

Moreover, an approximate computation of the maximum likelihood estimations for the unknown parameters is given. Using MCMC, simulation studies are proposed.

Finally, in order to present applications, some real data sets in bioinformatics are fitted with the model. Based on some important criterions, this model is compared with four other discrete distributions in bioinformatics. We see that the generalized Hypergeometric distribution provides a better fit than four other discrete distributions../files/site1/files/61/9(1).pdf

کلیدواژه‌ها English

Generalized Hypergeometric Distribution
Birth-Death Process
bioinformatics
Maximum likelihood estimation (MLE)
Markov Chain Monte Carlo (MCMC)
1. Astola J., Danielian E., "Frequency Distributions in Biomolecular Systems and Growing Networks, Tampere International Center for Signal Processing (TICSP)", Series no. 31, Tampere, Finland (2007).## 2. Astola J., Danielian E., Arzumanyan S., "Frequency distributions in bioinformatics", A Review, Proceedings Yerevan State University: Phys. Math. Sci., 223(3), (2010) 3-22. ## 3. Astola J., Gasparian K., Danielian E., "Moments estimators for hypergeometric distributions, Proceedings of the TISCP Workshop on Spectral Methods and Multirate Signal Processing (SMMSP) | Moscow", Russia, (2007) 233-234. ## 4. Danielian E., Astola J., "On regularly varying hypergeometric distributions", In Astola et al. (eds.), Proceedings International TICSP Workshop on Spectral Methods and Multirate Signal Processing, Florence, Italy, 2-3 Sept. 2006. TICSP Series no. 34, (2006) 127-132## 5. Farbod D., "On the parameters estimators for two frequency distributions arising in bioinformatics", Bulletin of the Georgian National Academy of Sciences, 9(1), (2015) 44-50. ## 6. Farbod D., Gasparian K., "Maximum likelihood estimators for some generalized Pareto-like frequency distribution", Journal of the Iranian Statistical Society (JIRSS), 12(2) (2013) 211-233. ## 7. Kabsch W., Sander C., "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features", Biopolymers, 22(12) (1983) 2577-2637. ## 8. Kuznetsov V. A., "Distributions associated with stochastic processes of gene expression in a single eukaryotic cell", EURASIP Journal on Applied Signal Processing, 4, (2001) 258-296. ## 9. Kuznetsov V. A., "Family of skewed distributions associated with the gene expression and proteome evolution", Signal Processing, 33(4) (2003) 889-910. ## 10. Kuznetsov V. A., Pickalov V. A., Senko O. V., Knott G. D., "Analysis of the evolving proteomes: Predictions of the number for protein domains in nature and the number of genes in eukaryotic organisms", Journal of Biological Systems, 10(4) (2002) 381-407. ## 11. Givens G. H., Hoeting J. A., "Computational Statistics, Wiley and Sons" (2005). ## 12. Ivchenko G. I., Medvedev Yu., "Mathematical Statistics, Mir Press", Moscow (1990), translated from original Russian edition. ## 13. Qian N., Sejnowski T. J., "Predicting the secondary structure of globular proteins using neural network models", Journal of Molecular Biology, 202, (1988) 865-884. ## 14. Rzhetsky A., Gomez Sh. M., "Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome", Bioinformatics, 17(10) (2001) 988-996. ## 15. Yang J., "Protein secondary structure prediction based on neural network models and support vector machines", CS229 Final Project, Dec (2008). ##