Estimation of Parameters for a Generalized Hypergeometric Distribution Generated by Birth-Death Process

Author
Quchan University of Technology
Abstract
In this paper, we consider a three parametric regularly varying generalized Hypergeometric distribution which have been generated by Birth-Death process for describing phenomena in bioinformatics (Danielian and Astola, 2006). Under satisfying some conditions, we obtain the system of likelihood equations which its solution coincides with the maximum likelihood estimators. The given maximum likelihood estimators are the same as some moment estimators.

Moreover, an approximate computation of the maximum likelihood estimations for the unknown parameters is given. Using MCMC, simulation studies are proposed.

Finally, in order to present applications, some real data sets in bioinformatics are fitted with the model. Based on some important criterions, this model is compared with four other discrete distributions in bioinformatics. We see that the generalized Hypergeometric distribution provides a better fit than four other discrete distributions../files/site1/files/61/9(1).pdf
Keywords

1. Astola J., Danielian E., "Frequency Distributions in Biomolecular Systems and Growing Networks, Tampere International Center for Signal Processing (TICSP)", Series no. 31, Tampere, Finland (2007).## 2. Astola J., Danielian E., Arzumanyan S., "Frequency distributions in bioinformatics", A Review, Proceedings Yerevan State University: Phys. Math. Sci., 223(3), (2010) 3-22. ## 3. Astola J., Gasparian K., Danielian E., "Moments estimators for hypergeometric distributions, Proceedings of the TISCP Workshop on Spectral Methods and Multirate Signal Processing (SMMSP) | Moscow", Russia, (2007) 233-234. ## 4. Danielian E., Astola J., "On regularly varying hypergeometric distributions", In Astola et al. (eds.), Proceedings International TICSP Workshop on Spectral Methods and Multirate Signal Processing, Florence, Italy, 2-3 Sept. 2006. TICSP Series no. 34, (2006) 127-132## 5. Farbod D., "On the parameters estimators for two frequency distributions arising in bioinformatics", Bulletin of the Georgian National Academy of Sciences, 9(1), (2015) 44-50. ## 6. Farbod D., Gasparian K., "Maximum likelihood estimators for some generalized Pareto-like frequency distribution", Journal of the Iranian Statistical Society (JIRSS), 12(2) (2013) 211-233. ## 7. Kabsch W., Sander C., "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features", Biopolymers, 22(12) (1983) 2577-2637. ## 8. Kuznetsov V. A., "Distributions associated with stochastic processes of gene expression in a single eukaryotic cell", EURASIP Journal on Applied Signal Processing, 4, (2001) 258-296. ## 9. Kuznetsov V. A., "Family of skewed distributions associated with the gene expression and proteome evolution", Signal Processing, 33(4) (2003) 889-910. ## 10. Kuznetsov V. A., Pickalov V. A., Senko O. V., Knott G. D., "Analysis of the evolving proteomes: Predictions of the number for protein domains in nature and the number of genes in eukaryotic organisms", Journal of Biological Systems, 10(4) (2002) 381-407. ## 11. Givens G. H., Hoeting J. A., "Computational Statistics, Wiley and Sons" (2005). ## 12. Ivchenko G. I., Medvedev Yu., "Mathematical Statistics, Mir Press", Moscow (1990), translated from original Russian edition. ## 13. Qian N., Sejnowski T. J., "Predicting the secondary structure of globular proteins using neural network models", Journal of Molecular Biology, 202, (1988) 865-884. ## 14. Rzhetsky A., Gomez Sh. M., "Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome", Bioinformatics, 17(10) (2001) 988-996. ## 15. Yang J., "Protein secondary structure prediction based on neural network models and support vector machines", CS229 Final Project, Dec (2008). ##