[ArXiv] Ripley’s K-function

Because of the extensive works by Prof. Peebles and many (observational) cosmologists (almost always I find Prof. Peeble’s book in cosmology literature), the 2 (or 3) point correlation function is much more dominant than any other mathematical and statistical methods to understand the structure of the universe. Unusually, this week finds an astro-ph paper written by a statistics professor addressing the K-function to explore the mystery of the universe.

[astro-ph:0804.3044] J.M. Loh
Estimating Third-Order Moments for an Absorber Catalog

Instead of getting to the detailed contents, which is left to the readers, I’d rather cite a few key points without math symbols.The script K is denoted as the 3rd order K-function from which the three-point and reduced three-point correlation functions are derived. The benefits of using the script K function over these correlation functions are given regarding bin size and edge correction. Yet, the author did not encourage to use the script K function only but to use all tools. Also, the feasibility of computing third or higher order measures of clustering is mentioned due to larger datasets and advances in computing. In appendix, the unbiasedness of the estimator regarding the script K is proved.

The reason for bringing in this K-function comes from my early experience in learning statistics. My memory of learning the 2 point correlation function from an undergraduate cosmology class is very vague but the basic idea of modeling this function gave me an epiphany during a spatial statistics class several years ago when the Ripley’s K-function was introduced. I vividly remember that I set up my own project to use this K-function to get the characteristics of the spatial distribution of GRBs. The particular reason for selecting GRBs instead of galaxies was 1. I was able to find the data set from the internet on my own (BATSE catalog: astronomers may think accessing data archives is easy but generally statistics students were not exposed to the fact that astronomical data sets are available via internet and in terms of data sets, they depend heavily on data providers, or clients), and 2. I recalled a paper by Professors Efron and Petrosian (1995, ApJ, 449:215-223 Testing Isotropy versus Clustering of Gamma-ray Bursts, who utilized the nearest neighborhood approach. After a few weeks, I made another discovery that people found GRB redshifts and began to understand the cosmological origin of GRBs more deeply. In other words, 2D spatial statistics was not the way to find the origins of GRBs. Due to a few shortcomings, one of them was the latitude dependent observation of BATSE (as a second year graduate student, I didn’t confront the idea of censoring and truncation, yet), I discontinued my personal project with a discouragement that I cannot make any contribution (data themselves, like discovering the distances, speak more louder than statistical inferences without distances).

I was delighted to see the work by Prof. Loh about the Ripley’s K function. Those curious about the K function may check the book written by Martinez and Saar, Statistics of the Galaxy Distribution (Amazon Link). Many statistical publications are also available under spatial statistics and point process that includes the Ripley’s K function.

Leave a comment