Random matrix approach to categorical data analysis

Aashay Patil and M. S. Santhanam
Phys. Rev. E 92, 032130 – Published 21 September 2015

Abstract

Correlation and similarity measures are widely used in all the areas of sciences and social sciences. Often the variables are not numbers but are instead qualitative descriptors called categorical data. We define and study similarity matrix, as a measure of similarity, for the case of categorical data. This is of interest due to a deluge of categorical data, such as movie ratings, top-10 rankings, and data from social media, in the public domain that require analysis. We show that the statistical properties of the spectra of similarity matrices, constructed from categorical data, follow random matrix predictions with the dominant eigenvalue being an exception. We demonstrate this approach by applying it to the data for Indian general elections and sea level pressures in the North Atlantic ocean.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Received 20 May 2015

DOI:https://doi.org/10.1103/PhysRevE.92.032130

©2015 American Physical Society

Authors & Affiliations

Aashay Patil and M. S. Santhanam*

  • Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pune 411 008, India

  • *santh@iiserpune.ac.in

Article Text (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand
Issue

Vol. 92, Iss. 3 — September 2015

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review E

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×