Organizing the Ocean of Public Data

Data from more than 2 million biological experiments are available through public online databases, but much of this information is not used by the biological and biomedical researchers who need it.

Michigan State University’s Arjun Krishnan is working to standardize the information reported about each sample and make it searchable for researchers through a web interface.

“We will develop machine learning approaches to automatically annotate publicly available samples from six species (human and five animal models) on a massive scale to enable researchers to seamlessly discover relevant published data,” says?Arjun Krishnan?assistant professor in the?Department of Computational Mathematics, Science and Engineering, and in the?Department of Biochemistry and Molecular Biology.

Once the data has been labeled and organized,?Krishnan and his team?will create an online web interface so that researchers can search for data that aligns with their research needs. As the field of biology has gradually shifted toward data science, Krishnan was inspired by the role that computation can play.

“Data-driven computational biology represents a confluence of ideas from diverse scientific and technological disciplines, including computer science, statistics, physics and applied mathematics,” says Krishnan.  “A constant reminder that good ideas can come from anywhere and from anyone.”

Learn more: go.msu.edu/krishnan