DNA Information Systems and Cancer Classification

Feb 11

Tuesday, February 11, 2020 - 12:00pm to 1:00pm


Siddharth Jain, postdoctoral scholar in Electrical Engineering at Caltech

The oldest information system was handed to us by nature in the form of life. One of the fundamental units of this biological information system is the genome or DNA. Human genome is continuously evolving, hence the sequenced genome is a snapshot in time of this evolving entity. Over time, the genome accumulates mutations that can be associated with different phenotypes - like physical traits, diseases, etc. Underlying mutation accumulation is an evolution channel (The term channel is motivated by the notion of communication channel introduced by Shannon in 1948 and started the area of Information and Coding Theory) which is controlled by hereditary, environmental and stochastic factors. The premise of this talk is to decode the human genome using an information and coding theory framework. In particular, it focuses on: (i) the analysis and characterization of the evolution channel using measures of capacity, expressiveness , evolution distance and uniqueness of ancestry. Using these insights for ii) the design of error correcting codes for DNA storage, and iii) cancer classification.