Spinal muscular atrophy (SMA) is the leading cause of infant mortality, with an incidence of nearly 1 per 12,000 live births. SMA is caused by inheriting two deficient copies (one from each parent) of the gene SMN1. This gene is responsible for turning messenger RNA from its immature form into its mature form, so that enough functional SMN protein is synthesized. A second related gene, SMN2, is nearly identical to SMN1 but is unstable, and can only partially compensate for the deletion of SMN1. In total, 1 out of every 54 people worldwide carries a deficient copy of the SMN1 gene, meaning that it is a relatively prevalent condition.
SMA is typically associated with a dysfunction of the lower motor neurons which connect to skeletal muscles. Besides embryonic lethality, SMA varies from acute and infantile with severe muscle weakness and early mortality, to intermediate with generalized weakness in the torso and limbs, to mild with some ability to walk. However, children with SMA also experience a variety of other conditions, including endocrine and metabolic disorders, and other cardiovascular, gastrointestinal, reproductive, and skeletal systems defects that have only recently been revealed.
Because the parents (and some relatives) of SMA patients have only one functional copy of the SMN1 gene, we hypothesize that they too may suffer from a host of diseases. We will first leverage the UKBB to find the carriers of deficient SMN1, then examine what diseases are more prevalent than in a matched control population. In Aim 1, we will use machine learning on insurance codes to find disease associations. In Aim 2, we will analyze electronic health records for common medical terms and apply machine learning to classify more prevalent diseases. Aim 3 will utilize generative AI to mine electronic health records for novel disease associations. Identifying diseases that are prevalent in SMA carriers could lead to a wider adoption of SMA therapeutics for new indications.