Precision medicine is constrained by two major data interpretation bottlenecks: the “Millions-to-one” challenge of filtering millions of genomic variants from next-generation sequencing to identify a single causative variant for molecular diagnosis reporting, and the “Words-to-terms” challenge of transforming unstructured clinical jargon into standardized, interoperable ontology terms. We present two novel Generative AI (GenAI) frameworks addressing these challenges. Both systems integrate contextual information with knowledge from curated medical databases and real-time web data. Evaluation using both open-source Kimi 2 and closed-source Gemini-2.5-pro yielded similarly accurate results.
To address the first challenge, our GenAI framework automates the application of ACMG/AMP guidelines for variant interpretation. Validation on 68 simulated cases with spiked-in pathogenic variants onto negative VCF files demonstrated 94% sensitivity in variant prioritization and 72% concordance in genetic disease diagnosis. The system generates comprehensive reports detailing ACMG evidence codes, phenotype profile matching, and differential diagnoses, achieving high concordance with human experts on internal cases. For the second challenge, the framework maps unstructured clinical text to phenotype ontologies and standardized values. We demonstrated this utility by rapidly processing 2,000 literature cases to build a large-scale virtual cohort for Leigh Syndrome. These results highlight the enormous potential of GenAI in overcoming data interpretation bottlenecks for rare disease diagnosis and research. The open-source AI tool like Kimi 2 performs similarly well as the closed-source tools, with the added advantage of potential local deployment to enhance patient privacy protection.



