Thomas Charlon received his Computer Science PhD at the University of Geneva, Switzerland (2019) while being employed as a Bioinformatician at Precision for Medicine, Quartz Bio to perform unsupervised clustering of genome-wide data in systemic autoimmune diseases. He then independently researched withheld content on social networks and pursued entrepreneurial projects as a real-estate price estimation web app.

As a Research Associate with Prof. Tianxi Cai, he applies NLP to texts related to mental health and suicide prevention, such as scientific publications and electronic health records, to assist psychiatrists in identifying at-risk patients for the Center for Suicide Research Prevention project. He also develops standardized codebooks used by many other labs (MGB, CHA, Duke, Pittsburgh), creates web apps to support and facilitate dissemination of results, and helps setup reproducible analysis processes.

Presentations

23x

Agents for Healthcare Chart Review

LLMs enable new perspectives in the analysis of narrative text from patient level hospital data. I will show how to carefully integrate them in medically sound pipelines to perform tasks as identifying disease activity, comorbidities, and risk factors, which were traditionally done manually and can now be scaled by orders of magnitude. I will highlight the usual cornerstone steps in these pipelines, distinguish reasoning-heavy from NLP-heavy applications, and showcase the key differences to consider in public pre-trained models as Llama and GPT-OSS and open-source software as Ollama and vLLM.

See Presentation
22x

The best of both worlds: building R / Python pipelines for biomedical LLM semantic search apps

At the CELEHS laboratory we are particularly interested by LLM-based embeddings as BGE and BERT. As the number of models increases, we need methods to compare their clinical usefulness. While some R packages exist to leverage GPU capabilities, Pytorch is by far more used for GPU computation. In contrast, R is efficient for data management and visualization. How should one build robust and reproducible pipelines incorporating them both ? My answer is well-designed pipelines with Docker, Makefile, and Elasticsearch. In this talk I will showcase my design approaches to such challenges.

See Presentation