Sriharsha Mopidevi

Sriharsha Mopidevi

Senior Application Developer

AI-4-AI Lab, Penn Medicine, University of Pennsylvania

Building AI systems that make healthcare research safer, faster, and more accessible.

About

I'm a software engineer specializing in AI/ML applications for healthcare at the University of Pennsylvania. I work in the AI-4-AI Lab (Artificial Intelligence for Ambulatory Care Innovation) in the Department of Biostatistics, Epidemiology, and Informatics, where I build production systems that bridge clinical research and cutting-edge artificial intelligence.

My work spans the full stack of healthcare technology: from designing privacy-preserving de-identification pipelines that process clinical video, audio, and text, to building scalable data platforms that make multimodal research data findable, accessible, and reusable. I am framework-agnostic and adapt to whatever the problem requires, whether that means Python, Node.js, Go, React, or Django.

Before Penn, I worked as a software engineer in India, building web platforms, data pipelines, and microservices architecture. I hold an MS in Computer Science from Pace University, New York.

Experience

Senior Application Developer

September 2023 - Present

University of Pennsylvania, AI-4-AI Lab · Philadelphia, PA

  • Designed and built MedVidDeID, a modular medical audio-video de-identification pipeline using Python, OpenCV, pose estimation, and ASR models. Achieved 92% de-identification accuracy and reduced processing time by 63% over manual methods.
  • Developed and validated the pipeline through multiple peer-reviewed publications and conference presentations.
  • Built and optimized Databricks workflows to process and query 10TB+ of multimodal data in real time, enabling concurrent experiments across multiple labs.
  • Developing a multimodal FAIR data repository and dashboard using Azure Data Lake, Next.js, Django, and MariaDB.
  • Developed HPC scripts using SLURM and LSF, reducing experiment runtime by 20%.
  • Enhanced REDCap functionality with AWS and Amazon Mechanical Turk, enabling scalable video data labeling with 350+ valid responses.
  • Led training sessions for junior developers and researchers on OpenCV, Docker, HPC workflows, and Databricks.
  • Collaborated across labs to support NIH grant proposals, securing funding for AI-driven bioinformatics projects.
  • Co-authored 6 peer-reviewed papers including publications in JBI, JAMIA, AAAI, and PSB, covering video de-identification, multimodal datasets, VQA benchmarks, and speaker role identification.
  • Built containerized ML inference pipelines with Docker for reproducible deployment of computer vision and NLP models across HPC and cloud environments.
  • Designed and implemented REST APIs with Django and FastAPI to serve multimodal clinical data to frontend dashboards and external research tools.
  • Developed automated data quality checks and validation pipelines for clinical video and audio datasets, ensuring compliance with IRB and HIPAA standards.

Software Engineer

August 2019 - January 2022

GSPAN · New Delhi, India

  • Developed a User Management System PWA using React.js, Python, and Django, increasing organizational productivity by 25%.
  • Built an interactive web app visualizing Indian election and COVID-19 data using React.js, Deckgl, Mapbox, and React-vis, boosting website traffic by 5x.
  • Migrated from monolithic architecture to microservices with Node.js and Express.js, reducing development cycles by 50%.
  • Implemented CI/CD pipelines using Jenkins and GitLab, reducing deployment errors by 70%.
  • Built a serverless architecture with Netlify and AWS Lambda, cutting operational costs by 70%.
  • Engineered fully automated ETL pipelines using Python and FastAPI for cross-organizational data analysis.

Education

MS, Computer Science

Pace University, Seidenberg School of Computer Science and Information Systems

New York, NY · January 2022 - May 2023

B.Tech, Electronics and Communications

ACE Engineering College

Hyderabad, India · Graduated May 2019

Publications & Research

MedVidDeID: Protecting Privacy in Clinical Encounter Video Recordings.

Mopidevi S, Jang KJ, Alasaly B, Pugh S, Park J, Batugo A, Hwang S, Eaton E, Mowery DL, Johnson KB.

Journal of Biomedical Informatics, 170, 2025.

Observer: Creation of a Novel Multimodal Dataset for Outpatient Care Research.

Johnson KB, Cohen DL, Alasaly B, Jang KJ, Eaton E, Mopidevi S, Koppel R.

Journal of the American Medical Informatics Association (JAMIA), 33(2), 424-433, 2026.

Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models.

Park J, Jang KJ, Alasaly B, Mopidevi S, Zolensky A, Eaton E, Lee I, Johnson K.

Proceedings of the AAAI Conference on Artificial Intelligence, 39(19), 19821-19829, 2025.

Speaker Role Identification in Clinical Conversations.

Zolensky A, Jang KJ, Sabin J, Hartzler A, Alasaly B, Mopidevi S, Liberman M, Johnson K.

Biocomputing 2026: Proceedings of the Pacific Symposium (PSB), 144-157, 2025.

Towards a Real-time Clinical Agenda Setting System for Enhancing Clinical Interactions in Primary Care Visits.

Jang KJ, Bhatti S, Pugh S, Maduno C, Sridhar S, Mopidevi S, Eaton E, Johnson K.

Workshop on LLMs and Generative AI for Health at AAAI 2025

The Observer Repository: Advancing Ambulatory Care Innovation Through Video-Based Clinical Ethnography.

Johnson KB, Cohen DL, Alasaly B, Jang KJ, Eaton E, Mopidevi S, Koppel R.

medRxiv, 2025.

Presentations

MedVidDeID: Protecting Privacy in Clinical Encounter Video Recordings.

Mopidevi S et al.

AMIA Informatics Summit 2025, Podium Abstract, Pittsburgh, PA.

Featured Projects

MedVidDeID

An open-source medical data de-identification pipeline for video, audio, and text. Uses WhisperX, YOLOv11, PHIlter, and custom AudioScrub modules.

PythonOpenCVPyTorchWhisperXYOLONLP

Observer Platform

A FAIR (Findable, Accessible, Interoperable, Reusable) clinical research data repository supporting observational studies in ambulatory care. Handles data collection, storage, exploration, and analysis across research teams with a multimodal, multi-database architecture built on Azure Data Lake.

DjangoNext.jsReactMariaDBAzure Data LakeDatabricks

Telegram User Scraper

29 stars

A tool to scrape Telegram user details from groups. One of the popular open-source projects.

Python

Low Latency Live Streaming

A live streaming platform template with Big Blue Button, Oven Media Engine, and FFMPEG-based stream conversion. Supports real-time comments and Docker Compose deployment.

DjangoDockerFFMPEGPostgreSQL

Indian Election & COVID-19 Data Visualization

Interactive web application visualizing Indian election results and COVID-19 data with geospatial mapping.

React.jsDeck.glMapboxData Visualization

Debian Distribution Build

Guide and tooling for building custom Debian-based Linux distributions using live-build. Create bootable ISOs with personalized kernel, packages, and configurations.

LinuxDebianISO

Tech Stack

Languages

PythonJavaScriptTypeScriptGoCSQLShell

Backend

DjangoFlaskFastAPIExpress.jsNode.js

Frontend

ReactNext.jsVue.js

AI / ML

PyTorchTensorFlowKerasscikit-learnOpenCVHugging FaceOpenAILangChainWhisperXYOLO

Data

DatabricksApache SparkPostgreSQLMariaDBMongoDBNeo4jAzure Data LakeREDCap

Cloud & DevOps

AzureAWSDockerKubernetesJenkinsCI/CD

Other

HPC (SLURM, LSF)GitLinuxJupyterVS Code

Beyond Code

When I'm not building AI systems, I'm exploring the cosmos.

Astrophysics

Currently taking astrophysics classes at Penn. Reading about black holes, dark matter, and the origins of the universe.

Cooking

Experimenting in the kitchen and trying out new recipes.

Fitness

Regular at the gym and a frequent walker on the Schuylkill River Trail.

Outdoors

Hiking, beach trips, and spending time in nature whenever the weather is right.

Sci-Fi Movies

The more thought-provoking, the better.

Let's Connect

Feel free to reach out if you want to collaborate on research, discuss healthcare AI, or just talk about the universe.