Data Scientist
SIFEI HAN
Phone: - | Email:-
Github: www.github.com/sifei
LinkedIn: www.linkedin.com/in/sifei-han
Address: 2025 Oakford St., Philadelphia, PA, 19146
Objective
Data scientist with 7+ years of experience in academia and 4 years of experience in healthcare. Developed different machine learning models to solve real-world challenges. Seeking to apply skills for scalable solutions. Skilled in machine learning, deep learning, NLP, large language model (LLM), and general problem solving.
Skills
Programming Languages
Python, R, C++, Java, Perl
Web Development
Rshiny, React, HTML, JSON, SQL, MySQL, SharePoint, Drupal, CSS, AWS, Postgres, SSL
Machine Learning
SVM, Logistic Regression, Random Forests, KNN, Decision Tree, Naïve Bayes, SciKit-Learn
Large Language Model
LLaMa, Mistral, Mixtral
Deep Learning
CNN, RNN/LSTM, Huggingface, Keras, TensorFlow, Pytorch, Transformers, Theano, BERT
Other Tools/Frameworks
LLaMA-Factory, Accelerate, Flask (Python), NLTK, spaCy, NumPy, SciPy, Pandas, Gensim, Scrapy, MetaMap
Experience
Research PostDoc Fellow/Data Scientist- Children’s Hospital of Philadelphia- Philadelphia, PA – April. 2020 - Present
Applied and fine-tuned large language models (LLMs) for advanced natural language processing tasks in healthcare, including bruise detection, Social Determinants of Health (SDOH) extraction, and Research Domain Criteria (RDoC) extraction from clinical notes.
Created the Wake Up Safe (WUS) Event Report and Analysis platform at www.wusreport.com , enabling pediatric anesthesiologists across the nation to submit incidents and conduct case analyses.
Developed an Expertise Knowledge Platform (EKP) to infer Expertise from external and internal resources for CHOP faculty/staff can navigate and find what they need.
Clinical experimental design: Developed a Data Query Platform (DQP) tool for anesthesiologists to query an initial cohort for their research.
Manuscript writing (4 published journal paper, 2 under-review)
Graduate research assistant – University of Kentucky- Lexington, KY – Aug. 2012 – Dec. 2019
Developed an ensemble model with under-sampling and co-training approaches to triage critical posters needing immediate help based on narratives form mental health forums.
Population health: Developed an attention-based CNN model to detect adverse drug reactions and drug intakes based on tweets.
Population health: Developed a Deep&Wide model to identify whether a Twitter user is a JUUL (an e-cigarette) user or not.
Population health: Applied topic modeling for demographic and thematic analysis of social data on electric cigarettes.
Clinical decision support: Used NER and MetaMap to extract ICD-9 codes from EMRs.
software engineer – Office of Sponsored Projects Administration at UKY- Lexington, KY – May. 2016 - Aug. 2016
Developed grant proposal abstract mining tool for University of Kentucky’s faculties/researchers to find potential collaborators; for the Office of the Vice President for Research to identify influential research drivers (researchers or departments).
Developed proposal summarization tool with topic modeling to help the Office of Vice President for Research to discover the changes of research topics during the years; and to improve budget planning for future fiscal years.
Education
Ph.D. IN cOMPUTER sCIence – University of Kentucky – Kentucky – Aug. 2012- Dec. 2019
Published two (2) journal papers, three (3) conference papers, and one (1) workshop paper.
B.s. IN cOMPUTER sCIence – University of Kentucky – Kentucky – Aug. 2008 - May. 2012
b.s. in MathematicAL economics – University of Kentucky – Kentucky – Aug. 2008 - May. 2012
Professional Activities
Training
I-Corps: NSF entrepreneurship training program
Reviewer
IEEE Access
ACM Transactions on Computing for Healthcare
AMIA Clinical Informatics Conference (2022-Present)
American Medical Informatics Association Annual Symposium (2016-Present)
2019 9th International Conference on Advanced Computer Information Technologies
International Conference on Information Systems (2019)
Member - Omicron Delta Kappa (ODK) national leadership honor society
MENTORSHIP
Penn Undergraduate Research Mentoring Program (PURM), 2024
Peer-Reviewed Journal Publications
Automated Matchmaking of Researcher Biosketches and Funder Requests for Proposals using Deep Neural Networks. S. Han, R Richie, L. Shi, F.R. Tsui
IEEE Access, 2024 (Impact Factor: 3.9)
Extracting social determinants of health events with transformer-based multitask, multilabel named entity recognition. R. Richie, V.M. Ruiz, S. Han, L. Shi, F.R. Tsui
Journal of the American Medical Informatics Association (JAMIA), 2023 (Impact Factor:7.942)
Building siamese attention-augmented recurrent convolutional neural networks for document similarity scoring.
S. Han, L. Shi, R. Richie, F.R. Tsui
Information Sciences, 2022 (Impact Factor: 8.233)
Classifying Social Determinants of Health from Unstructured Electronic Health Records Using Deep Learning-based Natural Language Processing
S. Han, R.F. Zhang, L. Shi, R. Richie, H. Liu, A. Tseng, W. Quan, N. Ryan, D. Brent, F.R. Tsui
Journal of Biomedical Informatics, 2022 (Impact Factor:8)
Analytical validation of GMEX rapid point-of-care CYP2C19 genotyping system for the CHANCE-2 trial
X. Meng, A. Wang, G. Zhang, S. Niu, W. Li, S. Han, F. Fang, X. Zhao, K. Dong, Z. Jin, H. Zheng
Stroke and Vascular Neurology (SVN), 2021 (Impact Factor:9.893)
Data and systems for medication-related text classification and concept normalization from Twitter: Insights from the Social Media Mining for Health (SMM4H) 2017 shared task
A. Sarker, M. Belousov, J. Friedrichs, K. Hakala, S. Kiritchenko, F Mehryary, S. Han, T. Tran, A. Rios, R. Kavuluru, B. de Bruijn, F. Ginter, D. Mahata, S. M. Mohammad, G. Nenadic, G. Gonzalez-Hernandez.
Journal of the American Medical Informatics Association (JAMIA), 2018 (Impact Factor:7.942)
On the popularity of the USB flash drive-shaped electronic cigarette Juul
R. Kavuluru and S. Han and E. Hahn
Tobacco control, 2018 (Impact Factor:6.953)
Peer-Reviewed Conference Publications
B. Li, S. Han, L. Shi, L. Wu, F. Tsui. “The Extract-Transform-Load Lessons for Loading Neonatal
Healthcare Data to the OMOP-CDM.” 2020 OHDSI Global Symposium, 2020.
S.Han, T. Tran, A.Rios,R.Kavuluru. “Team UKNLP:Detecting ADRs,Classifying Medication Intake Messages, and Normalizing ADR Mentions on Twitter.” In Proceedings of the 2nd Social Media Mining for Health Applications Workshop and Shared Task at AMIA, 2017.
S.Han, and R.Kavuluru. “Exploratory analysis of marketing and non-marketing e-cigarette themes on Twitter.” International Conference on Social Informatics. Springer International Publishing, 2016.
S.Han, and R.Kavuluru.“On assessing the sentiment of general tweets.” Canadian Conference on Artificial
Intelligence. Springer, Cham, 2015.
R.Kavuluru, S.Han, and D.Harris. “Unsupervised extraction of diagnosis codes from EMRs using knowledge-based and extractive text summarization techniques.” Canadian Conference on Artificial Intelligence. Springer, Berlin, Heidelberg, 2013.