Matthew M. Casey
Tagline:Computer Science PhD Student at Northwestern University
About Me
I am a first year PhD student at Northwestern University in the computer science theory group. I am currently interested in beyond worst-case analysis, an algorithmic design paradigm that evaluates algorithmic performance against inputs more like those seen in the real world. This lets us prove interesting algorithmic results for problems that are otherwise hard in the traditional worst-case paradigm.
Before joining Northwestern, I received my BS in Computer Science and Mathematics at Northeastern University. There I worked on scheduling algorithms under the mentorship of Rajmohan Rajaraman.
Research Interests
- Beyond Worst-Case Analysis
- Tensor Decomposition
- Learning Augmented Algorithms
Contact
Address
2233 Tech Drive
Mudd Room 3016
Evanston, IL 60208-3109
Publications
Scheduling Splittable Jobs on Configurable Machines
Conference PaperPublisher:Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)Date:2024Authors:Matthew CaseyRajmohan RajaramanDavid StalfaCheng TanAnalysis of Viability of TCGA and GTEx Gene Expression for Gleason Grade Identification
Conference PaperPublisher:Artificial Intelligence in MedicineDate:2020Authors:Matthew CaseyNianjun ZhouDescription:Gleason grade is a critical indicator for determining patient treatment for prostate cancer. In this paper, we analyze the viability of RNA sequencing gene expression data for Gleason grade identification. We combine datasets from the TCGA (sampled from cancer patients) and GTEx (sampled from healthy patients) databases. Using mutual information techniques, we reduce the dimensionality from 19046 genes to only the 20 most predictive genes. Then, we apply an unsupervised approach to analyze the separability of the grades of cancer. We use the t-SNE algorithm to map features into two dimensions and apply a Gaussian Mixture Model (GMM) for clustering. The result shows a clear visual separability between cancer and healthy samples. However, the grades of cancer themselves are not visually separable. Also, we apply the Mann-Whitney U test to compare the statistical similarity of the different Gleason grades and find that most grades are similar to each other. We further apply a random forest model to estimate the Gleason grade. The results show that the model accurately predicts whether a sample comes from healthy or cancer tissue. However, the model is weak in classifying the Gleason grade. The best performing model has a weighted macro-averaged F1 score of 0.66, improving on a baseline score of 0.22 obtained by random guessing. Our results indicate that the difference in gene expression among Gleason grades is relatively small compared to the difference between healthy and cancer samples. Thus, gene expression alone cannot be used for Gleason grade identification.A Machine Learning Approach to Prostate Cancer Risk Classification Through Use of RNA Sequencing Data
Conference PaperPublisher:Big Data – BigData 2019Date:2019Authors:Matthew CaseyBaldwin ChenJonathan ZhouNianjun ZhouDescription:Advancements in RNA sequencing technology have made genomic data acquired during sequencing more precise, making models fitted to sequencing data more practical. Previous studies conducted regarding prostate cancer diagnosis have been limited to microarray data, with limited successes. We utilized The Cancer Genome Atlas' (TCGA) prostate cancer sequencing data to test the viability of fitting machine learning models to RNA sequencing data. A major challenge associated with the sequencing data is its high dimensionality. In this research, we addressed two complementary tasks. The first was to identify genes most associated with potential cancer. We started by using the mutual information metric to identify the most significant genes. Furthermore, we applied the Recursive Feature Elimination (RFE) algorithm to reduce the number of genes needed to identify cancer. The second task was to create a classification model to separate potential high-risk patients from the healthy ones. For the second task, we combated the high dimensionality challenge with Principal Component Analysis (PCA). In addition to high dimensionality, another challenge is the imbalanced data set that has a 10:1 class imbalance of cancerous and healthy tissue respectively. To combat this problem, we used the Synthetic Minority Oversampling Technique (SMOTE) to create synthetic observations and equalize the class distribution. We trained and tested a logistic regression model using 5-fold cross-validation. The results were promising, significantly reducing the false negative rate as compared to current diagnostic techniques while still keeping the false positive rate low. The model showed great improvements over previous machine learning attempts to diagnose prostate cancer. Our model could be applied as part of the patient diagnosis pipeline, helping to improve accuracy.
Work Experiences
Algorithms Research Co-op
from: 2023, until: 2023Organization:Northeastern UniversityLocation:Boston, Massachusetts, United States
Python Software Engineering Coop
from: 2022, until: 2022Organization:MORSE CorpLocation:Cambridge, Massachusetts, United States
Education
PhD
from: 2024, until: presentField of study:Computer ScienceSchool:Northwestern University
Bachelor of Science - BS
from: 2020, until: 2024Field of study:Computer Science and MathematicsSchool:Northeastern University