Matthew M. Casey

Tagline:Computer Science PhD Student at Northwestern University

personal photo of Matthew M. Casey

About Me

I am a first year PhD student at Northwestern University in the computer science theory group. I am currently interested in beyond worst-case analysis, an algorithmic design paradigm that evaluates algorithmic performance against inputs more like those seen in the real world. This lets us prove interesting algorithmic results for problems that are otherwise hard in the traditional worst-case paradigm.

Before joining Northwestern, I received my BS in Computer Science and Mathematics at Northeastern University. There I worked on scheduling algorithms under the mentorship of Rajmohan Rajaraman.

Research Interests

  • Beyond Worst-Case Analysis
  • Tensor Decomposition
  • Learning Augmented Algorithms

Publications

  • Scheduling Splittable Jobs on Configurable Machines

    Conference PaperPublisher:Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)Date:2024
    Authors:
    Matthew CaseyRajmohan RajaramanDavid StalfaCheng Tan
  • Analysis of Viability of TCGA and GTEx Gene Expression for Gleason Grade Identification

    Conference PaperPublisher:Artificial Intelligence in MedicineDate:2020
    Authors:
    Matthew CaseyNianjun Zhou
    Description:
    Gleason grade is a critical indicator for determining patient treatment for prostate cancer. In this paper, we analyze the viability of RNA sequencing gene expression data for Gleason grade identification. We combine datasets from the TCGA (sampled from cancer patients) and GTEx (sampled from healthy patients) databases. Using mutual information techniques, we reduce the dimensionality from 19046 genes to only the 20 most predictive genes. Then, we apply an unsupervised approach to analyze the separability of the grades of cancer. We use the t-SNE algorithm to map features into two dimensions and apply a Gaussian Mixture Model (GMM) for clustering. The result shows a clear visual separability between cancer and healthy samples. However, the grades of cancer themselves are not visually separable. Also, we apply the Mann-Whitney U test to compare the statistical similarity of the different Gleason grades and find that most grades are similar to each other. We further apply a random forest model to estimate the Gleason grade. The results show that the model accurately predicts whether a sample comes from healthy or cancer tissue. However, the model is weak in classifying the Gleason grade. The best performing model has a weighted macro-averaged F1 score of 0.66, improving on a baseline score of 0.22 obtained by random guessing. Our results indicate that the difference in gene expression among Gleason grades is relatively small compared to the difference between healthy and cancer samples. Thus, gene expression alone cannot be used for Gleason grade identification.
  • A Machine Learning Approach to Prostate Cancer Risk Classification Through Use of RNA Sequencing Data

    Conference PaperPublisher:Big Data – BigData 2019Date:2019
    Authors:
    Matthew CaseyBaldwin ChenJonathan ZhouNianjun Zhou
    Description:
    Advancements in RNA sequencing technology have made genomic data acquired during sequencing more precise, making models fitted to sequencing data more practical. Previous studies conducted regarding prostate cancer diagnosis have been limited to microarray data, with limited successes. We utilized The Cancer Genome Atlas' (TCGA) prostate cancer sequencing data to test the viability of fitting machine learning models to RNA sequencing data. A major challenge associated with the sequencing data is its high dimensionality. In this research, we addressed two complementary tasks. The first was to identify genes most associated with potential cancer. We started by using the mutual information metric to identify the most significant genes. Furthermore, we applied the Recursive Feature Elimination (RFE) algorithm to reduce the number of genes needed to identify cancer. The second task was to create a classification model to separate potential high-risk patients from the healthy ones. For the second task, we combated the high dimensionality challenge with Principal Component Analysis (PCA). In addition to high dimensionality, another challenge is the imbalanced data set that has a 10:1 class imbalance of cancerous and healthy tissue respectively. To combat this problem, we used the Synthetic Minority Oversampling Technique (SMOTE) to create synthetic observations and equalize the class distribution. We trained and tested a logistic regression model using 5-fold cross-validation. The results were promising, significantly reducing the false negative rate as compared to current diagnostic techniques while still keeping the false positive rate low. The model showed great improvements over previous machine learning attempts to diagnose prostate cancer. Our model could be applied as part of the patient diagnosis pipeline, helping to improve accuracy.

Work Experiences

  • Algorithms Research Co-op

    from: 2023, until: 2023

    Organization:Northeastern UniversityLocation:Boston, Massachusetts, United States

  • Python Software Engineering Coop

    from: 2022, until: 2022

    Organization:MORSE CorpLocation:Cambridge, Massachusetts, United States

Education

  • PhD

    from: 2024, until: present

    Field of study:Computer ScienceSchool:Northwestern University

  • Bachelor of Science - BS

    from: 2020, until: 2024

    Field of study:Computer Science and MathematicsSchool:Northeastern University