Matthew M. Casey

About Me

I am a first year PhD student at Northwestern University in the Computer Science Theory group, advised by Edith Elkind.

I am interested in computational social choice, with a focus on multiwinner voting and participatory budgeting.

Before joining Northwestern, I received my BS in Computer Science and Mathematics at Northeastern University. There I worked on scheduling algorithms under the mentorship of Rajmohan Rajaraman.

Research Interests

Computational Social Choice
Multiwinner Voting
Representation and Fairness in Voting

Contact

Address

2233 Tech Drive
Mudd Room 3016
Evanston, IL 60208-3109

https://www.linkedin.com/in/matthewmcasey/

Google Scholar

https://scholar.google.com/citations?user=ZESDuX0AAAAJ&hl=en

mattcasey@u.northwestern.edu

Publications

Scheduling Splittable Jobs on Configurable Machines
Conference PaperPublisher:Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)Date:2024
Authors:
Matthew CaseyRajmohan RajaramanDavid StalfaCheng Tan
DOI
Analysis of Viability of TCGA and GTEx Gene Expression for Gleason Grade Identification
Conference PaperPublisher:Artificial Intelligence in MedicineDate:2020
Authors:
Matthew CaseyNianjun Zhou
Description:
Gleason grade is a critical indicator for determining patient treatment for prostate cancer. In this paper, we analyze the viability of RNA sequencing gene expression data for Gleason grade identification. We combine datasets from the TCGA (sampled from cancer patients) and GTEx (sampled from healthy patients) databases. Using mutual information techniques, we reduce the dimensionality from 19046 genes to only the 20 most predictive genes. Then, we apply an unsupervised approach to analyze the separability of the grades of cancer. We use the t-SNE algorithm to map features into two dimensions and apply a Gaussian Mixture Model (GMM) for clustering. The result shows a clear visual separability between cancer and healthy samples. However, the grades of cancer themselves are not visually separable. Also, we apply the Mann-Whitney U test to compare the statistical similarity of the different Gleason grades and find that most grades are similar to each other. We further apply a random forest model to estimate the Gleason grade. The results show that the model accurately predicts whether a sample comes from healthy or cancer tissue. However, the model is weak in classifying the Gleason grade. The best performing model has a weighted macro-averaged F1 score of 0.66, improving on a baseline score of 0.22 obtained by random guessing. Our results indicate that the difference in gene expression among Gleason grades is relatively small compared to the difference between healthy and cancer samples. Thus, gene expression alone cannot be used for Gleason grade identification.
DOI
A Machine Learning Approach to Prostate Cancer Risk Classification Through Use of RNA Sequencing Data
Conference PaperPublisher:Big Data – BigData 2019Date:2019
Authors:
Matthew CaseyBaldwin ChenJonathan ZhouNianjun Zhou
Description:
Advancements in RNA sequencing technology have made genomic data acquired during sequencing more precise, making models fitted to sequencing data more practical. Previous studies conducted regarding prostate cancer diagnosis have been limited to microarray data, with limited successes. We utilized The Cancer Genome Atlas' (TCGA) prostate cancer sequencing data to test the viability of fitting machine learning models to RNA sequencing data. A major challenge associated with the sequencing data is its high dimensionality. In this research, we addressed two complementary tasks. The first was to identify genes most associated with potential cancer. We started by using the mutual information metric to identify the most significant genes. Furthermore, we applied the Recursive Feature Elimination (RFE) algorithm to reduce the number of genes needed to identify cancer. The second task was to create a classification model to separate potential high-risk patients from the healthy ones. For the second task, we combated the high dimensionality challenge with Principal Component Analysis (PCA). In addition to high dimensionality, another challenge is the imbalanced data set that has a 10:1 class imbalance of cancerous and healthy tissue respectively. To combat this problem, we used the Synthetic Minority Oversampling Technique (SMOTE) to create synthetic observations and equalize the class distribution. We trained and tested a logistic regression model using 5-fold cross-validation. The results were promising, significantly reducing the false negative rate as compared to current diagnostic techniques while still keeping the false positive rate low. The model showed great improvements over previous machine learning attempts to diagnose prostate cancer. Our model could be applied as part of the patient diagnosis pipeline, helping to improve accuracy.
DOI

Work Experiences

Algorithms Research Co-op
from: 2023, until: 2023
Organization:Northeastern UniversityLocation:Boston, Massachusetts, United States
Python Software Engineering Coop
from: 2022, until: 2022
Organization:MORSE CorpLocation:Cambridge, Massachusetts, United States

Education

PhD
from: 2024, until: present
Field of study:Computer ScienceSchool:Northwestern University
Bachelor of Science - BS
from: 2020, until: 2024
Field of study:Computer Science and MathematicsSchool:Northeastern University

Matthew M. Casey

About Me

Research Interests

Contact

Address

Publications

Scheduling Splittable Jobs on Configurable Machines

Analysis of Viability of TCGA and GTEx Gene Expression for Gleason Grade Identification

A Machine Learning Approach to Prostate Cancer Risk Classification Through Use of RNA Sequencing Data

Work Experiences

Algorithms Research Co-op

Python Software Engineering Coop

Education

PhD

Bachelor of Science - BS