Skip to content

Breast Cancer Diagnostics

    This project applies supervised machine learning to predict whether a breast tumor is benign (B) or malignant (M) using the Breast Cancer Wisconsin (Diagnostic) dataset from the UCI Machine Learning Repository.

    Highlights:
    Ensemble model achieved over 99% accuracy on the test set.
    – Perfect sensitivity (no malignant cases misclassified).
    – Most predictive features: Concave Points (Worst), Radius (Worst), Perimeter (Worst).

    Demonstrates how ensemble learning methods can enhance diagnostic precision and reduce false negatives in medical classification tasks.

    Employment Forecasting

      Every month the U.S. Bureau of Labor Statistics publishes employment figures for large sections of the U.S. Economy. The task is to forecast the amount of people employed in the financial activities sector for four years. This information is publicly reported by the St. Louis FRED and can be found at the following link. https://fred.stlouisfed.org/series/USFIRE.

      Four years represents a large amount of time and confidence windows will be wide. Therefore, I will test a few forecasting models by comparing their respective RMSE and Winkler scores on a known portion of the time series.

      Computer Vision System: Dry Beans

        This project is a computer vision–driven multi-class classification system🚀 It distinguishes seven varieties of dry beans using image-derived features from the UCI Machine Learning Repository (13,611 samples, 16 features). The best-performing model was a Radial Support Vector Machine, achieving ~93.5% accuracy on the hold-out test set, outperforming the ~93.1% benchmark reported in prior research using SVMs.
        The results suggest further gains could be possible with additional, less-correlated features (e.g., color information from raw images)

        This project applied end-to-end machine learning techniques and demonstrated how careful preprocessing, model selection, and validation can meaningfully improve real-world classification performance.

        MovieLens

          Recommendation systems have revolutionized e-commerce. Companies like Amazon sell all manner of products to their customers and actively collect reviews which are stored in massive databases. Machine learning tools are then used to recommend products across the full range of customers in a tailored and highly specific way. This process enables data driven companies to increase their pool of satisfied, returning consumers. It is the same process used by media companies like Apple Music and Netflix.

          This project will use a large subset of the MovieLens data which holds approximately 10 million ratings on approximately 10,000 movies from around 70,000 users. I will use the Root Mean Squared Error of my predicted ratings against a validation set of actual ratings in order to evaluate several different models. The goal is an RMSE of below .86490 (which corresponds to within one star of accuracy on a Netflix type rating system).