Brent Butler

Data Scientist · (719) 588-1439 · butlerbt.mg@gmail.com

Welcome to my portfolio! I have always loved the surprising insights data can help us see. It can help you question your assumptions, or affirm your uncertainties. Data has the potential to tell a story better than a book, or capture a moment in time better than a photo. It is my mission to help further those possibilities. I have a particular affinity for geographic and spatial data, because I am continually interested in the intersection of people and place. Below you will find my personal Data Science projects, please reach out with any questions, comments, or recommendations. I'd love to talk about anything that is Data!


Projects

FastMap.ai

Identifying and mapping buildings using Deep Learning

Honed a resnet-34 neural network and used various geographic/GIS libraries to identify, segment, and label buildings in aerial imagery.

Technologies/Techniques used: Fast.ai, Solaris, Geopandas, GoogleCoLab, Google cloud computing, Rasterio, QGIS, MatPlotLib.

Challenges Faced: applying computer vision techniques, uncommon data type, mixed quality labeling, feature engineering.


5 year returns

Time series analysis and forecast of nation wide real estate values

Used time series models to analyzee and forecast national median home values across 14000 zip codes. Identified ideal candidate zip codes for various investment strategies.

Technologies/Techniques used: echnologies/Techniques Used. PySpark, Statsmodels time series, FB Prophet, ARMA/ARIMA/SARIMAX, Plotly.

Challenges Faced: Cloud Computing, hyper parameter selection/tuning, Time Series Assumptions.


Antartic Ice Shelf vulnerability

Forecasting ice shelf vulnerabilty based on backscatter data and melt days

Built a vulnerabiltiy index for Antartic Ice Shelves. The index is absed upon lidar backscatter data and annual melt days.

Technologies/Techniques Used: Python, ArcGIS, Gdal, NetCDF, projection transformations, resampling techniques, classification.

Challenges Faced: Obscure data type conversion (.sir), geographic projection issues, large .tiff file sizes (32 bands).


Water Pump Predictor

Applying machine learning algroithms to predict water pumps functional state across Tanzanzia

After rigerous data cleaning and feature engineering we were able to obtain reliable cross validation predictions using models such as XGboost, Random Forest, and KNN. With these reliable predictions limited resources could be distributed with more precision to ensure drinking water access and/or to reduce burdensome cost of pump monitoring.

Technologies/Techniques used: SciKitLearn, XGBoost, KNN, Decision Tree, Random Forest, K Folds CV, Hyperparameter tuning, Precision, Recall, AUC.

Challenges Faced: Computation limitiations, feature engineering and cleaning, interprobility, class inbalance.


King County Housing

Real Estate Sale Price Inference

My team and I trained an interpretable multi linear regression model describing home sale prices in 2018 from messy real-world public data.

Technologies/Techniques Used: Python, Pandas, PostgreSql, SciKitLearn, Statsmodeling, hypothesis Testing, A/B testing, one-hot-encoding, normalization, scaling, feature transformation, stepwise feature selection.

Challenges Faced: Feature selection, nonnormal and nonlinear features, multicollinearity, model fitting, group management.


Opportunity Youth

Exploratory Data Analysis Using Census Data

Coalesced multiple public data sources to explore and paint a picture of the changing demographics amongst "Opportunity Youth" in King County.

Technologies/Techniques used: Python, PostGreSql, MatPlotLib, Leaflet, QGIS, joins and window functions, data visualization, non-technical presentation.

Challenges Faced: vague business question, operating under ambiguity, defining methods, linking geographic and non-geographic data.


Stratiform Mountain Guides 2019 Buisiness Analysis

Exploratory Data Analysis of basic buisness metrics

I practiced basic EDA to create a report that communicates a broad picture of a small data set. The data set was a database tracking the work of my previous business. The final product is an HTML display of the descriptive statistics of each variable contained within the data set. I used Python, Pandas, Pandas Profiling, Matplotlib, HTML, and CSS for this project.