Ben for short. Experienced in end-to-end data science & engineering using Python, but I consider myself a generalist. Deeply excited about autonomy and cryptocurency (disjointly). Standing up out of chairs whenever possible. Keeping beginner's mind as best as I can.

Most of these are technical projects whose code is stored on my own Github account. For all of them, I did all or a significant portion of the work.

Featuretools

Feature engineering in Python

Contributors: Many, see 'contributors' tab on Github (click this card to go to the Github repository).

Date: 2016-2018

Provides a language to express complex, composable features on top of relational time-varying data, an efficient multithreaded engine to compute these features at particular points in time (scalable to clusters using Dask or PySpark), and algorithms for automatically generating the list of features (e.g. Deep Feature Synthesis). I was the first user, a lead developer since 2016, and helped in its transition from closed to open source in 2017.

Recognition: Most popular feature engineering library on Github with over 2400 stars as of November 2018. Stack Overflow tag.

Note: this clickable card redirects to https://github.com/Featuretools/featuretools/commits?author=bschreck.

AI Project Manager

Deployed Predictive Analytics for Large Software Projects (collab. with Accenture)

Contributors: Kalyan Veeramachaneni (MIT & Feature Labs), Sarvesh Damle (Accenture), Rajendra Prasad (Accenture), Shankar Mallapur (Accenture)

Date: 2015-2018

  • Multi-year effort to deploy a system at Accenture that augments project managers by predicting the degree to which projects' future metrics will be on target.
  • Developed deployment-focused machine learning workflow that translated to many other projects at Feature Labs.
  • Succesfully deployed, continually tested, and periodically updated by Accenture engineers since September 2018.
  • Knowledge transfer of necessary machine learning concepts and skills to novice engineers at Accenture Bangalore.
  • One of first machine learning projects actually used internally at Accenture.

Recognition: Published in IEEE Big Data 2018. Deployed at Accenture.

Note: this clickable card redirects to https://www.featuretools.com/wp-content/uploads/2018/03/AIPM.pdf.

DARPA D3M TA2 System

Contributors: Members of Feature Labs and the MIT Data-to-AI Lab, along with Carles Sala.

Date: 2017-2018

D3M (Data-Driven Discovery of Models) is a multiyear 25+MM project/competition. TA2 systems automatically solve image, audio, text, tabular, relational, time-series, graph, recommendation, and other data science problems. They also interface with TA1 components (data science primitives) and TA3 systems (UIs). Our entry used open-source tools we developed such as [GET LINKS] MLBlocks (automatic pipeline generation), Featuretools (feature engineering system), and BTB (copula-based hyperparameter optimizer). We consistently handled all dataset types. I contributed heavily to our handling of image, text, relational, time-series, graph, & recommendation problems, and integrating with TA1 primitives. Note: system not public. Link redirects to D3M Gitlab page.

Recognition: Consistently in top 2 at integration events, depending on metric, while I was a member

Note: this clickable card redirects to https://gitlab.com/datadrivendiscovery.

DARPA D3M TA1 Primitives

Contributors: Members of Feature Labs and the MIT Data-to-AI Lab, along with Carles Sala.

Date: 2017-2018

D3M (Data-Driven Discovery of Models) is a multiyear 25+MM project/competition. TA1 primitives are individual data science components that can be accessed by TA2 and TA3 systems. I was the lead developer of our TA1 entries, including a primitive for applying Featuretools and Random Forest-based feature selection. I also helped debug JPL's Scikit-Learn (CHECK SPELLING) primitives.

Note: this clickable card redirects to https://github.com/Featuretools/ta1-primitives.

Lil Neuron

NLP Lyric Generation

Contributors:

Date: 2016-2017

Recurrent neural networks trained on rap lyrics. Used a novel method incorporating word pronunciations to embed rhyme schemes. See Blog tab for more details [NEED TOP ADD LINKS TO THESE]

Recognition: AI Grant Finalist

Note: this clickable card redirects to https://github.com/bschreck/lil-neuron.

DLDB

Deep Learning for Databases

Contributors: Kalyan Veeramachaneni

Date: 2018

High-level API to apply deep learning to tabular data. incorporates cutoff times to preserve label leakage. Automatically embeds sparse categorical variables into a dense numeric representation. Parameterized architecture allows interoperability with Keras. Several demos available here.

Note: this clickable card redirects to https://github.com/Featuretools/DL-DB.

Personal Food Computer

Hydroponic vegetable & herb growing system

Contributors:

Date: 2018

This is inspired by the MIT Media Lab Open Agriculture Project, and specifically a low-cost version discussed on an associate forum. I use several devices (such as a Raspberry Pi) to measure and control variables like temperature & humidity. Eventually I want to see how many variables I can automate and control.

Note: this clickable card redirects to https://www.media.mit.edu/posts/build-a-food-computer.

Predicting Malicious Cyber Connections

Featuretools demo for MIT Lincoln Labs.

Contributors:

Date: 2018

Applies Featuretools to cybersecurity dataset to build features and predict malicious web traffic in advance

Note: this clickable card redirects to https://github.com/Featuretools/predict-malicious-cyber-connections.

Trane

A Language to Express Predictive Problems (Part of Master's Thesis)

Contributors: Kalyan Veeramachaneni, Alex Nordin, Lei Xu, & Albert Carter

Date: 2016

Reminiscent of one late, great jazz saxophonist, this project allows anyone (including machines) to improvise over data. It includes a novel language to express prediction problems over datasets, and an interpreter to convert those problems into code. In my thesis, I used an early version of Featuretools in combination with Scikit-Learn to automatcally generate and solve thousands of these problems. While I am no longer actively contributing work is ongoing with current students in the MIT Data-to-AI lab. Note that Github link is the most recent repository, which was heavily refactored from my original work and thus does not show my contributions directly. Please click the card to see the published paper.

Recognition: Published in IEEE DSAA 2016

Note: this clickable card redirects to https://github.com/HDI-Project/Trane.

Pernican

A machine learning system to recommend meaningful prediction problems depending on the dataset and user (Part of Master's Thesis)

Contributors: Kalyan Veeramachaneni

Date: 2016

Recommender system to learn and suggest meaningful prediction problems from arbitrary datasets. I developed a system that translate Trane problems into human-readable sentences, and a UI that allows humans to rank randomly generated problems based on how meaningful they think they are. Pernican uses these rankings as labeled data and generates features from each dataset. Features include statistical information about the dataset, as well as semantic clusters using Word2Vec topic models on the names of columns. It also generates an exhaustive list of possible prediction problems using a subset Trane. This is nontrivial because datasets have different numbers of columns and different numbers of possible prediction problems. Finally, the system uses both implicit low-rank matrix features as well as explicit extracted features to rank each possible prediction problem and return an ordered list. Users can provide feedback and the system will update its internal model. Note: project showed promising early results but needed more data to reach strong conclusions. Note: url not provided because project is not currently public

Note: this clickable card redirects to .

Online Matrix Prediction Implementation

Final project for MIT 6.883 Online Methods in Machine Learning

Contributors:

Date: 2016

Implemented a theoretically near-optimal online algorithm in Julia and Python to predict unknown values in a matrix according to a low local-trace norm heuristic.

Recognition: Open Source Contribution: first implementation of algorithm

Note: this clickable card redirects to https://github.com/bschreck/near-optimal-online-matrix-prediction.

Robo-Chef

NLP Recipe Suggestions

Contributors: Nico Rakover, Ambika Krishnamachar

Date: 2015

Using an attention-based recurrent neural network, built a system that gathered recipes and associated comments from websites, isolated refinements within comments and placed them within the referenced location in the recipe text.

Note: this clickable card redirects to https://github.com/bschreck/robo-chef.

Scenic-Recursion

Computer Vision Scene Detection

Contributors:

Date: 2015

Combined convolutional neural networks and recurrent neural networks to identify objects in scenes. Discovered a bug in Tensor Flow's data input feeders that did not sort inputs correctly (see these Stack Overflow questions: {GET LINKS}). Note: project unfinished because Tensor Flow lacked implementations for several gradient update steps I needed for backpropagation. I did not have time in the final weeks of the semester to implement these.

Recognition: Open Source Contribution: Tensor Flow

Note: this clickable card redirects to https://github.com/bschreck/scenic-recursion.

Gesture-controlled quadcopter using FPGA & Kinect

Final project for MIT 6.111 Digital System Lab

Contributors: Lee Gross

Date: 2014

Built a gesture-controlled FPGA-based quadcopter system, where users hand movements were translated into controls through a Microsoft Kinect and an FPGA and remotely sent to a quadcopter. Video available here.

Recognition: Recognition: MIT EECS Northern Telecom/BNR Project Award for Best Undergraduate Laboratory Project

Note: this clickable card redirects to https://github.com/bschreck/gesture-drone.

MOOCdb: Data Science Foundry for MOOCs

Contributors: Sebastien Boyer, Ben Gelman, Kalyan Veeramachaneni.

Date: 2015

System to allow many different courses and educational providers to generate high-level features for their MOOC data and predict student dropout at any phase in the course.

Recognition: Published in IEEE DSAA 2015

Note: this clickable card redirects to https://github.com/MOOCdb.

Anti-Coordination of Multiple Smart Agents Around a Simple Signal

Final project for MIT 16.412j Cognitive Robotics

Contributors: Lee Gross, Matthew Susskind

Date: 2015

Implementation for Arduino-based agents of this paper for Arduino-based agents.

Note: this clickable card redirects to https://github.com/bschreck/ez-anti-coordination.

Multicore, multilevel cache, branch-predicting, pipelined processor

Final project for MIT 6.175 Constructive Computer Architecture

Contributors:

Date: 2014

Implemented in Bluespec on an FPGA

Note: this clickable card redirects to .

Scheduling for Liquid-Handling Robot Automation

Project for MIT SuperUROP program in the Weiss Lab for Synthetic Biology

Contributors: Jonathan Babb, Felix Sun

Date: 2014

Developed a scheduling system to automate wet lab research using operating systems principles to execute multiple tasks concurrently on liquid-handling robots. Developed user-facing front-end for biologists

Recognition: Publication: IWBDA 2014

Note: this clickable card redirects to https://bitbucket.org/jbabb/biocad/src/default/.