Ben for short. Experienced in end-to-end data science & engineering using Python, but I consider myself a generalist. Deeply excited about autonomy and cryptocurency (disjointly). Standing up out of chairs whenever possible. Keeping beginner's mind as best as I can.

These are projects whose code is not on my own Github account.

Projects stored on my Github are listed in the "Projects" tab.

Note: I have included projects for which I contributed issues, even though I may not have committed code into their master branches. Many times these issues led to changes in the master branch by other contributors. Furthermore, I want to showcase my usage & understanding of them.

When available, links reference either a single issue/pull request/other contribution, or my commit history

Featuretools

Core contributor. See Projects tab for more information.

Note: this clickable card redirects to https://github.com/Featuretools/featuretools/commits?author=bschreck.

Dask/Distributed

Distributed computation in Python. Provided several bug fixes and feature implementations.

Note: this clickable card redirects to https://github.com/dask/distributed/commits?author=bschreck.

Dask-EC2

Automatic deployment of Dask program to Amazon EC2 cluster. Several commits.

Note: this clickable card redirects to https://github.com/dask/dask-ec2/commits?author=bschreck.

D3M

Central storage location for shared D3M code. Implemented features & bug fixes, posted issues in multiple repositories.

Note: this clickable card redirects to https://gitlab.com/datadrivendiscovery.

Pandas

Data processing in Python. Proposed Cython-optimized versions of expanding aggregation functions (which I had implemented locally).

Note: this clickable card redirects to https://github.com/pandas-dev/pandas/issues/12430.

Tensor Flow

Neural networks in Python and C++. Early after the initial release, I posted an issue about a bug/feature request on Github, and asked/resolved several fundamental questions on Stack Overflow.

Note: this clickable card redirects to https://stackoverflow.com/search?q=user%3A2002890+tensorflow.

Cvxpy

Convex optimization in Julia. Proposed a feature to allow matrix versions of the exponential and logarithm operators. Started implementation but did not get around to finishing it.

Note: this clickable card redirects to https://github.com/cvxgrp/cvxpy/issues/278.

Convex.jl

Convex optimization in Julia. Proposed a feature to allow matrix versions of the exponential and logarithm operators. Started implementation but did not get around to finishing it.

Note: this clickable card redirects to https://github.com/JuliaOpt/Convex.jl/issues/138.

Wikitable2CSV

Converts Wikipedia tables into CSV files. Several commits

Note: this clickable card redirects to https://github.com/gambolputty/wikitable2csv/pull/6.

Dask/fastparquet

Python Parquet filetype reader/writer. Found bugs which led to changes in master branch

Note: this clickable card redirects to https://github.com/dask/fastparquet/issues?utf8=%E2%9C%93&q=is%3Aissue+bschreck.

MLBlocks

While I didn't explicitly contribute code, I contributed ideas about architecture and the way the system learns.

Note: this clickable card redirects to https://github.com/HDI-Project/MLBlocks.

HDI Model Provenance Json Metadata

A specification for the construction of a model provenance file that keeps track of the journey from raw data to deployed model in Machine Learning 2.0 projects.

Note: this clickable card redirects to https://github.com/HDI-Project/model-provenance-json.

HDI Trane

Language to express prediction problems. Did not directly contribute code to latest version on Github, but original work was part of my master's thesis and led to a publication. See Projects for more details.

Note: this clickable card redirects to https://github.com/HDI-Project/Trane.

Predicting Malicious Cyber Connections

Applies Featuretools to cybersecurity dataset to build features and predict malicious web traffic in advance.

Note: this clickable card redirects to https://github.com/Featuretools/predict-malicious-cyber-connections.

MOOCdb

Data Science Foundry for MOOCS. Works across educational MOOC providers to generate complex features that can be used, for instance, to predict dropout.

Note: this clickable card redirects to https://github.com/MOOCdb.