Kubernetes is expensive and hard to maintain - why not just deploy to Heroku like thousands of Python devs have before you? A whisk project is pre-configured to deploy the web service to Heroku. The vast majority of ML models do not require a Kubernetes cluster. This makes the developing and debugging cycle very fast. The flask app autoreloads when the source code in your project changes. Use curl to send a request to the Flask app and generate a prediction: To start the web service, type whisk app start: It’s ready to serve predictions for the project’s model. Run the web serviceĪ Flask app is included in the app/ directory of every whisk project. This is easy as a whisk project is structured like the Python packages you already use. To build the package, type whisk package dist: A whisk project contains default setup.py and MANIFEST.in files to get you started. You can distribute the model as a Python package, letting anyone with Python installed on their computer use the model. This reduces the amount of noise in a notebook and makes it easy to use shared code across multiple notebooks. As the Python package is installed locally, you can easily call functions from the project’s src/ directory from the notebook. Notebooks are placed within the notebooks/ directory. The default command name is the whisk project name: Run the notebook This is pre-configured to invoke the model. Whisk projects include click, a Python package for creating beautiful command line interfaces. DVC is core to a great bootstrapping experience: there’s no need to re-train the model when you initially checkout the project (which can take > 20 minutes) as DVC verifies nothing has changed in the training pipeline. This project leverages DVC to store the large training data set, version control the training pipeline, and store the generated model artifacts. Whisk includes DVC, an open-source version control system for machine learning projects. Just run git clone and whisk setup to get started: It’s painful bootstrapping most data science projects, but that’s not the case with this whisk. Real or Not: NLP with Disaster Tweets is an ML project that trains a Tensorflow-backed Keras model to predict whether a tweet is about a real disaster or not. Rather than walking through a trivial example, lets take a look at project built with whisk. You can follow the quick tour in the whisk docs to get orientated on the initial scaffolding. The structure it creates is uniformly applicable across data science projects (well, at least the ones I’ve worked on). whisk doesn’t force you to use a particular ML framework or train a model in a certain way. The initial structure contains an end-to-end machine learning example that saves a model to disk and shows how to later load the model and generate a prediction. Next, use the whisk create command to create a new data science project: Open a terminal and type pip install whisk. Much like django-admin startproject creates the structure for a Django web app, whisk sets up a project directory structure and environment for you. Then, I’ll show how to bootstrap, use, distribute, and deploy an existing DS project that uses whisk. In the spirt of “show, don’t tell” I’m going to show how to create a whisk data science project. Well, when this structure is applied, it lets you do magical things in a pure, battle-hardened Python way. Now, why should you care about structuring your data science project like a software project? You’re not a software engineer after all. It lets you setup your data science projects like a software engineer without studying to be one. It combines a data science-flavored Python project structure - the same project structure that’s been almost unchanged over a decade - with a suite of lightweight tools. whisk is actually kind of boring, but that’s the point. That’s why I’m happy to introduce whisk, an open-source data science project framework that makes collaboration, reproducibility, and deployment “just work”. Starting with a notebook is logical: who knows if this will go anywhere? What I need is something that lets me focus on data science but adds just enough structure to make the project easy to share. Rather than using Python best practices for structuring a project, most data science projects evolve from a notebook. In short time, I’d have an unorganized notebook, duplicate code, plenty of ugly hacks, and a project that’s difficult to share, reproduce, and deploy as an ML model. However, that boundless, explore-at-all-cost drive soon backfired. Like an eager prospector rushing to pan for gold - supplies and dignity be damned - I used to dive into a data science project with just a Jupyter Notebook and a bit of drool collecting in my Civil War era beard.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |