Setup

Setting up our development environment for local development.

In this lesson, we'll setup the development environment that we'll be using in all of our lessons. We'll have instructions for local laptop While everything will work locally on your laptop.

Cluster

We'll start with defining our cluster, which refers to a group of servers that come together to form one system. Our clusters will have a head node that manages the cluster and it will be connected to a set of worker nodes that will execute workloads for us. These clusters can be fixed in size or autoscale based on our application's compute needs, which makes them highly scalable and performant. We'll create our cluster by defining a compute configuration and an environment.

Environment

We'll start by defining our cluster environment which will specify the software dependencies that we'll need for our workloads.

💻 Local

Your personal laptop will need to have Python installed and we highly recommend using Python 3.10. You can use a tool like pyenv (mac) or pyenv-win (windows) to easily download and switch between Python versions.

pyenv install 3.10.11 # install
pyenv global 3.10.11 # set default

Once we have our Python version, we can create a virtual environment to install our dependencies. We'll download our Python dependencies after we clone our repository from git shortly.

mkdir madewithml
cd madewithml
python3 -m venv venv # create virtual environment
source venv/bin/activate # on Windows: venv\Scripts\activate
python3 -m pip install --upgrade pip setuptools wheel

Compute

Next, we'll define our compute configuration, which will specify our hardware dependencies (head and worker nodes) that we'll need for our workloads.

💻 Local

Your personal laptop (single machine) will act as the cluster, where one CPU will be the head node and some of the remaining CPU will be the worker nodes (no GPUs required). All of the code in this course will work in any personal laptop though it will be slower than executing the same workloads on a larger cluster.

Workspaces

With our compute and environment defined, we're ready to create our cluster workspace. This is where we'll be developing our ML application on top of our compute, environment and storage.

💻 Local

Your personal laptop will need to have an interactive development environment (IDE) installed, such as VS Code. For bash commands in this course, you're welcome to use the terminal on VSCode or a separate one.

Git

With our development workspace all set up, we're ready to start developing. We'll start by following these instructions to create a repository:

Create a new repository
name it Made-With-ML
Toggle Add a README file (very important as this creates a main branch)
Scroll down and click Create repository
Now we're ready to clone the Made With ML repository's contents from GitHub inside our madewithml directory.

export GITHUB_USERNAME="YOUR_GITHUB_UESRNAME" # <-- CHANGE THIS to your username
git clone https://github.com/GokuMohandas/Made-With-ML.git .
git remote set-url origin https://github.com/$GITHUB_USERNAME/Made-With-ML.git
git checkout -b dev
export PYTHONPATH=$PYTHONPATH:$PWD # so we can import modules from our scripts

💻 Local

Recall that we created our virtual environment earlier but didn't actually load any Python dependencies yet. We'll clone our repository and then install the packages using the requirements.txt file.

python3 -m pip install -r requirements.txt

Caution: make sure that we're installing our Python packages inside our virtual environment.

Notebook

Now we're ready to launch our Jupyter notebook to interactively develop our ML application.

💻 Local

We already installed jupyter through our requirements.txt file in the previous step, so we can just launch it.

jupyter lab notebooks/madewithml.ipynb

Notebook

Now we're ready to launch our Jupyter notebook to interactively develop our ML application.

💻 Local

We already installed jupyter through our requirements.txt file in the previous step, so we can just launch it.

jupyter lab notebooks/madewithml.ipynb

Ray

We'll be using Ray to scale and productionize our ML application. Ray consists of a core distributed runtime along with libraries for scaling ML workloads and has companies like OpenAI, Spotify, Netflix, Instacart, Doordash + many more using it to develop their ML applications. We're going to start by initializing Ray inside our notebooks:

1, import ray

1
2
3
4, # Initialize Ray
if ray.is_initialized():
ray.shutdown()
ray.init()

We can also view our cluster resources to view the available compute resources:

1, ray.cluster_resources()

💻 Local

If you are running this on a local laptop (no GPU), use the CPU count from ray.cluster_resources() to set your resources. For example if your machine has 10 CPUs:

{'CPU': 10.0,
'object_store_memory': 2147483648.0,
'node:127.0.0.1': 1.0}

Head on over to the next lesson, where we'll motivate the specific application that we're trying to build from a product and systems design perspective. And after that, we're ready to start developing!

Learn by bytes

Search Suggest

Setup

Post a Comment

learnbybytes