Setting up our development environment for local development.
In this lesson, we'll setup the development environment that we'll be using in all of our lessons. We'll have instructions for local laptop While everything will work locally on your laptop.
Cluster
We'll start with defining our cluster, which refers to a group of servers that come together to form one system. Our clusters will have a head node that manages the cluster and it will be connected to a set of worker nodes that will execute workloads for us. These clusters can be fixed in size or autoscale based on our application's compute needs, which makes them highly scalable and performant. We'll create our cluster by defining a compute configuration and an environment.
Environment
We'll start by defining our cluster environment which will specify the software dependencies that we'll need for our workloads.
💻 Local
Your personal laptop will need to have Python installed and we highly recommend using Python 3.10. You can use a tool like pyenv (mac) or pyenv-win (windows) to easily download and switch between Python versions.
pyenv install 3.10.11 # install
pyenv global 3.10.11 # set default
Once we have our Python version, we can create a virtual environment to install our dependencies. We'll download our Python dependencies after we clone our repository from git shortly.
mkdir madewithml
cd madewithml
python3 -m venv venv # create virtual environment
source venv/bin/activate # on Windows: venv\Scripts\activate
python3 -m pip install --upgrade pip setuptools wheel
Compute
Next, we'll define our compute configuration, which will specify our hardware dependencies (head and worker nodes) that we'll need for our workloads.
Your personal laptop (single machine) will act as the cluster, where one CPU will be the head node and some of the remaining CPU will be the worker nodes (no GPUs required). All of the code in this course will work in any personal laptop though it will be slower than executing the same workloads on a larger cluster.
Workspaces
With our compute and environment defined, we're ready to create our cluster workspace. This is where we'll be developing our ML application on top of our compute, environment and storage.
💻 Local
Your personal laptop will need to have an interactive development environment (IDE) installed, such as VS Code. For bash commands in this course, you're welcome to use the terminal on VSCode or a separate one.
Git
With our development workspace all set up, we're ready to start developing. We'll start by following these instructions to create a repository:
- Create a new repository
- name it Made-With-ML
- Toggle Add a README file (very important as this creates a main branch)
- Scroll down and click Create repository
- Now we're ready to clone the Made With ML repository's contents from GitHub inside our madewithml directory.
export GITHUB_USERNAME="YOUR_GITHUB_UESRNAME" # <-- CHANGE THIS to your username
git clone https://github.com/GokuMohandas/Made-With-ML.git .
git remote set-url origin https://github.com/$GITHUB_USERNAME/Made-With-ML.git
git checkout -b dev
export PYTHONPATH=$PYTHONPATH:$PWD # so we can import modules from our scripts
💻 Local
Recall that we created our virtual environment earlier but didn't actually load any Python dependencies yet. We'll clone our repository and then install the packages using the requirements.txt file.
python3 -m pip install -r requirements.txt
Caution: make sure that we're installing our Python packages inside our virtual environment.
Notebook
Now we're ready to launch our Jupyter notebook to interactively develop our ML application.
💻 Local
We already installed jupyter through our requirements.txt file in the previous step, so we can just launch it.
jupyter lab notebooks/madewithml.ipynb
Notebook
Now we're ready to launch our Jupyter notebook to interactively develop our ML application.
💻 Local
We already installed jupyter through our requirements.txt file in the previous step, so we can just launch it.
jupyter lab notebooks/madewithml.ipynb
Ray
We'll be using Ray to scale and productionize our ML application. Ray consists of a core distributed runtime along with libraries for scaling ML workloads and has companies like OpenAI, Spotify, Netflix, Instacart, Doordash + many more using it to develop their ML applications. We're going to start by initializing Ray inside our notebooks:
1, import ray
1
2
3
4, # Initialize Ray
if ray.is_initialized():
ray.shutdown()
ray.init()
We can also view our cluster resources to view the available compute resources:
1, ray.cluster_resources()
💻 Local
If you are running this on a local laptop (no GPU), use the CPU count from ray.cluster_resources() to set your resources. For example if your machine has 10 CPUs:
{'CPU': 10.0,
'object_store_memory': 2147483648.0,
'node:127.0.0.1': 1.0}
Head on over to the next lesson, where we'll motivate the specific application that we're trying to build from a product and systems design perspective. And after that, we're ready to start developing!