2.3 The input and output interfaces
To fit the custom program into our platform, it's nothing more important than the input / output interfaces. The needed custom materials include:
- Dataset
- Training / Validation code
- Model weights (Optional for FL but must required for FV)
Since dataset belongs to Co-PI while code and model weights belongs to PI. If you are both Co-PI and PI, please separates the files into different folders. Here are some examples, note that you do not have the orange part initially, so let's prepare them step-by-step.
Example-1 |
CoPI |---dataset | |---img1.jpg | |---img2.jpg | |---labels.csv |---dataset.zip # from the compressing of dataset
PI |---main.py # custom training / validation code |---requirements.txt # the packages that main.py needed and Flavor |---Dockerfile # script for building an image that contains main.py and has packages in requirements.txt installed |
Example-2 |
CoPI |---data.csv |---data.csv.tar # from the compressing of data.csv PI |---main.py # custom training / validation code |---requirements.txt # the packages that main.py needed and Flavor |---Dockerfile # script for building an image that contains main.py and has packages in requirements.txt installed |
Dataset
Compress dataset into a file (.zip, .tar or .tar.gz)
- The dataset can be any format or even multiple files, but it should be a compressed file in ".zip" or ".tar" or ".tar.gz".
- The platform will decompress the file according to its extension by a corresponding command under a folder called INPUT_PATH which will be introduced later. For instance,
- In Example-1, after CoPI upload "dataset.zip", the system will do "cd $INPUT_PATH && unzip dataset.zip". So the final path will be "$INPUT_PATH/dataset/*.jpg".
- In Example-2, after CoPI upload "data.csv.tar", the system will do "cd $INPUT_PATH && tar -xf dataset.zip". So the final path will be "$INPUT_PATH/data.csv".
- The extracted file structure is possible to be different if we use different decompressing tools. Note that here we use Linux commands to extract. Please confirm the decompressed file structure in advance. For example, a file "foo.csv" is compressed into "foo.zip", then:
- "double clicking foo.zip in MacOS" might generate "./foo/foo.csv".
- "unzip foo.zip in Ubuntu" will generate "./foo.csv".
Code
1. Convert .ipynb to .py
If your code is in Jupyter Notebook format ".ipynb", please follow the below instructions to convert it into python script format ".py". Although ".ipynb" is friendly for developing due to its storage of both input code and output result, python scripts ".py" that stores input code only are more widely used for deployment because of its lightness and stability. Please choose one of the following method to convert the code:
- Method 1 - Copy paste directly
- Create an empty text. e.g. xxx.txt
- Rename its extension as "py". e.g. xxx.py
- Copy code in each block from ".ipynb" to ".py" sequentially.
- Method 2 - Use commands (Recommended)
pip install jupyter nbconvert
jupyter nbconvert --to script xxx.ipynb
2. Standard machine processes
Please ensure that your training or validation code should be conformed with standard machine learning processes (or it should be rearranged first) and make sure it runs properly.
- Training code (FL):
- Load data
- Initialize a model
- (Optional) Load pre-trained weights into the model
- Training # Please make sure the result is converged by certain epochs
- Validation code (FV):
- Load data
- Initialize a model
- Load pre-trained weights into the model
- Validating
3. Models inputs and outputs
There is NO limitation of model inputs and outputs. The only required interface is about the code, see the next item.
4. Input interfaces and output Interfaces (IMPORTANT)
There are three things we need to do
- Modify code to fit input interface
- Modify code to fit output interface
- Pack the modified code into a docker image
Here we provide a library called FLaVor to complete these requirements.
For FL, please follow 3.1 An overview of FLaVor FL to complete the code interface.
For FV, please follow 4.1 An overview of FLaVor FV to complete the code interface.
Weights
- Pytorch: xxx.ckpt
- type(model): torch.nn.Module
- {"state_dict": model.state_dict()}
- torch.save
- Tensorflow: xxx.ckpt
- type(model): tensorflow.keras.models.Model
- {"state_dict": {str(key): value for key, value in enumerate(model.get_weights())}}
- pickle.dump
- XGBoost: xxx.json
- type(model): xgboost.Booster
- model
- xgboost.save_model
Note that if you have multiple models weights, you can upload several weights into a project and choose one for a plan.