Note: The machine learning functions are not optimized for distributed
processing. The capability to train large data sets is limited by this
execution of the final training on a single instance.
Feature vector
To solve a problem with the machine learning technique, especially as a supervised learning problem, it is necessary to represent the data set with the sequence of pairs of labels and feature vector. A label is a target value you want to predict from the unseen feature and a feature is a A N-dimensional vector whose elements are numerical values. In Trino, a feature vector is represented as a map-type value, whose key is an index of each feature, so that it can express a sparse vector. Since classifiers and regressors can recognize the map-type feature vector, there is a function to construct the feature from the existing numerical values,features
:
Features |
---|
{0=1.0, 1=2.0, 2=3.0} |
features
can be directly passed to ML functions.
Classification
Classification is a type of supervised learning problem to predict the distinct label from the given feature vector. The interface looks similar to the construction of the SVM model from the sequence of pairs of labels and features implemented in Teradata Aster or BigQuery ML. The function to train a classification model looks like as follows:classify
returns the predicted label by using the trained model. The trained model can not be saved natively,
and needs to be passed in the format of a nested query:
learn_libsvm_classifier
to control the
internal parameters of the model.
Regression
Regression is another type of supervised learning problem, predicting continuous value, unlike the classification problem. The target must be numerical values that can be described asdouble
.
The following code shows the creation of the model predicting
sepal_length
from the other 3 features:
learn_libsvm_regressor
provides you a
way to control the training process.
Machine learning functions {#machine-learning-functions-1}
features()
features(double, ...)
→ map(bigint, double)Returns the map representing the feature vector.
learn_classifier()
learn_classifier(label, features)
→ Classifier
Returns an SVM-based classifier model, trained with the given label and
feature data sets.
learn_libsvm_classifier()
learn_libsvm_classifier(label, features, params)
→ Classifier
Returns an SVM-based classifier model, trained with the given label and
feature data sets. You can control the training process by libsvm
parameters.
classify()
classify(features, model)
→ label
Returns a label predicted by the given classifier SVM model.
learn_regressor()
learn_regressor(target, features)
→ Regressor
Returns an SVM-based regressor model, trained with the given target and
feature data sets.
learn_libsvm_regressor()
learn_libsvm_regressor(target, features, params)
→ Regressor
Returns an SVM-based regressor model, trained with the given target and
feature data sets. You can control the training process by libsvm
parameters.
regress()
regress(features, model)
→ target
Returns a predicted target value by the given regressor SVM model.