RegressionTrainingConfig

Training config for the PREDICTIVE_MODELING problem type

KEY	TYPE	Description
CUSTOM_METRICS	List[str]	Registered custom metrics available for selection.
BATCH_SIZE	BatchSize	Batch size.
OBJECTIVE	RegressionObjective	Ranking scheme used to select final best model.
MONOTONICALLY_DECREASING_FEATURES	List[str]	Constrain the model such that it behaves as if the target feature is monotonically decreasing with the selected features
DROPOUT_RATE	int	Dropout percentage rate.
K_FOLD_CROSS_VALIDATION	bool	Use this to force k-fold cross validation bagging on or off.
TEST_ROW_INDICATOR	str	Column indicating which rows to use for training (TRAIN) and testing (TEST). Validation (VAL) can also be specified.
TRAINING_ROWS_DOWNSAMPLE_RATIO	float	Uses this ratio to train on a sample of the dataset provided.
CUSTOM_LOSS_FUNCTIONS	List[str]	Registered custom losses available for selection.
SAMPLING_UNIT_KEYS	List[str]	Constrain train/test separation to partition a column.
PRETRAINED_LLM_NAME	str	Enable algorithms which process text using pretrained large language models.
IS_MULTILINGUAL	bool	Enable algorithms which process text using pretrained multilingual NLP models.
DATA_SPLIT_FEATURE_GROUP_TABLE_NAME	str	Specify the table name of the feature group to export training data with the fold column.
ACTIVE_LABELS_COLUMN	str	Specify a column to use as the active columns in a multi label setting.
AUGMENTATION_STRATEGY	RegressionAugmentationStrategy	Strategy to deal with class imbalance and data augmentation.
LOSS_PARAMETERS	str	Loss function params in format =;=;.....
RARE_CLASS_AUGMENTATION_THRESHOLD	float	Augments any rare class whose relative frequency with respect to the most frequent class is less than this threshold. Default = 0.1 for classification problems with rare classes.
MIN_CATEGORICAL_COUNT	int	Minimum threshold to consider a value different from the unknown placeholder.
NUM_CV_FOLDS	int	Specify the value of k in k-fold cross validation.
LOSS_FUNCTION	RegressionLossFunction	Loss function to be used as objective for model training.
MONOTONICALLY_INCREASING_FEATURES	List[str]	Constrain the model such that it behaves as if the target feature is monotonically increasing with the selected features
MAX_TOKENS_IN_SENTENCE	int	Specify the max tokens to be kept in a sentence based on the truncation strategy.
DISABLE_TEST_VAL_FOLD	bool	Do not create a TEST_VAL set. All records which would be part of the TEST_VAL fold otherwise, remain in the TEST fold.
TYPE_OF_SPLIT	RegressionTypeOfSplit	Type of data splitting into train/test (validation also).
REBALANCE_CLASSES	bool	Class weights are computed as the inverse of the class frequency from the training dataset when this option is selected as "Yes". It is useful when the classes in the dataset are unbalanced. Re-balancing classes generally boosts recall at the cost of precision on rare classes.
SORT_OBJECTIVE	RegressionObjective	Ranking scheme used to sort models on the metrics page.
SAMPLE_WEIGHT	str	Specify a column to use as the weight of a sample for training and eval.
PERFORM_FEATURE_SELECTION	bool	If enabled, additional algorithms which support feature selection as a pretraining step will be trained separately with the selected subset of features. The details about their selected features can be found in their respective logs.
TEST_SPLIT	int	Percent of dataset to use for test data. We support using a range between 5% to 20% of your dataset to use as test data.
TREE_HPO_MODE	None	(RegressionTreeHPOMode): Turning off Rapid Experimentation will take longer to train.
TEST_SPLITTING_TIMESTAMP	str	Rows with timestamp greater than this will be considered to be in the test set.
FULL_DATA_RETRAINING	bool	Train models separately with all the data.
NUMERIC_CLIPPING_PERCENTILE	float	Uses this option to clip the top and bottom x percentile of numeric feature columns where x is the value of this option.
DO_MASKED_LANGUAGE_MODEL_PRETRAINING	bool	Specify whether to run a masked language model unsupervised pretraining step before supervized training in certain supported algorithms which use BERT-like backbones.
IGNORE_DATETIME_FEATURES	bool	Remove all datetime features from the model. Useful while generalizing to different time periods.
PARTIAL_DEPENDENCE_ANALYSIS	PartialDependenceAnalysis	Specify whether to run partial dependence plots for all features or only some features.
DROP_ORIGINAL_CATEGORICALS	bool	This option helps us choose whether to also feed the original label encoded categorical columns to the mdoels along with their target encoded versions.
MAX_TEXT_WORDS	int	Maximum number of words to use from text fields.
PRETRAINED_MODEL_NAME	str	Enable algorithms which process text using pretrained multilingual NLP models.
FEATURE_SELECTION_INTENSITY	int	This determines the strictness with which features will be filtered out. 1 being very lenient (more features kept), 100 being very strict.
TIMESTAMP_BASED_SPLITTING_METHOD	RegressionTimeSplitMethod	Method of selecting TEST set, top percentile wise or after a given timestamp.
TRUNCATION_STRATEGY	str	What strategy to use to deal with text rows with more than a given number of tokens (if num of tokens is more than "max_tokens_in_sentence").
TARGET_TRANSFORM	RegressionTargetTransform	Specify a transform (e.g. log, quantile) to apply to the target variable.
TARGET_ENCODE_CATEGORICALS	bool	Use this to turn target encoding on categorical features on or off.
TIMESTAMP_BASED_SPLITTING_COLUMN	str	Timestamp column selected for splitting into test and train.