DataGenerationConfig

Generate synthetic data using a model for finetuning an LLM.

KEY TYPE Description
descriptionCol str Name of the description column.
verifyResponse bool Whether to verify the response.
examplesPerTarget int Number of examples per target.
idCol str Name of the identifier column.
subsetSize Optional[int] Size of the subset to use for generation.
tokenBudget int Token budget for generation.
frequencyPenalty float Penalty for frequency of token appearance.
documentationCharLimit int Character limit for documentation.
concurrency int Number of concurrent processes.
model str Model to use for data generation.
generationInstructions str Instructions for the data generation model.
temperature float Sampling temperature for the model.
fewshotExamples int Number of fewshot examples used to prompt the model.
completionCol str Name of the output completion column.
promptCol str Name of the input prompt column.
seed Optional[int] Seed for random number generation.
oversample bool Whether to oversample the data.