Last Update: August 22, 2024

Generation Parameters & Their Usage

When using Tromero’s inference server, various parameters can be passed to customise the behavior and output of the AI models. This section covers how to pass generation paramethers at runtime, as well as a comprehensive list of the generation parameters one can pass , along with their descriptions.

Usage

The generation parameters can be passed as keywords to the create function when calling the inference server through the Python and TypeScript libraries, or included inside the parameters object in the JSON body for HTTP calls.

It can be used in the following way:

import os
from tromero import Tromero

client = Tromero(tromero_key="your-tromero-key")

response = client.chat.completions.create(
model="your-model-name",
messages=[{"role": "user", "content": prompt}],
temperature=0.7,
top_p=0.9,
n=3
)

This example passes a temperature, top_p & n paramether.

Generation Parameters

Bellow is the list of all generation paramethers taken by Tromero's inference server followed by their descriptions.

Model Parameters

  • model (string): Name of the model to use.
  • messages (Message[]): A list of messages comprising the conversation so far.

Output Control Parameters

  • stream (boolean | null): If set, partial message deltas will be sent as they become available.
  • n (number): Number of output sequences to return for the given prompt.
  • best_of (number): Number of output sequences generated from the prompt. The top n sequences are returned. Default is equal to n.

Penalties and Randomness

  • presence_penalty (number): Penalizes new tokens based on their presence in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage repetition.
  • frequency_penalty (number): Penalizes new tokens based on their frequency in the generated text. Values > 0 encourage new tokens, while values < 0 encourage repetition.
  • repetition_penalty (number): Penalizes new tokens based on whether they appear in the prompt and generated text. Values > 1 encourage new tokens, while values < 1 encourage repetition.
  • temperature (number): Controls the randomness of sampling. Lower values make the model more deterministic, higher values make it more random.
  • top_p (number): Controls the cumulative probability of top tokens to consider. Set to 1 to consider all tokens.
  • top_k (number): Controls the number of top tokens to consider. Set to -1 to consider all tokens.
  • min_p (number): Represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Set to 0 to disable.

Generation Control Parameters

  • seed (number): Random seed for the generation.
  • use_beam_search (boolean): Whether to use beam search instead of sampling.
  • length_penalty (number): Penalizes sequences based on their length when using beam search.
  • early_stopping (boolean): Controls the stopping condition for beam search.

Stopping Conditions

  • stop (string[]): List of strings that stop the generation when generated.
  • stop_token_ids (number[]): List of tokens that stop the generation when generated. The output includes the stop tokens unless they are special tokens.
  • include_stop_str_in_output (boolean): Whether to include the stop strings in output text. Defaults to False.
  • ignore_eos (boolean): Whether to ignore the EOS token and continue generating tokens after it is generated.

Token Limits

  • max_tokens (number): Maximum number of tokens to generate per output sequence.
  • min_tokens (number): Minimum number of tokens to generate per output sequence before EOS or stop tokens are generated.

Output Formatting

  • tools (dict[]): List of tools that a model will use, similar to OpenAI's tools parameter. Currently, multiple tools are not supported. The model will ALWAYS use the provided tool if this parameter is set.
  • output_format (dict): Defines the format of the model's output. Supported values are:
    • {"type": "json_object"}: The output will be a JSON object.
    • {"type": "text"}: The output will be plain text.
  • guided_schema (object): A JSON schema that constrains the model's output to generate a valid JSON according to the provided schema. This ensures the output adheres to a specific structure and format.
  • guided_regex (string): A valid regular expression that constrains the model's output to match a specific format. This helps in generating text that follows a particular pattern.

For further assistance, please contact support@tromero.ai.

Was this page helpful?