^{Last Update: August 6, 2024}

Model Inference

Welcome to the Tromero inference documentation. Inference is the process of using a trained model to make predictions or generate text based on input data. This page will guide you through the process of installing the Tromero library, initializing a client, and making requests to the model. Whether you're building a chatbot, generating text, or simply exploring the capabilities of Tromero, this page will provide you with the foundational knowledge you need to get started. We'll cover the requirements for installation, client initialization, and making requests to the model, with code examples in both Tromero's Python or TypeScript library.

Installing Tromero

To install Tromero, you can use pip for Python, and npm for TypeScript:

pip install tromero

First, users must import the library and initialise a client.

import os
from tromero import Tromero

client = Tromero(tromero_key=os.getenv("TROMERO_KEY"))

If users have a preference over the location of the models, they can specify that in the client:

import os
from tromero import Tromero
client = Tromero(tromero_key=os.getenv("TROMERO_KEY", location_preference = “uk”))

Note: there are different model availability in different regions, so by selecting a region you may be limiting the choice of base models. The client parameter for location takes priority over the settings on the Tromero platform. If nothing is set, the location from the Tromero settings page is choosen.

Following the client initialisation, users can call the model by making request like that:


response = client.chat.completions.create(
    model = "your-model-name",
    messages = [
      { "role": "system", "content": "You are a friendly chatbot." },
      { "role": "user", "content": input },
    ]
)

Tromero supports streaming responses, which allows you to receive and process data incrementally as it's generated.

To enable streaming in your API calls, simply pass the parameter stream = True or stream: true in your request. This tells the API to return the response incrementally, rather than waiting for the complete response to be ready.

Here's an example of how to initiate a streaming request:

import os
from tromero import Tromero

stream = client.chat.completions.create(
  model = "model_name",
  messages = [{"role": "user", "content": "tell me a story" }],
  stream = True,
)

for chunk in stream:
  print(chunk.choices[0].delta.content)