Skip to main content

Google Gemini API

Google uses a different API parameter list than OpenAI.

Google Token Count

Google caches their LLM responses and delivers a handfull of tokens in each sub-reply. This prevents the Synthonnel software from calculating comparable values for some performance metrics.

Tokens per Second and Token Count for Google responses are not directly compared to other Inference Providers.

Time to 1st Token and Total Time will be accurate and are comparable to other Inference Providers.

Parameters

The exact parameter name must be used, followed by the equals sign ( = ), then the value:

maxOutputTokens          = 1024

Whitespace such as tabs and spaces are ignored.

topK = 0.5
temperature = 1.0

Comments have a hash ( # ) at the beginning of the line and are ignored:

#not_used          = null

Definitions

The official Google Gemini documentation has details about the parameters themselves.

https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini#parameters

Unused

  • Some parameters (such as role and parts) are handled by the Synthonnel software.
  • Some parameters (such as inlineData) are complex datatypes and not used by Synthonnel at this time.
  • Some parameters (such as fileUri and mimeType) do not make sense in Synthonnel usage so are not included.

Supported

ParameterData TypeRange
temperaturefloat0.0 - 1.0 (gemini-1.0-pro-001)
temperaturefloat0.0 - 2.0 (gemini-1.0-pro-002)
maxOutputTokensinteger1 to model dependant
topKinteger1 to 40
topPfloat0 to 1
stopSequencesstring or arrayvalid string or array

The Google Gemini API has some configurable guardrail parameters for responses. These parameters have specific inputs.

ParameterAllowed Values
categoryHARM_CATEGORY_SEXUALLY_EXPLICIT
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_DANGEROUS_CONTENT
thresholdBLOCK_NONE
BLOCK_LOW_AND_ABOVE
BLOCK_MED_AND_ABOVE
BLOCK_ONLY_HIGH