Skip to main content

Hugging Face

There are different Hugging Face API interface parameter lists based on the models used. Please refer to the model card for exact parameters. Generally they are similar to the OpenAI interface, and those are shown below.

Hugging Face Free Token Count

Hugging Face caches their LLM responses from their free inference completely and delivers the entire response at once. This prevents the Synthonnel software from calculating comparable values for some performance metrics.

Tokens per Second and Token Count for Hugging Face Free responses are not directly compared to other Inference Providers.

Time to 1st Token will be the time for the entire response to be returned, and is not directly compared to other Inference Providers.

Total Time will be accurate and is comparable to other Inference Providers.

Parameters

The exact parameter name must be used, followed by the equals sign ( = ), then the value:

max_tokens          = 1024

Whitespace such as tabs and spaces are ignored.

frequency_penalty = 0.5
temperature = 1.0

Comments have a hash ( # ) at the beginning of the line and are ignored:

#not_used          = null

Definitions

The official Hugging Face documentation and specific model cards will have details about the parameters themselves.

https://huggingface.co/docs/api-inference/detailed_parameters

Unused

  • Some parameters (such as model and messages) are handled by the Synthonnel software.

Supported

ParameterData TypeRange
frequency_penaltyfloat-2.0 to 2.0
logprobsbooleantrue or false
top_logprobsinteger0 to 20
max_tokensinteger1 to model dependant
presence_penaltyfloat-2.0 to 2.0
seedintegerinteger max value
stopstring or arrayvalid string or array
temperaturefloat0 to 2
top_pfloat0 to 1