Hugging Face

There are different Hugging Face API interface parameter lists based on the models used. Please refer to the model card for exact parameters. Generally they are similar to the OpenAI interface, and those are shown below.

Hugging Face Free Token Count

Hugging Face caches their LLM responses from their free inference completely and delivers the entire response at once. This prevents the Synthonnel software from calculating comparable values for some performance metrics.

Tokens per Second and Token Count for Hugging Face Free responses are not directly compared to other Inference Providers.

Time to 1st Token will be the time for the entire response to be returned, and is not directly compared to other Inference Providers.

Total Time will be accurate and is comparable to other Inference Providers.

Parameters

The exact parameter name must be used, followed by the equals sign ( = ), then the value:

max_tokens          = 1024

Whitespace such as tabs and spaces are ignored.

frequency_penalty = 0.5
temperature                =            1.0

Comments have a hash ( # ) at the beginning of the line and are ignored:

#not_used          = null

Definitions

The official Hugging Face documentation and specific model cards will have details about the parameters themselves.

https://huggingface.co/docs/api-inference/detailed_parameters

Unused

Some parameters (such as model and messages) are handled by the Synthonnel software.

Supported

Parameter	Data Type	Range
frequency_penalty	float	-2.0 to 2.0
logprobs	boolean	true or false
top_logprobs	integer	0 to 20
max_tokens	integer	1 to model dependant
presence_penalty	float	-2.0 to 2.0
seed	integer	integer max value
stop	string or array	valid string or array
temperature	float	0 to 2
top_p	float	0 to 1

Hugging Face

Hugging Face Free Token Count​

Parameters​

Definitions​

Unused​

Supported​

Hugging Face Free Token Count

Parameters

Definitions

Unused

Supported