Hugging Face
There are different Hugging Face API interface parameter lists based on the models used. Please refer to the model card for exact parameters. Generally they are similar to the OpenAI interface, and those are shown below.
Hugging Face Free Token Count
Hugging Face caches their LLM responses from their free inference completely and delivers the entire response at once. This prevents the Synthonnel software from calculating comparable values for some performance metrics.
Tokens per Second
and Token Count
for Hugging Face Free responses are not directly compared to other Inference Providers.
Time to 1st Token
will be the time for the entire response to be returned, and is not directly compared to other Inference Providers.
Total Time
will be accurate and is comparable to other Inference Providers.
Parameters
The exact parameter name must be used, followed by the equals sign ( =
), then the value:
max_tokens = 1024
Whitespace such as tabs and spaces are ignored.
frequency_penalty = 0.5
temperature = 1.0
Comments have a hash ( #
) at the beginning of the line and are ignored:
#not_used = null
Definitions
The official Hugging Face documentation and specific model cards will have details about the parameters themselves.
https://huggingface.co/docs/api-inference/detailed_parameters
Unused
- Some parameters (such as
model
andmessages
) are handled by the Synthonnel software.
Supported
Parameter | Data Type | Range |
---|---|---|
frequency_penalty | float | -2.0 to 2.0 |
logprobs | boolean | true or false |
top_logprobs | integer | 0 to 20 |
max_tokens | integer | 1 to model dependant |
presence_penalty | float | -2.0 to 2.0 |
seed | integer | integer max value |
stop | string or array | valid string or array |
temperature | float | 0 to 2 |
top_p | float | 0 to 1 |