Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
model name
"meta-llama/Llama-2-70b-chat-hf"
input prompt - a single string is currently supported
The maximum number of tokens to generate in the completion.
The total length of input tokens and generated tokens is limited by the model's context length.If explicitly set to None it will be the model's max context length minus input length or 16384, whichever is smaller.
0 < x <= 1000000What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic
0 <= x <= 2An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
x <= 1Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.
0 <= x <= 1Sample from the best k (number of) tokens. 0 means off
x >= 0number of sequences to return
1 <= x <= 4whether to stream the output via SSE or return the full response
return top tokens and their log-probabilities
return prompt as part of the respons
up to 16 sequences where the API will stop generating further tokens
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
-2 <= x <= 2Positive values penalize new tokens based on how many times they appear in the text so far, increasing the model's likelihood to talk about new topics.
-2 <= x <= 2The format of the response. Currently, only json is supported.
Alternative penalty for repetition, but multiplicative instead of additive (> 1 penalize, < 1 encourage)
0.01 <= x <= 5A unique identifier representing your end-user, which can help monitor and detect abuse. Avoid sending us any identifying information. We recommend hashing user identifiers.
Seed for random number generator. If not provided, a random seed is used. Determinism is not guaranteed.
-9223372036854776000 <= x < 18446744073709552000streaming options
List of token IDs that will stop generation when encountered
return tokens as token ids
A key to identify prompt cache for reuse across requests. If provided, the prompt will be cached and can be reused in subsequent requests with the same key.
Optional multi-modal data to pass alongside the prompt. Only supported for a small number of non-chat-native vision models. Images must be base64 data URIs (e.g. 'data:image/png;base64,...').
Successful Response