prompt_style
is an attribute used in the Axolotl codebase to specify the style or format of prompts that are generated for interacting with language models. It determines how the prompts are structured, which can significantly impact the model's performance and the relevance of its responses. The prompt_style
attribute is often used in classes that generate or manage prompts, allowing for customization of the interaction based on the specific needs of the task or the preferences for the model's output format.
Different values of prompt_style
correspond to different formatting strategies for the prompts. For example, in the Axolotl codebase, the PromptStyle
enumeration defines several styles such as INSTRUCT
, CHAT
, and CHATML
, each representing a unique way of structuring prompts:
class PromptStyle(Enum): """ Enum for prompt styles """ INSTRUCT = "instruct" CHAT = "chat" CHATML = "chatml"
These styles can be used to tailor the prompts for different types of interactions, such as instructional prompts, chat-like conversations, or machine learning-specific formats. Depending on the selected prompt_style
, the generated prompts will follow a specific template or structure, influencing how the language model interprets the input and generates its responses.
For instance, when a Prompter
class instance sets its prompt_style
to INSTRUCT
, it might generate prompts that are more directive or instructional in nature, guiding the model to perform a specific task. Conversely, setting prompt_style
to CHAT
might result in prompts that mimic a conversational exchange between a user and the model.
Here's an example of how a Prompter
class might use prompt_style
to generate different types of prompts:
class ReflectAlpacaPrompter: def __init__(self, prompt_style="instruct"): self.prompt_style = prompt_style self.match_prompt_style() def match_prompt_style(self): if self.prompt_style == PromptStyle.INSTRUCT.value: # Set up prompt template for instructional style ... elif self.prompt_style == PromptStyle.CHAT.value: # Set up prompt template for chat style ...
In this example, the ReflectAlpacaPrompter
class initializes with a prompt_style
and uses the match_prompt_style
method to configure its prompt generation strategy based on the selected style.
class PromptStyle(Enum): """ Enum for prompt styles """ INSTRUCT = "instruct" CHAT = "chat" CHATML = "chatml"
def match_prompt_style(self): self.turn_format = turn_format self.turn_no_input_format = turn_no_input_format self.system_format = system_format
def match_prompt_style(self): self.turn_format = "USER: {instruction}\n{input}\nASSISTANT:" self.turn_no_input_format = "USER: {instruction}\nASSISTANT:"
def match_prompt_style(self): # pylint: disable=duplicate-code self.turn_format = "{instruction}\n{input}" self.turn_no_input_format = "{instruction}" self.system_format = "{system}"
def match_prompt_style(self): if self.prompt_style == PromptStyle.INSTRUCT.value: self.prompt_input = ( self.system_prompt + "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n" ) self.prompt_no_input = ( self.system_no_input_prompt + "### Instruction:\n{instruction}\n\n### Response:\n" ) self.agent_label = "### Thought:\n{output}\n\n### Agent Reflection:\n{reflection}\n\n### Final Response:\n{corrected}" self.response_split = "### Final Response:" if self.prompt_style == PromptStyle.CHAT.value: self.prompt_input = ( self.system_prompt + "USER: {instruction}\n{input}\nASSISTANT:" ) self.prompt_no_input = ( self.system_no_input_prompt + "USER: {instruction}\nASSISTANT:" ) self.agent_label = ( "\nTHOUGHT: {output}\nASSISTANT REFLECTION: {reflection}\nASSISTANT:" ) self.response_split = "ASSISTANT:"
def match_prompt_style(self): # pylint: disable=duplicate-code if self.prompt_style == PromptStyle.INSTRUCT.value: self.turn_format = "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n" self.turn_no_input_format = ( "### Instruction:\n{instruction}\n\n### Response:\n" ) self.system_format = "{system}\n\n" if self.prompt_style == PromptStyle.CHAT.value: self.turn_format = "USER: {instruction}\n{input}\nASSISTANT:" self.turn_no_input_format = "USER: {instruction}\nASSISTANT:" self.system_format = "SYSTEM: {system}\n" if self.prompt_style == PromptStyle.CHATML.value: self.turn_format = "<|im_start|>user\n{instruction}\n{input}<|im_end|>\n<|im_start|>assistant\n" self.turn_no_input_format = ( "<|im_start|>user\n{instruction}<|im_end|>\n<|im_start|>assistant\n" ) self.system_format = "<|im_start|>system\n{system}<|im_end|>\n"
def match_prompt_style(self): self.turn_format = "USER: Summarize the following article as a TL;DR.\n{instruction}\n{input}\nASSISTANT:" self.turn_no_input_format = "USER: Summarize the following article as a TL;DR.\n{instruction}\nASSISTANT:"
def __init__(self, prompt_style="instruct"): self.prompt_style = prompt_style self.match_prompt_style()
def __init__(self, prompt_style=PromptStyle.INSTRUCT.value): self.prompt_style = prompt_style if prompt_style else PromptStyle.INSTRUCT.value self.match_prompt_style()
def match_prompt_style(self): # pylint: disable=duplicate-code if self.prompt_style == PromptStyle.INSTRUCT.value: self.turn_format = "### Human:\n{instruction}\n### Additional Context:\n{input}\n### Assistant:\n" self.turn_no_input_format = "### Human:\n{instruction}\n### Assistant:\n" self.system_format = "### System:\n{system}\n" if self.prompt_style == PromptStyle.CHAT.value: self.turn_format = "USER: {instruction}\n{input}\nASSISTANT:" self.turn_no_input_format = "USER: {instruction}\nASSISTANT:" self.system_format = "SYSTEM: {system}\n" if self.prompt_style == PromptStyle.CHATML.value: self.turn_format = "<|im_start|>user\n{instruction}\n{input}<|im_end|>\n<|im_start|>assistant\n" self.turn_no_input_format = ( "<|im_start|>user\n{instruction}<|im_end|>\n<|im_start|>assistant\n" ) self.system_format = "<|im_start|>system\n{system}<|im_end|>\n"
Prompt tuning was developed for text classification tasks on T5 models, and all downstream tasks are cast as a text generation task. For example, sequence classification usually assigns a single class label to a sequence of text. By casting it as a text generation task, the tokens that make up the class label are generated. Prompts are added to the input as a series of tokens. Typically, the model parameters are fixed which means the prompt tokens are also fixed by the model parameters.
The key idea behind prompt tuning is that prompt tokens have their own parameters that are updated independently. This means you can keep the pretrained model's parameters frozen, and only update the gradients of the prompt token embeddings. The results are comparable to the traditional method of training the entire model, and prompt tuning performance scales as model size increases.
Take a look at Prompt tuning for causal language modeling for a step-by-step guide on how to train a model with prompt tuning.
DEFAULT_PROMPTS_REPO = "huggingface-tools/default-prompts" PROMPT_FILES = {"chat": "chat_prompt_template.txt", "run": "run_prompt_template.txt"}
Prompt tuning adds task-specific prompts to the input, and these prompt parameters are updated independently of the pretrained model parameters which are frozen.
The abstract from the paper is:
In this work, we explore "prompt tuning", a simple yet effective mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's "few-shot" learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method "closes the gap" and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed "prefix tuning" of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.
[[autodoc]] tuners.prompt_tuning.config.PromptTuningConfig
[[autodoc]] tuners.prompt_tuning.model.PromptEmbedding
def prompt_process(sent_1, sent_2, prompt_1="", prompt_2="", prompt_3=""): sent_2 = sent_2.replace("####", "The final answer is") return prompt_1 + sent_1 + prompt_2 + sent_2 + prompt_3
P-tuning is designed for natural language understanding (NLU) tasks and all language models. It is another variation of a soft prompt method; P-tuning also adds a trainable embedding tensor that can be optimized to find better prompts, and it uses a prompt encoder (a bidirectional long-short term memory network or LSTM) to optimize the prompt parameters. Unlike prefix tuning though:
The results suggest that P-tuning is more efficient than manually crafting prompts, and it enables GPT-like models to compete with BERT-like models on NLU tasks.
Take a look at P-tuning for sequence classification for a step-by-step guide on how to train a model with P-tuning.
To give the user maximum flexibility, the whole prompt template as explained in above
can be overwritten by the user. In this case make sure that your custom prompt includes an introduction section,
a tool section, an example section, and an unfinished example section. If you want to overwrite the run
prompt template,
you can do as follows:
<Tip warning={true}>template = """ [...] """ agent = HfAgent(your_endpoint, run_prompt_template=template)
Please make sure to have the <<all_tools>>
string and the <<prompt>>
defined somewhere in the template
so that the agent can be aware
of the tools, it has available to it as well as correctly insert the user's prompt.
Similarly, one can overwrite the chat
prompt template. Note that the chat
mode always uses the following format for the exchanges:
Human: <<task>> Assistant:
Therefore it is important that the examples of the custom chat
prompt template also make use of this format.
You can overwrite the chat
template at instantiation as follows.
<Tip warning={true}>template = """ [...] """ agent = HfAgent(url_endpoint=your_endpoint, chat_prompt_template=template)
Please make sure to have the <<all_tools>>
string defined somewhere in the template
so that the agent can be aware
of the tools, it has available to it.
In both cases, you can pass a repo ID instead of the prompt template if you would like to use a template hosted by someone in the community. The default prompts live in this repo as an example.
To upload your custom prompt on a repo on the Hub and share it with the community just make sure:
run
command in a file named run_prompt_template.txt
chat
command in a file named chat_prompt_template.txt
Training large pretrained language models is very time-consuming and compute-intensive. As they continue to grow in size, there is increasing interest in more efficient training methods such as prompting. Prompting primes a frozen pretrained model for a specific downstream task by including a text prompt that describes the task or even demonstrates an example of the task. With prompting, you can avoid fully training a separate model for each downstream task, and use the same frozen pretrained model instead. This is a lot easier because you can use the same model for several different tasks, and it is significantly more efficient to train and store a smaller set of prompt parameters than to train all the model's parameters.
There are two categories of prompting methods:
This conceptual guide provides a brief overview of the soft prompt methods included in 🤗 PEFT: prompt tuning, prefix tuning, P-tuning, and multitask prompt tuning.
Prompt tuning was developed for text classification tasks on T5 models, and all downstream tasks are cast as a text generation task. For example, sequence classification usually assigns a single class label to a sequence of text. By casting it as a text generation task, the tokens that make up the class label are generated. Prompts are added to the input as a series of tokens. Typically, the model parameters are fixed which means the prompt tokens are also fixed by the model parameters.
The key idea behind prompt tuning is that prompt tokens have their own parameters that are updated independently. This means you can keep the pretrained model's parameters frozen, and only update the gradients of the prompt token embeddings. The results are comparable to the traditional method of training the entire model, and prompt tuning performance scales as model size increases.
Take a look at Prompt tuning for causal language modeling for a step-by-step guide on how to train a model with prompt tuning.
Prefix tuning was designed for natural language generation (NLG) tasks on GPT models. It is very similar to prompt tuning; prefix tuning also prepends a sequence of task-specific vectors to the input that can be trained and updated while keeping the rest of the pretrained model's parameters frozen.
The main difference is that the prefix parameters are inserted in all of the model layers, whereas prompt tuning only adds the prompt parameters to the model input embeddings. The prefix parameters are also optimized by a separate feed-forward network (FFN) instead of training directly on the soft prompts because it causes instability and hurts performance. The FFN is discarded after updating the soft prompts.
As a result, the authors found that prefix tuning demonstrates comparable performance to fully finetuning a model, despite having 1000x fewer parameters, and it performs even better in low-data settings.
Take a look at Prefix tuning for conditional generation for a step-by-step guide on how to train a model with prefix tuning.
P-tuning is designed for natural language understanding (NLU) tasks and all language models. It is another variation of a soft prompt method; P-tuning also adds a trainable embedding tensor that can be optimized to find better prompts, and it uses a prompt encoder (a bidirectional long-short term memory network or LSTM) to optimize the prompt parameters. Unlike prefix tuning though:
The results suggest that P-tuning is more efficient than manually crafting prompts, and it enables GPT-like models to compete with BERT-like models on NLU tasks.
Take a look at P-tuning for sequence classification for a step-by-step guide on how to train a model with P-tuning.
Multitask prompt tuning (MPT) learns a single prompt from data for multiple task types that can be shared for different target tasks. Other existing approaches learn a separate soft prompt for each task that need to be retrieved or aggregated for adaptation to target tasks. MPT consists of two stages:
P-tuning adds trainable prompt embeddings to the input that is optimized by a prompt encoder to find a better prompt, eliminating the need to manually design prompts. The prompt tokens can be added anywhere in the input sequence, and p-tuning also introduces anchor tokens for improving performance.
The abstract from the paper is:
While GPTs with traditional fine-tuning fail to achieve strong results on natural language understanding (NLU), we show that GPTs can be better than or comparable to similar-sized BERTs on NLU tasks with a novel method P-tuning -- which employs trainable continuous prompt embeddings. On the knowledge probing (LAMA) benchmark, the best GPT recovers 64% (P@1) of world knowledge without any additional text provided during test time, which substantially improves the previous best by 20+ percentage points. On the SuperGlue benchmark, GPTs achieve comparable and sometimes better performance to similar-sized BERTs in supervised learning. Importantly, we find that P-tuning also improves BERTs' performance in both few-shot and supervised settings while largely reducing the need for prompt engineering. Consequently, P-tuning outperforms the state-of-the-art approaches on the few-shot SuperGlue benchmark..
[[autodoc]] tuners.p_tuning.config.PromptEncoderConfig
[[autodoc]] tuners.p_tuning.model.PromptEncoder
class PromptTuningInit(str, enum.Enum): TEXT = "TEXT" RANDOM = "RANDOM"
Multitask prompt tuning (MPT) learns a single prompt from data for multiple task types that can be shared for different target tasks. Other existing approaches learn a separate soft prompt for each task that need to be retrieved or aggregated for adaptation to target tasks. MPT consists of two stages:
def format_prompt(self, task, chat_mode=False): description = "\n".join([f"- {name}: {tool.description}" for name, tool in self.toolbox.items()]) if chat_mode: if self.chat_history is None: prompt = self.chat_prompt_template.replace("<<all_tools>>", description) else: prompt = self.chat_history prompt += CHAT_MESSAGE_PROMPT.replace("<<task>>", task) else: prompt = self.run_prompt_template.replace("<<all_tools>>", description) prompt = prompt.replace("<<prompt>>", task) return prompt
Prompt lookup decoding is a variant of speculative decoding that is also compatible with greedy search and sampling. Prompt lookup works especially well for input-grounded tasks - such as summarization - where there is often overlapping words between the prompt and output. These overlapping n-grams are used as the LLM candidate tokens.
To enable prompt lookup decoding, specify the number of tokens that should be overlapping in the prompt_lookup_num_tokens
parameter. Then you can pass this parameter to the [~GenerationMixin.generate
] method.
</hfoption> <hfoption id="sampling">from transformers import AutoModelForCausalLM, AutoTokenizer import torch device = "cuda" if torch.cuda.is_available() else "cpu" tokenizer = AutoTokenizer.from_pretrained("facebook/opt-1.3b") inputs = tokenizer("The second law of thermodynamics states", return_tensors="pt").to(device) model = AutoModelForCausalLM.from_pretrained("facebook/opt-1.3b").to(device) assistant_model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m").to(device) outputs = model.generate(**inputs, prompt_lookup_num_tokens=3) print(tokenizer.batch_decode(outputs, skip_special_tokens=True)) ['The second law of thermodynamics states that entropy increases with temperature. ']
For prompt lookup decoding with sampling, add the do_sample
and temperature
parameters to the [~GenerationMixin.generate
] method.
</hfoption> </hfoptions>from transformers import AutoModelForCausalLM, AutoTokenizer import torch device = "cuda" if torch.cuda.is_available() else "cpu" tokenizer = AutoTokenizer.from_pretrained("facebook/opt-1.3b") inputs = tokenizer("The second law of thermodynamics states", return_tensors="pt").to(device) model = AutoModelForCausalLM.from_pretrained("facebook/opt-1.3b").to(device) outputs = model.generate(**inputs, prompt_lookup_num_tokens=3, do_sample=True, temperature=0.7) print(tokenizer.batch_decode(outputs, skip_special_tokens=True)) ["The second law of thermodynamics states that energy cannot be created nor destroyed. It's not a"]
# fmt: off DEFAULT_SYSTEM_PROMPT = """You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your \ answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure\ that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not \ correct. If you don't know the answer to a question, please don't share false information."""
def __init__(self, prompt, num_samples): self.prompt = prompt self.num_samples = num_samples
def get_prompt(self, batch_size: int, task_ids: Optional[torch.Tensor] = None) -> torch.Tensor: """ Returns the virtual prompts to use for Peft. Only applicable when using a prompt learning method. """ peft_config = self.active_peft_config prompt_encoder = self.prompt_encoder[self.active_adapter] prompt_tokens = ( self.prompt_tokens[self.active_adapter] .unsqueeze(0) .expand(batch_size, -1) .to(prompt_encoder.embedding.weight.device) ) if peft_config.peft_type == PeftType.PREFIX_TUNING: prompt_tokens = prompt_tokens[:, : peft_config.num_virtual_tokens] if peft_config.inference_mode: past_key_values = prompt_encoder.embedding.weight.repeat(batch_size, 1, 1) else: past_key_values = prompt_encoder(prompt_tokens) if self.base_model_torch_dtype is not None: past_key_values = past_key_values.to(self.base_model_torch_dtype) past_key_values = past_key_values.view( batch_size, peft_config.num_virtual_tokens, peft_config.num_layers * 2, peft_config.num_attention_heads, peft_config.token_dim // peft_config.num_attention_heads, ) if peft_config.num_transformer_submodules == 2: past_key_values = torch.cat([past_key_values, past_key_values], dim=2) past_key_values = past_key_values.permute([2, 0, 3, 1, 4]).split( peft_config.num_transformer_submodules * 2 ) if TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING.get(self.config.model_type, None) is not None: post_process_fn = TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING[self.config.model_type] past_key_values = post_process_fn(past_key_values) return past_key_values else: if peft_config.peft_type == PeftType.MULTITASK_PROMPT_TUNING: prompts = prompt_encoder(prompt_tokens, task_ids) else: if peft_config.inference_mode: prompts = prompt_encoder.embedding.weight.repeat(batch_size, 1, 1) else: prompts = prompt_encoder(prompt_tokens) return prompts
"{{ '\n\n# System Preamble' }}" "{{ '\n## Basic Rules' }}" "{{ '\nYou are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user\\'s requests, you cite your sources in your answers, according to those instructions.' }}" "{{ '\n\n# User Preamble' }}" "{{ '\n' + system_message }}" "{{ '<|END_OF_TURN_TOKEN|>'}}" "{% for message in loop_messages %}" # Loop over all non-system messages "{% set content = message['content'] %}" "{% if message['role'] == 'user' %}" # After all of that, handle messages/roles in a fairly normal way "{{ '<|START_OF_TURN_TOKEN|><|USER_TOKEN|>' + content.strip() + '<|END_OF_TURN_TOKEN|>' }}" "{% elif message['role'] == 'system' %}" "{{ '<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>' + content.strip() + '<|END_OF_TURN_TOKEN|>' }}" "{% elif message['role'] == 'assistant' %}" "{{ '<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>' + content.strip() + '<|END_OF_TURN_TOKEN|>' }}" "{% endif %}" "{% endfor %}" "{{ '<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>'}}" "{{ '<results>' }}" "{% for document in documents %}" # Loop over all non-system messages "{{ '\nDocument: ' }}" "{{ loop.index0 }}\n" "{% for key, value in document.items() %}" "{{ key }}: {{value}}\n" "{% endfor %}" "{% endfor %}" "{{ '</results>'}}" "{{ '<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>' }}" "{{ 'Carefully perform the following instructions, in order, starting each with a new line.\n' }}" "{{ 'Firstly, Decide which of the retrieved documents are relevant to the user\\'s last input by writing \\'Relevant Documents:\\' followed by comma-separated list of document numbers. If none are relevant, you should instead write \\'None\\'.\n' }}" "{{ 'Secondly, Decide which of the retrieved documents contain facts that should be cited in a good answer to the user\\'s last input by writing \\'Cited Documents:\\' followed a comma-separated list of document numbers. If you dont want to cite any of them, you should instead write \\'None\\'.\n' }}" "{% if citation_mode=='accurate' %}" "{{ 'Thirdly, Write \\'Answer:\\' followed by a response to the user\\'s last input in high quality natural english. Use the retrieved documents to help you. Do not insert any citations or grounding markup.\n' }}" "{% endif %}" "{{ 'Finally, Write \\'Grounded answer:\\' followed by a response to the user\\'s last input in high quality natural english. Use the symbols <co: doc> and </co: doc> to indicate when a fact comes from a document in the search result, e.g <co: 0>my fact</co: 0> for a fact from document 0.' }}" "{{ '<|END_OF_TURN_TOKEN|>' }}" "{% if add_generation_prompt %}" "{{ '<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>' }}" "{% endif %}" ) default_rag_message = DEFAULT_RAG_PREAMBLE.replace("\n", "\\n").replace("'", "\\'") rag_template = rag_template.replace("DEFAULT_SYSTEM_MESSAGE", default_rag_message) return {"default": default_template, "tool_use": tool_use_template, "rag": rag_template} def apply_tool_use_template( self, conversation: Union[List[Dict[str, str]], "Conversation"], tools: List[Dict], **kwargs, ) -> Union[str, List[int]]: """Create a Command-R tool-use prompt. Once rendered, the prompt instructs the model to generate a list of actions to perform on a set of user supplied tools to help carry out the user's requests. Conceptually, this works in the same way as `apply_chat_format`, but takes an additional `tools` parameter. Converts a Conversation object or a list of dictionaries with `"role"` and `"content"` keys and a list of available tools for the model to use into a prompt string, or a list of token ids. This method will use the tokenizer's `default_tool_use_template` template specified at the class level. You can override the default template using the `tool_use_template` kwarg but the quality of your results may decrease. Args: conversation (Union[List[Dict[str, str]], "Conversation"]): A Conversation object or list of dicts with "role" and "content" keys, representing the chat history so far. tools (List[Dict]): a list of tools to render into the prompt for the model to choose from. See an example at the bottom of the docstring. The format should be: * name (str): The name of the tool to be called. Valid names contain only the characters a-z, A-Z, 0-9, _ and must not begin with a digit. * description (str): The description of what the tool does, the model uses the description to choose when and how to call the function. * parameter_definitions (List[Dict]): The input parameters of the tool. Accepts a dictionary where the key is the name of the parameter and the value is the parameter spec. Valid parameter names contain only the characters a-z, A-Z, 0-9, _ and must not begin with a digit. Parameter specs are as follows: * description (str): The description of the parameter. * type (str): the type of the parameter - most effective for python builtin data types, such as 'str', 'bool' * required: boolean: Denotes whether the parameter is always present (required) or not. Defaults to not required. add_generation_prompt (bool, *optional*): Whether to end the prompt with the token(s) that indicate the start of an assistant message. This is useful when you want to generate a response from the model. Note that this argument will be passed to the chat template, and so it must be supported in the template for this argument to have any effect. tokenize (`bool`, defaults to `True`): Whether to tokenize the output. If `False`, the output will be a string. padding (`bool`, defaults to `False`): Whether to pad sequences to the maximum length. Has no effect if tokenize is `False`. truncation (`bool`, defaults to `False`): Whether to truncate sequences at the maximum length. Has no effect if tokenize is `False`. max_length (`int`, *optional*): Maximum length (in tokens) to use for padding or truncation. Has no effect if tokenize is `False`. If not specified, the tokenizer's `max_length` attribute will be used as a default. return_tensors (`str` or [`~utils.TensorType`], *optional*): If set, will return tensors of a particular framework. Has no effect if tokenize is `False`. Acceptable values are: - `'tf'`: Return TensorFlow `tf.Tensor` objects. - `'pt'`: Return PyTorch `torch.Tensor` objects. - `'np'`: Return NumPy `np.ndarray` objects. - `'jax'`: Return JAX `jnp.ndarray` objects. return_dict (`bool`, *optional*, defaults to `False`): Whether to return a dictionary with named outputs. Has no effect if tokenize is `False`. **tokenizer_kwargs: Additional kwargs to pass to the tokenizer. Returns: `str`: A rendered prompt string. or if tokenize=True: `List[int]`: A list of token ids representing the tokenized chat so far, including control tokens. This output is ready to pass to the model, either directly or via methods like `generate()`. Examples: ```python >> tokenizer = CohereTokenizerFast.from_pretrained("CohereForAI/c4ai-command-r-v01") >> tools = [ { "name": "internet_search", "description": "Returns a list of relevant document snippets for a textual query retrieved from the internet", "parameter_definitions": { "query": { "description": "Query to search the internet with", "type": "str", "required": True } } }, { "name': "directly_answer", "description": "Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history", "parameter_definitions": {} } ] >> conversation = [ {"role": "user", "content": "Whats the biggest penguin in the world?"} ] >> # render the prompt, ready for user to inspect, or for input into the model: >> prompt = tokenizer.apply_tool_use_template(conversation, tools=tools, tokenize=False, add_generation_prompt=True) >> print(prompt) <BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># Safety Preamble The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral. # System Preamble ## Basic Rules You are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions. # User Preamble ## Task and Context You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging. ## Style Guide Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling. ## Available Tools Here is a list of tools that you have available to you: \\`\\`\\`python def internet_search(query: str) -> List[Dict]: \"\"\"Returns a list of relevant document snippets for a textual query retrieved from the internet Args: query (str): Query to search the internet with \"\"\" pass \\`\\`\\` \\`\\`\\`python def directly_answer() -> List[Dict]: \"\"\"Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history \"\"\" pass \\`\\`\\`<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Whats the biggest penguin in the world?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>Write 'Action:' followed by a json-formatted list of actions that you want to perform in order to produce a good response to the user's last input. You can use any of the supplied tools any number of times, but you should aim to execute the minimum number of necessary actions for the input. You should use the `directly-answer` tool if calling the other tools is unnecessary. The list of actions you want to call should be formatted as a list of json objects, for example: \\`\\`\\`json [ { "tool_name": title of the tool in the specification, "parameters": a dict of parameters to input into the tool as they are defined in the specs, or {} if it takes no parameters } ]\\`\\`\\`<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|> ``` >> inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors='pt') >> outputs = model.generate(inputs, max_new_tokens=128) >> print(tokenizer.decode(outputs[0])) Action: ```json [ { "tool_name": "internet_search", "parameters": { "query": "biggest penguin in the world" } } ] ``` """ return self.apply_chat_template( conversation, chat_template="tool_use", tools=tools, **kwargs, ) def apply_grounded_generation_template( self, conversation: Union[List[Dict[str, str]], "Conversation"], documents: List[Dict], citation_mode: Literal["fast", "accurate"] = "accurate", **kwargs, ) -> Union[str, List[int]]: """Create a Command-R grounded generation (aka RAG) prompt. Once rendered, the prompt instructs the model to generate a response with citations in, based on supplied documents. Conceptually, this works in the same way as `apply_chat_format`, but takes additional `documents` and parameter `citation_mode` parameters. Converts a Conversation object or a list of dictionaries with `"role"` and `"content"` keys and a list of documents for the model to ground its response on into a prompt string, or a list of token ids. This method will use the tokenizer's `grounded_generation_template` template specified at the class level. You can override the default template using the `grounded_generation_template` kwarg but the quality of your results may decrease. Args: conversation (Union[List[Dict[str, str]], "Conversation"]): A Conversation object or list of dicts with "role" and "content" keys, representing the chat history so far. documents (List[Dict[str, str]): A list of dicts, representing documents or tool outputs to ground your generation on. A document is a semistructured dict, wiht a string to string mapping. Common fields are `url`, `title`, `snippet` etc but should be descriptive of the key. They will get rendered into the prompt. citation_mode: either "accurate" (prompt the model to generate an answer first, then rewrite it with citation spans in) or "fast", where the prompt instructs the model to generate an answer with citations in directly. The former has higher quality citations, the latter requires fewer tokens to be generated. add_generation_prompt (bool, *optional*): Whether to end the prompt with the token(s) that indicate the start of an assistant message. This is useful when you want to generate a response from the model. Note that this argument will be passed to the chat template, and so it must be supported in the template for this argument to have any effect. tokenize (`bool`, defaults to `True`): Whether to tokenize the output. If `False`, the output will be a string. padding (`bool`, defaults to `False`): Whether to pad sequences to the maximum length. Has no effect if tokenize is `False`. truncation (`bool`, defaults to `False`): Whether to truncate sequences at the maximum length. Has no effect if tokenize is `False`. max_length (`int`, *optional*): Maximum length (in tokens) to use for padding or truncation. Has no effect if tokenize is `False`. If not specified, the tokenizer's `max_length` attribute will be used as a default. return_tensors (`str` or [`~utils.TensorType`], *optional*): If set, will return tensors of a particular framework. Has no effect if tokenize is `False`. Acceptable values are:
In this section of the guide we have compiled a list of best practices that tend to improve the prompt results:
def __init__(self, prompt, num_samples): self.prompt = prompt self.num_samples = num_samples
def __getitem__(self, index): example = {} example["prompt"] = self.prompt example["index"] = index return example
def __init__(self, prompt, num_samples): self.prompt = prompt self.num_samples = num_samples
def prompts(self): return [ "Today is a beautiful day and I want", "In the city of", "Paris is the capital of France and", "Computers and mobile phones have taken", ]
def prompts(self): return [ "Today is a beautiful day and I want", "In the city of", "Paris is the capital of France and", "Computers and mobile phones have taken", ]
PROMPTS = [ "Hello, my name is", "Are unicorns real? Unicorns are", "For the first time in several years,", "My name is Julien and I am", "The goal of life is", "Whenever I'm sad, I like to", ]
---
title: Template-Free
description: Construct prompts without a template.
order: 4
---
See [these docs](../input_output.qmd).
---
title: Instruction Tuning
description: Instruction tuning formats for supervised fine-tuning.
order: 2
---
## alpaca
instruction; input(optional)
```{.json filename="data.jsonl"}
{"instruction": "...", "input": "...", "output": "..."}
question and answer
{"question": "...", "category": "...", "answer": "..."}
instruction
{"INSTRUCTION": "...", "RESPONSE": "..."}
instruction; input(optional)
{"instruction": "...", "input": "...", "response": "..."}
instruction with reflect; input(optional)
{"instruction": "...", "input": "...", "output": "...", "reflection": "...", "corrected": "..."}
question, choices, (solution OR explanation)
{"question": "...", "choices": ["..."], "solution": "...", "explanation": "..."}
question, choices, (solution OR explanation)
{"question": "...", "choices": ["..."], "solution": "...", "explanation": "..."}
article and summary
{"article": "...", "summary": "..."}
basic instruct for alpaca chat
{"instruction": "...", "input": "...", "response": "..."}
question and answer for alpaca chat
{"question": "...", "answer": "..."}
question and answer for alpaca chat, for concise answers
{"instruction": "...", "input": "...", "response": "..."}
question and answer for alpaca chat, for load_camel_ai
{"message_1": "...", "message_2": "..."}
support for open orca datasets with included system prompts, instruct
{"system_prompt": "...", "question": "...", "response": "..."}
in context question answering from an article
{"article": "...", "question": "...", "answer": "..."}
in context question answering (alternate)
{"context": "...", "question": "...", "answer": "..."}
in context question answering from an article, with default response for no answer from context
{"article": "...", "unanswerable_question": "..."}
instruction and revision
{"instruction": "...", "revision": "..."}
critique
{"scores": "...", "critiques": "...", "instruction": "...", "answer": "..."}
critique and revise
{"scores": "...", "critiques": "...", "instruction": "...", "answer": "...", "revision": "..."}
instruction, adds additional eos tokens
{"prompt": "...", "generation": "..."}
For a dataset that is preprocessed for instruction purposes:
{"input": "...", "output": "..."}
You can use this example in your YAML config:
datasets:
- path: repo
type:
system_prompt: ""
field_system: system
field_instruction: input
field_output: output
format: "[INST] {instruction} [/INST]"
no_input_format: "[INST] {instruction} [/INST]"
See full config options under here.
prompts = ("I would like to", "I really like to", "The weather is pretty")
---
title: Template-free prompt construction
description: "Template-free prompt construction with the `input_output` format"
---
<!-- TOC -->
- [Background](#background)
- [Masking Inputs](#masking-inputs)
- [You may not want prompt templates](#you-may-not-want-prompt-templates)
- [The `input_output` format](#the-input_output-format)
- [Usage](#usage)
- [1. Prepare Data](#1-prepare-data)
- [2. Use `type: input_output`](#2-use-type-input_output)
- [3. Check the prompts](#3-check-the-prompts)
<!-- /TOC -->
<a id="markdown-background" name="background"></a>
## Background
<a id="markdown-masking-inputs" name="masking-inputs"></a>
### Masking Inputs
One of the most popular features of
[axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) is
setting the following configuration value:
```yaml
train_on_inputs: false
If you declare a dataset formats
such as alpaca
or chatml
, axolotl knows what is an input
(i.e. human) vs. an output (i.e. the assistant) and masks the input
labels so that your model can focus on predicting the outputs only.
<a id="markdown-you-may-not-want-prompt-templates" name="you-may-not-want-prompt-templates"></a>
However, there are many situations where you don't want to use one of these formats or templates. This is because they can:
<|im_start|>
that can
quickly become footguns if you don't include them correctly at
inference time.<a id="markdown-the-inputoutput-format" name="the-inputoutput-format"></a>
input_output
formatYou can construct your prompts without a template by using the
input_output
format, by setting type: input_output
in your
configuration file like this:
config.yml
train_on_inputs: false # Mask segments of your data datasets: - path: output.jsonl type: input_output # use template free prompt construction
Unlike type: completion
, which is also template-free,
type: input_output
allows you to mask segments of your text. More
details on how this works are described below.
<a id="markdown-usage" name="usage"></a>
This is how you can use the input_output
format:
<a id="markdown-1-prepare-data" name="1-prepare-data"></a>
To use the input_output
format, collect your data in the following
format into a jsonl file (below is the first row from the file
output
.jsonl` pretty printed):
$ head -n1 output.jsonl | python -m json.tool
:::{.cell-output .cell-output-stdout} { "segments": [ { "label": true, "text": "<s>Hello\n" }, { "label": true, "text": "hi there!. " }, { "label": false, "text": "goodbye " }, { "label": true, "text": "farewell</s>" } ] } :::
Set label:false
when you want to mask a segment of text so that the
model isn't trained on it. Some things to keep in mind:
[!IMPORTANT]
- EOS, BOS, spaces, newlines etc. are entirely up to you. Axolotl concatenates all the segments as-is. The tokenizer doesn't add anything additional. Notice how I added spaces, newlines,
<s>
(BOS), and</s>
(EOS) myself.- Make sure you check the materialized output to validate that the prompt is getting assembled how you like.
<a id="markdown-2-use-type-inputoutput" name="2-use-type-inputoutput"></a>
type: input_output
Let's materialize data with our output.jsonl
file by setting
type: input_output
in our axolotl config:
# training_config.yaml base_model: mistralai/Mistral-7B-v0.1 data_seed: 49 seed: 49 datasets: - path: output.jsonl type: input_output val_set_size: 0.1 sequence_len: 896 sample_packing: false micro_batch_size: 2 gradient_accumulation_steps: 3 eval_batch_size: 2 num_epochs: 1 learning_rate: 0.0002 train_on_inputs: false special_tokens: bos_token: "<s>" eos_token: "</s>" unk_token: "<unk>"
You can use the following command to materialize your data. The
--debug
flag will print the tokens, along with the labels so you can
verify that the correct items are being ignored:
$ python -m axolotl.cli.preprocess training_config.yaml --debug ... [2024-03-05 23:36:46,969] [INFO] [axolotl.check_example_labels:35] [PID:607731] [RANK:0] <s>(1, 1) Hello(22557, 22557) (13, 13) hi(12014, 12014) there(736, 736) !(28808, 28808) .(28723, 28723) (28705, 28705) good(-100, 1179) bye(-100, 17664) (-100, 28705) fare(19111, 19111) well(5458, 5458) </s>(2, 2)
The format is decoded_token
(label
, token_id
), for example,
<s>(1, 1)
means that the token is <s>
, the label is 1
and the
token_id is 1
. When the label is -100
then that token is ignored for
training.
<a id="markdown-3-check-the-prompts" name="3-check-the-prompts"></a>
Here is another way to check the materialized output:
from transformers import AutoTokenizer from datasets import load_from_disk import yaml directory = !ls last_run_prepared/ with open('training_config.yaml', 'r') as f: cfg = yaml.safe_load(f) model_id = cfg['base_model'] tok = AutoTokenizer.from_pretrained(model_id) ds = load_from_disk(f'last_run_prepared/{directory[0]}/')
>>> row = ds[0] >>> print(tok.decode(row['input_ids'])) <s> Hello hi there!. goodbye farewell</s>
We can check that the right tokens are ingored by comparing the labels to each token:
import pandas as pd pd.DataFrame([{'token': tok.decode(i), 'label': l, 'id':i} for i,l in zip(row['input_ids'], row['labels'])])
| token | label | id | |-------|-------|-------| | 0 | <s> | 1 | | 1 | Hello | 22557 | | 2 | \n | 13 | | 3 | hi | 12014 | | 4 | there | 736 | | 5 | ! | 28808 | | 6 | . | 28723 | | 7 | | 28705 | | 8 | good | -100 | | 9 | bye | -100 | | 10 | | -100 | | 11 | fare | 19111 | | 12 | well | 5458 | | 13 | </s>| 2 |
If we look at the input data, the above table seems correct! (The jsonl version is repeated below for reference):
$ head -n1 output.jsonl | python -m json.tool
:::{.cell-output .cell-output-stdout} { "segments": [ { "label": true, "text": "<s>Hello\n" }, { "label": true, "text": "hi there!. " }, { "label": false, "text": "goodbye " }, { "label": true, "text": "farewell</s>" } ] } :::
# Split into batches # We will get the following results: # [ ["I would like to", "hello how are you"], [ "what is going on", "roses are red and"], [ "welcome to the hotel"] ] formatted_prompts = [prompts[i : i + batch_size] for i in range(0, len(prompts), batch_size)] # Apply padding on the left since we are doing generation padding_side_default = tokenizer.padding_side tokenizer.padding_side = "left"
description = "Launches a series of prompts to create and save a `default_config.yaml` configuration file for your training system. Should always be ran first on your machine"
def __init__(self, prompt: str = None, choices: list = []): self.position = 0 self.choices = choices self.prompt = prompt if sys.platform == "win32": self.arrow_char = "*" else: self.arrow_char = "➔ "
# currently only supported on Llama and Mistral
neftune_noise_alpha:
# Whether to bettertransformers
flash_optimum:
# Whether to use xformers attention patch https://github.com/facebookresearch/xformers:
xformers_attention:
# Whether to use flash attention patch https://github.com/Dao-AILab/flash-attention:
flash_attention:
flash_attn_cross_entropy: # Whether to use flash-attention cross entropy implementation - advanced use only
flash_attn_rms_norm: # Whether to use flash-attention rms norm implementation - advanced use only
flash_attn_fuse_qkv: # Whether to fuse QKV into a single operation
flash_attn_fuse_mlp: # Whether to fuse part of the MLP into a single operation
# Whether to use scaled-dot-product attention
# https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html
sdp_attention:
# Shifted-sparse attention (only llama) - https://arxiv.org/pdf/2309.12307.pdf
s2_attention:
# Resume from a specific checkpoint dir
resume_from_checkpoint:
# If resume_from_checkpoint isn't set and you simply want it to start where it left off.
# Be careful with this being turned on between different models.
auto_resume_from_checkpoints: false
# Don't mess with this, it's here for accelerate and torchrun
local_rank:
# Add or change special tokens.
# If you add tokens here, you don't need to add them to the `tokens` list.
special_tokens:
# bos_token: "<s>"
# eos_token: "</s>"
# unk_token: "<unk>"
# pad_token: "[PAD]"
# Add extra tokens.
tokens:
# FSDP
fsdp:
fsdp_config:
# Deepspeed config path. e.g., deepspeed_configs/zero3.json
deepspeed:
# Advanced DDP Arguments
ddp_timeout:
ddp_bucket_cap_mb:
ddp_broadcast_buffers:
# Path to torch distx for optim 'adamw_anyprecision'
torchdistx_path:
# Set to HF dataset for type: 'completion' for streaming instead of pre-tokenize
pretraining_dataset:
# Debug mode
debug:
# Seed
seed:
# Allow overwrite yml config using from cli
strict:
---
title: Pre-training
description: Data format for a pre-training completion task.
order: 1
---
For pretraining, there is no prompt template or roles. The only required field is `text`:
```{.json filename="data.jsonl"}
{"text": "first row"}
{"text": "second row"}
...
:::{.callout-note}
Axolotl usually loads the entire dataset into memory. This will be challenging for large datasets. Use the following config to enable streaming:
pretraining_dataset: # hf path only
...
:::
def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.titles = [ "Hardware Selection Arguments", "Resource Selection Arguments", "Training Paradigm Arguments", "positional arguments", "optional arguments", ]
---
title: Dataset Preprocessing
description: How datasets are processed
---
Dataset pre-processing is the step where Axolotl takes each dataset you've configured alongside
the (dataset format)[../dataset-formats/] and prompt strategies to:
- parse the dataset based on the *dataset format*
- transform the dataset to how you would interact with the model based on the *prompt strategy*
- tokenize the dataset based on the configured model & tokenizer
- shuffle and merge multiple datasets together if using more than one
The processing of the datasets can happen one of two ways:
1. Before kicking off training by calling `python -m axolotl.cli.preprocess /path/to/your.yaml --debug`
2. When training is started
What are the benefits of pre-processing? When training interactively or for sweeps
(e.g. you are restarting the trainer often), processing the datasets can oftentimes be frustratingly
slow. Pre-processing will cache the tokenized/formatted datasets according to a hash of dependent
training parameters so that it will intelligently pull from its cache when possible.
The path of the cache is controlled by `dataset_prepared_path:` and is often left blank in example
YAMLs as this leads to a more robust solution that prevents unexpectedly reusing cached data.
If `dataset_prepared_path:` is left empty, when training, the processed dataset will be cached in a
default path of `./last_run_prepared/`, but will ignore anything already cached there. By explicitly
setting `dataset_prepared_path: ./last_run_prepared`, the trainer will use whatever pre-processed
data is in the cache.
What are the edge cases? Let's say you are writing a custom prompt strategy or using a user-defined
prompt template. Because the trainer cannot readily detect these changes, we cannot change the
calculated hash value for the pre-processed dataset. If you have `dataset_prepared_path: ...` set
and change your prompt templating logic, it may not pick up the changes you made and you will be
training over the old prompt.
# Tokenize each batch tokenized_prompts = [ tokenizer(formatted_prompt, padding=True, pad_to_multiple_of=pad_to_multiple_of, return_tensors="pt") for formatted_prompt in formatted_prompts ] # Put back the original padding behavior tokenizer.padding_side = padding_side_default completions_per_process = []
class CustomHelpFormatter(argparse.HelpFormatter): """ This is a custom help formatter that will hide all arguments that are not used in the command line when the help is called. This is useful for the case where the user is using a specific platform and only wants to see the arguments for that platform. """ def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.titles = [ "Hardware Selection Arguments", "Resource Selection Arguments", "Training Paradigm Arguments", "positional arguments", "optional arguments", ] def add_argument(self, action: argparse.Action): if "accelerate" in sys.argv[0] and "launch" in sys.argv[1:]: args = sys.argv[2:] else: args = sys.argv[1:] if len(args) > 1: args = list(map(clean_option, args)) used_platforms = [arg for arg in args if arg in options_to_group.keys()] used_titles = [options_to_group[o] for o in used_platforms] if action.container.title not in self.titles + used_titles: action.help = argparse.SUPPRESS elif action.container.title == "Hardware Selection Arguments": if set(action.option_strings).isdisjoint(set(args)): action.help = argparse.SUPPRESS else: action.help = action.help + " (currently selected)" elif action.container.title == "Training Paradigm Arguments": if set(action.option_strings).isdisjoint(set(args)): action.help = argparse.SUPPRESS else: action.help = action.help + " (currently selected)" action.option_strings = [s for s in action.option_strings if "-" not in s[2:]] super().add_argument(action) def end_section(self): if len(self._current_section.items) < 2: self._current_section.items = [] self._current_section.heading = "" super().end_section()
---
title: Conversation
description: Conversation format for supervised fine-tuning.
order: 3
---
## sharegpt
conversations where `from` is `human`/`gpt`. (optional: first row with role `system` to override default system prompt)
```{.json filename="data.jsonl"}
{"conversations": [{"from": "...", "value": "..."}]}
Note: type: sharegpt
opens special configs:
conversation
: enables conversions to many Conversation types. Refer to the 'name' here for options.roles
: allows you to specify the roles for input and output. This is useful for datasets with custom roles such as tool
etc to support masking.field_human
: specify the key to use instead of human
in the conversation.field_model
: specify the key to use instead of gpt
in the conversation.datasets: path: ... type: sharegpt conversation: # Options (see Conversation 'name'): https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py field_human: # Optional[str]. Human key to use for conversation. field_model: # Optional[str]. Assistant key to use for conversation. # Add additional keys from your dataset as input or output roles roles: input: # Optional[List[str]]. These will be masked based on train_on_input output: # Optional[List[str]].
{"conversations": [{"role": "...", "value": "..."}]}
conversations where role
is used instead of from
{"conversations": [{"role": "...", "value": "..."}]}
conversations where from
is prompter
assistant
instead of default sharegpt
{"conversations": [{"from": "...", "value": "..."}]}
creates a chat where bot is asked to tell a joke, then explain why the joke is funny
{"conversations": [{"title": "...", "text": "...", "explanation": "..."}]}
---
title: Dataset Formats
description: Supported dataset formats.
listing:
fields: [title, description]
type: table
sort-ui: false
filter-ui: false
max-description-length: 250
---
Axolotl supports a variety of dataset formats. It is recommended to use a JSONL format. The schema of the JSONL depends upon the task and the prompt template you wish to use. Instead of a JSONL, you can also use a HuggingFace dataset with columns for each JSONL field.
Below are these various formats organized by task: