Fine tuning LLMs for domain specific tasks like classification (one some private dataset) is not easy. We have to understand lot of concepts and it I have struggled a lot to find a proper way to fine tune and evaluate our fine tuned models on tasks like classification.
- How a dataset can help a model to achieve to provide refined results
- how can we contraint our model’s output
- how to construct dataset with external prompts so that we can cast a language completion problem to a classification like problem
- how to fine tune a 7B model on a consumer gpu
- common problems while loading a peft model (saved in local) during inference
- how to build an overall analytical pipeline to asses LLM’s performance on quality, speed, and reliability
- Other different insights and best practices.
My latest blog covers it all, and being a three part blog series, more to come. In this blog I shared all the potential common best practices for fine tuning a large language models using Hugging Face and utilize Weights and Bias to effectively manage our experimentation and track our model based on different performance parameters.
Please do check out here