prof 전광성 univ. of arizona
LLM can be harmful
should not provide a biased answer
Alignement
The task of ensuring that the AI models agents behavior align well with human values.
The last step of the LLM training pipe line
pretraining →supervised fine tuning → alignment
supervised fine tuning
Th teach the pre trained model to follow instructions and generate desired outputs
Intruction-tuned → with prompt paired data
Q. tellme how to rob a bank
A1: First fine weapons
A2: I am not sure I can answer that.
data {,,,, ,z } z perfer A1 if human speaks
why binary? not scaling?