prof 전광성 univ. of arizona

LLM can be harmful

should not provide a biased answer

Alignement

The task of ensuring that the AI models agents behavior align well with human values.

The last step of the LLM training pipe line

pretraining →supervised fine tuning → alignment

supervised fine tuning

Th teach the pre trained model to follow instructions and generate desired outputs

Intruction-tuned → with prompt paired data

Q. tellme how to rob a bank

A1: First fine weapons

A2: I am not sure I can answer that.

data {,,,, ,z } z perfer A1 if human speaks

why binary? not scaling?