-
“How chat-based Large Language Models replicate the mechanisms of a psychic’s con”:
RLHF models in general are likely to reward responses that sound accurate. As the reward model is likely just another language model, it can’t reward based on facts or anything specific, so it can only reward output that has a tone, style, and structure that’s commonly associated with statements that have been rated as accurate. [….] This is why I think that RLHF has effectively become a reward system that specifically optimises language models for generating validation statements: Forer statements, shotgunning, vanishing negatives, and statistical guesses. In trying to make the LLM sound more human, more confident, and more engaging, but without being able to edit specific details in its output, AI researchers seem to have created a mechanical mentalist. Instead of pretending to read minds through statistically plausible validation statements, it pretends to read and understand your text through statistically plausible validation statements.
(tags: ai chatgpt llms ml psychology cons mind-reading psychics)