Posts

Dec 5, 2023
A Collection of Good LLM Analogies

LLMs Analogies SciComm
Jun 25, 2023
The Curious Case of LLM Evaluations
Our modeling, scaling and generalization techniques grew faster than our benchmarking abilities - which in turn have resulted in poor evaluation and hyped capabilities.
Evaluation LLMs Survey
Apr 27, 2023
Gender Bias in GPT4: A Short Demo
A brief demonstration of gender bias in GPT4, as observed from various downstream task perspectives ft. Taylor Swift
🪴 Potted LLMs Evaluation Prompting
Apr 10, 2023
Unstable Theory of Mind in Sparks of AGI
Discussing the prospect of deriving instinct and purpose for a prompt and creating examples for evaluation problems focussing the Sally-Anne False-Belief Test and provide a summary of when GPT4 and GPT3.5 pass or fail the test.
LLMs Evaluation
Mar 1, 2023
Prompt Cap | Making Sure Your Model Benchmarking is Cap or Not-Cap
Enhance classification with a text annotation framework for improved systemization in prompt-based language model evaluation
Evaluation Prompting LLMs

A Collection of Good LLM Analogies