Research

AidanBench | Co-First Author

Accepted to NeurIPS Language Gamification 2024

This is a novel benchmark that is used for evaluating sustained, open ended generation in large language models / LLMs.

It uses open ended question prompts to assess a model's coherence, creativity, contextual attention, and instruction following through embedding based dissimilarity metrics.

We performed comparative analyses across SOTA models and could demonstrate that AidanBench is strongly correlated with model size and moderately correlated with LMSYS.

This is a non saturating benchmark / has no score ceiling and it aligns better with real-world open-ended use cases.

arXiv Humun Website

Creating a Cooperative AI Policymaking Platform | Co-First Author

with Humanity Unleashed

Leading research on frameworks that systematically identify and quantify human values across diverse populations, using Bayesian modeling to inform AI driven policy development.

Developing methodologies to capture stakeholder preferences that represent various demographic groups to get AI recommendations to reflect by domain a comprehensive range of societal perspectives.

Translating elicited values into actionable policy proposals to enable transparent governance with human opinion oversight in AI decision processes.

The platform I'm building is part of the larger mission of leveraging AI to enhance human cooperation and alignment, working toward responsible governance before more advanced AI systems emerge.