Brookhaven AIMS Series: Evaluating Large Language Models of Code

Name: Brookhaven AIMS Series: Evaluating Large Language Models of Code
Start: 2023-07-18T12:00:00-04:00
End: 2023-07-18T13:00:00-04:00
Location: No location set

Tuesday 18 Jul 2023, 12:00 → 13:00 US/Eastern

Description

The Brookhaven AI/ML Seminar (AIMS) series is about showcasing research at Brookhaven National Laboratory (BNL) and elsewhere that uses AI and Machine Learning to enhance scientific discovery and that uses domain science questions to motivate new AI developments.

- 12:00 → 13:00
  
  Evaluating Large Language Models of Code¶ 1h
  
  Abstract: Large language models (LLMs) have become increasingly useful at generating code. Code generation models have the potential to make programming more accessible. To achieve this goal, models will need to perform well on a variety of programming languages and will need to understand instructions from a diverse community of users. In this talk, I present two recent projects that evaluate contemporary code generation models: MultiPL-E, a test-driven code generation benchmark for 18+ programming languages; and StudentEval, a benchmark of diverse code generation prompts written by beginner programmers.
  
  Bio: Carolyn Jane Anderson is an Assistant Professor of Computer Science at Wellesley College. Her research explores the intersection of computation and meaning using a variety of methods, including Bayesian modeling, human subjects experiments, and deep learning. Her current projects focus on large language models of code and include MultiPL-E, a code generation benchmark for 18+ languages. She is also involved in the BigCode project, a collaboration to develop state-of-the-art open-source code generation models, which recently released the SantaCoder and StarCoder models. She graduated from the University of Massachusetts, Amherst with a PhD in Linguistics in 2021.
  
  Speaker: Prof. Carolyn J. Anderson (Wellesley College)
  
  Carolyn_Anderson_07182023.mp4

Choose timezone

Brookhaven AIMS Series: Evaluating Large Language Models of Code