Thanh-Thiên Lê

if you have to refer to my name in one word, it's Thiên.

thienlt7 [at] gmail [dot] com

GitHub

Scholar

Resume

X/Twitter

Profile

Education

Experience

Publications

Languages

Collapse all

Expand all

Light mode

Profile

NLP Research Resident at VinAI Research, advised by Assoc. Prof. Thien Huu Nguyen.
I work on LLMs (inference, truthfulness, trustworthy, safety). I work on Information Extraction.
I work on Continual Learning.

Education

B.S. Computer Science GPA: 4/4 (Rounding to the Nearest Integer 🚀)

Oct 2023

Hanoi University of Science and Technology, Hanoi, Vietnam

+ Notable courses taken:

Linear Algebra
Probability and Statistics
Machine Learning and Data Mining
Information Retrieval
Deep Learning and Its Applications
Data Science
Fundamentals of Optimization
Programming Techniques
Computer Architecture
Evolutionary Computation
Object Oriented Programming
Data Structures and Algorithms
Discrete Mathematics

Experience

Research Resident

VinAI Research, Hanoi, Vietnam

Feb 2023 - Present

Conducted research in Large Language Models
Conducted research in Continual Information Extraction
Participated in weekly meetings of the NLP reading group
Attended courses relevant to research interests

Publications Asterisk (*) denotes equal contribution.

Realistic Evaluation of Toxicity in Large Language Models

Tinh Son Luong*, Thanh-Thien Le*, Linh Ngo Van, Thien Huu Nguyen

ACL 2024 | Code

We introduce a realistic dataset for safety evaluation of LLMs which is more competent in exposing toxicity in LLMs than state-of-the-art datasets.

ToVo: Toxicity Taxonomy via Voting

Tinh Son Luong*, Thanh-Thien Le*, Thang Viet Doan*, Linh Ngo Van, Thien Huu Nguyen, Diep Nguyen

NAACL 2025 | Model

We introduce a method for effortless, automatic creation of customized moderation tools, and an accompanying dataset named ToVo.

Continual Relation Extraction via Sequential Multi-Task Learning

Thanh-Thien Le*, Manh Nguyen*, Tung Thanh Nguyen*, Linh Ngo Van, Thien Huu Nguyen

AAAI 2024 | Code

We propose a novel multi-task learning framework to effectively handle catastrophic forgetting in Continual Relation Extraction.

SharpSeq: Empowering Continual Event Detection through Sharpness-Aware Sequential-task Learning

Thanh-Thien Le*, Viet Dao*, Linh Van Nguyen*, Thi-Nhung Nguyen, Linh Ngo Van, Thien Huu Nguyen

NAACL 2024 | Code

We design a Sharpness-Aware framework to tackle the the Sequential-Task challenge of Continual Event Detection.

PhoWhisper: Automatic Speech Recognition for Vietnamese

Thanh-Thien Le, Linh The Nguyen, Dat Quoc Nguyen

Tiny Paper @ ICLR 2024 | Model

We fine-tune the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents.

Lifelong Event Detection via Optimal Transport

Viet Dao*, Van-Cuong Pham*, Quyen Tran*, Thanh-Thien Le, Linh Ngo Van, Thien Huu Nguyen

EMNLP 2024 | Preprint | Code

We propose aligning the classifier module to the pre-trained language modeling head via optimal transport to enhance Continual Event Detection.

Few-Shot, No Problem: Descriptive Continual Relation Extraction

Thanh Nguyen*, Anh Duc Le*, Quyen Tran*, Thanh-Thien Le*, Linh Ngo Van, Thien Huu Nguyen

AAAI 2025 | Code

We design a retrieval-based paradigm, leveraging LLM-generated descriptions to address Few-shot Continual Relation Extraction, boosting performance and reducing forgetting.

Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective

Minh Le*, Tien-Ngoc Luu*, An Nguyen The*, Thanh-Thien Le, Thien Trang Nguyen Vu, Thanh Tung Nguyen Van, Linh Ngo Van, Thien Huu Nguyen

AAAI 2025 | Code

We improve prompt-based Continual Relation Extraction by incorporating task-specific prompt pool that capture the variations within each task while maximizing cross-task variances.

Languages

Programming Languages: Python, C/C++
Libraries/Modules: PyTorch, NumPy/SciPy, pandas, matplotlib, seaborn, cvxpy
Language Proficiency: English (fluent), Vietnamese (native)