About Me
My name is Sangah Lee 이상아[李尙娥], an assistant professor in the Department of Linguistics at Seoul National University.
I am a computational linguist interested in multilingual and low-resource scenarios of language modeling. Recently, I’m on a few topics bridging computational linguistics with various fields of theoretical linguistics.
Research Topics
- Various topics on Large Language Models, especially in Korean
- Multilingual, multicultural scenarios of NLP
- Linguistic probing of LLMs
- Low-resource languages and their morpheme-aware tokenization
- And many other things with methods of computational linguistics!
Experience
- [2022.9 - present] Assistant Professor @ Dept. of Linguistics, Seoul National University
- [2022.3 - 2022.8] Assistant Teaching Professor @ Faculty of Liberal Education, Seoul National University
- [2021.9 - 2022.2] Lecturer @ Dept. of Linguistics, Seoul National University
- [2021.3 - 2022.2] PostDoctoral Researcher @ Graduate School of Data Science, Seoul National University
Education
- [2021] Ph.D. in Linguistics @ Dept. of Linguistics, Seoul National University
- Dissertation: “The Construction of a Korean Pre-Trained Model and an Enhanced Application on Sentiment Analysis”
- Advisor: Hyopil Shin
- [2016] M.A. @ Dept. of Linguistics, Seoul National University
- Dissertation: “An Automatic Analysis of Argumentation Schemes of Korean Texts”
- Advisor: Hyopil Shin
- [2013] B.A. in Linguistics; B.Eng. in Computer Science and Engineering (double majors) @ Seoul National University
Publications
- Jean Seo, Minha Kang, SungJoo Byun, and Sangah Lee (2024), ManWav: The First Manchu ASR Model, Proceedings of the 3rd Workshop on NLP Applications to Field Linguistics (Field Matters 2024).
- Seung Joo Yoo and Sangah Lee (2024), Large Language Models Show Human-Like Abstract Thinking Patterns: A Construal-Level Perspective, Proceedings of the Annual Meeting of the Cognitive Science Society.
- Sangah Lee, Sungjoo Byun, Jean Seo, and Minha Kang (2024), ManNER & ManPOS: Pioneering NLP for Endangered Manchu Language, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024).
- Kyuhee Kim, Surin Lee, and Sangah Lee (2024), KoCoNovel: Annotated Dataset of Character Coreference in Korean Novels, arXiv:2404.01140.
- Kyuhee Kim, Surin Lee, and Sangah Lee (2024), K-Act2Emo: Korean Commonsense Knowledge Graph for Indirect Emotional Expression, arXiv:2403.14253.
- Dongjun Jang, Sangah Lee, Sungjoo Byun, Jinwoong Kim, Jean Seo, Minseok Kim, Soyeon Kim, Chaeyoung Oh, Jaeyoon Kim, Hyemi Jo, and Hyopil Shin (2023), DaG LLM ver 1.0: Pioneering Instruction-Tuned Language Modeling For Korean NLP, arXiv:2311.13784v1.
- Jean Seo, Sungjoo Byun, Minha Kang, and Sangah Lee (2023), Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data, 3rd Multilingual Represenation Learning (MRL) Workshop.
- Sangah Lee (2023), Studies on Clauses in Computational Linguistics Focused on Korean Corpora, Journal of Korean Linguistics, No.107, pp. 445-468.
- Sangah Lee, Seokgi Kim, Eunjin Kim, Minji Kang, and Hyopil Shin (2022), Contract Eligibility Verification Enhanced by Keyword and Contextual Embeddings, KIISE Vol.49, No.10, pp.848-858.
- Sangah Lee and Hyopil Shin (2021), The Korean Morphologically Tight-Fitting Tokenizer for Noisy User-Generated Texts, 2021 The 7th Workshop on Noisy User-Generated Text (W-NUT).
- Sangah Lee and Hyopil Shin (2021), Combining Sentiment-Combined Model with Pre-Trained BERT Models for Sentiment Analysis, Journal of KIISE, Vol.48, No.7, pp.815-824.
- Sangah Lee and Hyopil Shin (2021), Argument Facet Detection in Online Debates Based on Attention Weights and Clustering with Combined Similarity Matrices, Korean Journal of Linguistics, Vol.46, No.1, pp.107-134.
- Sangah Lee, Hansol Jang, Yunmee Baik, Suzi Park and Hyopil Shin (2020), A Small-Scale Korean-Specific BERT Language Model, Journal of KIISE, Vol.47, No.7, pp.682-692.
- Sangah Lee and Hyopil Shin (2018), An Analysis of Linear Argumentation Structure of Korean Debate Texts Using Sequential Modeling and Linguistic Features, Journal of KIISE, Vol.45, No.12, pp.1292-1301.
- Sangah Lee and Hyopil Shin (2016), Stance Classification of Online Debate Texts based on Discourse Relations, Language Research, Vol.52, No.3, pp.511-532.
Presentations
- Nayoung Park and Sangah Lee (2023), The Phonological Constraints on Korean Lexical Subclasses, The 9th International Conference on Phonology and Morphology (ICPM9).
- Sangah Lee (2022), Computational Linguistics and the Study of Korean Syntax, The Society of Korean Linguistics.
- Sangah Lee and Hyopil Shin (2020), A Method of Infusing Additional Features into Pre-Trained BERT Models for Sentiment Analysis, Korea Software Congress 2020.
- Sangah Lee and Suzi Park (2018), The Occurrence and Evolution of Feminist Twitterians, The Discourse and Cognitive Linguistics Society of Korea.
- Sangah Lee (2017), Automatic Prediction of ‘Anti-Search Variants’ of Twitter based on Word Embeddings and Phonetic Similarity, The 29th Annual Conference on Human & Cognitive Language Technology.
- Migyeong Kim, Suzi Park and Sangah Lee (2016), The POS Elderly: Semi-Automatic Annotation Tool for Historical Korean, The 28th Annual Conference on Human & Cognitive Language Technology.
- Sangah Lee and Hyopil Shin (2015), An Automatic Classification of Discourse Relations in the Arguing Structure of Korean Texts, The 27th Annual Conference on Human & Cognitive Language Technology.
Courses
- [Fall] Language and Computer (Undergraduate), Seminar on Computational Liguistics (Graduate)
- [Spring] Computational Linguistics (Undergraduate), Studies on Computational Linguistics II (Graduate)