Publications


  1. ARISE: Agentic Rubric-Guided Iterative Survey Engine for Automated Scholarly Paper Generation
    Zi Wang, Xingqiao Wang, Sangah Lee, and Xiaowei Xu (2026)
    To appear at HAXD 2026
  2. Do Korean-Adapted LLMs Think in Korean? Analyzing Latent Language and the Preservation of Korean-Specific Knowledge
    Sangah Lee (2025)
    Language and Information, Vol.29, No.3, pp.229-256.
  3. Nunchi-Bench: Benchmarking Language Models on Cultural Reasoning with a Focus on Korean Superstition
    Kyuhee Kim and Sangah Lee (2025)
    Findings of the Association for Computational Linguistics: ACL 2025
  4. KoBALT: Korean Benchmark For Advanced Linguistic Tasks
    Hyopil Shin, Sangah Lee, Dongjun Jang, Wooseok Song, Jaeyoon Kim, Chaeyoung Oh, Hyemi Jo, Youngchae Ahn, Sihyun Oh, Hyohyeong Chang, Sunkyoung Kim, Jinsik Lee (2025)
    arXiv
  5. A Short Note on the Structural Priming in LLM: Focusing on Dative Constructions in Korean
    Semoon Hoe and Sangah Lee (2024)
    Language and Information, Vol.28, No.3, pp.111-142 (In Korean).
  6. ManWav: The First Manchu ASR Model
    Jean Seo, Minha Kang, Sungjoo Byun, Sangah Lee (2024)
    Proceedings of the Third Workshop on NLP Applications to Field Linguistics
  7. Large Language Models Show Human-Like Abstract Thinking Patterns: A Construal-Level Perspective
    Seung Joo Yoo and Sangah Lee (2024)
    Proceedings of the Annual Meeting of the Cognitive Science Society
  8. ManNER & ManPOS: Pioneering NLP for Endangered Manchu Language
    Sangah Lee, Sungjoo Byun, Jean Seo, and Minha Kang (2024)
    Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
  9. KoCoNovel: Annotated Dataset of Character Coreference in Korean Novels
    Kyuhee Kim, Surin Lee, and Sangah Lee (2024)
    arXiv
  10. K-Act2Emo: Korean Commonsense Knowledge Graph for Indirect Emotional Expression
    Kyuhee Kim, Surin Lee, and Sangah Lee (2024)
    arXiv
  11. DaG LLM ver 1.0: Pioneering Instruction-Tuned Language Modeling For Korean NLP
    Dongjun Jang, Sangah Lee, Sungjoo Byun, Jinwoong Kim, Jean Seo, Minseok Kim, Soyeon Kim, Chaeyoung Oh, Jaeyoon Kim, Hyemi Jo, and Hyopil Shin (2023)
    arXiv
  12. Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data
    Jean Seo, Sungjoo Byun, Minha Kang, and Sangah Lee (2023)
    3rd Multilingual Represenation Learning (MRL) Workshop
  13. Studies on Clauses in Computational Linguistics Focused on Korean Corpora
    Sangah Lee (2023)
    Journal of Korean Linguistics, No.107, pp. 445-468 (In Korean).
  14. Contract Eligibility Verification Enhanced by Keyword and Contextual Embeddings
    Sangah Lee, Seokgi Kim, Eunjin Kim, Minji Kang, and Hyopil Shin (2022)
    KIISE Vol.49, No.10, pp.848-858 (In Korean).
  15. The Korean Morphologically Tight-Fitting Tokenizer for Noisy User-Generated Texts
    Sangah Lee and Hyopil Shin (2021)
    2021 The 7th Workshop on Noisy User-Generated Text (W-NUT)
  16. Combining Sentiment-Combined Model with Pre-Trained BERT Models for Sentiment Analysis
    Sangah Lee and Hyopil Shin (2021)
    KIISE Vol.49, No.10, pp.848-858 (In Korean).
  17. Argument Facet Detection in Online Debates Based on Attention Weights and Clustering with Combined Similarity Matrices
    Sangah Lee and Hyopil Shin (2021)
    Korean Journal of Linguistics, Vol.46, No.1, pp.107-134.
  18. KR-BERT: A Small-Scale Korean-Specific BERT Language Model
    Sangah Lee, Hansol Jang, Yunmee Baik, Suzi Park and Hyopil Shin (2020)
    Journal of KIISE, Vol.47, No.7, pp.682-692.
  19. An Analysis of Linear Argumentation Structure of Korean Debate Texts Using Sequential Modeling and Linguistic Features
    Sangah Lee and Hyopil Shin (2021)
    Journal of KIISE, Vol.45, No.12, pp.1292-1301 (In Korean).
  20. Stance Classification of Online Debate Texts based on Discourse Relations
    Sangah Lee and Hyopil Shin (2021)
    Language Research, Vol.52, No.3, pp.511-532 (In Korean).



Presentations


  1. Cultural Assessment of Korean Language Generation in Large Language Models: Limitations of Machine-Translated Corpora
    Sangah Lee (2024)
    The 2024 Lingusitic Society of Korea Winter Conference
  2. The Phonological Constraints on Korean Lexical Subclasses
    Nayoung Park and Sangah Lee (2023)
    The 9th International Conference on Phonology and Morphology (ICPM9)
  3. Computational Linguistics and the Study of Korean Syntax
    Sangah Lee (2022)
    The Society of Korean Linguistics
  4. A Method of Infusing Additional Features into Pre-Trained BERT Models for Sentiment Analysis
    Sangah Lee and Hyopil Shin (2020)
    Korea Software Congress 2020
  5. The Occurrence and Evolution of Feminist Twitterians
    Sangah Lee and Suzi Park (2018)
    The Discourse and Cognitive Linguistics Society of Korea
  6. Automatic Prediction of ‘Anti-Search Variants’ of Twitter based on Word Embeddings and Phonetic Similarity
    Sangah Lee (2017)
    The 29th Annual Conference on Human & Cognitive Language Technology
  7. The POS Elderly: Semi-Automatic Annotation Tool for Historical Korean
    Migyeong Kim, Suzi Park and Sangah Lee (2016)
    The 28th Annual Conference on Human & Cognitive Language Technology
  8. An Automatic Classification of Discourse Relations in the Arguing Structure of Korean Texts
    Sangah Lee and Hyopil Shin (2015)
    The 27th Annual Conference on Human & Cognitive Language Technology