K U N M T

Loading

  1. Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts
    Seonmin Koo, Jinsung Kim, YoungJoon Jang, Chanjun Park (✝), Heuiseok Lim (✝)
    EMNLP 2024

  2. Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models
    Seonmin Koo, Jinsung Kim, Chanjun Park (✝), Heuiseok Lim (✝)
    EMNLP 2024-Findings

  3. Translation of Multifaceted Data without Re-Training of Machine Translation Systems
    Hyeonseok Moon, Seungyoon Lee, Seongtae Hong, Seungjun Lee, Chanjun Park , Heuiseok Lim
    EMNLP 2024-Findings

  4. Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark
    Chanjun Park, Hyeonwoo Kim, Dahyun Kim, SeongHwan Cho, Sanghoon Kim, Sukyung Lee, Yungi Kim, Hwalsuk Lee
    ACL 2024

  5. KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models
    Jaehyung Seo, Jaewook Lee, Chanjun Park, SeongTae Hong, Seungjun Lee, Heuiseok Lim
    ACL 2024 (Findings of ACL 2024)

  6. Length-aware Byte Pair Encoding for Mitigating Over-segmentation in Korean Machine Translation
    Jungseob Lee, Hyeonseok Moon, Seungjun Lee, Sugyeong Eo, Chanjun Park, Hyunwoong Ko, Jaehyung Seo, Seungyoon Lee, Heuiseok Lim
    ACL 2024 (Findings of ACL 2024)

  7. SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
    Sanghoon Kim (*), Dahyun Kim(*), Chanjun Park(*), Wonsung Lee(*), Wonho Song(*), Yunsu Kim(*), Hyeonwoo Kim(*), Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee, Sunghun Kim
    NAACL 2024 Industry Track, 2024

  8. Exploring Inherent Biases in LLMs within Korean Social Context: A Comparative Analysis of ChatGPT and GPT-4
    Seungyoon Lee, Dongjun Kim, Dahyun Jung, Chanjun Park, Heuiseok Lim 
    NAACL 2024 Student Research Workshop, 2024

  9. Explainable CED: A Dataset for Explainable Critical Error Detection in Machine Translation
    Dahyun Jung, Sugyeong Eo, Chanjun Park, Heuiseok Lim 
    NAACL 2024 Student Research Workshop, 2024

  10. Model-Based Data-Centric AI: Bridging the Divide Between Academic Ideals and Industrial Pragmatism
    Chanjun Park, Minsoo Khang, Dahyun Kim
    ICLR 2024 – Data-centric Machine Learning Research (DMLR) Workshop, 2024

  11. Leveraging Pre-existing Resources for Data-Efficient Counter-Narrative Generation in Korean
    Seungyoon Lee, Chanjun Park (Corresponding Author), DaHyun Jung, Hyeonseok Moon, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim (Corresponding Author)
    LREC-COLING 2024, 2024

  12. KNOTICED: A Dataset for Critical Error Detection in English-Korean Machine Translation
    Sugyeong Eo, Jungwoo Lim, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
    LREC-COLING 2024, 2024

  13. Hyper-BTS Dataset: Scalability and Enhanced Analysis of Back TranScription (BTS) for ASR Post-Processing
    Chanjun Park, Jaehyung Seo, Seolhwa Lee, Junyoung Son, Hyeonseok Moon, Sugyeong Eo, Chanhee Lee, Heuiseok Lim
    EACL 2024 (Findings of EACL 2024), 2024

  14. Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation
    Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Jaehyung Seo, Heuiseok Lim
    EACL 2024 (Findings of EACL 2024), 2024

  15. KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing
    Seonmin Koo (*), Chanjun Park (*), Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
    EMNLP 2023, 2023

  16. CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients
    Jaehyung Seo, Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Heuiseok Lim
    EMNLP 2023, 2023

  17. Proceedings of the Seventh Widening NLP Workshop (WiNLP 2023)
    Bonaventure F. P. Dossou, Isidora Tourni, Hatem Haddad, Shaily Bhatt, Fatemehsadat Mireshghallah, Sunipa Dev, Tanvi Anand, Weijia Xu, Atnafu Lambebo Tonja, Alfredo Gomez, Chanjun Park
    EMNLP 2023, 2023

  18. Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse
    Seungyoon Lee (*), DaHyun Jung (*), Chanjun Park (*), Seolhwa Lee, Heuiseok Lim
    ICDM 2023 – The First Workshop on Data-Centric AI, 2023

  19. Informative Evidence-guided Prompt-based Fine-tuning for English-Korean Critical Error Detection
    DaHyun Jung, Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
    IJCNLP-AACL 2023, 2023

  20. Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction
    Chanjun Park(*), Seonmin Koo(*), Seolhwa Lee(*), Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
    ICML 2023 – Data-centric Machine Learning Research (DMLR) Workshop, 2023

  21. DMOps: Data Management Operation and Recipes
    Eujeong Choi(*), Chanjun Park(*)
    ICML 2023 – Data-centric Machine Learning Research (DMLR) Workshop, 2023

  22. Inter-Annotator Agreement in the Wild: Uncovering Its Emerging Roles and Considerations in Real-World Scenarios
    NamHyeok Kim(*), Chanjun Park(*) 
    ICML 2023 – Data-centric Machine Learning Research (DMLR) Workshop, 2023

  23. Transcending Traditional Boundaries: Leveraging Inter-Annotator Agreement (IAA) for Enhancing Data Management Operations (DMOps)
    Damrin Kim, NamHyeok Kim, Chanjun Park, Harksoo Kim
    ICML 2023 – Data-centric Machine Learning Research (DMLR) Workshop, 2023

  24. Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation
    Seugnjun Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim
    ICML 2023 – Data-centric Machine Learning Research (DMLR) Workshop, 2023

  25. Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline
    Seonmin Koo(*), Chanjun Park(*), Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
    ICML 2023 – Data-centric Machine Learning Research (DMLR) Workshop, 2023

  26. Knowledge Graph-Augmented Korean Generative Commonsense Reasoning
    Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim
    ICML 2023 – Data-centric Machine Learning Research (DMLR) Workshop, 2023

  27. Improving Formality-Sensitive Machine Translation using Data-Centric Approaches and Prompt Engineering
    Seugnjun Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim
    IWSLT 2023 – ACL 2023, 2023

  28. Towards Diverse and Effective Question-Answer Pair Generation from Children Storybooks
    Sugyeong Eo, Hyeonseok Moon, Jinsung Kim, Yuna Hur, Jeongwook Kim, SongEun Lee, Changwoo Chun, Sungsoo Park, Heuiseok Lim
    ACL 2023 -Findings, 2023
  29. PEEP-Talk: A Situational Dialogue-based Chatbot for English Education
    Seugnjun Lee, Yoonna Jang, Chanjun Park, Jungseob Lee, Jaehyung Seo, Hyeonseok Moon, Sugyeong Eo, Seounghoon Lee, Bernardo Nugroho Yahya, Heuiseok Lim
    ACL 2023 – Demo Track, 2023

  30. PicTalky: Augmentative and Alternative Communication for Language Developmental Disabilities
    Chanjun Park, Yoonna Jang, Seolhwa Lee, Jaehyung Seo, Kisu Yang, Heuiseok Lim
    AACL 2022 – Demo Track, 2022

  31. KU X Upstage’s submission for the WMT22 Quality Estimation: Critical Error Detection Shared Task
    Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
    WMT 2022 – EMNLP 2022, 2022

  32. QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation
    Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Gyeongmin Kim, Jungseob Lee, Heuiseok Lim
    COLING 2022, 2022

  33. Focus on FoCus: Is FoCus focused on Context, Knowledge and Persona?
    SeungYoon Lee, Jungseob Lee, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Jaehyung Seo, Jeongbae Park, Heuiseok Lim
    COLING 2022 – The 1st Workshop on Customized Chat Grounding Persona and Knowledge , 2022

  34. A Self-Supervised Automatic Post-Editing Data Generation Tool
    Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Jaehyung Seo, Seungjun Lee, Heuiseok Lim
    ICML 2022 – DataPerf workshop, 2022

  35. A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation
    Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim
    NAACL 2022 – Findings, 2022
  36. Priming Ancient Korean Neural Machine Translation
    Chanjun Park, Seolhwa Lee, Hyeonseok Moon, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim
    LREC 2022, 2022

  37. FreeTalky: Don’t Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue
    Chanjun Park, Yoonna Jang, Seolhwa Lee, Sungjin Park, Heuiseok Lim
    LREC 2022, 2022

  38. Empirical Analysis of Synthetic Data Generation Using Noising Strategies for Automatic Post-editing
    Hyeonseok Moon, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Jeongsub Lee, Sugyeong Eo, Heuiseok Lim
    LREC 2022, 2022

  39. FreeTalky: Don’t Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue
    Chanjun Park(*), Yoonna Jang(*), Seolhwa Lee(*), Sungjin Park(*), Heuiseok Lim
    AAAI 2022 -Artificial Intelligence for Education(AI4EDU), 2022
  40. How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus
    Chanjun Park, Seolhwa Lee, Hyeonseok Moon, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim
    NeurIPS 2021 – Data-centric AI (DCAI) workshop, 2021

  41. A New Tool for Efficiently Generating Quality Estimation Datasets
    Sugyeong Eo, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim
    NeurIPS 2021 – Data-centric AI (DCAI) workshop, 2021

  42. Automatic Knowledge Augmentation for Generative Commonsense Reasoning
    Jaehyung Seo, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
    NeurIPS 2021 – Data-centric AI (DCAI) workshop, 2021

  43. Syntax-enhanced Dialogue Summarization using Syntax-aware information
    Seolhwa Lee, Kisu Yang, Chanjun Park, João Sedoc, Heuiseok Lim
    NeurIPS 2021 – Women in Machine Learning (WiML 2021) workshop, 2021
  44. Towards Syntax-Aware DialogueSummarization using Multi-task Learning
    Seolhwa Lee, Kisu Yang, Chanjun Park, João Sedoc, Heuiseok Lim
    EMNLP 2021 -Widening NLP (WiNLP2021) workshop, 2021
  45. Two Heads are Better than One? Verification of Ensemble Effect in Neural Machine Translation
    Chanjun Park, Sungjin Park, Seolhwa Lee, Taesun Whang, Heuiseok Lim
    EMNLP 2021 -The Second Workshop on Insights from Negative Results in NLP, 2021 – (Oral presentation)
  46. BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text
    Chanjun Park, Jaehyung Seo, Seolhwa Lee, Chanhee Lee, Hyeonseok Moon, Sugyeong Eo, Heuiseok Lim
    ACL 2021 -WAT(Workshop on Asian Translation) 2021 Workshop, 2021 – (oral presentation)
  47. Dealing with the Paradox of Quality Estimation
    Sugyeong Eo (*), Chanjun Park (*), Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim 
    MT Summit 2021 – LoResMT, 2021- (Oral presentation)
  48. Should we find another model?: Improving Neural Machine Translation Performance with ONE-Piece Tokenization Method without Model Modification
    Chanjun Park (*), Sugyeong Eo (*), Hyeonseok Moon (*), Heuiseok Lim
    NAACL-HLT 2021 Industry Track, 2021- (Poster/Oral presentation)

KU NMT Group.

School of Computer Science

College of Engineering, Korea University

© 2024 KU NMT GROUP.

Contact US

  • Group Leader Email
    bcj1210@naver.com
  • Address
    #311 Aegineung Student Center, College of Informatics, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea