News
- Aug 23, 2024 - I'll be joining
AI4Research-Lab at Penn State U
this fall, pursuing my PhD degree and continuing my research.
- Aug 28, 2023 - Glad to announce that I'll be joining
CHATS-Lab
this fall as a researcher, working on a brand-new topic of
fine-grained control on interactive language models with
Dr. Weiyan Shi from Stanford University!
- Jan 15, 2023 - Excited to share that I am selected to join H2Lab, UW as a research student, supervised by
Dr. Min and
Prof. Hannaneh
- Nov 28, 2022 - One new paper based on CERT was published by TACL
- Oct 12, 2022 - CERT gained over 200 citaions! So glad that this paper was liked by so many researchers.
- May 8, 2022 - I will join Amazon as a Software Engineer Intern. First time working in the industry, looking forward to that.
- Mar 16, 2020 - I will join AI4H Lab in UCSD, mentored by Prof. Xie
- Apr 9, 2019 - I will join the edge computing Lab in ICT, Chinese Academy of Science, mentored by Prof. Peng
|
Research Interest
My research focuses on customized Large Language Models with two directions: LLM with domain knowledge and LLM
with certain personalities. I’m currently interested in applying self-supervised learning on generation tasks to explore
large language models' ability to learn personality from domain dialogues.
|
Publications
|
An End-to-End Contrastive Self-Supervised Learning Framework for Language Understanding
Hongchao Fang,
Pengtao Xie
Transactions of the Association for Computational Linguistics 2022; 10 1324–1340
[paper]
We propose a four-level optimization framework that performs data augmentation and contrastive learning end-to-end,
to enable the augmented data to be tailored to the contrastive learning task.
|
|
Cert: Contrastive self-supervised learning for language understanding
Hongchao Fang,
Pengtao Xie
Arxiv Preprint
[paper]
[code]
We propose CERT: Contrastive self-supervised Encoder Representations from Transformers,
which pertrains language representation models using contrastive self-supervised learning at the sentence level
|
|
MedDialog: Large-scale medical dialogue dataset
Guangtao Zeng, Wenmian Yang, Zeqian Ju, Yue Yang, Sicheng Wang, Ruisi Zhang,
Meng Zhou, Jiaqi Zeng, Xiangyu Dong, Ruoyu Zhang, Hongchao Fang, Penghui Zhu,
Shu Chen and Pengtao Xie
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
[paper][code]
We build large-scale medical dialogue datasets – MedDialog, which is the largest medical dialogue dataset to date.
We pre-train several dialogue generation models on the Chinese MedDialog dataset, including Transformer, GPT,
BERT-GPT, and compare their performance. It is shown that models trained on MedDialog are able to generate clinically
correct and human-like medical dialogues. We also study the transferability of models trained on MedDialog to low-resource medical dialogue generation tasks.
It is shown that via transfer learning which finetunes the models pre-trained on MedDialog, the performance of medical dialogue generation
tasks with small datasets can be greatly improved, as shown in human evaluation and automatic evaluation.
|
Research Experiences
|
providing an evaluation framework for multi-turn, multi-faceted and multi-level personality control abilities, with the popular MyersBriggs Type Indicator (MBTI) as personality types.
Intern Project at Stanford.
|
|
Developing a better prompt-based fine-tuning approach that uses contrastive objectives for nonparametric masked language model(NPM)
Intern Project at UW.
|
|
Proposing a four-level optimization framework that performs data augmentation and contrastive learning end-to-end, to enable the augmented data to be tailored to the contrastive learning task
Intern Project at UCSD.
|
|
Building an automatic image classification model training system based on the cooperation between server and clients
Intern Project at ICT, Chinese Academy of Sciences.
|
Work Experiences
Designed and implemented a Java API for sellers to fetch billing and invoices and optimized frond-end UI to show more detailed information.
Software Engineer Internship at Amazon.
|
Hoppies
- My English name: Cosimo, comes from the protagonist in Calvino's novel: The Baron in the Trees. It tells the story of a baron who stayed on the trees all his life, maintaining his integrity and pursuing his ideals.
- I am a big rock fan with a wide interest in Punk, Shoe-gazing, and Jazz.
- I watch movies a lot, especially for French New Wave films. My favorite director is Jean-Luc Godard
- Besides movies and rock, I am also an enthusiast for extreme sports, including climbing, snowboarding, and skateboarding.
|
Awards
- University Graduate Fellowship (UGF) of The Pennsylvania State University
- Scholarship in College of Engineering, The Pennsylvania State University
- Academic Excellence Award at Central University of Finance and Economics
|
|