Hongchao(Cosimo) Fang

My name is Hongchao Fang ( 方红超 ), a first-year PhD student at Penn State University. I am glad to be supervised by Prof. Wenpeng Yin. Before PSU, I got my Master degree in Computer Science at Northeastern University and my B.Eng. degree from Central University of Finance and Economics, China.

I am fortunate to work under the supervision of Dr. Min, Prof. Hannaneh in UW, Weiyan Shi from Stanford University, and Prof. Xie in UCSD.

I am actively looking for research internships next summer(2025 summer). Feel free to email me if you have openings.

Scholar  /  Email  /  LinkedIn  /  Github  /  Curriculum Vitae

profile photo
News
  • Aug 23, 2024 - I'll be joining AI4Research-Lab at Penn State U this fall, pursuing my PhD degree and continuing my research.
  • Aug 28, 2023 - Glad to announce that I'll be joining CHATS-Lab this fall as a researcher, working on a brand-new topic of fine-grained control on interactive language models with Dr. Weiyan Shi from Stanford University!
  • Jan 15, 2023 - Excited to share that I am selected to join H2Lab, UW as a research student, supervised by Dr. Min and Prof. Hannaneh
  • Nov 28, 2022 - One new paper based on CERT was published by TACL
  • Oct 12, 2022 - CERT gained over 200 citaions! So glad that this paper was liked by so many researchers.
  • May 8, 2022 - I will join Amazon as a Software Engineer Intern. First time working in the industry, looking forward to that.
  • Mar 16, 2020 - I will join AI4H Lab in UCSD, mentored by Prof. Xie
  • Apr 9, 2019 - I will join the edge computing Lab in ICT, Chinese Academy of Science, mentored by Prof. Peng
Research Interest

My research focuses on customized Large Language Models with two directions: LLM with domain knowledge and LLM with certain personalities. I’m currently interested in applying self-supervised learning on generation tasks to explore large language models' ability to learn personality from domain dialogues.

Publications
An End-to-End Contrastive Self-Supervised Learning Framework for Language Understanding
Hongchao Fang, Pengtao Xie
Transactions of the Association for Computational Linguistics 2022; 10 1324–1340
[paper]

We propose a four-level optimization framework that performs data augmentation and contrastive learning end-to-end, to enable the augmented data to be tailored to the contrastive learning task.


Cert: Contrastive self-supervised learning for language understanding
Hongchao Fang, Pengtao Xie
Arxiv Preprint
[paper] [code]

We propose CERT: Contrastive self-supervised Encoder Representations from Transformers, which pertrains language representation models using contrastive self-supervised learning at the sentence level

MedDialog: Large-scale medical dialogue dataset
Guangtao Zeng, Wenmian Yang, Zeqian Ju, Yue Yang, Sicheng Wang, Ruisi Zhang, Meng Zhou, Jiaqi Zeng, Xiangyu Dong, Ruoyu Zhang, Hongchao Fang, Penghui Zhu, Shu Chen and Pengtao Xie
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
[paper][code]

We build large-scale medical dialogue datasets – MedDialog, which is the largest medical dialogue dataset to date. We pre-train several dialogue generation models on the Chinese MedDialog dataset, including Transformer, GPT, BERT-GPT, and compare their performance. It is shown that models trained on MedDialog are able to generate clinically correct and human-like medical dialogues. We also study the transferability of models trained on MedDialog to low-resource medical dialogue generation tasks. It is shown that via transfer learning which finetunes the models pre-trained on MedDialog, the performance of medical dialogue generation tasks with small datasets can be greatly improved, as shown in human evaluation and automatic evaluation.

Research Experiences
providing an evaluation framework for multi-turn, multi-faceted and multi-level personality control abilities, with the popular MyersBriggs Type Indicator (MBTI) as personality types.
Intern Project at Stanford.
Developing a better prompt-based fine-tuning approach that uses contrastive objectives for nonparametric masked language model(NPM)
Intern Project at UW.
Proposing a four-level optimization framework that performs data augmentation and contrastive learning end-to-end, to enable the augmented data to be tailored to the contrastive learning task
Intern Project at UCSD.
Building an automatic image classification model training system based on the cooperation between server and clients
Intern Project at ICT, Chinese Academy of Sciences.
Work Experiences
Designed and implemented a Java API for sellers to fetch billing and invoices and optimized frond-end UI to show more detailed information.
Software Engineer Internship at Amazon.
Hoppies
  • My English name: Cosimo, comes from the protagonist in Calvino's novel: The Baron in the Trees. It tells the story of a baron who stayed on the trees all his life, maintaining his integrity and pursuing his ideals.
  • I am a big rock fan with a wide interest in Punk, Shoe-gazing, and Jazz.
  • I watch movies a lot, especially for French New Wave films. My favorite director is Jean-Luc Godard
  • Besides movies and rock, I am also an enthusiast for extreme sports, including climbing, snowboarding, and skateboarding.
Awards
  • University Graduate Fellowship (UGF) of The Pennsylvania State University
  • Scholarship in College of Engineering, The Pennsylvania State University
  • Academic Excellence Award at Central University of Finance and Economics

Website design from Jon Barron