-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
executable file
·262 lines (246 loc) · 12.4 KB
/
index.html
File metadata and controls
executable file
·262 lines (246 loc) · 12.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Kan Jen Cheng</title>
<meta name="author" content="Kan Jen Cheng">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Lato:ital,wght@0,400;0,700;1,400;1,700&display=swap" rel="stylesheet">
<link href="https://fonts.googleapis.com/earlyaccess/cwtexkai.css" rel="stylesheet">
<link rel="stylesheet" type="text/css" href="stylesheet.css?v=2">
<link rel="icon" href="images/eecs_logo.jpg">
</head>
<body>
<main class="page">
<!-- ===== Bio ===== -->
<header class="bio">
<div class="bio-content">
<p class="name">Kan Jen Cheng <span class="name-sign" lang="zh-Hant">鄭堪任</span></p>
<p>
I am an incoming PhD student in Computer Science at the
University of Maryland, College Park (Fall 2026),
advised by Professor <a href="https://www.cs.umd.edu/~lin/">Ming C. Lin</a>.
Previously, I received my bachelor's degree from
UC Berkeley, where I worked on
audio-visual research in
<a href="https://people.eecs.berkeley.edu/~gopala/">Berkeley Speech Group (BAIR)</a>,
advised by Professor <a href="https://people.eecs.berkeley.edu/~gopala/">Gopala Anumanchipalli</a>.
</p>
<p>
My research interests lie at the intersection of <b><i>audio-visual learning</i></b>,
<b><i>multimodal perception</i></b>, and <b><i>generative modeling</i></b>.
While recent advancements in LLMs have mastered language,
I view text as a reduction of reality. Instead, I aim to build multimodal systems that
perceive the world through the <b><i>synergy</i></b> of sight and sound, grounded in
<b><i>spatial awareness</i></b> and <b><i>physical understanding</i></b>.
</p>
<p>
If you would like to discuss my research or potential collaborations, feel free to
<a href="mailto:[email protected]">contact me</a>. I'm always open to connect
and collaborate.
</p>
<p class="links">
<a href="mailto:[email protected]">Email</a> /
<a href="https://github.com/iftrush/">GitHub</a> /
<a href="https://www.linkedin.com/in/kan-jen-cheng-b34624193/">LinkedIn</a>
</p>
</div>
<div class="bio-photo">
<img class="profile-photo" alt="Kan Jen Cheng" src="images/profilepic.jpg">
</div>
</header>
<!-- ===== Research Philosophies ===== -->
<section class="section">
<h2>Research Philosophies</h2>
<p>
My work is driven by two core philosophies: <b><i>human-centered perception</i></b>,
where I model speech characteristics, affective dynamics, and joint cognitive attention
to capture how humans naturally experience the world; and <b><i>creative media</i></b>,
where I develop tools that offer precise, object-level control for content creation.
</p>
</section>
<!-- ===== Future Directions ===== -->
<section class="section">
<h2>Future Directions</h2>
<p>
As an audio-visual researcher, I have witnessed the power of <b><i>multimodal synergy</i></b>, yet I
realize that correlation alone is insufficient. True perception requires understanding the
physical laws, such as geometry, dynamics, and material interactions, that govern the spaces
where sight and sound coexist. Consequently, my future research will focus on spatial and
physical learning to contribute to the development of a comprehensive world model. I aim to
move beyond surface-level alignment to construct digital twins that not only mimic the
appearance of the environment but also simulate its underlying physical reality. By grounding
audio-visual generation in these physical truths, we can enable agents to reason about the
world through a unified sensory experience.
</p>
</section>
<!-- ===== Research ===== -->
<section class="section">
<h2>Research</h2>
<!-- Paper: CAVE -->
<article class="paper">
<div class="paper-thumbnail">
<div class="paper-media" tabindex="0" role="button" aria-label="Toggle alternate preview for CAVE" aria-pressed="false">
<div class="paper-media-hover">
<img src="images/av_hf_flow.png" alt="" class="paper-media-asset paper-media-asset--large paper-img-cave-flow">
</div>
<img src="images/av_hf_teaser.png" alt="" class="paper-media-asset paper-media-asset--large paper-media-default paper-img-cave-teaser">
</div>
</div>
<div class="paper-details">
<a href="TODO">
<span class="papertitle">CAVE: Coherent Audio-Visual Emphasis via Schrödinger Bridge</span>
</a>
<br>
<a href="https://iftrush.github.io/"><b>Kan Jen Cheng*</b></a>,
<a href="https://wx83.github.io/">Weihan Xu*</a>,
Koichi Saito,
Nicholas Lee,
<a href="http://louis-liu.notion.site/">Yisi Liu</a>,
<a href="https://jlian2.github.io/">Jiachen Lian</a>,
<a href="https://tinglok.netlify.app/">Tingle Li</a>,
<a href="https://alexander-h-liu.github.io/">Alexander H. Liu</a>,
<a href="https://fnzhan.com/">Fangneng Zhan</a>,
Masato Ishii,
Takashi Shibuya,
<a href="https://pliang279.github.io/">Paul Pu Liang</a>,
<a href="https://people.eecs.berkeley.edu/~gopala/">Gopala Anumanchipalli</a>
<br>
<em>under review</em>
<br>
Collaborate with <i>Sony AI</i> & <i>MIT Media Lab</i>
<div class="paper-links">
<a href="TODO">project page</a> / <a href="TODO">arXiv</a>
</div>
<p>
A realization of human's audio-visual selective attention that jointly emphasizes the
selected object visually and acoustically based on flow-based Schrödinger bridge.
</p>
</div>
</article>
<!-- Paper: EMO-Reasoning -->
<article class="paper">
<div class="paper-thumbnail">
<div class="paper-media" tabindex="0" role="button" aria-label="Toggle alternate preview for EMO-Reasoning" aria-pressed="false">
<div class="paper-media-hover">
<img src="images/emo_score.png" alt="" class="paper-media-asset paper-media-asset--large">
</div>
<img src="images/emo_reason_teaser.png" alt="" class="paper-media-asset paper-media-asset--large paper-media-default">
</div>
</div>
<div class="paper-details">
<a href="https://berkeley-speech-group.github.io/emo-reasoning/">
<span class="papertitle">EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems</span>
</a>
<br>
Jingwen Liu*,
<a href="https://iftrush.github.io/"><b>Kan Jen Cheng*</b></a>,
<a href="https://jlian2.github.io/">Jiachen Lian</a>,
Akshay Anand,
Rishi Jain,
Faith Qiao,
<a href="https://www.stat.berkeley.edu/~yugroup/people/Robbie.html">Robin Netzorg</a>,
<a href="https://hcchou.wixsite.com/huangchengchou">Huang-Cheng Chou</a>,
<a href="https://tinglok.netlify.app/">Tingle Li</a>,
<a href="https://daniellin94144.github.io/">Guan-Ting Lin</a>,
<a href="https://people.eecs.berkeley.edu/~gopala/">Gopala Anumanchipalli</a>
<br>
<em>ASRU</em>, 2025
<div class="paper-links">
<a href="https://berkeley-speech-group.github.io/emo-reasoning/">project page</a> /
<a href="https://arxiv.org/abs/2508.17623">arXiv</a>
</div>
<p>
A holistic benchmark for assessing emotional coherence in spoken dialogue systems
through continuous, categorical, and perceptual metrics.
</p>
</div>
</article>
<!-- Paper: Audio Texture Manipulation -->
<article class="paper">
<div class="paper-thumbnail">
<div class="paper-media paper-media--video" tabindex="0" role="button" aria-label="Toggle alternate preview for Audio Texture Manipulation by Exemplar-Based Analogy" aria-pressed="false">
<div class="paper-media-hover paper-img-ate-video">
<video muted autoplay loop playsinline preload="metadata" class="paper-media-asset">
<source src="images/audio_texture_editing.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</div>
<img src="images/audio_texture_editing_fix.png" alt="" class="paper-media-asset paper-media-asset--large paper-media-default paper-img-ate-teaser">
</div>
</div>
<div class="paper-details">
<a href="https://berkeley-speech-group.github.io/audio-texture-analogy/">
<span class="papertitle">Audio Texture Manipulation by Exemplar-Based Analogy</span>
</a>
<br>
<a href="https://iftrush.github.io/"><b>Kan Jen Cheng*</b></a>,
<a href="https://tinglok.netlify.app/">Tingle Li*</a>,
<a href="https://people.eecs.berkeley.edu/~gopala/">Gopala Anumanchipalli</a>
<br>
<em>ICASSP</em>, 2025
<div class="paper-links">
<a href="https://berkeley-speech-group.github.io/audio-texture-analogy/">project page</a> /
<a href="https://arxiv.org/abs/2501.12385">arXiv</a>
</div>
<p>
A latent diffusion, exemplar-based analogy (In-Context Learning) model for
audio texture manipulation, which refers to editing the overall perceptual quality of
a sound and its interaction with various sound sources.
</p>
</div>
</article>
</section>
<!-- ===== Professional Experiences ===== -->
<section class="section">
<h2>Professional Experiences</h2>
<div class="entry-list">
<div class="entry">
<div class="entry-details">
<p><a href="https://people.eecs.berkeley.edu/~gopala/" class="entry-org">Berkeley AI Research (BAIR)</a>, CA, U.S.</p>
<p>Research Assistant • Spring 2024 - Spring 2026</p>
<p>With: <a href="https://people.eecs.berkeley.edu/~gopala/">Gopala Anumanchipalli</a></p>
</div>
<img class="entry-logo" src="images/bair_logo.png" alt="BAIR logo">
</div>
</div>
</section>
<!-- ===== Education ===== -->
<section class="section">
<h2>Education</h2>
<div class="entry-list">
<div class="entry">
<div class="entry-abbr">Ph.D.</div>
<div class="entry-details">
<p>Fall 2026 - Present</p>
<p>University of Maryland, College Park, MD, U.S.</p>
<p>Ph.D. in Computer Science</p>
</div>
<img class="entry-logo" src="images/umd_seal.svg" alt="University of Maryland seal">
</div>
<div class="entry">
<div class="entry-abbr">B.A.</div>
<div class="entry-details">
<p>Fall 2020, Spring 2022 - Fall 2023</p>
<p>University of California, Berkeley, CA, U.S.</p>
<p>B.A. in Computer Science</p>
</div>
<img class="entry-logo" src="images/ucberkeley_seal.svg" alt="UC Berkeley seal">
</div>
</div>
</section>
<!-- ===== Footer ===== -->
<footer class="footer">
<p class="footer-text">
Last updated: <span id="last-updated"></span>
</p>
<p class="footer-text">
Template from <a href="https://jonbarron.info/">Jon Barron</a>.
</p>
</footer>
</main>
<script src="main.js?v=2" defer></script>
</body>
</html>