
Sunday, July 9, 2006 (13:30 - 16:30, Carmichael)
Prof. Howard Leung, City University of Hong Kong
Prof. Tsuhan Chen, Carnegie Mellon University, USA
With the growing demand from the audience for high-quality visual experience, people continue to explore multimedia technologies that can be adopted in the movie and animation industries. In the spatial domain, audience would like to be able to look at a realistic scene that may include synthetic objects from various viewpoints. This can be achieved by using the concept of the plenoptic function as a seven-dimensional function that represents all the light rays in a dynamic scene. The research on sampling, storing, interpolating, and reconstructing the plenoptic function, has been emerging at both academic and industrial research institutions and is commonly referred to as image-based rendering or multi-view image processing. In the temporal domain, audience would like to see a human-like character or an avatar move realistically just like a human. The development of motion capture systems facilitates this task by actually collecting the 3D information of human motions. The research on capturing the accurate 3D data with less manual intervention and on processing of the motion capture data have been hot topics in recent years.
Recent convergence of image processing, computer vision, and computer graphics has resulted in significant progress in image-based rendering and motion capture techniques. Now widely used in applications ranging from special effects (Remember the movie "The Matrix" and “The Polar Express”?) to virtual reality, image-based rendering and motion capture have become critical tools for creating visually exciting content. Image-based rendering helps to capture real-world scenes directly without the need for computationally expensive modeling of 3D geometry or surface reflectance, as is often done in traditional computer graphics. On the other hand, motion capture is employed for collecting human motion data to enhance realism compared with physically-based modeling and to reduce manual labeling as in rotoscoping.
Interesting problems in image-based rendering and motion capture will be discussed in this tutorial. The section on the image-based rendering includes capturing, sampling, representation and streaming of multi-view images needed to render realistically looking scenes. For examples, lightfield, lumigraph, and concentric mosaics are popular representations of multi-view data using different sampling schemes. While studying the mechanism for sampling multi-view data, we will reveal the connections between image-based rendering and multidimensional multirate signal processing, and the Sampling Theorem discovered by Harry Nyquist 80 years ago! The section on motion capture will be focused on optical motion capture systems that are divided into two main categories: markerless and marker-based motion capture systems. The feature extraction, data representation, processing of motion capture data for classification and retrieval will be introduced. Applications of motion capture systems for dancing performance, movie production, animation and robot motion driving will be discussed.
Howard Leung is currently an Assistant Professor in the Department of Computer Science at City University of Hong Kong. He received the B.Eng. degree in Electrical Engineering from McGill University, Canada, in 1998, the M.Sc. degree and the Ph.D. degree in Electrical and Computer Engineering from Carnegie Mellon University in 1999 and 2003 respectively. Throughout his graduate studies at CMU, he was with the Advanced Multimedia Processing Lab and worked on various projects such as trademark and sketch retrieval, 3D virtual environment for multi-user conferencing and image-based rendering for 3D virtual environment. In addition, he spent several summers in I.B.M. T. J. Watson Research Center for building an interactive seminar broadcast system, developing an immersive whiteboard and creating a realistic video avatar.
Currently Howard is working on several research projects along the area of multimedia signal processing and pattern recognition. He has been developing a web-based Chinese handwriting education system with an intelligent analysis tool to provide instant feedback to students. Moreover, he is working on re-synthesizing the dynamic writing from static Chinese calligraphy images by applying image processing and novel model parameter estimation techniques. In addition, he is experimenting with novel approaches for processing, indexing and retrieving 3D human motions captured by Motion Capture system. Furthermore, he has been collaborating with the industry on a project for developing a multi-user whiteboard system for mobile devices.
Howard has also been actively involved in various professional activities. He was the Local Arrangement Co-chair for ACM Symposium on Virtual Reality Software and Technology (VRST 2004) and the Tutorial Chair for the 4th International Conference on Web-Based Learning (ICWL 2005). He served as a Technical Program Committee Member for IEEE International Symposium on Intelligent Multimedia, Video & Speech Processing (ISIMP 2004), Asia-Pacific Workshop on Visual Information Processing (VIP 2005), Multimedia Communications and Home Networking Symposium, IEEE International Conference on Communications (ICC 2005) and the 11th International Conference on Parallel and Distributed Systems (ICPADS 2005). He is the Organization Chair of the 5th International Conference on Web-Based Learning (ICWL 2006) and the Finance Chair of the Tenth IEEE International EDOC Conference (EDOC 2006). He is currently a member of the IEEE Signal Processing Society and is the Tresurer of the Hong Kong Web Society.
Tsuhan Chen has been with the Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, since October 1997, where he is currently Professor. He directs the Advanced Multimedia Processing Laboratory, striving to turn multimedia technologies from science fiction into reality. His research interests include multimedia signal processing and communication, implementation of multimedia systems, multimodal biometrics, audio-visual interaction, pattern recognition, computer vision and computer graphics, and bioinformatics. From August 1993 to October 1997, he worked in the Visual Communications Research Department, AT&T Bell Laboratories, Holmdel, New Jersey, and later at AT&T Labs-Research, Red Bank, New Jersey, as a senior technical staff member and then a principle technical staff member.
Tsuhan helped create the Technical Committee on Multimedia Signal Processing, as the founding chair, and the Multimedia Signal Processing Workshop, both in the IEEE Signal Processing Society. His endeavor later evolved into founding of the IEEE Transactions on Multimedia and the IEEE International Conference on Multimedia and Expo, both joining the efforts of multiple IEEE societies. He was appointed the Editor-in-Chief of IEEE Transactions on Multimedia for 2002-2004.
Before serving as the Editor-in-Chief for IEEE Transactions on Multimedia, he also served in the Editorial Board of IEEE Signal Processing Magazine and as Associate Editor for IEEE Trans. on Circuits and Systems for Video Technology, IEEE Trans. on Image Processing, IEEE Trans. on Signal Processing, and IEEE Trans. on Multimedia. He has co-edited a book titled Advances in Multimedia: Systems, Standards, and Networks.
Tsuhan received the B.S. degree in electrical engineering from the National Taiwan University in 1987, and the M.S. and Ph.D. degrees in electrical engineering from the California Institute of Technology, Pasadena, California, in 1990 and 1993, respectively. He received the Charles Wilts Prize for outstanding independent research in Electrical Engineering leading to a Ph.D. degree at the California Institute of Technology. He has published more than a hundred of technical papers and holds fifteen U.S. patents. He was a recipient of the National Science Foundation CAREER Award, titled “Multimodal and Multimedia Signal Processing,” from 2000 to 2003.