Human could perceive the environments around and perform effectively and efficiently with multi-sensory information, and we hope the machines with multiple sensors could have the same ability to understand the world, which is the important foundation for the general artificial intelligence. Thus, this project focus on the multi-modal perception and understanding with the motivation of cognitive neuroscience, and want to resolve these problem below: