Mixed reality has increased in applications across education, entertainment, and manufacturing. We will study how multimodal data--skin response, pulse, facial expressions, eye gaze, speech, etc.--aids understanding of engagement levels in immersive environments with diverse users. We will seek to develop and evaluate a flexible metric of engagement for optimizing user experience in immersive applications
Mentors: Joe Geigel, Reynold Bailey, and Cecilia O. Alm
Magnifying technologies allow people with low-vision to read and access digital content. However, they also require agile use of targeting devices. We combine every-day eye-tracking technologies with speech capture to create as well as understand magnifying interfaces that follow the reader's gaze and do not require external pointing devices such as a mouse.
Mentors: Kristen Shinohara, Cecilia O. Alm, and Reynold Bailey
Automation and machine-generation of news content is changing newsrooms' practices, the news consumer's understanding of what journalism is, and how news is selected and presented. We will use multimodal behavioral data (gaze, facial expressions, biophysical signals) to study perception of machine-generated content versus stories written by journalists. Influences of credibility, objectivity, and political bias will be investigated.
Mentors: Ammina Kothari, Reynold Bailey, and Cecilia O. Alm
Deaf or hard-of-hearing individuals may miss important visual content while reading the text captions on videos. We will build on team members' prior work on gaze-guidance and automatic-speech-recognition captioning to explore subtle gaze guidance strategies to draw attention to important regions of a video, with varying levels of caption text accuracy, and across four user groups: deaf/hard-of-hearing, non-native speakers, native speakers, and native speakers with autism spectrum disorder.
Mentors: Matt Huenerfauth, Reynold Bailey, and Cecilia O. Alm
Positive experiences boost human well-being and productivity. We will attempt to explore innovative methods of achieving sensing-based inference of human amusement with deep learning and modest data. Using relevant base models and recent techniques in transfer learning and data augmentation, verbal and nonverbal sensing data elicited while diverse subjects view or read amusing content will be infused into systems to infer ranges of positive expressions and intensities of amusement.
Mentors: Raymond Ptucha and Cecilia O. Alm
This project will build on our work in virtual theatre; which we define as shared, live performance, experienced in a virtual space, with participants contributing from different physical locales. In particular, we will explore how immersive viewing of a virtual performance (using a head mounted display such as an Oculus Rift) affects audience engagement compared viewing the same performance in a non-immersive environment (such as on a screen or monitor). Galvanic skin response, facial expression capture, and possibly eye gaze will be used to gauge audience interest and engagement. In addition, speech capture (of audience descriptions and impressions of the performance) will enable emotional prosody analysis.
Mentors: Joe Geigel, Reynold Bailey, and Cecilia O. Alm
Systematically characterizing the semantics of humor is subjective and difficult. For language technologies, both the interpretation of humor or the expression of amusement remain challenging. A particular caveat is that in machine learning modeling, gold standard data collection tends to rely on third-party annotators interpreting and rating linguistic examples. Rather, for humor, people’s spontaneous verbal and non-verbal reactions of joy and amusement may constitute more natural ground truth. It is important to better understand the relationship between traditional annotation and spontaneous measurement-based annotation, and how they impact machine modeling. This study will capture sensing data from observers (pulse, skin response, facial expressions, and spoken spontaneous reactions such as laughter, lip smacks, gasps, breathing patterns), as they view humorous and non-humorous language-inclusive video clips. The observers will also be asked to label the clips. We will statistically compare amusement signals captured spontaneously against task-based annotation, and assess effect on machine modeling. We will also explore whether reactions to humor has a sustained effect on annotation of consecutive video clips.
Mentors: Ray Ptucha, Chris Homan, and Cecilia O. Alm
While autism spectrum disorder (ASD) is primarily diagnosed through behavioral characteristics, behavioral studies alone are not sufficient to elucidate the mechanisms underpinning ASD-related behavioral difficulties. In previous research, behavioral difficulties associated with ASD have in part been attributed to atypical attention towards stimuli. Eye-tracking has been previously used to quantify atypical visual saliency in ASD during natural scene viewing. In this project, we will explore the potential of electroencephalography (EEG) in providing insight into the neurological correlates of information processing during natural scene viewing. We will integrate EEG and eye-tracking data in machine learning models, with the goal to uncover the attentional and neurological correlates of viewing performance between typical and ASD subjects. These findings may also provide potential markers for early diagnosis of ASD. In this REU project, together with the faculty mentors, student researchers will extend the current state-of-the-art, conduct experiments to collect necessary data, and utilize machine learning and statistical methods to analyze the collected data with a goal to uncover the mechanism underlying atypical attention in ASD.
Mentors: Linwei Wang, Hadi Hosseini, and Catherine Qi Zhao
Smart spaces are typically augmented with devices that are capable of sensing, processing, and actuating. For example, RIT SmartSuite (https://www.rit.edu/fa/smartsuite/) is fitted with multiple sensors and actuators to make living spaces sustainable. These spaces are autonomous systems from which contextual information can be gathered/mined/used to support autonomy minimizing user intervention. However, sensing, modeling, and understanding user context such as intents accurately is difficult and often employs intrusion detection techniques. This project will (1) create an immersive representation of some smart spaces, such as the RIT SmartSuite, using augmented and/or virtual reality and supporting user participation through a head mounted display such as Oculus Rift, and (2) use eye gaze, motion, and facial expression capture to derive interactions between participants and objects. This project will test the effectiveness of visual cues and feedback techniques to improve the virtual presence and participation in technology rich smart spaces. Multi-modal sensor data will be analyzed with a goal to derive user intents.
Mentors: Peizhao Hu, Ammina Kothari, Joe Geigel, and Reynold Bailey
Automated word alignment is a first step in traditional statistical machine translation. We have used alignment in a computational framework that establishes meaningful relationships between observers’ gaze and co-collected spoken language during descriptive image/scene inspection, for machine-annotation of image/scene regions. REU student researchers will extend the current framework to task-based dialogues with pairs of observers discussing visual content. We may also explore an alternative metric for assessing multimodal alignment quality when the alignment input consists of visual-linguistic data from more than one observer.
Mentors: Cecilia O. Alm, Preethi Vaidyanathan, Reynold Bailey, and Ernest Fokoué
In the United States, online learning enrollments are on the rise and competing in growth with traditional course offerings. How is this shift in STEM educational delivery impacting attention, behavior, and performance of students? This project explores how online vs. face-to-face learning contexts influence students. We will use various sensing devices that capture facial expressions, skin response, etc. We will examine what happens when students are given performance incentives, and also consider the role of content difficulty. Project outcomes will contribute to improving individualized learning experiences.
Employment and work circumstances have profound impacts on the well-being and prosperity of individuals and communities. Social media sites provide forums where individuals can divulge, tell stories about, and make sense of their experiences with employment. Students will build upon machine learning methods such as neural networks, social network analysis, and natural language processing techniques to study work-related discourse on social media. Deep architectures and recurrence have revolutionized the neural network community and these techniques will be paired with semantic role labeling or frame semantics to extract efficiently information on employment and work from social media data. The project will explore quantifiable relationships between social media narratives and communities, and psychophysical signals such as reader skin response. These activities will contribute to our understanding of how people make sense of and react to job narratives.
This project will investigate models and algorithms to best leverage human expertise in building machine-learning systems. Active and interactive learning techniques will be explored where the former focuses on minimizing humans’ input to the system and the latter aims to elicit and monitor the most effective user feedback to improve the learning outcome. Students will research a number of interaction strategies between human users and a learning system and help enhance existing active and interactive learning models.
Creative writing is a prominent feature of higher education. It remains very tradition-bound: 1) often taught in a workshop setting as a highly-subjective process with evaluation using aesthetic principles centered on professorial expertise; and 2) focused on the creative work as a fixed artifact (e.g. the story or poem). The model of creative writing instruction influences other forms of writing pedagogy and the model of the work of the creative writer influences other forms of written work. Over recent decades, this model has been critiqued from several angles, with attention to collaborative or social approaches. At the same time, creative writing has changed as digital technologies become a tool and medium for the writer. What happens to the work and the model of instruction in environments increasingly characterized by developments in synthetic language, such as the Amazon Echo and artificial intelligence? The creative work becomes a transaction deeply tied to the sensations and interactions of listeners. Such technologies, which will play a growing role in our environments, allow us to explore new forms of creativity and response. This project will use sensing modalities (pulse or skin response or natural language processing) to understand response to these new creative forms, with implications for creative writing instruction and for a broad range of interactions with the Amazon Echo and similar technologies, as they play a growing role in our daily lives.
In statistical machine translation, bilingual word alignment is a first step in translating text from one language to another. Such algorithms can also establish meaningful relationships between observers’ gaze data and their co-collected verbal descriptions, for machine-annotation of visual content. REU student researchers will extend the current visual-linguistic alignment framework to video, consider holistic and region-based annotation for visual content with positive or negative valence, and explore a novel approach to human assessment of alignment quality within the context of different applications.
In recent work, we used a microphone to record speech and a camera to capture facial expressions of individuals engaged in computer-based tasks. The nature of the tasks were manipulated to induce moderate stress. We were later able to fuse both sources of data to reliably predict when the user was stressed. Students involved in this project will build on these ideas and develop techniques for real-time monitoring of user stress, attention, and cognitive load from multisource data towards monitoring in online learning systems.
Interpersonal violence (IPV) and abuse are prominent problems that impact individuals and macro-level community wellness. Our findings from large social media collections include that beliefs, values, social pressure, and fears play major roles in keeping people in abusive relationships. Semantic role labeling is a method that can help identify information. Our prior work also showed that such tools are not well tailored to this data. Students will focus on: (1) inventorying and modifying prior semantic role labeling tools to better extract information on IPV from social media data, and (2) identifying quantifiable relations between signals in social media narratives, geospatial information, and demographic data sources.
We will extend techniques to systematically fuse multimodal data, representing experts’ domain knowledge, to improve image understanding. Students will enhance a prototype multimodal interactive user interface with interactive machine learning, to effectively collect and incorporate user feedback to enhance the data fusion results.
Post-apocalyptic storytelling in computer games tends to heavily rely on conveying–and users experiencing–haptic, visual and aural cues of danger. Students will study how creative narrative and computational design capture, convey, and produce sensory experience. Students will participate in developing and implementing a game prototype of a post-apocalyptic version of Rochester, the city they are in, and study creative reactions in users.
In statistical machine translation, bitext word alignment is the first step in translating text from one language to another. Alignment algorithms can establish meaningful relationships between observers’ gaze data and their co-collected verbal image descriptions, for image annotation or classification. REU student researchers will (1) extend the bi-modality alignment framework to new modalities, (2) automate transcription with speech recognition, and (3) explore alignment quality considering factors such as concept concreteness and specificity, image domain, and observer background knowledge