Masters Thesis Announcement: Threshold Interval Indexing Techniques for = Complicated Uncertain Data by Andrew Knight Title: Threshold Interval Indexing Techniques for Complicated Uncertain = Data Candidate: Andrew Knight E-mail: AndyLPK247@gmail.com Defence Date: August 27, 2010 Time: 10:00 am Location: Breakout Room 3 URL: http://people.rit.edu/alk1234/pages/thesis.html Abstract: Uncertain data is an increasingly prevalent topic in database research, = given the advance of instruments which inherently generate uncertainty = in their data. In particular, the problem of indexing uncertain data for = range queries has received considerable attention. To efficiently = process range queries, existing approaches mainly focus on reducing the = number of disk I/Os. However, due to the inherent complexity of = uncertain data, processing a range query may incur high computational = cost in addition to the I/O cost. In this paper, I present a novel = indexing strategy focusing on one-dimensional uncertain continuous data, = called threshold interval indexing. Threshold interval indexing is able = to balance I/O cost and computational cost to achieve an optimal overall = query performance. A key ingredient of the proposed indexing structure = is a dynamic interval tree. The dynamic interval tree is much more = resistant to skew than R-trees, which are widely used in other indexing = structures. This interval tree optimizes pruning by storing x-bounds, or = pre-calculated probability boundaries, at each node. In addition to the = basic threshold interval index, I present two variants, called the = strong threshold interval index and the hyper threshold interval index, = which leverage x-bounds not only for pruning but also for accepting = results. Furthermore, I present a more efficient memory-loaded versions = of these indexes, which reduce the storage size so the primary interval = tree can be loaded into memory. Each index description includes methods = for querying, parallelizing, updating, bulk loading, and externalizing. = I perform an extensive set of experiments to demonstrate the = effectiveness and efficiency of the proposed indexing strategies. Chair: Manjeet Rege Reader: Qi Yu Observer: Zack Butler Masters Thesis Announcement: Dynamic Fault Tolerant Grid Workflow in the = Water Threat Management Project by Young Suk Moon Title: Dynamic Fault Tolerant Grid Workflow in the Water Threat = Management Project Candidate: Young Suk Moon E-mail: ysm906@gmail.com Defence Date: August 9, 2010 Time: 1:00 pm Location: ICL 1 (70-3520) URL: http://cyberaide.org/members/moon-young-suk/masters-thesis#MSThesis Abstract: Achieving fault tolerance is an inevitable problem in distributed = systems, with it becoming more challenging in decentralized, = heterogeneous, and dynamic-environment systems such as a Grid. When = deploying applications requires time-criticality, how to allocate = resources for jobs in a fault-tolerant manner is an important issue for = the delivery of the services. The Water Threat Management project is a research to find solutions for = the contamination incidents problems in urban water distribution = systems, and it involves the development of the cyberinfrastructure in a = Grid environment. To handle such urgent events properly, the deployment = of the system demands real-time processing without the failure. Our approach of integrating a fault-tolerant framework into a Water = Threat Management system provides fault tolerance at the "queuing stage" = rather than the "job-execution stage" by scheduling jobs in = fault-tolerant ways. This includes the development of the batch queuing = system in the Cyberaide Shell project. In addition, we present a dynamic = workflow in the Water Threat Management system that can reduce the queue = wait time in the changing environment. Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon Masters Thesis Announcement: An Analytic Investigation into Self = Organizing Maps and their Network Topologies by Renee Baltimore Title: An Analytic Investigation into Self Organizing Maps and their = Network Topologies Candidate: Renee Baltimore E-mail: rfb4636@cs.rit.edu Defence Date: 11/17/2010 Time: 10:00 am Location: 70-3400 URL: = http://www.cs.rit.edu/~rfb4636/Renee%20Baltimore%20MS%20Thesis%20Report.pd= f Abstract: This paper details master=92s thesis work involving research and = investigation into the approach of self-organizing maps for clustering = of data, more specifically, clustering of image data and how this can be = used in understanding image composition. This work will build upon ideas = which have previously been explored, such as using self organizing maps = for identifying and grouping different regions of an image which may = possess similar features. A large part of this research is based upon = experimentation with a variety of topological models of the = self-organizing map network and investigation into what advantages these = different topologies afford the network in terms of its clustering = capabilities. Chair: Dr. Roger Gaborski Reader: Dr. Peter Anderson Observer: TBA Masters Thesis Announcement: Segmentation of Slap Fingerprints by Derek = M Johnson Title: Segmentation of Slap Fingerprints Candidate: Derek M Johnson E-mail: dmj3538@cs.rit.edu Defence Date: 11/17/2010 Time: 11:00 am Location: Golisano College of Computing and Information Sciences URL: www.cs.rit.edu/~dmj3538/thesis Abstract: This thesis describes a novel algorithm that segments the individual = fingerprints in a multi-print image. The algorithm segments the distal = phalanx portion of the fingers that appear in the image and labels the = finger for each identified print. The accuracy of this algorithm is = compared to the publicly-available reference implementation, NFSEG, part = of the NIST Biometric Image Software (NBIS) suite developed at NIST = (National Institute of Standards and Technology). The comparison is = performed over large set of fingerprint images captured from unique = individuals. Chair: Dr. Roger S. Gaborski Reader: Dr. Peter G. Anderson Observer: TBA Masters Thesis Announcement: An Analysis of the Million Module March = Algorithm Applied to the ATRON Robotic Platform by James Phipps Title: An Analysis of the Million Module March Algorithm Applied to the = ATRON Robotic Platform Candidate: James Phipps E-mail: jnp3646@cs.rit.edu Defence Date: 12/16/10 Time: 10:00 am Location: 70-3672 URL: http://www.cs.rit.edu/~jnp3646 Abstract: The Million Module March algorithm is a locomotion planning = algorithm for self-reconfiguring robotic systems. It was first = introduced by Robert Fitch and Zack Butler. It has already been proven = to successfully plan movement for a kinematic abstraction whose traits = are very different from the kinematic traits of the ATRON system. In = this work we further examine this algorithm, and an adaptation of it to = the ATRON robotic system. We examine a two dimensional proof of the reachability of = connected configurations of sliding squares, and expand the proof to the = three dimensional SlidingCube model of a self-reconfiguring robot. Using = this proof, we explore in greater detail the theoretical basis of the = Million Module March algorithm. We then modify the simulator used in the original Million Module = March works to simulate the ATRON platform, and run a series of = experiments. Ultimately, it is determined that the algorithm does not = consistently perform as desired on the ATRON platform. We demonstrate = that this performance is due to the inability of ATRON's kinematics to = guarantee reachability of connected configurations, and that therefore = no similar algorithm of sublinear complexity can be guaranteed to = perform as desired. Chair: Dr. Zack Butler Reader: Dr. Richard Zanibbi Observer: Dr. James Heliotis Masters Thesis Announcement: HadoopT - Breaking the Scalability Limits = of Hadoop by Anup Talwalkar Title: HadoopT - Breaking the Scalability Limits of Hadoop Candidate: Anup Talwalkar E-mail: ast1207@rit.edu Defence Date: 02/02/2011 Time: 12:00 pm Location: 70-3000 URL: http://www.cs.rit.edu/~ast1207/thesis/ Abstract: Hadoop is primarily used for its large scale data storage and data = intensive computing using MapReduce. Hadoop follows the master/slave = architecture with a single master known as NameNode and multiple slaves = known as DataNodes. Recently, limitations of Hadoop such as single point of failure of the = NameNode, limitations on scalability and fault tolerance of the cluster = have become prominent. The entire namespace of the Hadoop cluster is = stored on a single NameNode which limits the growth and data storage = capacity. This thesis work introduces multiple NameNodes architecture that allows = the Hadoop cluster to scale up further using distributed metadata = storage. We propose a new architecture known as HadoopT which was = designed for distributed metadata storage. This architecture consists of = two main components: low level metadata storage servers as NameNodes and = a high level metadata storage node as SuperNode to access the cluster. The effectiveness of this architecture is demonstrated by simulating a = HadoopT cluster and comparing its performance with the default Hadoop = configuration. The comparison was performed for both simulations based = on data transfers, job executions and boot up times of the nodes. The = main outcome of this work is overcoming the scalability limits of Hadoop = and the single point of failure. Chair: Dr. Minseok Kwon Reader: Dr. Rajendra K. Raj Observer: Dr. Stanis=C5=82aw P. Radziszowski Masters Thesis Announcement: Cloth Simulation using Hardware = Tessellation by David Huynh Title: Cloth Simulation using Hardware Tessellation Candidate: David Huynh E-mail: dxh9178@rit.edu Defence Date: 2/18/2011 Time: 10:00 am Location: 70-3405 URL: http://davehuynh.com/en/project/cloth_sim Abstract: Cloth simulation has long been a topic of interest in computer graphics = since the early works of Terzopoulos et al. Over the years many techniques have been = developed to simulate cloth. Though the general concern has been on the physical = accuracy of the simulation. As the simulation gets closer to computational sciences the = complexity also increases which at times may come at the cost of real-time performance. = With newer and more powerful graphics hardware coming out each year, researchers are = starting to shy away from the traditional CPU implementation and turning towards the GPU = to offload work. As the parallel nature of the graphics hardware offer much better = performance, researcher can process many tasks, originally sequential tasks, = simultaneously on the GPU. I propose a solution that will map current industry standard=92s = position-based dynamics on to the new graphics pipeline. The focus is on performance and visual = realism rather than physical accuracy. By implementing such solutions on the graphics = hardware, more detailed cloth behavior can be simulated with real-time performance. In = this paper, the described cloth simulation solution will be done completely on the GPU = through the use of hardware tessellation on the new DirectX 11 graphics pipeline. The = solution though originally designed specifically for cloth may also be adapted for = generic deformable object (soft body dynamics). Chair: Joe Geigel Reader: Reynold Bailey Observer: Christopher Egert Masters Thesis Announcement: Influence Maximization on Families of = Graphs by Andrei Mouravski Title: Influence Maximization on Families of Graphs Candidate: Andrei Mouravski E-mail: amouravski@gmail.com Defence Date: February 24, 2011 Time: 11:00 am Location: 70-3000 URL: www.amouravski.com/thesis.pdf Abstract: We examine the feasibility of computing the maximum expected influence = on paths and trees using the independent cascade model. We created a = polynomial time method for determining the expected influence from any = initial state on the independent cascade model on acyclic influence = graphs in O$\Big(|V(G)|^2\cdot\max{(|V(G)|,|V(E)|)}\Big)$ time. We = created a polynomial time program that would solve for the maximum = expected influence and optimal initially selected nodes on arbitrary = paths in O$(|V(G)|^5)$ time. We have also proved a bound of = O$(|V(T)|^4\cdot ({\frac{2|V(T)|}{\varepsilon})}^2)$ on approximating = the maximum expected influence on arbitrary trees with error equal to = $\varepsilon\cdot2|V(T)|$. Chair: Christopher M. Homan Reader: Ivona Bez=C3=A1kov=C3=A1 Observer: Stanis=C5=82aw P. Radziszowski Masters Thesis Announcement: An Integrated Environment for Data = Acquisition with Dynamic Changes in Wireless Sensor Networks by Albert = Nurgaliev Title: An Integrated Environment for Data Acquisition with Dynamic = Changes in Wireless Sensor Networks Candidate: Albert Nurgaliev E-mail: arn3237@cs.rit.edu Defence Date: 5/27/11 Time: 3:00 pm Location: 70-3672 URL: http://www.cs.rit.edu/~arn3237 Abstract: The wireless sensor network (WSN) is an important technology with a wide = variety of diverse applications in such domains as healthcare, military = forces and environmental monitoring. Our research aims at developing methods and tools capable of addressing = typical WSN problems such as energy constraint, low memory, and = computation capability of a sensor node by implementing a new WSN design = concept, as well as improving existing and developing new protocols. Our = research goal is to develop novel generic methodologies supporting a = higher level of design flexibility and possible architectural = optimization against multiple criteria such as the quality of data = (QoD), quality of service (QoS), and lifetime extension. Application = requirements may vary in terms of abovementioned parameters and = consequently there is no single platform that can be applied to all = domains. Moreover, current methods do not provide opportunities for = dynamic changes of either protocols or their parameters, which might = improve WSN agility and survivability in a harsh environment. Therefore, = this problem can be solved by integrating various protocols at different = layers within a single framework and supporting their dynamic selection = in order to adapt the network to varying application requirements. This thesis develops a mechanism which facilitates structural design and = implementation of an Integrated Environment for Data Acquisition with = Dynamic Changes (IEDADC). It features adaptation and integration of = protocols, protocol switching and automatic or manual selection as well = as some other features such as implementation of quality assurance and = localization techniques. The design methodology is tested by implementing a SN prototype = consisting of a base station and sensor nodes. Sun SPOT (Small = Programmable Object Technology) is used as a hardware basis for this = work. The software has been developed in Java programming language = including the host and sensor nodes=92 applications. The conducted experiments have confirmed the higher level of design = flexibility and optimization of the following criteria: energy = consumption, QoD and QoS. Chair: Leon Reznik Reader: Minseok Kwon Observer: Hans-Peter Bischof Masters Thesis Announcement: Security of the SHA-3 Candidates Keccak and = Blue Midnight Wish: Zero-sum Property by Liliya Andreicheva Title: Security of the SHA-3 Candidates Keccak and Blue Midnight Wish: = Zero-sum Property Candidate: Liliya Andreicheva E-mail: lna5520@rit.edu Defence Date: 06/16/2011 Time: 10:00 am Location: Break out room 3 URL: http://www.cs.rit.edu/~lna5520/ Abstract: The SHA-3 competition for the new cryptographic standard was initiated = by National Institute of Standards and Technology (NIST) in 2007. In the = following years, the event grew to one of the top areas currently being = researched by the CS and cryptographic communities. The first objective = of this thesis is to overview, analyse and critique the SHA-3 = competition. The second one is to perform an in-depth study of the = security of two candidate hash functions, the finalist Keccak and the = second round candidate Blue Midnight Wish. The study shall primarily = focus on zero-sum distinguishers. First we attempt to attack reduced = versions of these hash functions and see if any vulnerabilities can be = detected. This is followed by attacks on their full versions. In the = process, a novel approach is utilized in the search of zero-sum = distinguishers by employing SAT solvers. We conclude that while such = complex attacks can theoretically uncover undesired properties of the = two hash functions presented, such attacks are still far from being = fully realized due to current limitations in computing power. Chair: Stanislaw P. Radziszowski Reader: Edith Hemaspaandra Observer: Ivona Bezakova Masters Thesis Announcement: Scientific Visualization Using Pixar's = RenderMan by John Lukasiewicz Title: Scientific Visualization Using Pixar's RenderMan Candidate: John Lukasiewicz E-mail: jxl6110@rit.edu Defence Date: 6/29/2011 Time: 11:00 am <---- modified Location: 70-3576 <-----updated URL: http://www.cs.rit.edu/~jxl6110/ Abstract: This thesis will attempt to visualize astrophysical data that is = proprocessed and formatted by the Spiegel software using Pixar=E2=80=99s Renderman. The = output will consist of a large set of points and data associated with each point. The goal = is to create images that are both informative and aesthetically pleasing to the = viewer. This has been done many times before with software rendering and APIs such as = OpenGL or JOGL. This thesis will use Pixar=E2=80=99s Photorealistic Renderman, or = PRMan for short, as a renderer. PRMan is an industry proven standard renderer that is = based on the Renderman Interface Speci=EF=AC=81cation which has been in = development since 1989. The original version was released in September of 1989 and the latest = speci=EF=AC=81cation, version 3.2 was published in 2005. Since aesthetics is a subjective quality based on the viewers=E2=80=99 = preference, the only way to determine if an image is aesthetically pleasing is to survey = a general population. The thesis includes an experiment to assess the quality of = the new renders. Chair: Professor Hans-Peter Bischof, Ph.D. Reader: Reader: Professor Reynold Bailey, Ph.D. Observer: Professor Joe Geigel, Ph.D. Masters Thesis Announcement: Evaluation of RSL History as a Tool for = Assistance in the Development and Evaluation of Computer Vision = Algorithms by Ben Holm Title: Evaluation of RSL History as a Tool for Assistance in the = Development and Evaluation of Computer Vision Algorithms Candidate: Ben Holm E-mail: benbelly@gmail.com Defence Date: 8th August 2011 Time: 11:00 am Location: TBA URL: http://holmbase.dynalias.net/ms/ Abstract: A revision of Recognition Strategy Language (RSL), a domain-specific = language for pattern recognition algorithm development, is in = development. This language provides several tools for pattern = recognition algorithm implementation and analysis, including composition = of operations and a detailed history of those operations and their = results. This research focuses on that history and shows that for some = problems it provides an improvement over traditional methods of = gathering information. When designing a pattern recognition algorithm, bookkeeping code in the = form of copious logging and tracing code must be written and analyzed in = order to test the effectiveness of procedures and parameters. The amount = of data grows when dealing with video streams; new organization and = searching tools need to be designed in order to manage the large volume = of data. General purpose languages have techniques like Aspect Oriented = Programming intended to address this prob- lem, but a general approach = is limited because it does not provide tools that are useful to only one = problem domain. By incorporating support for this bookkeeping work = directly into the language, RSL provides an improvement over the general = approach in both development time and ability to evaluate the algorithm = being designed for some problems. The utility of RSL is tested by evaluating the implementation process of = a computer vision algorithm for recognizing American Sign Language = (ASL). RSL history is examined in terms of its use in the development = and evaluation stages of the algorithm, and the usefulness of the = history is stated based on the benefit seen at each stage. RSL is found = to be valuable for a portion of the algorithm involving distinct steps = that provide opportunity for comparison. RSL was less beneficial for the = dynamic programming portion of the algorithm. Compromises were made for = performance reasons while implementing the dynamic programming solution = and the inspection at every step of what amounts to a brute-force search = was less informative. We suggest that this investigation could be = continued by testing with a larger data set and by comparing this ASL = recognition algorithm with another. Chair: Dr. Matthew Fluet, Dr. Richard Zanibbi Reader: N/A Observer: Dr. Hans-Peter Bischof Masters Thesis Announcement: Semantic Analysis of Facial Gestures from = Video Using a Bayesian Framework by Gati Vashi Title: Semantic Analysis of Facial Gestures from Video Using a Bayesian = Framework Candidate: Gati Vashi E-mail: gdv7368@cs.rit.edu Defence Date: 5th August 2011 Time: 10:00 am Location: Breakout Room 3576 URL: http://www.cs.rit.edu/~gdv7368/ Abstract: The continuous growth of video technology has resulted in increased = research into the semantic analysis of video. The multimodal property of = the video has made this task very complex. The objective of this thesis = was to research, implement and examine the underlying methods and = concepts of semantic analysis of videos and improve upon the state of = the art in automated emotion recognition by using semantic knowledge in = the form of Bayesian inference. The main domain of analysis is facial = emotion recognition from video, including both visual and vocal aspects = of facial gestures. The goal is to determine if an expression on a = person=92s face in a sequence of video frames is happy, sad, angry, = fearful or disgusted. A Bayesian network classification algorithm was = designed and used to identify and understand facial expressions in = video. The Bayesian network is an attractive choice because it provides = a probabilistic environment and gives information about uncertainty from = knowledge about the domain. This research contributes to current = knowledge in two ways: by providing a novel algorithm that uses edge = differences to extract keyframes in video and facial features from the = keyframe, and by testing the hypothesis that combining two modalities = (vision with speech) yields a better classification result (low false = positive rate and high true positive rate) than either modality used = alone. Chair: Dr. Roxanne Canosa Reader: Dr. Leon Reznik Observer: Dr. Zack Butler Masters Thesis Announcement: Scalable Cooperative Caching Algorithm = based on Bloom Filters by Nodirjon Siddikov Title: Scalable Cooperative Caching Algorithm based on Bloom Filters Candidate: Nodirjon Siddikov E-mail: nbs8816@rit.edu Defence Date: August 22, 2011 Time: 10:00 am Location: 70-3576 (Breakout room 3) URL: http://www.cs.rit.edu/~nbs8816/thesis.html Abstract: This thesis presents the design, implementation and evaluation of a = novel cooperative caching algorithm based on the bloom filter data = structure. The new algorithm uses a decentralized approach to resolve = the problems that prevent the existing solutions from being scalable. = The problems consist of an overloaded manager, a communication overhead = among clients, and a memory overhead on the global cache. The new = solution reduces the manager load and the communication overhead by = distributing the global cache information among cooperating clients. = Thus, the manager no longer maintains the global cache. Furthermore, the = memory overhead is decreased due to a bloom filter data structure. The = bloom filter saves memory space in the global cache and makes the new = algorithm scalable. The correctness of the research hypothesis is = verified by running experiments on the caching algorithms. The = experiment results demonstrate that the new caching algorithm maintains = a low block access time as existing algorithms. In addition, the new = algorithm decreases the manager load by the factor of nine. Moreover, = the communication overhead is reduced by nearly a factor of six as a = result of distributing the global cache to clients. Finally, the results = show a significant reduction in the memory overhead which also = contributes to the scalability of the new algorithm. Chair: Prof. Hans-Peter Bischof Reader: Prof. Minseok Kwon Observer: Prof. James Heliotis Masters Thesis Announcement: Balancing Truth Error and Manual Processing = in the PDQ System by Douglass Huang Title: Balancing Truth Error and Manual Processing in the PDQ System Candidate: Douglass Huang E-mail: douglasshuang@gmail.com Defence Date: Thursday, August 25, 2011 Time: 10:00 am Location: Laboratory for Computational Studies, 70-3400 URL: http://www.cs.rit.edu/~dxh9787/Thesis/MS_Thesis.html Abstract: Production Data Quality (PDQ) is a specialized pattern classifier whose = main purpose is to assess independently the data quality of a production = classifier. It accomplishes this by producing a high quality Truth from = the source input, and then using the Truth to identify errors in the = production classifier's output data. Previous studies have shown close = agreement between PDQ processing outcomes and a particular mathematical = model of the system. In this study we describe and analyze an expanded model that addresses = the potential tradeoff between Truth error and manual processing in PDQ, = with an eye towards informing operational decisions about precision and = efficiency. Using statistical data from the 2010 Census PDQ system as = input, we examine the predictions of the new model in order to = understand their potential usefulness. The outcomes show strong agreement between two methods for estimating = Projected Truth error rate, supporting the validity of both methods as = well as the static model. In addition, the new Projector model gives = tight bounds on the projected manual processing rate and reveals a = characteristic relationship between Projected Truth error and projected = manual processing. We explore a practical application of this model for = tuning PDQ, and we find an opportunity to achieve a 60% efficiency = increase for the selected sample, while maintaining an acceptable degree = of precision. Chair: Roger S. Gaborski, Ph.D. Reader: Peter G. Anderson, Ph.D. Observer: K. Bradley Paxton, Ph.D. Masters Thesis Announcement: Establishing Parameters for Problem = Difficulty in Permutation-Based Genetic Algorithms by Adam F. Nogaj Title: Establishing Parameters for Problem Difficulty in = Permutation-Based Genetic Algorithms Candidate: Adam F. Nogaj E-mail: adamfnogaj@gmail.com Defence Date: October 13, 2011 Time: 10:00 am Location: GOL 3455 URL: http://caffeinehangover.com/thesis-announcement/ Abstract: This thesis examines the performance of genetic algorithm (GA) crossover = techniques within two problems: n-queens with poison (NQWP) and = processor scheduling (PS), each at problem sizes of 32, 64, and 128. = The specific crossover techniques studied were cycle crossover, order = crossover, partially mapped crossover, merging crossover, and one-point, = two-point, and uniform signature representation crossover, in addition = to various greedy approaches. In conjunction with tests that vary = crossover techniques, experimentation was performed to determine what = percentage of problem constraints (poisoned squares or precedence = relationships between tasks, respectively) makes the problems most = difficult to solve, that is, the constraint densities at which optimal = solutions require the highest number of GA fitness evaluations. While = minor fluctuations in difficulty occur upon variations in fitness = function and problem size, the NQWP problem is most difficult around a = constraint density of 0.8 and the PS problem is most difficult around = constraint densities of 0.2 to 0.3. Even within an individual problem, = one crossover technique does not irreproachably outperform others, = however cycle crossover stands out in its performance in the processor = scheduling problem while merging crossover and uniform signature = crossovers most often perform well for n-queens with poison. Chair: Roger S. Gaborski Reader: Peter G. Anderson Observer: Stanislaw P. Radziszowski Masters Thesis Announcement: Ray-Traced Simulation of Radiation Pressure = for Optical Lift by Timothy J. Peterson Title: Ray-Traced Simulation of Radiation Pressure for Optical Lift Candidate: Timothy J. Peterson E-mail: tjp4434@rit.edu Defence Date: November 8, 2011 Time: 11:00 am Location: GOL-3400 URL: http://www.timpeterson.org/rit/thesis Abstract: When light refracts at a surface, it changes the direction and intensity = of the light rays. By Newton's 3rd law, this process imparts a small = momentum to the object. This effect can be exploited to manipulate very = small objects, by carefully selecting the shape and optical properties = of the object. This thesis describes a computational model based on the = laws of physics to determine the forces and torques resulting from light = interacting with a hemicylindrical particle. This model has been = successfully used to predict for the first time a phenomenon called = optical lift. Chair: Joe Geigel Reader: Grover Swartzlander Observer: Hans-Peter Bischof Masters Thesis Announcement: Layout-Based Substitution Tree Indexing and = Retrieval of Mathematical Expressions by Thomas Schellenberg Title: Layout-Based Substitution Tree Indexing and Retrieval of = Mathematical Expressions Candidate: Thomas Schellenberg E-mail: mtschellenberg@gmail.com Defence Date: Wednesday, November 16 Time: 11:00 am Location: 70-3000 URL: http://www.cs.rit.edu/~mts4386/Thesis-Info.html Abstract: We introduce a new system for layout-based indexing and retrieving = mathematical expressions using substitution trees. Substitution trees = can efficiently store and find hierarchically-structured data based on = similarity. Previously Kolhase and Sucan applied substitution trees to = indexing mathematical expressions in operator tree representation = (Content MathML) and query-by-expression retrieval. In this = investigation, we use substitution trees to index mathematical = expressions in symbol layout tree representation (\LaTeX) to group = expressions based on the similarity of their symbols, symbol layout, = sub-expressions and size. We describe our novel design and implementation of the substitution tree = indexing and retrieval algorithms and our many significant contributions = to the behavior of these algorithms, including: allowing substitution = trees to index and retrieve mathematical expressions instead of = predicates; introducing a bias in the insertion function that helps = group expressions in the index based on similarity; modifying the search = function to find expressions that are not identical yet still relevant = to a search query; and ranking search results based on their similarity = to the search query. We provide an experiment testing our system against the term = frequency-inverse document frequency (TF-IDF) keyword-based system of = Zanibbi and Yuan and demonstrate that: in many cases, the two systems = are comparable; our system excelled at finding expressions identical to = the search query and expressions containing relevant sub-expressions; = and our system experiences some limitations due to the insertion bias = and the presence of \LaTeX~formatting in expressions. Future work = includes: designing a different insertion bias that improves the quality = of search results; modifying the behavior of the search and ranking = functions; and extending the scope of the system so that it can index = websites or non-\LaTeX~expressions (Content MathML, images, etc.). Overall, we present a promising first attempt at layout-based = substitution tree indexing and retrieval for mathematical expressions. = We believe that this method will prove beneficial to the field as a = whole because storing and finding expressions based on similarity is = ideal for mathematical information retrieval and our experiment suggests = that our substitution tree system will perform much better than other = leading search engines upon further refinement. Chair: Richard Zanibbi Reader: Bo Yuan Observer: Paul Tymann