Multimodal Retrieval and Execution Monitoring Using Rich Recipe Representation


The content of recipes like for preparing food or assembling furniture exist as textual or image documents which makes it difficult for machines to read, reason and handle ambiguity. We consider the problem of reasoning with data in recipes in multiple representations.

Our innovative representation is used in a web-based decision support system that helps users perform constrained queries using multiple modalities and monitor an agent executing (cooking) based on the recipes. This technology overcomes shortcomings with our recipe representation, which is enhanced with additional knowledge such as outcomes like allergen information, possible failures, and solutions for each atomic cooking step.

Complex instructions such as furniture assembly

Ability to perform expressive and constrained queries about

– the outcome of recipe

– the process of recipe preparation

Ability to monitor progress of an executor automatically following the instruction.

