About
What is SMART-101 dataset?
Recent times have witnessed an increasing number of applications of deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, ChatGPT, etc. Such a dramatic progress raises the question: how generalizable are neural networks in solving problems that demand broad skills? To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset, for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed specifically for children in the 6--8 age group. Our dataset consists of 101 unique puzzles; each puzzle comprises a picture and a question, and their solution needs a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning, among others. To scale our dataset towards training deep neural networks, we programmatically generate entirely new instances for each puzzle, while retaining their solution algorithm. To benchmark performances on SMART-101, we propose a vision and language meta-learning model using varied state-of-the-art backbones. Our experiments reveal that while powerful deep models offer reasonable performances on puzzles in a supervised setting, they are not better than random accuracy when analyzed for generalization. We also evaluate the recent ChatGPT and other large language models on a part of SMART-101 and find that while these models show convincing reasoning abilities, the answers are often incorrect.
Puzzles
Example Puzzles from SMART-101 Dataset
We show below several example puzzles from the various skill set categories in the SMART-101 dataset. To see examples from all the 101 puzzles,
please see here. Most of the puzzles in SMART-101 have an image and a question. To solve the puzzle, a method must use the content
of the image and connect it with the details in the question to derive an algorithm -- usually a simple math algorithm. The method must then select the solution
from the five answer candidates to complete the puzzle.
Path Tracing
Question: Which object is linked to the hat?
A: Flower B: Disk C: Book D: Drink E: Ball
Algebra
Question: The correct additions in the squares were performed according to the pattern shown in the table.
What number is covered by the question mark?
A: 18 B: 21 C: 17 D: 20 E: 14
Counting
Question: All the flowers outside the triangle and outside the rectangle simultaneously are picked up.
The number of flowers which are picked up is:
A: 10 B: 13 C: 14 D: 7 E: 11
Spatial Reasoning
Question: A community with 8 huts has 4 straight roads and 4 circular roads. The drawing shows 7 of the huts. On every straight road there are 2 huts.
On every circular road, there are also 2 huts. Where on the drawing should the 8th hut be added?
A: A B: B C: C D: D E: E
Pattern Finding
Question: Carl had some 5-ray slices as depicted in the picture. He glued them together as depicted in the picture on the right.
At minimum, how many slices did he use?
A: 8 B: 1 C: 4 D: 7 E: 6
Path Tracing
Question: As shown in the image,Minna can only jump from one circle to a neighboring circle connected by a line. She cannot jump into any circle more than once.
She starts at circle 1 and needs to make exactly 3 jumps to reach circle 3.
In how many different ways can Minna do this?
A: 2 B: 0 C: 5 D: 1 E: 4
Arithmetic
Question: A bird jumps on a fence from the post on one end to the other end.
He needs 1 second for each jump. He makes 9 jumps ahead and then 7 jumps back.
Then he again makes 9 jumps ahead and 7 jumps back, and so on.
In how many seconds can the bird get from one end to the other end?
A: 74 B: 71 C: 75 D: 73 E: 72
Logic
Question: Gina encrypts words applying the grid presented. For instance,
the word UJEV is encrypyed as IO IU VU EG. What word did Gina encrypt EO IU VG IG?
A: RLYE B: CJBL C: GJLF D: IXEL E: TLRH
Measurement
Question: Ariel had a few plancks with a height of 2 units and a length of 4 units.
Making use of the plancks, he created the decoration depicted. How wide is the decoration?
A: 44 B: 24 C: 32 D: 28 E: 20
Baseline Performances
Puzzle split Performances
Text-only subset Performances
License
The SMART-101 dataset is released under `CC-BY-SA-4.0`.
Created by Mitsubishi Electric Research Laboratories (MERL), 2022-2023
SPDX-License-Identifier: CC-BY-SA-4.0
Citation and Contact
If you use this dataset, please cite the following CVPR 2023 paper:
@article{cherian2022deep,
title={Are Deep Neural Networks SMARTer than Second Graders?},
author={Cherian, Anoop and Peng, Kuan-Chuan and Lohit, Suhas and Smith, Kevin and Tenenbaum, Joshua B},
journal={arXiv preprint arXiv:2212.09993},
year={2022}
}
For questions or issues, contact:
Anoop Cherian (cherian at merl.com), Kuan-Chuan Peng (kpeng at merl.com), Suhas Lohit (slohit at merl.com)
Acknowledgements: We thank Joanna Matthiesen (CEO of Math Kangaroo USA) for sharing with us the human performance statistics and permission to use the puzzle images from the Math Kangaroo USA Olympiad.