## About

### What is SMART-101 dataset?

Recent times have witnessed an increasing number of applications of deep neural networks towards
solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, ChatGPT, etc.
Such a dramatic progress raises the question: *how generalizable are neural networks in solving problems
that demand broad skills?* To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task
and the associated SMART-101 dataset, for evaluating the abstraction, deduction, and generalization abilities of neural
networks in solving visuo-linguistic puzzles designed specifically for children in the 6--8 age group.
Our dataset consists of 101 unique puzzles; each puzzle comprises a picture and a question, and their solution
needs a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning, among others.
To scale our dataset towards training deep neural networks, we programmatically generate entirely new instances
for each puzzle, while retaining their solution algorithm. To benchmark performances on SMART-101, we propose
a vision and language meta-learning model using varied state-of-the-art backbones. __Our experiments reveal that
while powerful deep models offer reasonable performances on puzzles in a supervised setting, they are not
better than random accuracy when analyzed for generalization.__ We also evaluate the recent ChatGPT and other
large language models on a part of SMART-101 and find that while these models show convincing reasoning abilities,
the answers are often incorrect.

## Puzzles

### Example Puzzles from SMART-101 Dataset

We show below several example puzzles from the various skill set categories in the SMART-101 dataset. To see examples from all the 101 puzzles,
please see here. Most of the puzzles in SMART-101 have an image and a question. To solve the puzzle, a method must use the content
of the image and connect it with the details in the question to derive an *algorithm* -- usually a simple math algorithm. The method must then select the solution
from the five answer candidates to complete the puzzle.

#### Path Tracing

**Question:** Which object is linked to the hat?

**A:** Flower **B:** Disk **C:** Book **D:** Drink **E:** Ball

#### Algebra

**Question:** The correct additions in the squares were performed according to the pattern shown in the table.
What number is covered by the question mark?

**A:** 18 **B:** 21 **C:** 17 **D:** 20 **E:** 14

#### Counting

**Question:** All the flowers outside the triangle and outside the rectangle simultaneously are picked up.
The number of flowers which are picked up is:

**A:** 10 **B:** 13 **C:** 14 **D:** 7 **E:** 11

#### Spatial Reasoning

**Question:** A community with 8 huts has 4 straight roads and 4 circular roads. The drawing shows 7 of the huts. On every straight road there are 2 huts.
On every circular road, there are also 2 huts. Where on the drawing should the 8th hut be added?

**A:** A **B:** B **C:** C **D:** D **E:** E

#### Pattern Finding

**Question:** Carl had some 5-ray slices as depicted in the picture. He glued them together as depicted in the picture on the right.
At minimum, how many slices did he use?

**A:** 8 **B:** 1 **C:** 4 **D:** 7 **E:** 6

#### Path Tracing

**Question:** As shown in the image,Minna can only jump from one circle to a neighboring circle connected by a line. She cannot jump into any circle more than once.
She starts at circle 1 and needs to make exactly 3 jumps to reach circle 3.
In how many different ways can Minna do this?

**A:** 2 **B:** 0 **C:** 5 **D:** 1 **E:** 4

#### Arithmetic

**Question:** A bird jumps on a fence from the post on one end to the other end.
He needs 1 second for each jump. He makes 9 jumps ahead and then 7 jumps back.
Then he again makes 9 jumps ahead and 7 jumps back, and so on.
In how many seconds can the bird get from one end to the other end?

**A:** 74 **B:** 71 **C:** 75 **D:** 73 **E:** 72

#### Logic

**Question:** Gina encrypts words applying the grid presented. For instance,
the word UJEV is encrypyed as IO IU VU EG. What word did Gina encrypt EO IU VG IG?

**A:** RLYE **B:** CJBL **C:** GJLF **D:** IXEL **E:** TLRH

#### Measurement

**Question:** Ariel had a few plancks with a height of 2 units and a length of 4 units.
Making use of the plancks, he created the decoration depicted. How wide is the decoration?

**A:** 44 **B:** 24 **C:** 32 **D:** 28 **E:** 20

## Baseline Performances

*77.1%*

*21.6%*

*49.6%*

*23.4%*

*21.6%*

*24.1%*

*18.9%*

*25.3%*

Puzzle split Performances

*60.4%*

*36.4%*

*26.4%*

*12.7%*

Text-only subset Performances

## License

The SMART-101 dataset is released under `CC-BY-SA-4.0`.

Created by Mitsubishi Electric Research Laboratories (MERL), 2022-2023

SPDX-License-Identifier: CC-BY-SA-4.0

## Citation and Contact

If you use this dataset, please cite the following CVPR 2023 paper:

@article{cherian2022deep,

title={Are Deep Neural Networks SMARTer than Second Graders?},

author={Cherian, Anoop and Peng, Kuan-Chuan and Lohit, Suhas and Smith, Kevin and Tenenbaum, Joshua B},

journal={arXiv preprint arXiv:2212.09993},

year={2022}

}

For questions or issues, contact:

**Anoop Cherian**(cherian at merl.com),

**Kuan-Chuan Peng**(kpeng at merl.com),

**Suhas Lohit**(slohit at merl.com)

**Acknowledgements:**We thank Joanna Matthiesen (CEO of Math Kangaroo USA) for sharing with us the human performance statistics and permission to use the puzzle images from the Math Kangaroo USA Olympiad.