> For the complete documentation index, see [llms.txt](https://lichangbin.gitbook.io/paper_notes/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://lichangbin.gitbook.io/paper_notes/sep/large-scale-long-tailed-recognition-in-an-open-world.md).

# Large-Scale Long-Tailed Recognition in an Open World

## Motivation and advantages of the model

Real world data often have a **long-tailed** and **open-ended** distribution.&#x20;

Three common problems and their limitations:

* ***Imbalanced classification***: not sensitive to novelty
* ***Few shot learning***: cannot avoid forgetting
* ***Open set recognition*** (OOD detection): cannot transfer knowledge&#x20;

A practical recognition system must classify among **majority and minority classes**, **generalize from a few known instances**, and acknowledge **novelty upon a never seen instance**.&#x20;

This paper defines **Open Long-Tailed Recognition (OLTR)** as learning from such naturally distributed data and optimizing the classification accuracy **over a balanced test set** which include **head, tail, and open classes**.&#x20;

OLTR must handle ***imbalanced classification, few-shot learning, and open-set recognition*** **in one integrated** algorithm, whereas existing classification approaches focus only on one aspect and deliver poorly over the entire class spectrum.&#x20;

## OLTR (dynamic meta-embedding)

The key challenges are **how to share visual knowledge between head and tail classes and how to reduce confusion between tail and open classes**.&#x20;

OLTR could realize ***knowledge transfer, sensitivity to novelty,*** and ***avoid forgetting*** in a unified form.

We develop an integrated OLTR algorithm that maps **an image** to **a feature space** such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world. Our so-called **dynamic meta-embedding** combines a direct image feature and an associated memory feature, with the feature norm indicating the familiarity to known classes.&#x20;

![](/files/-MILCgfgElvSJT6-yF0d)

![](/files/-MILCkNJptGxY-AeB_W5)

Firstly,  a **visual memory** is obtained by aggregating the knowledge **from both head and tail classes**.&#x20;

Secondly, the **visual concepts** stored in the memory are **infused back** as associated memory feature to enhance the original direct feature. It can be understood as using induced knowledge (i.e. memory feature) to assist the direct observation (i.e. direct feature).&#x20;

We further learn a **concept selector(e in the bottom figure)** to control the amount and type of memory feature to be infused. Since head classes already have an abundant direct observation, only a small amount of memory feature is infused for them. On the contrary, tail classes suffer from scarce observation, the associated visual concepts in memory feature are extremely beneficial.&#x20;

Finally, we calibrate the confidence of open classes by calculating their **reachability(No.3 in the bottom figure)** to the obtained visual memory

![](/files/-MILCmrkUS0y_SOzPk89)

## Reference:

* <https://arxiv.org/pdf/1904.05160.pdf>
* <https://liuziwei7.github.io/papers/longtail_slides.pdf>
* <https://www.youtube.com/watch?v=A45wrs1g8VA>
* <https://liuziwei7.github.io/projects/LongTail.html>
