Media Content Atlas

A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs

Merve Cerit, Eric Zelikman, Mu-Jung Cho, Thomas N. Robinson, Byron Reeves, Nilam Ram, Nick Haber

Stanford University
Accepted to CHI 2025

How can researchers study complex, real-world digital media experiences at scale?

We introduce the Media Content Atlas (MCA)—a customizable, AI-powered pipeline and interactive dashboard for exploring and investigating digital media content across platforms at scale. Each point in the dashboard represents a single captured screen—a moment of what a participant sees and interacts with on their smartphone. This data comes from 1.12 million screenshots collected from 112 participants over the course of one month, sampled every 5 seconds while screens were on. Topics are color-coded and automatically identified by MCA’s AI-driven pipeline.

🧭 Media Content Atlas enables researchers to:

📊 Capture and analyze moment-by-moment screen content
🔍 Cluster media by semantic and visual similarity
📑 Automate topic modeling and labeling
🖼️ Retrieve content from theoretical or conceptual queries
🎛️ Visualize and explore media landscapes interactively

The Problem with Traditional Approaches

Screens are deeply embedded in how we live, learn, and connect—yet research often reduces digital experience to app-based metrics like time spent per app.

What App-Based Metrics Miss:

📱 What users actually see: An Instagram feed of makeup tutorials vs. cyberbullying is treated the same, despite vastly different effects.

🔄 How content flows across apps: Users jump between apps and browsers for related content, but this cross-platform flow is often overlooked.

🌱 Emergent behaviors: App categories are rigid and predefined. They don’t adapt to the diverse ways users engage with content—or how those behaviors evolve over time.

To address these blind spots, recent research has turned to continuous screen data collection, which captures everything displayed on users' screens.

Despite these advances, there are still no scalable and flexible tools that allow researchers to interact with, explore, and analyze massive, sensitive, and unstructured multimodal data.

Beyond App Names: Capturing Media Content via Screenomics

To design and evaluate Media Content Atlas, we used 1.12 million screens (10,000 screenshots per participant) collected from 112 participants over the course of one month. This data comes from the Human Screenome Project, where a study-specific app captured a screenshot every 5 seconds whenever the screen was on, creating a continuous, moment-by-moment record of digital experiences.

🔎 The Scale Challenge

Manually reviewing this dataset would take 190+ workdays, making AI-augmented exploration and analysis essential.

Figure. A short slice of a Screenome. This 15-minute segment, recorded from a consenting researcher’s real-world smartphone use, illustrates how digital content and media experiences shift even over brief periods of time. Colors indicate different app categories, while the size of each bar represents the amount of time spent on each app category.

Media Content Atlas Pipeline

MCA processes screen data scalably and securely through a four-step pipeline: embedding, clustering, retrieval, and interactive visualization. These steps allow researchers to transform raw screen data into structured, searchable, and explorable insights.

Built on open-source multimodal models (e.g., CLIP, LLaVA-One Vision), MCA requires no manual annotation and ensures data security by running on in-house servers. Its model-agnostic architecture allows flexible customization via prompting and fine-tuning.

Sample Clusters from Media Content Atlas

Expert Evaluation & Findings

We conducted a structured evaluation of MCA with domain experts in communication, psychology, medicine, and education. Each expert participated in think-aloud sessions and three hours of structured ratings, evaluating MCA’s clustering, topic labeling, and retrieval features.

✅ Key Evaluation Outcomes

Cluster Relevance: 96% of topic labels were rated relevant/highly relevant.
Description Accuracy: 83% of AI-generated descriptions were rated accurate/highly accurate.
Content Similarity: 89% of clustered images were considered semantically similar.
Image Retrieval Performance: 79.5% of retrieved images matched the intended search concept.
Comparison with Traditional Methods: Experts unanimously found MCA more informative and useful than app-based categorization.

Get in Touch

Have questions, ideas, or want to collaborate? Let’s connect.

Send an Email

BibTeX

@inproceedings{cerit2025mca,
        author = {Merve Cerit and Eric Zelikman and Mu-Jung Cho and Thomas N. Robinson and Byron Reeves and Nilam Ram and Nick Haber},
        title = {Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs},
        booktitle = {Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25)},
        year = {2025},
        month = {April},
        location = {Yokohama, Japan},
        publisher = {ACM},
        address = {New York, NY, USA},
        pages = {19},
        doi = {10.1145/3706599.3720055}
      }