Screens are deeply embedded in how we live, learn, and connect—yet research often reduces digital experience to app-based metrics like time spent per app.
What App-Based Metrics Miss:
📱 What users actually see: An Instagram feed of makeup tutorials vs. cyberbullying is treated the same, despite vastly different effects.
🔄 How content flows across apps: Users jump between apps and browsers for related content, but this cross-platform flow is often overlooked.
🌱 Emergent behaviors: App categories are rigid and predefined. They don’t adapt to the diverse ways users engage with content—or how those behaviors evolve over time.
To address these blind spots, recent research has turned to continuous screen data collection, which captures everything displayed on users' screens.
Despite these advances, there are still no scalable and flexible tools that allow researchers to interact with, explore, and analyze massive, sensitive, and unstructured multimodal data.
To design and evaluate Media Content Atlas, we used 1.12 million screens (10,000 screenshots per participant) collected from 112 participants over the course of one month. This data comes from the Human Screenome Project, where a study-specific app captured a screenshot every 5 seconds whenever the screen was on, creating a continuous, moment-by-moment record of digital experiences.
Manually reviewing this dataset would take 190+ workdays, making AI-augmented exploration and analysis essential.
Figure. A short slice of a Screenome. This 15-minute segment, recorded from a consenting researcher’s real-world smartphone use, illustrates how digital content and media experiences shift even over brief periods of time. Colors indicate different app categories, while the size of each bar represents the amount of time spent on each app category.
MCA processes screen data scalably and securely through a four-step pipeline: embedding, clustering, retrieval, and interactive visualization. These steps allow researchers to transform raw screen data into structured, searchable, and explorable insights.
Figure. MCA's end-to-end pipeline includes embedding and description generation, clustering and topic modeling, semantic image retrieval, and interactive visualization of media content.
Built on open-source multimodal models (e.g., CLIP, LLaVA-One Vision), MCA requires no manual annotation and ensures data security by running on in-house servers. Its model-agnostic architecture allows flexible customization via prompting and fine-tuning.
Figure. Sample clusters from MCA. Each cluster is labeled with a topic name and description, and contains a set of semantically similar images. The diversity of clusters illustrates the range of content interacted on media and captured by MCA.
We conducted a structured evaluation of MCA with domain experts in communication, psychology, medicine, and education. Each expert participated in think-aloud sessions and three hours of structured ratings, evaluating MCA’s clustering, topic labeling, and retrieval features.
âś… Key Evaluation Outcomes
Figure. Expert ratings on cluster interpretability and relevance. Bar plots (left) show label-level response counts. Boxplots (right) illustrate score distributions across clusters and domains.
Figure. Expert evaluation of image retrieval performance. Experts rated the relevance of retrieved images to the intended search concept.
@inproceedings{cerit2025mca,
author = {Merve Cerit and Eric Zelikman and Mu-Jung Cho and Thomas N. Robinson and Byron Reeves and Nilam Ram and Nick Haber},
title = {Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs},
booktitle = {Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25)},
year = {2025},
month = {April},
location = {Yokohama, Japan},
publisher = {ACM},
address = {New York, NY, USA},
pages = {19},
doi = {10.1145/3706599.3720055}
}