Compare to What, a web tool for data journalist
-
Overview
Compare To What helps data journalists by automatically identifying demographically similar counties and presenting the data for the similar counties in a way that facilitates easy comparison and simplifies the process of finding news stories.
-
The Problems
Journalists and visualization designers have been bothered by huge dedication in finding similar related information to help them better tell the stories. By finding the right information, they can better understand the impact of a particular fact or trend through automatic comparison with an appropriate baseline.
-
Role / Duration
Product Manager & UX Designer
March - June 2021
-
Target Users
Data journalists / Journalists / The general public
-
Goals
· Allow users to identify comparable counties for a particular county based on demographic variables
· Allow users to easily compare and find newsworthy data points across the counties
-
Use Case
Find and present newsworthy and informative Covid-19 data about comparable counties
Meet The Team
Project Timeline
User Research - What We Learned?
Our team interviewed over 20 data journalists in major media organizations, including the Guardian, the Star Tribune, etc. We also interviewed a solid number of readers to get more insights into their needs as a news consumer.
Overall, our research shows that the data quality and cohesion are the core problems.
Why this user research matters?
Before the research, the team had a vague understanding of what values the target users find helpful. Everyone on the team assumed that finding the right data set is most challenging for data journalists.
However, it turned out that knowing what to search and whether they can get clean and high-quality data is their real pain.
Challenges & Solutions
Artifacts: Navigation Structure
-
To use the application, we first have to input the county name we want comparisons to as well as the state the county is in.
After you click the “Go” button on the first page shown in the first screenshot below, it should take you to a form page where the user can enter in control demographics parameters for the comparison as shown in the second photo below.
After pressing the “Go” button again, if you scroll down as shown the screenshots below, you should see two tables labeled as “Similar Results” and “Outliers” which are the Covid-19 data results for the most and least similar counties to your input county that fit the demographics parameter that was set just right before.
If you click on “Show all the Data” button as shown in the last screenshot below, you can also see all the counties that are similar to your input county and match the control demographics parameters for comparison. Figures detailing the system is under the Appendix.
Deliver
System Architecture Figures
-
The design of the system architecture was purposefully made as lightweight as possible. Moreover, the application has not yet been deployed for use.
The Compare-To-What service is a Flask application where users interact with it through a frontend developed using HTML and CSS.
The user inputs are posted through the frontend API. Based on these inputs, the application utilized two APIs, the US Census API and the Covid Act API, to query relevant information.
Once all the necessary data is captured, it is then processed and served back to the user via the frontend API.
The figure below in Appendix provides an overview of a workflow for the users and how the system architecture responds to it.
Next Steps
-
Demographics
We currently support 7 demographics to filter by that were selected following user research. However, these do not encapsulate all use cases. We hope to add more demographics to compare against in the future.
One feature we would like to implement is to give the user the option to input in any attribute the US Census supports to filter by. This feature would give the user a bit more customizability and will address cases of when our application does not support the demographic the user is interested in.
-
Comparison Model
Currently, the system utilizes a threshold inputted by the user to find similar counties. For example, if the user inputted 30% for the population threshold, the system will find counties that fall within 30% of the given county’s population. However, this method is a bit arbitrary for users to use as it is difficult to know what threshold is a good enough threshold to judge counties as being similar.
Moreover, based on the user inputs, it is possible that there are 100 similar counties or 0 similar counties. To address this, we propose using a K Nearest Neighbor (KNN) model for finding similar counties. The KNN model works by taking in specific attributes and finding entities that are similar along all attribute dimensions. This removes the need for users to input in threshold levels. KNN would also be able to utilize more data when finding similar counties; instead of using just a single data point for income, median income, KNN can utilize multiple data points of income across varying age and race. Moreover, KNN guarantees that the user is always given some specific number of results.
-
Generalization
For this project, we focused on a specific use case where a user is trying to find Covid data for US counties similar to a target county. However, there are many use cases where the user may be trying to find other types of data for similar counties.
We hope to generalize this platform to address use cases beyond just Covid data specific to US counties.