Participants must be current or former data professionals (Data Analysts, Analytics Engineers, Data Engineers, Data Scientists, etc.).
Solo participation only (no teams).
Must have hands-on experience with SQL, dbt™, and Git.
Participants must use, but are not limited to, the following tools:
Paradime for SQL & dbt™ development.
MotherDuck for data storage and compute
Hex for data analysis and visualizations.
GitHub repository for version control.
Must be able to explain their code and insights. You can use ChatGPT, but you better understand it! 🤣
Use Paradime, MotherDuck, and Hex to uncover compelling insights from social media data. Aim for accurate, relevant, and engaging discoveries.
Participants are expected to submit:
Judges will score each submission based on:
Value of insights (1-10):
Are the insights interesting and relevant?
Get creative! Uncover something fun and accurate that you'd find interesting if you saw it on social media, for example.
Complexity of insights (1-10):
Are you creating relationships between datasets and providing in-depth analytical conclusions?
Complexity ≠ value, but you should use multiple datasets to generate valuable insights.
Quality of materials (1-10):
Is your code of professional quality? Are your data visualizations well-designed? Are your insights' conclusions clear to the reader?
If your submission isn't good enough to share with peers, it won't be good enough for the judges.
Integration of new data (1-10):
How effectively have you integrated new, relevant data to enhance your project?
Incorporating additional datasets has the potential to score you higher in other categories: value of findings, complexity of findings, and quality of materials.
For all technical support and challenge inquiries, use Paradime's #social-media-data-challenge Slack channel. You can also find support in the MotherDuck Slack Community.
Yes. While your submission must be your own work, you can use your network, online resources, and even ChatGPT for inspiration and learning. However, all actual work must be done by you alone.
Yes. Beyond the required tools (Paradime, MotherDuck, Hex, and GitHub), you are encouraged to use additional tools and technologies that enhance your project.
Not at all! Paradime provides the following resources:
Paradime for SQL & dbt™ development.
MotherDuck for data storage and compute.
Hex for data analysis and visualizations.
GitHub for version control.
Yes, incorporating additional data is required. The sample data we provide is optional. Any data you bring in must be user-generated social media data or relevant supplementary data.
Not exactly. This is a 6-week, asynchronous competition. Participants work on their own time and submit by the deadline.
Aim to generate insights that are accurate and interesting. They should be scroll stoppers! For inspiration, here are some intriguing insights to explore:
COVID-19 Sentiment Analysis
Analysis Question: How has the sentiment around COVID-19 on Reddit changed over time? Why?
Required Social Media Data: Reddit posts and comments related to COVID-19, or similar dataset.
Optional/Supplementary Data: Key dates, news, events, and/or anything that points to why sentiment has changed over time.
Donald Trump Popularity Trends
Analysis Question: How has Donald Trump's popularity changed over time?
Required Social Media Data: A sample of Twitter posts, mentions, and engagement, containing the words "Donald Trump" over the last 10 years.
Optional/Supplementary Data: Key dates, news, events, and/or anything that points to why popularity has changed over time.
Top YouTube Creators Study
Analysis Question: Who are the biggest YouTube creators, and why?
Required Social Media Data: YouTube comments, engagement metrics, etc.
Optional/Supplementary Data: Trending YouTube Video statistics, or similar datasets.
2022 NFL Superbowl Commercial Impact
Analysis Question: Which Commercials were most popular during the 2022 NFL Superbowl?
Required Social Media Data: Twitter and/or Reddit posts, mentions, and engagement during the 4-hour time block of the NFL Superbowl. Only pull data that contains information about brands that had Superbowl commercials.
Optional/Supplementary Data:
For public companies that advertised, pull stock market data to see if there's any correlation between Superbowl commercial success and stock price.
Using Superbowl advertisement cost data, identify which brands had the highest social engagement per dollar spent.