AWS and the University of San Francisco’s Data Institute collaborated to organize a datathon as part of the 2023 Data Science Conference (DSCO 23). The competition involved high school and undergraduate students working on a data science project focused on air quality and sustainability. The Data Institute at USF aims to support interdisciplinary research and education in the field of data science, combining academic research with the entrepreneurial culture of the San Francisco Bay Area’s technology industry.
The students utilized Amazon SageMaker Studio Lab, a free platform that offers a JupyterLab environment with compute and storage capabilities. To familiarize the students with machine learning (ML), a tutorial was provided, guiding them through the process of setting up an ML pipeline, including exploratory data analysis, feature engineering, model building and evaluation, as well as inference and monitoring. The tutorial used datasets from the Amazon Sustainability Data Initiative (ASDI), sourced from the National Oceanic and Atmospheric Administration (NOAA) and OpenAQ, to build an ML model for predicting air quality levels using weather data and AutoGluon’s binary classification model. After the tutorial, the students were given the freedom to work on their own projects in teams. The winning teams, led by Peter Ma, Ben Welner, and Ei Coltin, received prizes at the opening ceremony of the Data Science Conference at USF.
Feedback from one of the winners, Anay Pant, highlighted the event as an enjoyable and practical opportunity to apply Python coding skills learned in class. Anay’s team conducted research on various ML models and their performance in detecting atmospheric toxicity under specific weather conditions using an AQI dataset from NOAA. They developed a gradient boosting classifier to predict air quality based on weather statistics.
Sherry Marcus, the Director of AWS ML Solutions Lab, emphasized the increasing importance of AI in the workplace and the need for employees with machine learning skills. She expressed AWS’s excitement in helping the next generation explore and experiment with machine learning, hoping they would further expand their ML knowledge. She also expressed a personal desire to use an app built by one of the students from the datathon.
Diane Woodbridge from the Data Institute at the University of San Francisco commended the use of SageMaker Studio Lab, highlighting its ease of use for both the participating students and the graduate student mentors.
For those who missed the datathon, they can still register for their own Studio Lab account and work on their own projects. Those interested in running their own hackathon can reach out to their AWS representative for a Studio Lab referral code, granting participants immediate access to the service. Additionally, information about next year’s challenge at the USF Data Institute can be found on their website.
The content also includes information about the authors, Neha Narwal and Vidya Sagar Ravipati, who work as a Machine Learning Engineer and an Applied Science Manager, respectively, at AWS. Neha focuses on developing large language models for generative AI applications, while Vidya leverages his experience in large-scale distributed systems and his passion for machine learning to help AWS customers accelerate their AI and cloud adoption.