Data Science for Social Good

Today I wanted to write a short article on some of the phenomenal projects out there that use machine learning for very effective altruism, also coined data science for social good (DSSG). Thanks to some of my favorite podcasts, including TWiMLAI, Talking Machines, Data Skeptic, Practical AI, and the much-missed Partially Derivative, I’m able to keep abreast of all this awesome work, and I wanted to write this article as a summary of all the great projects and endeavors that inspire me.


One of the largest and most ambitious endeavors that I know of is the United Nations’ Global Pulse initiative. Billed as “harnessing big data and artificial intelligence for sustainable development and humanitarian action,” a mission that they certainly accomplish. Two of my favorite projects from them include predicting food shortages with computer vision at the edge and diagnostics without advanced laboratory equipment. The first project uses on-device deployment of machine learning models to predict rot through images of manioc (tapioca) leaves, an important source of nutrients in sub-saharan Africa. Given the relatively low price of smartphones and the low supply of microscopy experts, it makes sense that a mobile-based microscopy platform would be a great help for rural low-income communities.

Microsoft’s AI for good

Microsoft may be at the forefront of the tech giants in terms of corporate social responsibility thanks to their AI for Good initiative. Jennifer Marsman’s interview on Practical AI is one of the most inspiring talks I’ve heard in a while. Her enthusiasm is infectious, and the grants that Microsoft has funded show some incredibly creativity. “Project Premonition” combines drone detection of mosquito hotspots with robotic mosquito traps in said hotspots. This is combined with a cloud-based genomics platform to predict disease outbreaks before they happen! Another project, “Farm Beats,” helps water conservation and crop yield efforts using a low-cost approach that combines machine vision and remote sensors. Networking these sensors is normally cost-prohibitive due to range limitations in routers, but by using radio waves, a low-power way to get training data on the moisture content of the soil in different areas. Microsoft has an article describing these, and more.

Dedicated DSSG Organizations

Both the University of Washington and University of Chicago run fellowships to assist groups with a desire to make a positive impact using data. For U Chicago, this is as diverse as reducing waste, preventing harassment of tenants in gentrified areas, helping fix labor shortages, and improving doctor-patient matching. The linked page has the full range of projects. Of particular note is their Hitchhiker’s Guide to Data Science for Social Good, which covers some basic database, machine learning, and data exploration problems with this type of data.

UW’s eScience Institute has a broader approach, identifying traditional scientific endeavors that can benefit from contemporary data science approaches. There are several projects focusing on oceanography and biomedical applications that are of particular note. On the oceanography side, UW has built an underwater observatory near a volcano which records gathers a deluge of sonar data, which eScience was able to build infrastructure for. In terms of biomedical applications, there is a team that is building models to identify candidates for new Alzheimer’s therapeutics. Finding a cure, prevention, or treatment for Alzheimer’s is, in my opinion, one of the most important medical research projects today.

Outside of the university environment, there is also DataKind, which hosts weekend-long data dives and community events to help people. Their approach is to designate a “data ambassador” that acts as the point of communication between an external mission driven organization, and the DataKind data experts and project management, ensuring that the right problems are being addressed.

In addition to these more general programs, there are also some highly tailored organizations that seek to put their full weight behind a singular mission. One example of this is Thorn, an organization that seeks to combat sexual abuse and human traficking. Another is Data for Democracy, which helps to inform policy decisions with an aim to do civic good.

Crowdsourcing Approaches

Kaggle, the primary competition platform for data science, has historically had many competitions in this vein. Some have included identifying humpback whales and early lung cancer detection. However, there is another competition platform called Driven that focuses entirely on DSSG competitions. One neat feature is that people that have a data source that they want insight from can submit it to a competition to fast-forward/crowdsource the analysis, so long as the spirit of the competition is in line with the site’s mission.


If you have a data-based skillset that you want to put to good use, there are no shortage of ways to get out there and help. Often, the hardest part of any programming project is finding a problem to solve, which organizations like Driven tend to help. Hopefully, some of the projects in this post will give you some ideas as well!