An MIT experiment shows that AI tools like ChatGPT can significantly increase the productivity and quality of work of educated professionals, especially those with weaker skills.
The study, conducted by Shakked Noy and Whitney Zhang of the Massachusetts Institute of Technology’s Department of Economics, involved 453 college-educated professionals in fields such as marketing, data analytics, and human resources who were given incentive writing tasks.
Among those assigned to use ChatGPT with GPT 3.5, productivity increased significantly – work was completed 40% faster and the quality of output improved by 18%.
College-educated professionals performing midlevel professional writing tasks substantially increased their productivity when given access to ChatGPT. The generative writing tool increased the output quality of low-ability workers and reduced time spent on tasks for workers of all ability levels.
From the paper
Participants assigned to use ChatGPT reported greater enjoyment of their tasks and were twice as likely to continue using the AI tool in their real jobs two weeks after the experiment, and 1.6 times as likely to continue using it two months later. This suggests that AI tools like ChatGPT can be a valuable asset for professionals in a variety of fields.
ChatGPT excels in clear and persuasive writing
The tasks given to the participants, such as writing press releases or short analytical reports, were designed to resemble actual tasks they might encounter in their daily work. These tasks played to ChatGPT’s strengths, which include clear and persuasive writing.
Half of the participants were randomly assigned to a group that was allowed to use the AI tool for their second task. The other half formed the control group and were instructed to use a different tool (non-AI), but weren’t encouraged to use it for their tasks.
The work of these professionals was then evaluated by other professionals in the same field. They were asked to judge the work as closely as possible to what they would encounter in their usual professional environment. They rated the quality of the work on a scale of 1 to 7.
The researchers collected data and information about how much time the participants spent on the tasks and the quality of their work. They also monitored the participants’ work progress throughout the task to measure activity levels and verify that they were using the AI tool.
In addition, they recorded some contextual variables, such as their dropout rate and employment status, and tracked whether they were HR professionals. The idea was to see if these external factors could affect the outcomes, and they found that these variables didn’t have a significant impact on the results.
However, the study did find some limitations – the AI tool was less suited to tasks that required contextual knowledge or precise factual accuracy, or more vague prompts that required a lot of initiative.
Despite these limitations, many participants reported finding ChatGPT helpful in their real-world work, suggesting that even newer versions of the tool could be increasingly useful. For example, ChatGPT can be taught to use contextual factors in its responses, potentially making it useful in a growing number of professions.
While AI tools like ChatGPT may not make human experts obsolete, the evidence from this study suggests that they can complement human skills, increasing productivity while reducing inequality among workers.
One contradictory finding is that participants who used ChatGPT mostly submitted lightly edited or unedited output from the AI, suggesting that the technology in this experiment replaced rather than augmented human effort.
In any case, the results of Noy and Zhang’s study provide only a glimpse into the immediate productivity effects of AI tools like ChatGPT; they do not address the potential longer-term effects on labor markets and organizational structures. The extent to which GPT can be improved is one of the key questions for its impact.
The real world is more complex than a study setup
The researchers note many limitations to their work. The study focused primarily on specific tasks that required generic writing – clear, persuasive, and not significantly context-dependent.
Although ChatGPT excelled at such tasks, most real-world professional projects require context-specific knowledge and a high degree of factual accuracy, areas where the current iteration of AI writing tools like ChatGPT may fall short, as noted above.
Many of the tasks faced by professionals may also involve more ambiguous goals and instructions, where human creativity, adaptability, and decision-making skills could be critical. Unlike human workers, AI tools cannot fully understand the subtle nuances and complexities of tasks or adapt on their own when instructions are not precise.
Another important limitation is the experiment’s reliance on direct performance-based incentives in the form of bonus payments for output quality. In addition to a base payment ($10), participants could earn an additional bonus of up to $14 based on the quality of their work, with the goal of encouraging high performance. These short-term bonuses may not accurately reflect real-world incentives, which typically include longer-term promotion prospects, professional development goals, and the importance of maintaining a personal style or brand, the researchers write. Such long-term incentives could actually reduce the usefulness of AI writing tools like ChatGPT.
The most glaring omission, it seems to me, is that the study did not consider the time and effort required to fact-check the output of ChatGPT.
Overall, we speculate that, relative to our experimental findings, the direct productivity effects of ChatGPT in the real economy will be somewhat lower and the technology will be more strongly complementary to human workers. To what extent either of these is true remains an open question.
From the paper
While AI tools can speed up the process and even improve the quality of writing, the fact-checking required could offset these time-saving benefits, especially for tasks that require precise factual accuracy. Future advances in AI technology would need to significantly improve in this area to be more reliably useful.
AI’s potential impact on the job market
The potential for increased demand for services, as they become cheaper due to AI-enhanced productivity, is a critical factor that will mediate the productivity effects of AI technologies like ChatGPT, the researchers speculate.
For example, if the price of programming services were to drop significantly as a result of AI efficiency, this could potentially lead to a surge in demand and, consequently, an increase in employment in the sector. However, this may not be the case for professions such as advertising or communications. Demand could remain flat and employment could decline as fewer workers are needed.
The introduction of AI may also affect the composition of the workforce. Roles within organizations may shift, given the creativity and judgment required in professions such as advertising. The question is whether, for example, junior roles could be dedicated to fine-tuning AI outputs, while senior roles would provide conceptual input. This could significantly reshape how industries operate and how they employ people.
The role of AI in fields such as programming could have different implications for wages. If the AI tool requires an expert programmer to review its output and refine prompts, wages could rise due to the increased demand for expert professionals. However, if only basic programming skills are required to work with AI, this could lead to an oversupply of skilled programmers, potentially resulting in lower wages despite increased productivity.
Traditional promotion and hiring practices could also be affected by the widespread adoption of AI technologies such as ChatGPT. For example, AI tools could change the way performance is evaluated by recording and observing employees. This process could affect pay scales and hiring practices, in addition to changing organizational culture around effort and performance.