Supporting Block Party & NYC Community Boards )
by Breanna Green
The Challenge
Key to local democracy, New York City community boards are meant to act as representative bodies. The 59 community boards throughout NYC each employ up to 50 unsalaried members, whose main responsibility (among many others) is to receive complaints from community residents, address items of concern to the community, and conduct monthly community board meetings. Much discourse relevant to a local community occurs during these community board meetings, which anyone can attend! Unfortunately, many New Yorkers are unaware of how community boards shape their local living experience, may have never attended a community board meeting, nor do they know they even exist.
Block Party, a New York City-centered volunteer-supported organization, is working to change this. During the COVID-19 pandemic, when videos of many community board meetings were posted to YouTube, BlockParty founder Sarah Sachs built a tool to automatically summarize the text of community board meetings using speech-to-text efforts of YouTube, Artificial Intelligence (AI), and Natural Language Processing (NLP) and to automatically disseminate the resulting summary in the free Block Party newsletter mailing list for each community board. Over time, the Block Party Team was able to build up an archive of community board meeting transcript data dating back to March 2020.
Automatic text summarization is the task of producing a concise and fluent summary while preserving key content and overall meaning. While the full transcripts are often quite long and difficult to digest, Block Party extracts the most representative sentences and produces a general summary that is more informative. However, extracting key sentences does not always produce a summary that is fluent and flows well conceptually. With the recent advent of large language models (LLMs), Block Party wanted to improve the meeting transcript summarization process to help bring more succinct, human-readable summaries to their newsletter mailing list subscribers. It was my pleasure to support their cause this summer as a Siegel Family Endowment PiTech PhD Impact Fellow.
The Project
During my tenure as a PiTech Impact Fellow, I was able to leverage my research and expertise in NLP in order to support the meeting summarization process. I worked closely with Sarah, brainstorming ways we might envision a new summarization process utilizing LLMs.
Where much of Block Party’s previous work at summarization was extractive, we were looking towards a more abstractive process1. LLMs have become the state-of-the-art method for this task. For example, the recent introduction and advances of ChatGPT with summarization has attracted significant interest even beyond the NLP community. However, concerns regarding factuality and faithfulness have hindered its practical applications for summarization systems2. Therefore, we began with experimentation, using various established HuggingFace Transformers such as Pegasus, BART, and T5 to see if an abstractive summarization using LLMs could yield accurate results. Additionally, we held multiple stakeholder interviews including among local newsrooms, community board members, and tech experts who might be interested in this task and its implementation.
As we iterated through each language model and set of sample transcripts, we found instances of misrepresented text or full hallucinations. Given the nature of community board meetings and Block Party’s goal to remain true to the content of the meeting, this result was concerning. We took a step back and reimagined the summarization process completely. Where we originally hoped to fully rely on LLMs for this task, we recognized that it may not work in our use case, when dealing with messy conversational text data generated from the speech-to-text transcription of YouTube. In some situations, the LLMs produced fluent and concise summaries but in other situations we found the extractive process to be more informative and useful.
Proposed Solution
Thus, our ultimate proposed solution involved a mixture of both extractive and abstractive summarization techniques. We begin by chunking the full transcript and extracting a larger set of informative key sentences than previously done. Once extracted, we concatenate the chunks back into a larger document to then pass into an LLM. The larger document is essentially shortened and paraphrased by the LLM. Finally, we clean and edit the text and extract informed sentences once more. Not only do the newer summaries stay true to content, but they also read more fluently. There is still much more improvement to be done, but we are excited about this new process and the possibility for even more improvements.
Impact and Path Forward
While my role as Cornell Tech PiTech Impact Fellow has come to an end, the connection and community gained from working with Block Party is immeasurable. I am thrilled to note that I will be joining the Block Party team beyond the summer and fellowship! We plan to continue working together to explore new ways for expanding the summarization work. As community boards across NYC move from online platforms back to in-person meetings, we seek to identify and support ways to continue summarizing information in a centralized location, to the benefit of New Yorkers and local communities.
Citations & Resources
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268.
Zhang, H., Liu, X., & Zhang, J. (2023). Extractive summarization via chatgpt for faithful summary generation. arXiv preprint arXiv:2304.04193.
Block Party: A Platform to Explore NYC Community Board Meetings