Algorithms Are Not Enough: Lessons Bringing Computer Science to Journalism 2inShare By Jonathan StrayJanuary 21, 2014There are some amazing algorithms coming out the computer science community which promise to revolutionize how journalists deal with large quantities of information. But building a tool that journalists can use to get stories done takes a lot more than algorithms. Closing this gap has been one of the most challenging and rewarding aspects of building Overview, and I really think we’ve learned something.Overview is an open-source tool to help journalists sort through vast troves of documents obtained through open government programs, leaks, and Freedom of Information requests. Such document sets can include hundreds of thousands of pages, but you can’t find what you don’t know to search for. To solve this problem, Overview applies natural language processing algorithms to automatically sort documents according to topic and produce an explorable visualization of the complete contents of a document set.I want to get into the process of going from algorithm to application here, because — somewhat to my surprise — I don’t think this process is widely understood. The computer science research community is going full speed ahead developing exciting new algorithms, but seems a bit disconnected from what it takes to get their work used. This is doubly disappointing, because understanding the needs of users often shows that you need a different algorithm.The development of Overview is a story about text analysis algorithms applied to journalism, but the principles might apply to any sort of data analysis system. One definition says data science is the intersection of computer science, statistics, and subject matter expertise. This post is about connecting computer science with subject matter expertise.