Fragmentation in Computational Quantum Chemistry: Developing Pyfragment

Student: Syed Sharique Ahmed ’22
Research Mentor: John Herbert (The Ohio State University Department of Chemistry and Biochemistry

Computational quantum chemistry (that’s a mouthful!) is expensive business. Calculating different properties of large molecules (bunch of atoms ­— you know the building blocks of matter) can take a long time on the supercomputer (extremely high performing computer on steroids). Fragmentation is like breaking up a big task into small pieces and different people can do the task simultaneously making it faster and easier to do. My project was to help develop a software that can do fragmentation and has other cool features that can help research in the field and eventually help students study as well.

Computational quantum chemistry calculations scale nonlinearly and to get an accuracy of 1 kcal/mol the time cost is O(N7). To put this in perspective, doubling the system size from one molecule to two molecules leads to a 27 increase in computational time (calculations take 128 times longer). Additionally, there is a storage cost of O(N4) associated with the same methods. A possible solution to calculating properties of large molecules and systems like protein chains is fragmentation. Fragmentation involves breaking down large molecules into smaller subunits. Computational can be parallelized and each fragments property like energy can be calculated on s different node reducing wall time. Different levels of theory can be applied in layers to achieve a tradeoff between accuracy and computational time. Over the summer, the software developed by the Herbert group — PyFragment was interfaced with MOPAC, a quantum chemistry package for semiempirical methods. Additionally, distance-based screening was implemented. The tasks involved coding in Python, Yaml, and knowledge of regular expressions, GitLab (collaborative development with version control), SQLite to navigate the new database implementation, and command line prompts. Upcoming tasks include energy screening, extracting matrices-based properties from text files and eventually gradients and dependency trees.