Data analytics and mining
Coupling a data server to a computational notebook like Jupyter offers the possibility of performing analytics on the data generated and compiled within the platform. The platform allows us to import classes of compound, run various tools in the batch queuing system, and parse out important information from the outputs. Once a set of results is available, the notebook environment of the platform and its binding to ChemML allows users to analyze and mine the data in its entirety, rather than only referring to individual results. This statistical workup allows for benchmarking insights into the performance (accuracy and timing) of different simulation techniques, algorithms, and codes. More importantly, the data sets can also be mined to generate data-derived machine learning prediction models that can serve as surrogates for the underlying physics-based models as discussed above. The notebook platform also provides plotting functionality and access to other graphic representations. An example of this strategy is shown in the Fig. \ref{466490}, where ChemML results for a list of SMILES are compared in a plot.