In Silico Identification of Protein Targets for Chemical Neurotoxins Using ToxCast in Vitro Data and Read-Across within the QSAR Toolbox
Yaroslav G. Chushak, Jeffery M. Gearhart, and Heather A. Pangburn
Wright-Patterson Air Force Base
Hannah W. Shows
Wright State University
Introduction
Every day humans are exposed to thousands of manufactured chemicals. Some of these chemicals, such as organic solvents or pesticides, can interact with neurological proteins in the brain and cause neurotoxic effects leading to headaches, altered sensation or motor skills, impaired memory and cognitive functions, behavioral problems, even paralysis and death. The neurotoxicity of chemicals greatly depends on their interactions with neurological targets. The recently introduced adverse outcome pathway (AOP) framework links these molecular interactions (Molecular Initiating Event) with a series of key events on different biological levels that result in an adverse outcome effect (Vinken 2013). Within the AOP framework, neurotoxicity can be defined as an adverse effect on the functioning of the nervous system (Bal-Price et al. 2015).
With the recent advances made in the field of in vitro high-throughput screening (HTS), it is now possible to screen the biological activity of large chemical libraries in a cost-efficient and timely manner. In 2006, the US Environmental Protection Agency (EPA) initiated the ToxCast program to develop and evaluate in vitro biochemical and cell-based assays for screening thousands of chemicals at multiple concentrations in the high-throughput mode (Dix et al. 2007). In Phase II of the program, approximately 1,800 compounds were tested in ~900 HTS assays. The chemicals in the library included pesticides, commercial compounds, and some failed pharmaceuticals. In 2008, the ToxCast program was merged with a large multiagency Tox21 collaboration. Under this new program, ~8,400 chemicals were screened in ~70 HTS assays (Tice et al. 2013). These in vitro screenings generated an enormous volume of data which is publicly available at https://www.epa.gov/chemical-research/toxicity- forecaster-toxcasttm-data. Although ToxCast and other in vitro screening programs provide a significant amount of information about the biological activities for thousands of chemicals, a great deal of information for millions of chemicals remains missing. Computational methods together with the HTS data offer a great opportunity to partially address this data gap and identify molecular targets and other endpoints for chemical toxins of interest.
According to the European Chemical Agency (ECHA) guidance on information requirements and chemical safety assessment, two computational methods— (quantitative) structure-activity relationship [(Q)SAR] and grouping of chemicals with read-across—can be used for evaluating intrinsic properties of chemicals (European Chemical Agency 2016). Both methods are based on the similarity principle, i.e., that similar molecules have similar properties and the biological activities are defined by molecular structure (Patlewicz and Fitzpatrick 2016). (Q)SAR methods are statistical in nature as they try to correlate the molecular descriptors of chemicals with their properties. Furthermore, these methods are global in their scope as they build models for all chemicals in the training dataset and make predictions for a wide range of chemicals within the applicability domain. QSAR modeling was applied to develop predictive models based on ToxCast HTS data. Some of the models were successful (Liu et al. 2015; Mansouri and Judson 2016; Mansouri et al. 2016), while other QSAR models yielded low predictive performance (Novotarskyi et al. 2016; Thomas et al. 2012).
Grouping of chemicals into a category and read-across is another important technique for data gap filling in chemical hazard assessment. This approach is local in scope as its predictions are based on the properties of a small set of similar chemicals. The OECD Guidance on Grouping of Chemicals defines a chemical category as a group of chemicals whose physicochemical and toxicological properties are similar or follow a regular pattern as a result of structural similarity (Organization for Economic Co-operation and Development 2014). The similarities may be based on common functional groups, common modes or mechanisms of action, common constituents or chemical classes, etc. Read-across is a technique to predict the unknown properties of chemicals of interest based on the known properties of chemicals in the same chemical group (European Chemical Agency 2016). Grouping of chemicals and the read-across technique are implemented in several freely available tools such as QSAR Toolbox (Dimitrov et al. 2016), Toxmatch (Gallegos-Saliner et al. 2008), and ToxRead (Gini et al. 2014). QSAR Toolbox is a software platform developed by the Organisation for Economic Co-operation and Development (OECD) in collaboration with the ECHA, intended to be used to fill data gaps in hazard assessment of chemicals. QSAR Toolbox v.3.5 has a database with about 200,000 chemicals provided by governmental and commercial institutions. Furthermore, it allows access to import custom databases and use data for hazard assessment. The main aim of the present study was to explore the application of the QSAR Toolbox and data from ToxCast HTS assays to identify and predict molecular interactions of chemical neurotoxins with their targets.
Recently, activities of 86 compounds from the ToxCast library were tested in neuronal cultures on multi-well microelectrode arrays (MEAs) (Valdivia et al. 2014). Activities of these compounds on MEAs were compared with their activities on 20 ToxCast binding assays that measured the interaction of chemicals with 8 different ion channels. In our approach, we identified 123 proteins from ToxCast HTS assays that are related to neurological functions. This set of proteins includes ion channels, G protein-coupled receptors, nuclear receptors, transporters, and enzymes as potential neurological targets. The developed approach was evaluated by predicting neurological targets for pyrethroids and comparing the predicted results with ToxCast screening data.
Materials and Methods
ToxCast Compound Dataset
The Tox21/ToxCast dataset released in October 2015 consists of 9,076 chemicals tested in 1,193 cellular and biochemical assays (US Environmental Protection Agency 2016). These assays were developed across multiple human and animal cell lines by several providers, including Attagene, Inc. (marked as ATG), BioSeek (BSK), NIH Chemical Genomics Center (Tox21), and NovaScreen (NVS), among others. However, not all chemicals were tested in all of the assays. The majority of biochemical assays related to the activity of neurological proteins, such as ligand-gated ion channels and G protein-coupled receptors, were screened in the NovaScreen assay platform. Therefore, for our analysis, we selected a subset of 1,077 chemicals that were all screened in NVS assays. Furthermore, we reduced this subset by eliminating mixtures and compounds without the molecular description of their structure in the SMILES (Simplified Molecular Input Line-Entry System) format. As a result, the final subset contained 1,050 chemicals that were screened in 656 ToxCast HTS assays.
Bioactivity Data Associated with Neurotoxicity
The Tox21/ToxCast HTS assays targeted 342 different proteins. Using the Gene Ontology (GO) database, we have identified that 123 of these proteins have neurological functions. To identify proteins that are related to neuronal functions, we used three terms in the GO search: “neurological,” “synapse,” and “axon.” This search identified 2,499 unique proteins related to neurological functions, and 123 of these proteins were screened in 216 ToxCast assays. Data from these assays were imported into the QSAR Toolbox and used in further analysis. The chemical concentration at half maximum efficacy AC50 (in pM) was used to identify chemical-assay activities. Two sets of data were generated: one set coded with a “1” for active compounds and a “0” for inactive compounds was used for classification, while a second dataset containing AC50s for only active compounds was used for prediction of AC50 values for unknown chemicals of interest.
Performance Evaluation
To evaluate the performance of ToxCast HTS assays on chemical neurotoxins, compounds with known protein interactions from two databases were used: DrugBank (DB) (https://www.drugbank.ca/) and Ki database from the Psychoactive Drug Screening Program (PDSP) (https://kidbdev.med.unc.edu/databases/kidb.php).
The DB database combines detailed drug data with comprehensive drug-target information containing 8,261 drugs and 4,338 nonredundant proteins that are linked to these drug entries (Wishart et al. 2008). Twenty-nine chemicals from the DB database were screened in selected ToxCast assays and were used to evaluate the activity of neurological proteins in ToxCast screening.
The PDSP Ki database, which is funded by the US National Institute of Mental Health Psychoactive Drug Screening Program, serves as a data warehouse for published and internally derived Ki, or affinity, values for a large number of drugs and drug candidates at an expanding number of G protein-coupled receptors, ion channels, transporters, and enzymes (Roth et al. 2000). Currently it has ~60,000 Ki values. Seventeen chemicals from that database were tested on their targets in selected ToxCast HTS assays and were used to evaluate the performance of ToxCast assays.
Another database, the Toxin and Toxin Target Database (http://www.t3db.ca/), also provides mechanisms of toxicity and target proteins for toxins. However, ToxCast HTS data are already included in this database for chemical-protein associations. Therefore, this database information was not used for evaluation of ToxCast screening assays to avoid bias.
Software
Data processing and management were performed using SQLite v.3 SQL database engine (https://www.sqlite.org/). Grouping of chemicals into a category and read- across was performed within OECD QSAR Toolbox v.3.5. The compounds were grouped by organic functional groups and by structural similarity.