\documentclass[twocolumn,twoside]{article}
\makeatletter\if@twocolumn\PassOptionsToPackage{switch}{lineno}\else\fi\makeatother
\usepackage{amsfonts,amssymb,amsbsy,latexsym,amsmath,tabulary,graphicx,times,xcolor}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Following additional macros are required to function some
% functions which are not available in the class used.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage{url,multirow,morefloats,floatflt,cancel,tfrupee}
\makeatletter
\AtBeginDocument{\@ifpackageloaded{textcomp}{}{\usepackage{textcomp}}}
\makeatother
\usepackage{colortbl}
\usepackage{xcolor}
\usepackage{pifont}
\usepackage[nointegrals]{wasysym}
\urlstyle{rm}
\makeatletter
%%%For Table column width calculation.
\def\mcWidth#1{\csname TY@F#1\endcsname+\tabcolsep}
%%Hacking center and right align for table
\def\cAlignHack{\rightskip\@flushglue\leftskip\@flushglue\parindent\z@\parfillskip\z@skip}
\def\rAlignHack{\rightskip\z@skip\leftskip\@flushglue \parindent\z@\parfillskip\z@skip}
%Etal definition in references
\@ifundefined{etal}{\def\etal{\textit{et~al}}}{}
%\if@twocolumn\usepackage{dblfloatfix}\fi
\usepackage{ifxetex}
\ifxetex\else\if@twocolumn\@ifpackageloaded{stfloats}{}{\usepackage{dblfloatfix}}\fi\fi
\AtBeginDocument{
\expandafter\ifx\csname eqalign\endcsname\relax
\def\eqalign#1{\null\vcenter{\def\\{\cr}\openup\jot\m@th
\ialign{\strut$\displaystyle{##}$\hfil&$\displaystyle{{}##}$\hfil
\crcr#1\crcr}}\,}
\fi
}
%For fixing hardfail when unicode letters appear inside table with endfloat
\AtBeginDocument{%
\@ifpackageloaded{endfloat}%
{\renewcommand\efloat@iwrite[1]{\immediate\expandafter\protected@write\csname efloat@post#1\endcsname{}}}{\newif\ifefloat@tables}%
}%
\def\BreakURLText#1{\@tfor\brk@tempa:=#1\do{\brk@tempa\hskip0pt}}
\let\lt=<
\let\gt=>
\def\processVert{\ifmmode|\else\textbar\fi}
\let\processvert\processVert
\@ifundefined{subparagraph}{
\def\subparagraph{\@startsection{paragraph}{5}{2\parindent}{0ex plus 0.1ex minus 0.1ex}%
{0ex}{\normalfont\small\itshape}}%
}{}
% These are now gobbled, so won't appear in the PDF.
\newcommand\role[1]{\unskip}
\newcommand\aucollab[1]{\unskip}
\@ifundefined{tsGraphicsScaleX}{\gdef\tsGraphicsScaleX{1}}{}
\@ifundefined{tsGraphicsScaleY}{\gdef\tsGraphicsScaleY{.9}}{}
% To automatically resize figures to fit inside the text area
\def\checkGraphicsWidth{\ifdim\Gin@nat@width>\linewidth
\tsGraphicsScaleX\linewidth\else\Gin@nat@width\fi}
\def\checkGraphicsHeight{\ifdim\Gin@nat@height>.9\textheight
\tsGraphicsScaleY\textheight\else\Gin@nat@height\fi}
\def\fixFloatSize#1{}%\@ifundefined{processdelayedfloats}{\setbox0=\hbox{\includegraphics{#1}}\ifnum\wd0<\columnwidth\relax\renewenvironment{figure*}{\begin{figure}}{\end{figure}}\fi}{}}
\let\ts@includegraphics\includegraphics
\def\inlinegraphic[#1]#2{{\edef\@tempa{#1}\edef\baseline@shift{\ifx\@tempa\@empty0\else#1\fi}\edef\tempZ{\the\numexpr(\numexpr(\baseline@shift*\f@size/100))}\protect\raisebox{\tempZ pt}{\ts@includegraphics{#2}}}}
%\renewcommand{\includegraphics}[1]{\ts@includegraphics[width=\checkGraphicsWidth]{#1}}
\AtBeginDocument{\def\includegraphics{\@ifnextchar[{\ts@includegraphics}{\ts@includegraphics[width=\checkGraphicsWidth,height=\checkGraphicsHeight,keepaspectratio]}}}
\DeclareMathAlphabet{\mathpzc}{OT1}{pzc}{m}{it}
\def\URL#1#2{\@ifundefined{href}{#2}{\href{#1}{#2}}}
%%For url break
\def\UrlOrds{\do\*\do\-\do\~\do\'\do\"\do\-}%
\g@addto@macro{\UrlBreaks}{\UrlOrds}
\edef\fntEncoding{\f@encoding}
\def\EUoneEnc{EU1}
\makeatother
\def\floatpagefraction{0.8}
\def\dblfloatpagefraction{0.8}
\def\style#1#2{#2}
\def\xxxguillemotleft{\fontencoding{T1}\selectfont\guillemotleft}
\def\xxxguillemotright{\fontencoding{T1}\selectfont\guillemotright}
\newif\ifmultipleabstract\multipleabstractfalse%
\newenvironment{typesetAbstractGroup}{}{}%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage[authoryear]{natbib}
\newif\ifnokeywords\nokeywordsfalse
\makeatletter\input{size10-pointfive.clo}\makeatother%
\definecolor{kwdboxcolor}{RGB}{242,242,242}
\usepackage[hidelinks,colorlinks=true,allcolors=blue]{hyperref}
\linespread{1}
\def\floatpagefraction{0.8}
\usepackage[paperheight=11.69in,paperwidth=8.26in,top=1in,bottom=1in,left=1in,right=.75in,headsep=24pt]{geometry}
\usepackage{multirow-custom}
\makeatletter
\def\hlinewd#1{%
\noalign{\ifnum0=`}\fi\hrule \@height #1%
\futurelet\reserved@a\@xhline}
\def\tbltoprule{\hlinewd{1pt}\\[-14pt]}
\def\tblbottomrule{\noalign{\vspace*{6pt}}\hline\noalign{\vspace*{2pt}}}
\def\tblmidrule{\hline\noalign{\vspace*{2pt}}}
\let\@articleType\@empty
\let\@journalDoi\@empty
\let\@journalVolume\@empty
\let\@journalIssue\@empty
\let\@crossMarkLink\@empty
\let\@receivedDate\@empty
\let\@acceptedDate\@empty
\let\@revisedDate\@empty
\let\@copyrightYear\@empty
\let\@firstPage\@empty
\def\articleType#1{\gdef\@articleType{#1}}
\def\journalDoi#1{\gdef\@journalDoi{#1}}
\def\crossMarkLink#1{\gdef\@crossMarkLink{#1}}
\def\receivedDate#1{\gdef\@receivedDate{#1}}
\def\acceptedDate#1{\gdef\@acceptedDate{#1}}
\def\revisedDate#1{\gdef\@revisedDate{#1}}
\def\copyrightYear#1{\gdef\@copyrightYear{#1}}
\def\journalVolume#1{\gdef\@journalVolume{#1}}
\def\journalIssue#1{\gdef\@journalIssue{#1}}
\def\firstPage#1{\gdef\@firstPage{#1}}
\def\author#1{%
\gdef\@author{%
\hskip-\dimexpr(\tabcolsep)\hskip5pt%
\parbox{\dimexpr\textwidth-1pt}%
{\fontsize{11}{13}\selectfont\raggedright #1}%
}%
}
\usepackage{pharmascope-abs}
\usepackage{caption}
\usepackage{lastpage}
\usepackage{fancyhdr}
\usepackage[noindentafter,explicit]{titlesec}
\usepackage{fontspec}
\setmainfont[%
BoldFont=cambriab.otf,%
ItalicFont=CAMBRIAI.otf,%
BoldItalicFont=CAMBRIAZ.otf]{Cambria.otf}
\lefthyphenmin = 3
\def\title#1{%
\gdef\@title{%
\vspace*{-40pt}%
\ifx\@articleType\@empty\else{\fontsize{10}{12}\scshape\selectfont\hspace{8pt}\@articleType\hfill\mbox{}\par\vspace{2pt}}\fi%
\minipage{\linewidth}
\hrulefill\\[-0.7pt]%
\mbox{~}\hspace{5pt}\parbox{.1\linewidth}{\includegraphics[width=75pt,height=50pt]{ijrps_logo.png}}\hfill
\fcolorbox{kwdboxcolor}{kwdboxcolor}{\parbox{.792\linewidth}{%
\begin{center}\fontsize{17}{17}\selectfont\scshape\vskip-7pt International Journal of Research in Pharmaceutical Sciences\hfill\end{center}%
\vspace*{-10pt}\hspace*{4pt}{\fontsize{8}{9}\selectfont Published by JK Welfare \& Pharmascope Foundation\hfill Journal Home Page: \href{http://www.pharmascope.org/ijrps}{\color{blue}\underline{\smash{www.pharmascope.org/ijrps}}}}\hspace*{4pt}\mbox{}}}%
\par\vspace*{-1pt}\rule{\linewidth}{1.3pt}%
\endminipage%
\par\vspace*{9.2pt}\parbox{.98\linewidth}{\linespread{.9}\raggedright\fontsize{14}{17}\selectfont #1}%
\vspace*{-8pt}%
}
}
\setlength{\parindent}{0pt}
\setlength{\parskip}{0.4pc plus 1pt minus 1pt}
\def\abbrvJournalTitle{Int. J. Res. Pharm. Sci.}
\fancypagestyle{headings}{%
\renewcommand{\headrulewidth}{0pt}%
\renewcommand{\footrulewidth}{0.3pt}
\fancyhf{}%
\fancyhead[R]{%
\fontsize{9.12}{11}\selectfont\RunningAuthor,\ \abbrvJournalTitle,\ \ifx\@journalVolume\@empty X\else\@journalVolume\fi%
\ifx\@journalIssue\@empty\else(\@journalIssue)\fi%
,\ \ifx\@firstPage\@empty 1\else\@firstPage\fi-\pageref*{LastPage}%
}%
\fancyfoot[LO,RE]{\fontsize{9.12}{11}\selectfont\textcopyright\ International Journal of Research in Pharmaceutical Sciences}%
\fancyfoot[RO,LE]{\fontsize{9.12}{11}\selectfont\thepage}
}\pagestyle{headings}
\fancypagestyle{plain}{%
\renewcommand{\headrulewidth}{0pt}%
\renewcommand{\footrulewidth}{0.3pt}%
\fancyhf{}%
\fancyhead[R]{%
\fontsize{9.12}{11}\selectfont\RunningAuthor,\ \abbrvJournalTitle,\ \ifx\@journalVolume\@empty X\else\@journalVolume\fi%
\ifx\@journalIssue\@empty\else(\@journalIssue)\fi%
,\ \ifx\@firstPage\@empty 1\else\@firstPage\fi-\pageref*{LastPage}%
}%
\fancyfoot[LO,RE]{\fontsize{9.12}{11}\selectfont\textcopyright\ International Journal of Research in Pharmaceutical Sciences}%
\fancyfoot[RO,LE]{\fontsize{9.12}{11}\selectfont\thepage}
\ifx\@firstPage\@empty\else\setcounter{page}{\@firstPage}\fi
}
\def\NormalBaseline{\def\baselinestretch{1.1}}
\usepackage{textcase}
\setcounter{secnumdepth}{0}
\titleformat{\section}[block]{\bfseries\boldmath\NormalBaseline\filright\fontsize{10.5}{13}\selectfont}
{\thesection}
{6pt}
{\MakeTextUppercase{#1}}
[]
\titleformat{\subsection}[block]{\bfseries\boldmath\NormalBaseline\filright\fontsize{10.5}{12}\selectfont}
{\thesubsection}
{6pt}
{#1}
[]
\titleformat{\subsubsection}[block]{\NormalBaseline\filright\fontsize{10.5}{12}\selectfont}
{\thesubsubsection}
{6pt}
{#1}
[]
\titleformat{\paragraph}[block]{\NormalBaseline\filright\fontsize{10.5}{10}\selectfont}
{\theparagraph}
{6pt}
{#1}
[]
\titleformat{\subparagraph}[block]{\NormalBaseline\filright\fontsize{10.5}{12}\selectfont}
{\thesubparagraph}
{6pt}
{#1}
[]
\titlespacing{\section}{0pt}{.5\baselineskip}{.5\baselineskip}
\titlespacing{\subsection}{0pt}{.5\baselineskip}{.5\baselineskip}
\titlespacing{\subsubsection}{0pt}{.5\baselineskip}{.5\baselineskip}
\titlespacing{\paragraph}{0pt}{.5\baselineskip}{.5\baselineskip}
\titlespacing{\subparagraph}{0pt}{.5\baselineskip}{.5\baselineskip}
\captionsetup[figure]{skip=1.4pt,font=bf,labelsep=colon,justification=raggedright,singlelinecheck=false}
\captionsetup[table]{skip=1.4pt,font=bf,labelsep=colon,justification=raggedright,singlelinecheck=false}
\def\bibyear#1{#1}
%\def\bibjtitle#1{#1} %%Publisher request
\def\bibauand{}
\setlength\bibsep{3pt}
\setlength\bibhang{8pt}
\makeatother
\date{}
\usepackage{float}
\begin{document}
\def\RunningAuthor{Ani R et al.}
\firstPage{6273}
\articleType{Original Article}
\receivedDate{22 Jul 2020}
\acceptedDate{22 Aug 2020}
\revisedDate{21 Aug 2020}
\journalVolume{2020, 11}
\journalIssue{4}
\journalDoi{ijrps.v11i4.3310}
\copyrightYear{2020}
\def\authorCount{4}
\def\affCount{2}
\def\journalTitle{International Journal of Research in Pharmaceutical Sciences}
\title{\textbf{In Silico Prediction Tool for Drug-likeness of Compounds based on Ligand Based Screening}}
\author{Ani~R\textsuperscript{*}\textsuperscript{1},
Anand~P~S\textsuperscript{1},
Sreenath~B\textsuperscript{1},
Deepa~O~S\textsuperscript{2}~\\[5pt]{\textsuperscript{1}Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam-690525, India}~\\{\textsuperscript{2}Department of Mathematics, Amrita Vishwa Vidyapeetham, Coimbatore-641112, Tamil Nadu, India}}
\begin{abstract}
Drug Likeness prediction is a time-consuming and tedious process. An in-vitro method the drug development takes a long time to come to market. The failure rate is also another one to think about in this method. There are many in-silico methods currently available and developing to help the drug discovery and development process. Many online tools are available for predicting and classifying a drug after analyzing the drug-likeness properties of compounds. But most tools have their advantages and disadvantages. In this study, a tool is developed to predict the drug-likeness of compounds given as input to this software. This may help the chemists in analyzing a compound before actually preparing a compound for the drug discovery process. The tool includes both descriptor-based calculation and fingerprint-based calculation of the particular compounds. The descriptor-calculation also includes a set of rules and filters like Lipinski's rule, Ghose filter, Veber filter and BBB likeness. The previous studies proved that the fingerprint-based prediction is more accurate than descriptor-based prediction. So, in the current study, the drug-likeness prediction tool incorporated the molecular descriptors and fingerprint-based calculations based on five different fingerprint types. The current study incorporated five different machine learning algorithms for prediction of drug-likeness and selected the algorithm, which has a high accuracy rate. When a chemist inputs a particular compound in SMILES format, the drug-likeness prediction tool predicts whether the given candidate compound is drug or non-drug.
\end{abstract}\def\keywordstitle{Keywords}
\begin{keywords}Drug likeness prediction,\newline ligand-based virtual screening,\newline QSAR,\newline molecular fingerprints,\newline machine learning,\newline prediction accuracy
\end{keywords}
\twocolumn[ \maketitle {\printKwdAbsBox}]
\makeatletter\textsuperscript{*}Corresponding Author\par Name:\ Ani~R~\\ Phone:\ -~\\ Email:\ anir@am.amrita.edu
\par\vspace*{-11pt}\hrulefill\par{\fontsize{12}{14}\selectfont ISSN: 0975-7538}\par%
\textsc{DOI:}\ \href{https://doi.org/10.26452/\@journalDoi}{\textcolor{blue}{\underline{\smash{https://doi.org/10.26452/\@journalDoi}}}}\par%
\vspace*{-11pt}\hrulefill\\{\fontsize{9.12}{10.12}\selectfont Production and Hosted by}\par{\fontsize{12}{14}\selectfont Pharmascope.org}\par%
\vspace*{-7pt}{\fontsize{9.12}{10.12}\selectfont\textcopyright\ \@copyrightYear\ $|$ All rights reserved.}\par%
\vspace*{-11pt}\rule{\linewidth}{1.2pt}
\makeatother
\section{Introduction}
Nowadays we can find out different types of software and tools for predicting the drug-likeness on the web. But the problem with these tools are they are typically based on one method, even if there are other methods are available for drug-likeness prediction. Generally, they focus on only descriptor-based calculation or fingerprint-based calculation or docking. In our further studies, we found out the importance of a tool which incorporate all these methodologies. If we create such a tool, the accuracy rate is very high in the prediction of drug-likeness, or we can derive much better conclusion for the chemist who is eagerly waiting for the results. The fact is that there doesn't exist any other tools that incorporate all these methods. This increases the importance of our project. DruLiTo, Swiss ADME and Drug mint are the well-known drug-likeness prediction tools that are existing on the web into the problems of these tools:
DruLiTo is an open-source virtual screening tool for the drug-likeness prediction. The problem with this tool is, the tool gets freezes when processing some compounds. Drulito only contains the rules and filters for descriptor-based prediction. Another tool called SwissADME has the same problem which is the tool only focusing on descriptor-based calculation. Various studies and papers prove the accuracy of fingerprint-based prediction. So, for more accurate values, we need to combine these two methodologies. Drug Mint is a web server, which integrates these two methodologies, but it has a limited number of fingerprint types.
\begin{table}[!htbp]
\caption{\boldmath {Lipinski Rule} }
\label{tw-9a3b331852c8}
\def\arraystretch{1.1}
\ignorespaces
\centering
\begin{tabulary}{\linewidth}{p{\dimexpr.51870000000000005\linewidth-2\tabcolsep}p{\dimexpr.48129999999999995\linewidth-2\tabcolsep}}
\tbltoprule \rowcolor{kwdboxcolor}\multicolumn{2}{p{\dimexpr(1\linewidth-2\tabcolsep)}}{\cAlignHack Lipinski Rule}\\
\tblmidrule
\cAlignHack logP &
\cAlignHack {\textless}= 5\\
\cAlignHack Molecular Weight &
\cAlignHack {\textless}= 500\\
\cAlignHack H-Bond Donor &
\cAlignHack {\textless}= 5\\
\cAlignHack H-Bond Acceptor &
\cAlignHack {\textless}= 10\\
\tblbottomrule
\end{tabulary}\par
\end{table}
The fact is that there doesn't exist any other tools that incorporate all these methods effectively. This increases the importance of our Drug Likeness Prediction Tool. The tool incorporates different filters and rules for descriptor prediction. Because by using any single filter or rule, we can't be able to make a better prediction. For example, when we use Lipinski's rule alone, the prediction of the drug-likeness is not that much accurate. When using only this rule for the prediction, we got 50\% {\textemdash} 80\% wrong information in the output. By combining these filters and tools, we can reduce the wrong information and derive much better results. When coming to the fingerprint-based prediction, we use five different types of fingerprints with five different algorithms. So, the tool can figure out which is the most accurate algorithm for the five fingerprints and present the best-predicted output to the chemist.
\textbf{QSAR}
QSAR is a technique that tries to predict the activity, reactivity, and properties of an unknown set of molecules such as the binding affinity or hypothetical molecules or the toxic potential of existing with the help of effective molecular descriptors. The idea behind the QSAR modeling is that the candidate molecules which have the same molecular features will trigger the same biological responses. This area of research is trying to establish a relation between structural and electronics characteristics among candidate molecules. We are well-known with the ADMET properties. This is the mandatory features of a drug compound. The ADMET includes adsorption, distribution, metabolism, excretion, and toxicity. Nowadays through QSAR modeling, we can also predict the candidate molecule is a drug or non-drug \unskip~\citep{867653:20160943}
\begin{table}[!htbp]
\caption{\boldmath {Ghose Filter} }
\label{tw-598a4ee4cda4}
\def\arraystretch{1.1}
\ignorespaces
\centering
\begin{tabulary}{\linewidth}{p{\dimexpr.5042\linewidth-2\tabcolsep}p{\dimexpr.4958\linewidth-2\tabcolsep}}
\tbltoprule \rowcolor{kwdboxcolor}\multicolumn{2}{p{\dimexpr(1\linewidth-2\tabcolsep)}}{\cAlignHack Ghose Filter}\\
\tblmidrule
\cAlignHack Atom Count &
\cAlignHack 20 to 70\\
\cAlignHack Molecular Weight &
\cAlignHack 160 to 480\\
\cAlignHack Molecular Refractivity &
\cAlignHack 40 to 130\\
\cAlignHack logP &
\cAlignHack -0.4 to 5.6\\
\tblbottomrule
\end{tabulary}\par
\end{table}
The initial phases of the QSAR technique are focused on the odd features of a molecule. Because of that we can estimate the biological responses on a particular molecule. This initial approach is called as 1D-QSAR. The researchers in this field extended their studies by binding more than one molecule. It leads to the finding of the 2D-QSAR technique \unskip~\citep{867653:20160924}.
The much better version of QSAR, known as 3D-QSAR technique maps the particular compound interactivity with the biological and chemical features into 3-D vector area. The main drawback is with 3-D QSAR is we are incapable to anticipate the exact location of the corresponding molecule. It is because the molecule is plotted into a 3-D vector space, that includes other replacement molecules. It is very hard to locate the target molecule without the structural information. By using alignment descriptors we can solve this problem \unskip~\citep{867653:20160932}.
\begin{table}[!htbp]
\caption{\boldmath {CMC-50 Likeness} }
\label{tw-9415c513acd2}
\def\arraystretch{1.1}
\ignorespaces
\centering
\begin{tabulary}{\linewidth}{p{\dimexpr.5083\linewidth-2\tabcolsep}p{\dimexpr.4917\linewidth-2\tabcolsep}}
\tbltoprule \rowcolor{kwdboxcolor}\multicolumn{2}{p{\dimexpr(1\linewidth-2\tabcolsep)}}{\cAlignHack CMC-50 Likeness}\\
\tblmidrule
\cAlignHack Atom Count &
\cAlignHack 30 to 55\\
\cAlignHack Molecular Weight &
\cAlignHack 230 to 390\\
\cAlignHack Molecular Refractivity &
\cAlignHack 70 to 110\\
\cAlignHack logP &
\cAlignHack 1.3 to 4.1\\
\tblbottomrule
\end{tabulary}\par
\end{table}
The 4-D QSAR technique is developed to address the alignment issue in the 3-D QSAR and 4-D QSAR is also an extension of 3-D QSAR technique. This technique solve alignment problem by constituting each molecules in non-identical conformation, orientation and protonation. Then the fundamental QSAR algorithm will locate the particular molecule from the molecules represented in different positions. When comparing to other QSAR models,4-D QSAR is more practicable with different binding targets and it also solves the alignment issue. \unskip~\citep{867653:20160921,867653:20160940}.
\textbf{D}\textbf{escriptors} \textbf{\space }
We discussed how we screen the candidate molecules in the drug discovery process. Now we are going to discuss descriptors which identify or categorizes the given compound into drug or non-drug. QSAR (quantitative structure-activity relations) models started to play a major role in drug discovery because of the cost of the methods like high throughput screening. This model requires good molecular descriptors which provides information about the molecular features for the target candidate molecule. Molecular descriptors are different in the basis of its underlying algorithm uses for the calculation and the type of molecular representation. Geometrical descriptors, topological indices, physicochemical and constitutional descriptors are some of the well-known types of descriptors.
\begin{table}[!htbp]
\caption{\boldmath {Veber Rule} }
\label{tw-0ce72683ab1e}
\def\arraystretch{1.1}
\ignorespaces
\centering
\begin{tabulary}{\linewidth}{p{\dimexpr.52900000000000006\linewidth-2\tabcolsep}p{\dimexpr.4710000000000001\linewidth-2\tabcolsep}}
\tbltoprule \rowcolor{kwdboxcolor}\multicolumn{2}{p{\dimexpr(.10000000000000001\linewidth-2\tabcolsep)}}{\cAlignHack Veber Rule}\\
\tblmidrule
\cAlignHack Rotatable Bond &
\cAlignHack {\textless}= 10\\
\cAlignHack TPSA &
\cAlignHack {\textless}= 140\\
\tblbottomrule
\end{tabulary}\par
\end{table}
\textbf{Topological Descriptors}
These descriptors are the 2D descriptors that mainly concerned with the internal alignment of molecular compounds. They considered as structure explicit descriptors because these descriptors are derived from the topological representation of compounds. In numerical form, it encodes the features like shape, presence of hetero atoms, multiple bonds, size and branching. It generally represents the connectivity of atoms or bonds. Because of including these kinds of features and properties it has a major role in biological activities, pharmacokinetic properties and physicochemical properties of the respective compounds. Numerical graph calculations are very necessary for the calculation of the topological descriptors because graphs contain a non-numeric form of compound molecular substructure. The commonly used topological descriptors are connectivity indices, Balaban j index, Zagreb indices, wiener index and kier shape. These descriptors categorize the compound molecules based on the shape, size, branching and flexibility.
\textbf{Geometrical Descriptors}
Geometric descriptors are calculated from the atoms 3D coordinate in a specific compound molecule. When comparing with topological descriptors, this kind of descriptors provides more information and discrimination when coming to the comparing compounds having the same structures. For geometric optimization a geometric overhead is needed to use in this kind of descriptors it will lead to exploiting the new pieces of information about flexible molecules having several molecular conformations. It increases complexity also. For this specific reason, these descriptors need alignment rules for comparing the candidate molecules.
\begin{table}[!htbp]
\caption{\boldmath {MDDR Likeness} }
\label{tw-3a65d9abdf8a}
\def\arraystretch{1.1}
\ignorespaces
\centering
\begin{tabulary}{\linewidth}{p{\dimexpr.5104\linewidth-2\tabcolsep}p{\dimexpr.4896\linewidth-2\tabcolsep}}
\tbltoprule \rowcolor{kwdboxcolor}\multicolumn{2}{p{\dimexpr(1\linewidth-2\tabcolsep)}}{\cAlignHack MDDR Likeness}\\
\tblmidrule
\cAlignHack No of Rings &
\cAlignHack {\textgreater}= 3\\
\cAlignHack No of Rigid Bonds &
\cAlignHack {\textgreater}= 18\\
\cAlignHack No of Rotatable Bonds &
\cAlignHack {\textgreater}= 6\\
\tblbottomrule
\end{tabulary}\par
\end{table}
\textbf{Physicochemical Descriptors}
From the 2D structure, we can analyze different physical and chemical properties. These properties influence drug activities in the body. The proper features of the drug might increase market demand. So, by evaluating the chemical and physical properties we can also assist the drug discovery process by identifying the compound selected. This part indicates that we need to pay attention in physicochemical properties like solubility, permeability that decides the optimal potency \unskip~\citep{867653:20160933,867653:20160936}.
\textbf{D}\textbf{escriptor }\textbf{R}\textbf{ules} \textbf{\space }
There are several descriptor prediction rules available. For our Drug Likeness Prediction tool we took six descriptor rules. The conditions for a compound to satisfy Lipinski rule and Ghose filter are shown in Tables~\ref{tw-9a3b331852c8} and~\ref{tw-598a4ee4cda4} respectively. Tables~\ref{tw-9415c513acd2} and~\ref{tw-0ce72683ab1e} shows the requirements of a compound to satisfy the CMC-50 rule and Veber rule respectively. The other two descriptor rules MDDR and BBB likeness conditions that should be satisfied by a compound are given in Tables~\ref{tw-3a65d9abdf8a} and~\ref{tw-b9e8b3246857}.
\begin{table}[!htbp]
\caption{\boldmath {BBB Likeness} }
\label{tw-b9e8b3246857}
\def\arraystretch{1.1}
\ignorespaces
\centering
\begin{tabulary}{\linewidth}{p{\dimexpr.5125\linewidth-2\tabcolsep}p{\dimexpr.4875\linewidth-2\tabcolsep}}
\tbltoprule
\cAlignHack Molecular Weight &
\cAlignHack {\textless}= 400\\
\cAlignHack H-Bonds &
\cAlignHack {\textless}= 8\\
\cAlignHack No of Acids &
\cAlignHack 0\\
\tblbottomrule
\end{tabulary}\par
\end{table}
\textbf{Fingerprints} \textbf{\space }
So, we discussed the descriptors and their calculations that are used for the descriptor-based calculation. Now we are going to have an in-depth look into the fingerprint prediction part of our tool.
The most common problem when trying to measure the similarity between two molecules is the complexity in its molecular representation. To measure the similarity between two molecules in a computationally more straightforward manner, we have to minimize the complexity in representing molecules. The commonly used easier and simple representation of molecules is molecular fingerprints. The molecular fingerprints convert the molecule into a sequence of bits; by this simple representation, we can easily compare the similarity between two molecules.
\begin{table}[!htbp]
\caption{\boldmath {Data Sources} }
\label{tw-b0294cb36ea6}
\def\arraystretch{1.1}
\ignorespaces
\centering
\begin{tabulary}{\linewidth}{p{\dimexpr.5498\linewidth-2\tabcolsep}p{\dimexpr.4502\linewidth-2\tabcolsep}}
\tbltoprule \rowcolor{kwdboxcolor}\textbf{Database} & \textbf{Type}\\
\tblmidrule
ChemBL &
Public\\
DrugBank &
Public\\
NCI Set &
Public\\
PubChem &
Public\\
ZINC &
Public\\
ChemSpider &
Public\\
Superdrug &
Public\\
CoCoCo &
Public\\
TCM &
Public\\
ACD &
Commercial\\
Life Chemicals &
Commercial\\
IBS Database &
Commercial\\
Chembase &
Commercial\\
\tblbottomrule
\end{tabulary}\par
\end{table}
\textbf{Substructure Key Based Fingerprints}
As the name denotes sub-structure key-based fingerprints make the bit string according to the presence of the keys. The fingerprint checks the substructure keys with the candidate compounds and generates the bit string depends on the presence of corresponding features. It is helpful when the list of structural keys is given but not in the case of absence of the structural keys.MACCS is considered as one of the commonly used fingerprint types. It is small in length. It is commonly used because it includes most of the features for drug discovery. It is available in both 166-bit and 960-bit format. It includes structural keys, which are used in the SMARTS pattern \unskip~\citep{867653:20160942,867653:20160929}.
\textbf{Topological Fingerprints}
They evaluate all the components of the molecules that merge in each stage of bond creation. It continues till a certain number of bonds and then applying a hash value to every analyzed component for generating the fingerprints. Due to its working manner, we can convert any candidate molecule into a fingerprint. With the help of the hashing mechanism, we can adjust the length of the bit string. These features enabled topological fingerprints faster in substructure searching and filtering of candidate molecules. One disadvantage of this fingerprint type is a single bit cannot be tracked back to a specific feature. This may lead to bit collision. Atom pairs and Topological torsions are two types of fingerprints that come under topological fingerprints. Atom-Pairscontains 2 versions of hashed and normal ones. Hashed fingerprint have 2048 bits, and ordinary has 16000+bits Topological Torsion consist of four different types of atoms are included in each fragment of the bit string conversion range based on the path \unskip~\citep{867653:20160944,867653:20160927}.
\textbf{Circular Fingerprints}
The extension of topological fingerprints is circular fingerprints. Instead of looking for the components and path up to individual bonds, this fingerprint type evaluates the environment of the particular atom with a given radius circle. For this reason, this is not meant for substructure searching or substructure queries. This type of fingerprint is mainly used for structure similarity searching.ECFP and FCFP are well-known types in circular based fingerprints. ECFP (Extended Connectivity Fingerprint) is mainly based on the Morgan algorithm. There are two types of ECPFs. One is ECFP4 and ECFP6. The main difference is the difference in the radius. One uses four as the radius while the other uses six as the range. FCFP (Functional Class Fingerprints) have a variation from ECFP. Because instead of indexing the environment of the particular atom, it's indexing the atom's role in the compound \unskip~\citep{867653:20160946,867653:20160922}.
\begin{table}[!htbp]
\caption{\boldmath {AtomPairs Analysis} }
\label{tw-f267ae02f185}
\def\arraystretch{1.1}
\ignorespaces
\centering
\begin{tabulary}{\linewidth}{p{\dimexpr.49589999999999996\linewidth-2\tabcolsep}p{\dimexpr.50410000000000004\linewidth-2\tabcolsep}}
\tbltoprule \rowcolor{kwdboxcolor}\cAlignHack \textbf{\textbf{Algorithm}} & \cAlignHack \textbf{\textbf{Accuracy}}\\
\tblmidrule
\cAlignHack Na{\"{\i}}ve-Bayes &
\cAlignHack 0.74\\
\cAlignHack KNN &
\cAlignHack 0.83\\
\cAlignHack Random Forest &
\cAlignHack 0.88\\
\cAlignHack MLP &
\cAlignHack 0.87\\
\cAlignHack CNN &
\cAlignHack 0.87\\
\tblbottomrule
\end{tabulary}\par
\end{table}
\section{Materials and Methods}
\textbf{Naive Bayes}
Naive Bayes algorithm is a classification model mainly based on the well-known Bayes Theorem. The Naive Bayes classifier also presumes the existence of a specific feature in class or category is independent to the existence of any other feature. Along with simplicity, this classification model is useful for size able data sets and easy to build. Because of these factors, Naive Bayes outperforms even highly advanced classification techniques.
Above, The posterior probability P(c/x) can find out by dividing the product of the prior probability of category P(c) and the likelihood which is the probability of predictor given category P(x/c) by the prior probability of the predictor P(x). The Naive Bayes algorithm works as follows: First of all, we need to convert the input data set into a frequency table. From the frequency table, the algorithm creates a likelihood table by finding out the probabilities. By substituting the probabilities into the Bayes equation, the algorithm can find out the posterior probability for the classes. The outcome of the prediction is the category which has a higher posterior probability.
\begin{table}[!htbp]
\caption{\boldmath {MACCS-166 Analysis} }
\label{tw-c29d6df75a0a}
\def\arraystretch{1.1}
\ignorespaces
\centering
\begin{tabulary}{\linewidth}{p{\dimexpr.4959\linewidth-2\tabcolsep}p{\dimexpr.5041\linewidth-2\tabcolsep}}
\tbltoprule \rowcolor{kwdboxcolor}\cAlignHack \textbf{Algorithm} & \cAlignHack \textbf{Accuracy}\\
\tblmidrule
\cAlignHack Na{\"{\i}}ve-Bayes &
\cAlignHack 0.74\\
\cAlignHack KNN &
\cAlignHack 0.90\\
\cAlignHack Random Forest &
\cAlignHack 0.91\\
\cAlignHack MLP &
\cAlignHack 0.90\\
\cAlignHack CNN &
\cAlignHack 0.89\\
\tblbottomrule
\end{tabulary}\par
\end{table}
Advantages of the Naive Bayes classifier are: When comparing to other classification methods like logistic regression, Naive Bayes is fast and easy to implement. Naive Bayes uses less training data, and also the algorithm is expandable in nature or the algorithm linearly expandable with the count of data points and predictors. The Naive Bayes classifier is efficient to deal with the discrete and continuous data, and also the technique can make probabilistic predictions more accurately. This well-known classification algorithm is also used for the multi class and binary classification problems.
\begin{table}[!htbp]
\caption{\boldmath {RDK Fingerprint Analysis} }
\label{tw-0c4921a681f5}
\def\arraystretch{1.1}
\ignorespaces
\centering
\begin{tabulary}{\linewidth}{p{\dimexpr.5166\linewidth-2\tabcolsep}p{\dimexpr.4834\linewidth-2\tabcolsep}}
\tbltoprule \rowcolor{kwdboxcolor}\cAlignHack \textbf{Algorithm} & \cAlignHack \textbf{Accuracy}\\
\tblmidrule
\cAlignHack Na{\"{\i}}ve-Bayes &
\cAlignHack 0.79\\
\cAlignHack KNN &
\cAlignHack 0.86\\
\cAlignHack Random Forest &
\cAlignHack 0.90\\
\cAlignHack MLP &
\cAlignHack 0.90\\
\cAlignHack CNN &
\cAlignHack 0.89\\
\tblbottomrule
\end{tabulary}\par
\end{table}
Disadvantages of the Naive Bayes classifier are: The Naive Bayes classifier is enormously depending on its feature independence, and this is the most significant disadvantage of the classifier too. Because it is tough to have a group of features which are entirely independent of each other in the real-life scenarios, if a categorical variable has a class and not being noticed in the training data then the Naive Bayes classifier will set a zero probability to the class. It will be impotent to make a prediction. This problem is generally known as ``Zero frequency'' in the Naive Bayes classification \unskip~\citep{867653:20160931,867653:20160938}.
\textbf{K Nearest Neighbours}
K Nearest Neighbours (KNN) is a simple, supervised ML algorithm which is generally suggested for the classification as well as the regression problems. But the industries are mainly depending on this algorithm in the classification predictive problems. KNN is defined as an idle algorithm because the algorithm does not have a dedicated training part and uses all the input data for training in the time of classification. The algorithm does not assume anything about the underlying data. Because of this, the algorithm is also known as a non-parametric algorithm.
Working of a KNN algorithm: KNN algorithm predicts the new data point values based on the feature similarity, which further means that the values of the new data points are how similarly matches the points in the training data set. The working of the algorithm is as follows: We must feed the training as well as test data for implementing the KNN algorithm. Then, we need to assign the value of K, i.e. the nearest data points. K can be any integer.
\begin{table}[!htbp]
\caption{\boldmath {Morgan-Circular Analysis} }
\label{tw-d7a140884962}
\def\arraystretch{1.1}
\ignorespaces
\centering
\begin{tabulary}{\linewidth}{p{\dimexpr.5124999999999999\linewidth-2\tabcolsep}p{\dimexpr.4875000000000001\linewidth-2\tabcolsep}}
\tbltoprule \rowcolor{kwdboxcolor}\cAlignHack \textbf{Algorithm} & \cAlignHack \textbf{Accuracy}\\
\tblmidrule
\cAlignHack Na{\"{\i}}ve-Bayes &
\cAlignHack 0.84\\
\cAlignHack KNN &
\cAlignHack 0.82\\
\cAlignHack Random Forest &
\cAlignHack 0.91\\
\cAlignHack MLP &
\cAlignHack 0.87\\
\cAlignHack CNN &
\cAlignHack 0.88\\
\tblbottomrule
\end{tabulary}\par
\end{table}
For each data point in the test data, the algorithm does the following: With the help of the Euclidean method, the algorithm calculates the distance between training data and test data. Sort them in ascending manner based on the Euclidean distance. Next, the algorithm will select the top K rows from the sorted array after that KNN provides a category to the test points based on the most recurrent class of these rows.
Advantages of the KNN algorithm are: Along with the simplicity, the KNN algorithm is beneficial for non-linear data classification because the algorithm doesn't assume anything about data. We can use the algorithm for regression as well as classification problems with a result having high precision.
Disadvantages of the KNN algorithm are: The prediction of KNN is slow in the case of big N. It is very sensitive to the irrelevant features. As we mentioned earlier the KNN algorithm stores all the training data. Because of this the algorithm is computationally a bit expensive and requires high memory storage when comparing to other models. \unskip~\citep{867653:20160941,867653:20160923}.
\textbf{Random Forest}
The Random forest algorithm is an ensemble learning method which is generally suggested for classification and regression problems. We already know that a forest is made up of a large number of trees and the number of trees increases the robustness of the forest. Likewise, the algorithm generates decision trees based on data samples and then predicts form each of the decision trees.
\begin{table}[!htbp]
\caption{\boldmath {Topological Torsion Analysis} }
\label{tw-bc7312278581}
\def\arraystretch{1.1}
\ignorespaces
\centering
\begin{tabulary}{\linewidth}{p{\dimexpr.5166\linewidth-2\tabcolsep}p{\dimexpr.4834\linewidth-2\tabcolsep}}
\tbltoprule \rowcolor{kwdboxcolor}\cAlignHack \textbf{Algorithm} & \cAlignHack \textbf{Accuracy}\\
\tblmidrule
\cAlignHack Na{\"{\i}}ve-Bayes &
\cAlignHack 0.80\\
\cAlignHack KNN &
\cAlignHack 0.80\\
\cAlignHack Random Forest &
\cAlignHack 0.89\\
\cAlignHack MLP &
\cAlignHack 0.88\\
\cAlignHack CNN &
\cAlignHack 0.87\\
\tblbottomrule
\end{tabulary}\par
\end{table}
The process of voting selects the best solution. It is also a supervised learning model which is more nuanced than a single decision tree because the algorithm reduces the over fitting by combining the output. Working of Random Forest algorithm is as follows: First, the algorithm selects the random samples from the input data set. Now, it will build a decision tree for every sample. After that, it will collect the prediction result from every decision tree. The decision trees will carry out voting for every predicted result. The final step is the algorithm selects most voted output as the final prediction output.
Advantages of the Random Forest algorithm are: By combining the result of different decision trees, the algorithm overcomes the problem of over fitting. When comparing to a simple decision tree, the random forest has less variance and work well for large data samples. The algorithm is very flexible because the scaling of data doesn't need and provides the right accuracy even after inputs the data without scaling. Even a large percentage of the input data is missing the algorithm can maintain good accuracy.
Disadvantages of Random forest algorithm are: Complexity is considered as one of the most significant disadvantages of the random forests. When comparing to the regular decision trees, the creation of the random forest is time-consuming and required more computational resources. In comparison, with other algorithms, the random forest prediction process is very time-consuming, and for an extensive collection of decision trees, the prediction is less intuitive \unskip~\citep{867653:20160930,867653:20160935}.
\textbf{Multi-layer perceptron }
The Multi-layer perceptron or MLP is a type of algorithm in feed forward artificial neural networks. MLP has a single hidden layer sometimes known as a vanilla artificial neural network. MLP contains at least three layers named as the input layer, hidden layer and an output layer. The multi-layer perceptron considers every node as a neuron with an activation function, except for the input nodes. For training purposes, MLP uses the algorithm called back propagation. The noticeable differences from linear perceptron are the multi-layer perceptron consist of non-linear activation function and multiple layers. The MLP can also distinguish the data which is not linearly separable. How does a multi-layer perceptron work: A fully connected input layer and output layer are the critical components of the multi-layer perceptron. Multiple hidden layers are there in between the layers as mentioned above of an MLP. The multi-layer perceptron includes fully connected input and output layers as same as the perceptrons. The difference is MLP contains multiple hidden layers in between input and output layers \unskip~\citep{867653:20160925}.
\bgroup
\fixFloatSize{images/8287ff62-220d-4aec-b776-0f7e735e4986-upicture1.png}
\begin{figure}[!htbp]
\centering \makeatletter\IfFileExists{images/8287ff62-220d-4aec-b776-0f7e735e4986-upicture1.png}{\includegraphics{images/8287ff62-220d-4aec-b776-0f7e735e4986-upicture1.png}}{}
\makeatother
\caption{\boldmath {UI of Insilico Prediction Tool}}
\label{f-922fbda4e181}
\end{figure}
\egroup
Let's look into the algorithm of multi-layer perceptron: The inputs are forwarded through the multi-layer perceptron by considering the value of the dot product of assigned weights with inputs. What lies in between the input layer and the hidden layer. It's almost the same with working of the perceptron. In the hidden layer, the algorithm assigns a value to the dot product, but we don't push forward this value. We already mentioned that the MLP uses the activation functions at each of their layers except the input layers. Sigmoid functions and Rectified line a run its are examples of such activation functions. Through this activation functions, MLPs pushes the calculated output at the presentation layer. So, in the next step, the algorithm forwards the computed result at the hidden layer through the activation function and pass it to the next layer in multi-layer perceptron. It also carries forward the dot product of corresponding weights. The algorithm will redo the steps to the final output layer. The result will be used for decision-making.
Advantages of the MLP: As you can understand, the multi-layer perceptron builds the basis for all the neural networks. And the multi-layer perceptron increases the computational power when dealing with the regression and classification problems. By multi-layer perceptron, we can also understand complex, and large models and also the computers are free from the xor problems \unskip~\citep{867653:20160945,867653:20160939}.
\bgroup
\fixFloatSize{images/be75b408-be48-487d-befc-da97bb91bc8f-upicture2.png}
\begin{figure}[!htbp]
\centering \makeatletter\IfFileExists{images/be75b408-be48-487d-befc-da97bb91bc8f-upicture2.png}{\includegraphics{images/be75b408-be48-487d-befc-da97bb91bc8f-upicture2.png}}{}
\makeatother
\caption{\boldmath {Non-drug prediction in insilico prediction tool}}
\label{f-7a41d3c5019b}
\end{figure}
\egroup
\textbf{Convolutional Neural Networks}
Convolutional Neural Networks or CNN is a kind of neural networks, which is generally used for image or visual data identification and classifications. The algorithm is efficiently used for areas like commercial recommended applications and natural language processing. There are different types of CNN's like 1D, 2D and 3D. Regardless of its different types, the algorithm is having the same traits and working based on the same approach. The difference between CNN's lies in the type of input data. For example, 1D CNN is used for natural language processing and 2D for image classification. The inputs to these CNN's are generally a sentence and pixels in image respectively. Working of the filters is also distinguishing the CNN into these three types. When coming to the architecture of the convolutional neural networks, shared weights, and sparse connectivity is two essential features. When comparing to multi-layer perceptron, CNN has a different architecture or connectivity between the neurons. In the connectivity pattern, we can set the width of the dataset. Assume that the width is three, then the neurons in the m layer will connect to three neurons in the below layer and it will go on till the last layer. The neurons in the same layer will share common boundaries, and we can categorize them. This category which shares the common boundaries are called filters. In the m-1 layer, we have five neurons and three different filters. Through these filters, we can generate better results to an input dataset. We use a non-linear function with the filters. For our tool, we have applied the sigmoid function for the non-linearities. The most significant advantage of this architectural pattern is we can find out a connection between neurons in the neighbouring layers \unskip~\citep{867653:20160934}
Shared weights are another feature of the CNN which means that the same parameters are shared among each filter. We already mentioned about m and m-1 layers. From the figure, it is clear that each neuron in the m-1 layer shares standard weights with the neurons in the above layer m. By using filters like, this will undoubtedly help in the identification of features nevertheless of their location in the input dataset. Shared weights also reduce the number of parameters. It will increase the efficiency of the algorithm and its fast processing. CNN also uses ideas like max-pooling and average pooling to reduce the over fitting of the input data set \unskip~\citep{867653:20160926,867653:20160937}.
\textbf{Database}
Dataset sources from which data are taken for the training and testing purposes are given in Table~\ref{tw-b0294cb36ea6}.
\textbf{Dataset}
Training and test data set contains a total of 11449 compounds out of which 6449 drug compounds were taken from Chembl database, and 5000 non-drug compounds were taken from the NCI database \unskip~\citep{867653:20160928}.
\bgroup
\fixFloatSize{images/80b54d13-6db8-40ee-a218-913ba440521b-upicture3.png}
\begin{figure}[!htbp]
\centering \makeatletter\IfFileExists{images/80b54d13-6db8-40ee-a218-913ba440521b-upicture3.png}{\includegraphics{images/80b54d13-6db8-40ee-a218-913ba440521b-upicture3.png}}{}
\makeatother
\caption{\boldmath {Drug prediction in insilico prediction tool}}
\label{f-6b0f6a1626b0}
\end{figure}
\egroup
\textbf{T}\textbf{ool}\textbf{ O}\textbf{peration} \textbf{\space }
The tool front-end is built on anvil platform, and the back-end works on Jupiter notebook based on python 3. The anvil app is a cloud-based web-app development platform for the easy creation of a web app. The tool takes input in SMILES format and options are given to the user for drug-likeness prediction based on two methods, i.e. De- script Prediction and Fingerprint based prediction. The descriptor prediction has six descriptor rules used to predict. It also allows the user to choose which type of fingerprint to be used in the fingerprint prediction technique. The tool allows five fingerprint techniques to be used. They include Maccs166,RDK, AtomPairs, Topological Torsion, Morgan Fingerprints. The tool uses the random forest algorithm for the prediction as analyzed earlier. The UI of our tool would look like as in Figure~\ref{f-922fbda4e181}.
The output of descriptor shows pass or fail results and fingerprint output shows whether the compound can become a drug or not. The images of a tool predicting drug and non-drug are shown in Figures~\ref{f-7a41d3c5019b} and~\ref{f-6b0f6a1626b0}.
\section{Results and Discussion}
The result set contains the accuracy of various fingerprints on five different algorithms. The analysis of basic MACCS-166 bit fingerprint is shown in Table~\ref{tw-c29d6df75a0a}. Here we have taken RDK Fingerprint given by the RD Kit Packages for analysis and results are given in Table~\ref{tw-0c4921a681f5}. Tables~\ref{tw-f267ae02f185} and~\ref{tw-bc7312278581} contains the result set of two topological fingerprints AtomPairs and Topological Torsion, respectively. The only circular fingerprint taken for analysis is Morgan Fingerprint, and the result set is shown in Table~\ref{tw-d7a140884962}.
\section{Conclusions}
From the analysis, we came to know that the Random Forest algorithm gave better results with all the fingerprints used for prediction. It out performed other algorithms in similarity-based drug-likeness prediction. So we implemented a random forest algorithm into our tool for the drug-likeness prediction of five fingerprint types. Our tool also shows the result set based on descriptor prediction, which has a total of six rules. The tool incorporating both the fingerprint and descriptor prediction gives a better result compared to other available tools.
\textbf{Funding Support}
The authors declare that they have no funding support for this study.
\textbf{Conflict of Interest}
The authors declare that they have no conflict of interest for this study.
\bibliographystyle{pharmascope_apa-custom}
\bibliography{\jobname}
\end{document}