Tony Frudakis | Freelancer Portfolio Item #401720

The work I do is always covered by Confidentiality Agreement and as a result, I do not publish or post the code underlying the results in this deck at GitHub or other, and gene names are redacted in this slide deck. Life Sciences Project: Use innovative computational biology approach to identify growth factors and other ligands useful for ex-vivo (outside the body) red blood cell development/culture. First, I built an app using random forest and support vector machine to classify cellular subtype in single cell genomics (RNAseq) datasets using data scraped from the literature public gene repositories. Below shows a composite dataset I created from data provided on Gene Expression Omnibus corresponding to dozens of bone marrow aspirate samples, showing the developmental progression from stem cells (HSC) to mature red blood cells (OrthoE-late) (in developmental time from left counterclockwise). Each spot is a full transcriptome for a single cell, corresponding to roughly 3000 bases for about 30,000 genes each. This dataset was used for Regulon Analysis, where I identified all of the activated transcription factors in the developmental process. This was done by finding factors differentially expressed among the different developmental stages, where the factor expression was correlated cell-by-cell with that of its known gene targets confirmed bioinformatically to contain canonical factor binding sites upstream of the start site of transcription for each gene. Discovered factors are redacted in the below. I built a number of custom algorithms for this project, including one for determining whether discovered regulons were randomly distributed across the HSCs or co-expressed in subsets of HSCs (regulon gene names redacted)? Another was developed to identify which among the most differentially expressed genes between cells of progressing developmental stages were transmembrane proteins. For example, to mark the MEP to BFU transition it would be helpful to know what if any transmembrane markers were available for antibody staining to mark this transition. This was accomplished by parsing the String and Swiss Uniprot databases for transmembrane tags. Another was developed to associate the regulons whose differential expression marks specific developmental cell stage progressions with specific activated genetic circuits. This was done by using the String DB API to retrieve KEGG diagrams. Scanning these diagrams reveals the cell surface receptors driving the circuit, and hence the activating ligand. Doing this for all of the developmental stages allowed us to create a defined media for ex-vivo RBC growth, which contained both previously known and totally new/unexpected growth factors and organic ligands. Other types of projects for clients have focused on scanning the peer-reviewed literature to document the state of expression of particular genes in certain patient sets. This was necessary because the literature was incomplete/inadequate for this particular gene, which served as a target for a potentially lucrative new chemical entity (drug). Both bulk (older style) and single cell genomics data were used in this project In this project, I developed a custom algorithm for conducting a candidate-gene based regulon analysis, involving computing the canonical binding sites for all known human transcription factors, identifying genes whose cell-by-cell expression in a single cell RNAseq dataset correlates with a differentially expressed transcription factor, parsing these for the presence of the relevant canonical binding sites, and identification of activated regulons (genetic circuits). This app found several important circuits missed by open-source libraries/modules for regulon analysis, such as Pyscenic. Another client operated in the financial services industry and wanted a proprietary method of selecting value stock investments. I developed an app to scan through the entire list of S&P500 and QQQ stock symbols for those whose historical prices indicated present long-term value (6-months to 1 year) – the Relative Strength Index (RSI), position with respect to moving averages (Bollinger Bands), and a number of other features (e.g. MACD) were used to rank the stocks. When the most attractive were used as an investment portfolio, returns were significantly greater than general market returns, indicating that we were successful in identifying oversold and overbought value stocks. Most of the stocks in the portfolio turned into positive investments right off the bat. Another client wanted a proprietary app for identifying stocks that represented exceptional short-term value. To do this, I incorporated machine learning (GARCH volatility prediction) with various statistical features to identify stocks poised for mean reversion within the next week. Over time, weekly investing results for stocks identified in this way proved exceptionally profitable. Cumulative returns for four particular stocks repeatedly identified in successive weekly analyses are shown below.