Gayatri Sahu

Gayatri Sahu

$50/hr
AI Prompt trainer,Data scientist ,ML engineer in NLP and deep learning
Reply rate:
-
Availability:
Hourly ($/hour)
Location:
Colorado Springs, Co, United States
Experience:
8 years
Gayatri Manjari Sahu Colorado Springs, CO,80920, USA - | -| In https://www.linkedin.com/in/gayatrimsahu Summary 8+ years of IT experience in various domains (Banking, Manufacturing, Biochemistry, Sales, Retail, Logistics and Transportation) with comprehensive and evolving skill sets. 5+ years of experience in applying advanced analytics, machine learning, and statistical techniques as Data Scientist to derive actionable insights and drive business growth. Proficient in data manipulation, model development, and deployment, with a proven track record of delivering impactful solutions across various industries. Strong background in data governance and quality control, particularly within healthcare data standards. Passionate about utilizing data to solve complex problems and committed to continuous learning and innovation in the field of data science. Hands on experience in building regression, classification, and recommendation systems with large datasets in distributed systems (HDFS), Databricks and constrained environments. Strong background in MLOps techniques, CI/CD pipeline, developing scripts for build, deployment, maintenance, and related tasks using Docker, python web frameworks Flask, FastAPI, Django and Streamlit. Familiarity with GitHub, Linux and Cloud environment: Amazon Redshift, Amazon S3, EC2 ,AWS EMR, Lambda, SageMaker and Databricks, Prometheus and Grafana. Ideal knowledge of: Natural Language Processing (Text preprocessing, sentiment analysis, named entity recognition (NER), topic modeling) Deep Learning Architectures: LSTM, Transformers (BERT) using Hugging Face - LLVMs (GPT-3, GPT-4,Llama 2), Prompt Engineering, PromptPerfect Hands on experience in programming and implementation of Python and Scala codes with strong knowledge in Object Oriented and Functional Programming Concepts in addition to experience using Scikit-learn, MLib, Keras, TensorFlow, SQL, and bash/Shell scripting. - Experience in Power BI, Data analysis, Data manipulation, Data Extraction, Data visualization, Data transformation from various data sources. Experience in working with C++ using OOP concept and OOAD design pattern (used UML) with strong developmental and analytical skills. Experienced in working with different flavors of UNIX box (DEC- alpha, UNICOS, Digital Unix, IRIX 6.0, Origin and AIX), POSIX and involved in QMS process. Experienced as a technical writer of various products of Cisco (like media gateway node manager, catalyst switch and gigabit router) using Frame maker. Skills Reporting and Visualization: Power BI, Excel, GitHub, PowerPoint, Jupyter Notebook's Seaborn, Autoviz, Matplotlib, Data visualization Language: Python (IDEs-Jupyter, Spyder, Google Collab, Anaconda, PyCharm, VSCode), C++, Java 2, Perl 5, R Databases: Oracle 8, MySQL, SQL Server, PGSQL, DataStax, Apache Cassandra, Astra DB, MongoDB, Pinecone (Vector database), Snowflake Operating Systems: Windows, XP, Linux, Unix (Sun-Solaris 5.6/5.8, IRIX, AIX), Mac Development Technologies: Scikit-learn, Keras, TensorFlow, Python, NLTK, spaCy, Spark-Mllib, PySpark, Spark, SQL, Shell Scripting, Dataframe, Dataset, RDDs, Pandas, NumPy, Seaborn, Matplotlib, Databricks, MLFlow, dvc, Autoscraper, Hadoop (HDFC), HIVE Cloud Services: AWS (S3, EC2, Lambda, SageMaker), Azure, Prometheus, Grafana Machine/Deep Learning: Linear Regression, Logistic Regression, Decision Tree, Random Forest, Clustering (K-Means), SVM, (ANN, CNN, RNN) using TensorFlow (Keras), NLP, LSTM, Transformers (BERT), Machine learning: Statistical analysis GenAI Framework/design: Langchain, LlamaIndex, LLVMs (GPT-3, GPT-4, Llama, Prompt engineering, Prompt Perfect Data Management: Data governance, Data quality control, Healthcare data standards Work Experience Employer- GuruConsulting LLC Sep 2020-Present GuruConsulting is one of the leading IT Consulting and Outsourcing partners to Fortune 500 clientele, specializing in Healthcare and pharmaceuticals, Aviation, Retail, Banking, Energy, Insurance, Finance, Oil & Gas, Logistics and Transportation, etc. OutlierSFO, CA Data Scientist/AI Prompt TrainerNov 2024 – Present Outlier is a platform that connects subject matter experts to help build the world’s most advanced Generative AI. Here, I work on various projects, from generating training data in each discipline to advance these models to evaluating the performance of models. As a contributor, rate and rank a multimodal image task, which involves providing images and prompts for an AI model to interpret. Generating and evaluating data from these multimodal tasks helps advance the model’s capabilities, making it more versatile and capable of handling complex, real-world requests. Write complex prompts that cause AI models to make mistakes and then identify and fix those mistakes. The role will be crucial in helping the model handle a wide range of instructions and topics while ensuring responses are accurate, clear, and safe responses. Help train an AI chatbot by reading a conversation history and evaluating the response outputted by a model to the user’s latest prompt. Rate the prompt with the criteria: Accuracy, Citation Correctness, Instruction Following, Refusal, Grammar/Presentation, Relevance, Tone/Style, Language Consistency, Fluency, and Comprehensiveness. Modify the prompt, write the rubrics, and rank the response to a voice recorder with it tone style, recognition, emotions, and many more parameters. Then optimize the data as task type, rewrite a good prompt that captures all the scenarios where the model fails and how to improve it. Capacity LLCNorth Brunswick, NJ Data ScientistDec 2023 – Nov 2024 Capacity Logistics helps to formulate the best plan for moving freight around the globe. The project aims to optimize various aspects of Capacity's operations such as package routing, delivery scheduling, and customer service, and predict the delivery time accordingly. Gained a deep understanding of Capacity's Logistics services and data sources to analyze large volumes of operational data and customer preferences, identifying trends and patterns. Developed a Cassandra schema to store transactional data in a distributed manner, ensuring low latency querying in high-availability, scalable solutions. Provided actionable insights to logistics managers regarding underperforming routes and packaging inefficiencies using statistical analysis. Experiment with machine learning techniques and neural networks to predict delivery times, reducing total processing time and optimizing routing solutions with real-time traffic data using algorithms and APIs. Using Cassandra's partitioning and replication features for handling massive and geographically distributed datasets, ensuring high data availability. Integrated real-time traffic data and package weight into a route optimization algorithm using Dijkstra's algorithm and Google Maps API. Developed a packaging recommendation system to suggest the most efficient packaging materials and dimensions for each delivery based on item characteristics and destination. Sakra World HospitalBangalore Data ScientistFeb 2022 - Mar 2023 Develop a machine learning model to screen children for Autistic Spectrum Disorder (ASD) based on behavioral and clinical data. The model should assist healthcare professionals in identifying potential ASD cases early for timely intervention. Developed a machine learning model for early screening of Autism Spectrum Disorder (ASD) using Python, focusing on behavioral and clinical data. Collaborated with healthcare professionals to identify core features relevant to ASD screening and performed thorough feature extraction and engineering from clinical data. Facilitated data cleaning and management using standards of healthcare data governance to guarantee patient privacy and data protection, enabling robust model training. Investigated multiple machine learning models (Random Forest, SVM, Neural Networks), to refine and optimize through precision, recall, and F1-score analysis. Established a secure API using Docker, AWS Lambda, and AWS API Gateway for accessible, real-time screening tool deployment. Designed continuous monitoring tools leveraging Prometheus and Grafana to interpret model metrics, including data quality control and adherence to healthcare data regulations like HIPAA/GDPR. Hewlett Packard EnterpriseColorado Springs, CO Data ConsultantMar 2021 - Feb 2022 A large storage seller and Data center for providing optimized storage solutions to third party business establishments like Lukoil, ENG bank, Nestle etc. As part of maintaining the data center an initiative was taken to add more resilience via Regression analysis to the data center by predicting the required redundant standby servers to avoid any shortages and weekly forecast of events generated in the data center. Worked on creation of ETL pipeline using Spark, Hive, creating custom analytical functions. Worked on Developing and Hyper-tuning various Regression Models like Ridge, Lasso, Elastic Net, Random Forest Regression to predict the storage required based on the historical data. Evaluate the metrics using RME, R-square, explained-variance-error and max-error. Worked on app and visual creation in Arcadia data to enable data visualization and Descriptive analytics based on the events generated for the 42K servers each hour. Storing and retrieving data from HDFS in different formats like text, Json, Sequence, Avro, Parquet, ORC and in compressed formats. Tuned Spark RDD parallelism technics to improving the performance and optimization of the spark Machine Learning jobs on Hadoop cluster. Marine Biochemistry LaboratoryMA Research AnalystSep 2020 - Mar 2021 The Marine Biological Laboratory (MBL) is an International Center for research and education in biological and environmental Science. The project aims to predict the organic and inorganic elements in a class of fish and the mortality rate of marine life when introduced toxic metals. Designed two experiments involving the spectrophotometric determination of the presence of inorganic and organic Arsenic, Beryllium, Lead, and other toxic metals, as well as the effects of Vitamin C in Sargassum, Zebra Fish, Mahi-Mahi, and South Florida oceanwater. Modeling the threshold of toxicity of metals in water, marine life, and drinking water. Calibrated standards by standardizing samples of each toxic metal and ascorbic acid and recording the spectrophotometric absorbance values. Developed unsupervised machine learning clusters, supervised predictive and prescriptive machine learning models, A/B testing, and regressions with Python, statistics to analyze, visualize, and predict the concentrations of toxic metals in drinking water. Heritage FoodsBangalore Data AnalystJan 2019 - Sep 2020 Heritage Foods, India a vintage leader in sector of Dairy food and renewable energy. The project aims to measure the price elasticity and predict the volume of cartons arriving in 10 days advance to warehouse. Quantify the quality, volume, demand of the product, consumers favorite food, contributed partner's share, price cuts and promotions, etc. using analytical statistics. Showcasing the insights of the business based on the factor elasticity, Heritage made selective and cautious price cuts for those licensing categories. Collecting structured and semi structured data, cleaning, preprocessing and ready for the dashboard display. Designed Logistics ML Carton Prediction algorithm to predict cartons arriving in a warehouse 10 days in advance. These predictions can be used in warehouses to allocate and plan staffing days in advance. Tracked the pre and post changes in consumer behavior post campaign launch and developed Power BI dashboards to convey the findings; the ROI measurements helped superstore to strategically extend the campaigns to other potential markets. NCMRWFDelhi Project AssociateNov 2016 - Jan 2019 The main purpose of the project is to examine the different modules of NCMRWF operational data assimilation and forecast system. Develop a unified UNIX shell script for the forecast procedure in different UNIX platforms also it supports a GUI-based menu-driven prediction suite. Designing the data for NCMRWF using GrADS and MATLAB. Analyzing and implementing the GUI-based feature using the script. Data analysis and make corrections to the corrupt or missing data. Visualizing the distributed data in MATLAB, Performance analysis based on the statistical value of the specified data according to regional based. Guru Consulting LLCNorth Brunswick, NJ Feb 2024 – Jun 2024 Assignment project on Prompt Engineering and Optimization for Large Language Models (LLMs) tasks, including text summarization, customer service automation, and conversational AI systems. Responsibilities: Designed, tested, and iterated over 50 prompt configurations for zero-shot, few-shot, and chain-of-thought prompting techniques. Developed specialized prompt structures for various NLP tasks, including: Text Summarization: Created prompts that enhanced the relevance and coherence of AI-generated summaries for long-form documents. Question Answering: Fine-tuned prompts for open-domain and closed-domain question-answering systems to improve answer precision and accuracy. Customer Support Chatbots: Engineered domain-specific prompts for chatbot workflows, improving response accuracy and reducing irrelevant answers. Key Learning: Developed a deep understanding of LLM behavior, prompt-response optimization, and the impact of prompt structuring on model outputs. Education Utkal University Master's, computer science and application Sambalpur University Bachelor's, science, Honors (Math) Certifications Certificate diploma in Python, Data Science Diploma in Machine/Deep Learning Generative AI with vertex, Prompt design, and prompt engineering: Google Cloud skills boost, LLMs Natural Language Processing with Transformers | Hugging Face CCNA Certified from Cisco
Get your freelancer profile up and running. View the step by step guide to set up a freelancer profile so you can land your dream job.