Lucas Carvalho
AI Engineer
-Brasília, Distrito Federal
https://www.linkedin.com/in/lucas-carvalho-mle
PROFESSIONAL SUMMARY
AI Architect with 8+ years pioneering enterprise-grade LLM platforms, specializing in dynamic model routing, secure RAG systems, and resilient agentic workflows. At Cognii (award-winning EdTech AI leader), engineered core components of their conversational tutoring infrastructure, which processes 10K+ student interactions weekly with 95% precision. Built multi-LLM pipelines to optimize cost/latency for educational open-response assessments. Expertise spans cross-cloud deployment (AWS/Azure/GCP), GDPR/SOC2 compliance, and n8n/Zapier automation for failover routing.
SKILLS
• LLM & RAG: Prompt engineering (zero/few‑shot), multi‑LLM orchestration (OpenAI, Azure, Hugging Face), vector‑based retrieval pipelines, zapier, n8n
• Agentic Workflows: Intent classification, retrieval, LLM synthesis, fallback routing
• Backend & APIs: FastAPI, Flask, gRPC; Docker & Kubernetes deployment; CI/CD with Terraform/Jenkins
• Data & Security: Secure connectors (APIs, SharePoint) with AES‑256/TLS; GDPR/SOC2 compliance; FAISS/Pinecone vector stores
• Cloud & DevOps: AWS, GCP, Azure for scalable inference and data pipelines
• ML & Evaluation: Model fine‑tuning, synthetic data generation, performance metrics (precision, BLEU/ROUGE)
EXPERIENCE
AI Engineer (Lead LLM Architect)09/2021 - 04/2025
Cognii (San Francisco, CA, USA)
Built AI infrastructure for Cognii’s Virtual Learning Assistant—used by 30K+ students globally for personalized tutoring and open-response assessment
• Multi-LLM Inference Engine: Architected a model-agnostic pipeline dynamically routing queries between OpenAI, Azure, and Hugging Face based on real-time cost ($0.02/token threshold), latency (<500ms SLA), and accuracy (BLEU-4 >0.85). Integrated fallback to lighter models during traffic spikes, maintaining 99.9% uptime during 5x demand surges (e.g., exam periods).
• Multilingual RAG Moderation: Scaled content moderation to 30+ languages using Pinecone vector retrieval and semantic hierarchies (syntax - concept mapping). Achieved 95% precision (F1-score) via few-shot prompt chaining, reducing manual review workload by 70% ($250K annual savings).
• n8n-Automated Agentic Tutoring: Engineered an intent-driven workflow: student query - concept retrieval - LLM synthesis - feedback analytics. Used n8n to trigger Slack alerts for latency breaches, accelerating incident response by 40%. System processed 10K+ cycles/week with sub-500ms P99 latency.
• SOC2-Compliant Data Ingestion: Secured academic data ingestion from SharePoint/APIs using AES-256 encryption and TLS 1.3, aligning with NSF grant requirements and GDPR.
AI Engineer 08/2019 - 04/2021
DataMind Analytics (London, England, United Kingdo
Developed ML automation for manufacturing/logistics clients prior to mainstream LLMs.
• RL-driven LLM agents: Created agents that integrated policy-based LLM chains with API calls to automate manufacturing RPA, reducing defects by 28% and boosting throughput by 15%.
• High-Scale Sentiment Analysis: Deployed Scikit-learn/TensorFlow pipeline processing 1M+ social posts/day (94% accuracy), avoiding vendor API costs.
• LLM summarisation engine: Fine-tuned models for medical report summarisation, accelerating read-out times by 35% in clinical operations.
• Multi-modal LLM perception: Combined LiDAR/camera data with LLM-based annotation and reasoning agent; delivered 99.98% detection accuracy under harsh weather.
• Zapier-Integrated Anomaly Detection: Connected Keras-based monitoring to Jira/Slack via Zapier, cutting incident response time by 50%.
Machine Learning Developer06/2017 - 03/2019
Kis Solutions (São Paulo, Brazil)
• NLP Customer Service Chatbot: Architected intent recognition pipeline using spaCy entity extraction and BERT embeddings (pre-LLM era), reducing customer resolution latency by 1.7 hours through dynamic FAQ routing and automated ticket classification – handling 5K+ daily queries with 92% accuracy.
• High-Frequency Ad Optimization: Developed distributed GridSearchCV framework (Dask/Scikit-learn) automating hyperparameter tuning for real-time bidding models; boosted AUC-ROC by 12% and increased client ad spend ROI by 18% through latency-optimized feature engineering.
• Analytics API Revolution: Replaced monolithic Django backend with asynchronous FastAPI microservice using Redis caching and WebSocket streaming, achieving 27% higher throughput (tested at 12K RPM) and cutting dashboard load times from 3.2s → 800ms under production load.
• Mission-Critical Anomaly Detection: Designed LSTM autoencoder (Keras/TensorFlow) with adaptive thresholding for AWS EC2 monitoring, reducing false positives by 36% and slashing incident MTTR to <8 minutes through Slack-integrated alerting.
COURSES / CERTIFICATIONS
• Professional Machine Learning Engineer (PME) 04/2023
Google Cloud
• Deep Learning with PyTorch 03/2021
Google
PROJECTS
• https://console.vectara.com
Deployed hybrid keyword/vector search for clinical trial RAG pipelines, leveraging their BM25-neural fusion to boost recall 28% while maintaining SOC2-compliant data ingestion via TLS 1.3-encrypted connectors.
• https://app.credal.ai
Integrated their policy-enforced RAG framework into financial agent workflows, automatically redacting PII from loan documents using Azure AD permission syncing and reducing compliance review latency by 40% under GDPR audits.
• https://www.cognii.com
Deployed their Virtual Learning Assistant's NLP assessment engine for university open-response grading, leveraging syntax/semantic analysis to achieve 96% scoring accuracy while reducing instructor workload 70% through automated feedback generation and GDPR-compliant analytics.
• https://studio.graphlit.com
Integrated their multimodal RAG platform to ingest and process 50K+ clinical trial documents via automated OCR/transcription pipelines, cutting healthcare chatbot development time by 80% while maintaining HIPAA-compliant data handling with RBAC controls.
EDUCATION
• Bachelor of Computer Science04/2013 - 03/2017
Universidade de Brasília (UnB)