Data Scientist & Data Engineer
People Analytics | NASA
Aug. 2021 - Present
Full Time
APIs
Python
Airflow
SQL
Databricks
AWS
Linux
Ollama
Memgraph
Cypher
Weaviate
Git
Posit
Tableau
Automating data pipelines, modernizing analytics infrastructure, applying data science principles to artificial intelligence R&D projects, and exploring graph analytics for workforce insights are a few things we have in the works.
I currently lead the implementation of a cloud-based data and analytics infrastructure that underpins the Human Capital organization’s data science initiatives. Achievements include linking siloed cloud environments by removing firewall and authentication barriers, expanding how the agency accesses and shares data, implementing storage and processing solutions for analytics-ready data, transitioning a data pipeline (saves 40+ hours of work per week, earned an Early Career Achievement Medal) to the cloud for automation and scalability, and establishing a platform for analytics, LLMs, and specialized databases. Through this work, I have learned about cloud architecture and engineering, components of infrastructure required to support high-performing, fast-moving data and analytics teams, and strategic considerations when designing for long-term scaling and cost-optimization. Specific technical components of this work include: AWS (S3, RDS, EC2, IAM, DNS, VPCs, etc.), SAP, APIs (Rest, SAP, and otherwise), Databricks, Airflow, Python, SQL, R, Posit, SAML authentication, Ollama, Weaviate, Memgraph, dbt, Quarto, Linux, Git, and more.
Currently, I am also mentoring interns and pursuing some R&D projects around skill analysis leveraging graph databases and LLM applications in human capital.
In the past, I have had the opportunity to work with teams across the Human Capital Office and the agency to provide insights using people data. From producing workforce-at-a-glance metrics in Tableau dashboards to help mission areas see their people data in a new way to collaborating with supervisors and telework coordinators to collect, process, and visualize data into applications to assist with return-to-office decision making, I've demonstrated my ability to synthesize complex, disparate datasets into data products that enable non-technical partners even beyond the Human Capital Office to take action.
Data Science Intern
People Analytics | NASA
Aug. 2020 - Aug. 2021
Full Time
Python
Web Scraping
NLP Techniques
APIs
Neo4j
Cypher
R
Shiny
Expanding a graph-driven skills analysis project, this internship entailed:
- Collecting and processing unstructured and structured data with web scraping, available APIs, Python, and NLP techniques to produce clean, structured datasets.
- Designing a graph model based on collected data and implementing it in a Neo4j graph database by importing data with optimized Cypher queries in Python scripts.
- Conducting analyses with graph algorithms and graph data science methodologies, as well as NLP techniques including Doc2Vec, topic modeling, and text similarity.
- Developing a beta NLP “Text-to-Cypher” pipeline that converts a natural language question to a Cypher query, employing custom named entity recognition, entity linking, and relationship extraction techniques.
- Visualizing results of analyses and offering a platform to interact with the “Text-to-Cypher” pipeline by developing an R Shiny application
Data Science Consultant
i2k Connect
Sep. 2022 - Sep. 2023
Freelance | Part Time
Python
Neo4j
OpenAI API
Cypher
As a data science consultant for this artificial intelligence company, I focused on improving the representation of and access to a proprietary dataset with automated data processing, graph databases, and large language models. By automating the processing and validating of tabular data, I improved data integrity and reduced the manual time spent reviewing records by automating the processing and validating of tabular data. After enhancing data quality, I established a knowledge graph that better modeled the relationship-rich data that was modeled and maintained in a tabular format. By creating a SME-informed graph data model and importing the tabular data into a Neo4j graph database using Python and Cypher, I provided more intuitive ways to analyze, interact with, and visualize data. To further improve the experience of interacting with and querying the graph database, I researched and tested how best to use ChatGPT’s large language model to convert natural language into dynamic Cypher queries that retrieve information from the knowledge graph without directly interacting with the graph and/or using code. The research paper written to share findings was published on OnePetro (under my maiden name Gipson).