Data Science Intern
People Analytics | NASA
Aug. 2020 - Aug. 2021
Full Time
Python
Web Scraping
NLP Techniques
APIs
Neo4j
Cypher
R
Shiny
Expanding a graph-driven skills analysis project, this internship entailed:
- Collecting and processing unstructured and structured data with web scraping, available APIs, Python, and NLP techniques to produce clean, structured datasets.
- Designing a graph model based on collected data and implementing it in a Neo4j graph database by importing data with optimized Cypher queries in Python scripts.
- Conducting analyses with graph algorithms and graph data science methodologies, as well as NLP techniques including Doc2Vec, topic modeling, and text similarity.
- Developing a beta NLP “Text-to-Cypher” pipeline that converts a natural language question to a Cypher query, employing custom named entity recognition, entity linking, and relationship extraction techniques.
- Visualizing results of analyses and offering a platform to interact with the “Text-to-Cypher” pipeline by developing an R Shiny application