2022/07/30 –, TR310-2
言語: 日本語
We build a simple, OSS-based infrastructure for visualizing and analyzing vehicle/people location information to realize a digital twin computation. We present our architecture how technologies to be combined, and demonstrate a system with a simple case in the session. 1. Distributed processing platform to process large amounts of data rapidly and quickly; Apache Hadoop. Apache Spark, and Apache Kafka 2. Multi-user ad-hoc analysis services; Jupyter Lab and Jupyter Hub Additional talk will include a spatial-temporal databases and technologies as extended topic.
I aim to develop an infrastructure that enables people from various companies and industries to utilize and analyze each other's data in my position promoting NTT's IOWN (Innovative Optical and Wireless Network*1).
We focus on infrastructure that enables multi-users to analyze large amounts of data, and build a simple analysis infrastructure using Open Source products. We tackled three challenges throughout the project, and take a following
actions.
(1) Use JupyterHub to achieve multi-user analysis
(2) Build with Big Top for quick Hadoop/Spark/Kafka deployment
(3) Pick Pyspark and configure a Jupyter execution kernel for it to handle data
on Hadoop clusters
We realize a multi-users service to provide Jupyter notebook to analyze large
amounts of data on Hadoop cluster. We will extend our service in a point of
following challenges in the future.
(1) Deploy JupyterHub with Kubernetes container cluster for scalability
(2) Provide Multi-tenant service
I'd like to share further Open Source products to related our work in a field of spatial-temporal data processing.
Beginner
Difficulty –入門
Talk Length (30/45/90 mins) –30 minutes
Proposal Type –Talk (30 mins)
Way to participate –Record participatoin
youtube_link –Shizuka Yasukouchi is an infrastructure engineer specialized in big data analysis and cloud platforms. She has been involved in a connected car project and built a data processing infrastructure for connected car. She is also an expedited lecture on Introduction to Hadoop/Spark/Kafka infrastructure at Open Source Conference 2021 Online/Nagoya.