COSCUP 2022

Your locale preferences have been saved. We like to think that we have excellent support for English in pretalx, but if you encounter issues or errors, please contact us!

Digital Twin workbench with Jupyter hub/lab and Hadoop/Spark/Kafka for geospatial-temporal applications
2022-07-30 , TR310-2
Language: 日本語

We build a simple, OSS-based infrastructure for visualizing and analyzing vehicle/people location information to realize a digital twin computation. We present our architecture how technologies to be combined, and demonstrate a system with a simple case in the session. 1. Distributed processing platform to process large amounts of data rapidly and quickly; Apache Hadoop. Apache Spark, and Apache Kafka 2. Multi-user ad-hoc analysis services; Jupyter Lab and Jupyter Hub Additional talk will include a spatial-temporal databases and technologies as extended topic.


I aim to develop an infrastructure that enables people from various companies and industries to utilize and analyze each other's data in my position promoting NTT's IOWN (Innovative Optical and Wireless Network*1).
We focus on infrastructure that enables multi-users to analyze large amounts of data, and build a simple analysis infrastructure using Open Source products. We tackled three challenges throughout the project, and take a following
actions.
(1) Use JupyterHub to achieve multi-user analysis
(2) Build with Big Top for quick Hadoop/Spark/Kafka deployment
(3) Pick Pyspark and configure a Jupyter execution kernel for it to handle data
on Hadoop clusters
We realize a multi-users service to provide Jupyter notebook to analyze large
amounts of data on Hadoop cluster. We will extend our service in a point of
following challenges in the future.
(1) Deploy JupyterHub with Kubernetes container cluster for scalability
(2) Provide Multi-tenant service
I'd like to share further Open Source products to related our work in a field of spatial-temporal data processing.

*1. https://www.rd.ntt/e/iown/


Target Audience

Beginner

Difficulty

入門

Talk Length (30/45/90 mins)

30 minutes

Proposal Type

Talk (30 mins)

Way to participate

Record participatoin

youtube_link

https://www.youtube.com/watch?v=NTKCqOdLdq8

Shizuka Yasukouchi is an infrastructure engineer specialized in big data analysis and cloud platforms. She has been involved in a connected car project and built a data processing infrastructure for connected car. She is also an expedited lecture on Introduction to Hadoop/Spark/Kafka infrastructure at Open Source Conference 2021 Online/Nagoya.