Please describe one project or research that you focused on. Must include role you played, difficulties, learnings and reflection. (About 800 words)
The research that I am most proud of is a 10-month long study of topological improvements of the semantic segmentation of structural objects. During this time, I was a Research Assistant at the Tongji Deep Learning Lab. Guided by Prof. Yin Wang and two Facebook Research Scientists, Dr. Saikat Basu and Dr. Guan Pang, we concentrated on improving the design of loss functions to improve the connectivity of semantic segmentation of linear structures. The three main difficulties I faced were theoretical, evaluation, and engineering challenges. It was my responsibility to design and explain the technique based on the principles of statistical learning theories, search for appropriate alternative metrics suitable to our scenario, and handle a large amount of experiment configurations and of data.
After we reached the finale of CVPR 2018 DeepGlobe Workshop with our methods that improved the result of semantic segmentation of roads from remote sensing imagery, I noticed an interesting problem. The problem is that the road segmentation result suffers from poor connectivity, and this defect also happens in many other scenarios where the objects exhibit linear structure, such as cell boundary, lane detection in autonomous driving, and blood vessels. However, connectivity is important for the topology of linear structures, as even a single small gap may cause a significant change to the entire topology. I supposed the cause may be the widely used pixel-wise loss functions such as cross-entropy loss. The pixel-level losses compute and optimize empirical risk, making independence assumptions among adjacent pixels, and thus cannot emphasize the penalization of small prediction gaps for structural objects. It should be appropriate to introduce priors to the loss function to make it aware of topology. Therefore, fusing the ideas of edge detection kernels and total variance regulation, I proposed a novel loss function term to penalize gaps and fragments on the prediction map. The computation of the loss function term is differentiable, enabling it to be used to train any existing deep neural networks and does not add overhead at inference time. I devised two mathematical forms and three spatial forms of the regulation to adapt to regions of different dimensions: 1D polylines for curvilinear objects, 2D squares for polygonal objects, and 3D cubes for polyhedral objects. Facing these theoretical challenges and diving into the fundamental research in machine learning has not only consolidated my image processing knowledge but also deepened my understanding of optimization and learning theories. In this creative theoretical exploration journey, I was fascinated by the importance of the ability to use notation as a tool of thought and communication, and this ability would support my further research in Sony.
We need to evaluate the effectiveness of the proposed loss functions. I conducted experiments on four delineation datasets including road datasets, a cell membrane dataset, and a 3D hepatic vessel dataset. Models of a number of architectures have been tested: U-Net, D-Linknet, and V-Net. Their performance was compared using a range of metrics. We got inspiration from the SpaceNet competition and used Average Path Length Similarity (APLS) as connectivity metrics for curvilinear structures. We introduced the IoU measurement for polygonal objects. In addition, we calculated the Hausdorff distance and counted the predicted polygons to measure the fragmentation problem. The process of searching for evaluation metrics taught me to conclude every experiment with a systematic result.
The experiments posed significant engineering challenges. To handle the experiments of over 300 configurations and the long-stretched hyper-parameter tuning process, I needed to write code that is highly reusable, parameterized, parallelized, and performant. The code is built on a flexible deep learning framework Fastai v1, whose development I followed closely. I repeatedly refactored the data loader, model and loss function loader, training loop codes, and parameterized experiments using a script interface. It turns out that my previous knowledge of Linux and skills in software engineering proved to be tremendously beneficial in the sense that without the time spent refactoring, collaboration and fast experiments without modifying the code would not have been possible. These abilities have been repetitively helping me in both research and work.
The polygon metrics show 2.5–4.3% connectivity improvements and 7.0-14.6% fragment reduction while maintaining pixel-level metrics. In contrast, existing methods such as CRF and GAN methods, all of which I have implemented for comparison, usually perform worse than baseline as they rely too much on shallow appearance features. We have fused these ideas into another paper, titled Whole-Object Segmentation Using Regional Variance Losses, and submitted to CVPR 2020 as a co-author. My research experience in the field of remote sensing in computer vision has given me a deeper understanding of computer vision and equipped me with essential research methodologies. Additionally, I gained the experience of doing research in collaboration with other people. I have learned how to effectively cooperate with my co-author who provided valuable inspiration on how to quantify topological correctness and performed a theoretical analysis of synthetic data. I greatly appreciated the helpful guidance in direction and on how to write a paper from my professor and the Facebook research scientists. This experience has honed my essential skills which will help me contribute to Sony as a Machine Learning practitioner.
2. Please describe what you would like to contribute through the position you are applying in Sony, from the aspect of knowledge/skills or experience. (About 500 words)
Sony is internationally known for its emotion-provoking products that are full of creativity and technology. Personally, the PlayStation series has filled the childhood of my brother and me with joy and strengthened friendships in our lives. I aspire to stay with Sony in the long-term while seeking to contribute to Sony by applying the following abilities to solve challenging problems: data science skills, Computer Vision knowledge, and the ability to use machine learning to solve domain-specific problems.
Working as a VP/Machine Learning Engineer/Solution Architect of a startup (Alfasommet) has honed two aspects of data science skills, which I seek to contribute to “priority 1, internship 2.” Firstly, tailoring the algorithms according to the business needs has consolidated my statistical knowledge and sharpened my software engineering skills. My abilities were proven when I delivered two machine learning-based features. The first aspect is a recommendation system, which matches users based on tags. I built it from data collection, model selection, and feature engineering to real-time inferencing, improving recall to 60% compared to the 30% of content-based approaches. The second one uses word embedding to recommend tags proactively, optimizing the user experience for thousands of users. Secondly, intensive communication helped me improve data visualization and communication skills. I worked closely with the CEO to analyze business objectives and optimization targets by creating visualization reports which gave us insights into user behaviors and communicated business benefits of the data product. I am delighted that this data product has helped us raise 5 million Chinese Yuan in the angel round and a following 12 million in the pre-A round and I became an AWS certified solution architect. In the future, I seek to become a researcher in the area of data science for food by combining my Computer Vision skills described in the previous essay, my ability to apply NLP and data science skills, and software engineering skills.
For priority 2, I would like to utilize my Computer Vision knowledge and C++ skills. The loss function term I devised described in the previous essay involves Gaussian and edge detection kernels, which deepened my understanding of image processing techniques. To compare performances, I implemented approximated CRF using convolution (from equations in section 6.2 of Gated CRF Loss for Weakly Supervised Semantic Image Segmentation). An example of using C++ to implement algorithms would be a Chess AI that won the final contest of my Artificial Intelligence class, for which I implemented Monte Carlo tree search and the bitboard trick. As a photography enthusiast, I would be thrilled to contribute to the R&D of image/video enhancement algorithms.
Regarding priority 3, I want to use my abilities in machine learning to solve domain-specific problems. This ability has been proved by participating in a competition organized by the National Basic Research Program of China (973 Program) with the influential Civil Engineering Department of Tongji University. I transferred my knowledge of NLP to reformulate the time series regression problem as a language model problem. Experiments implemented using Fastai, a deep learning library written in Python, shows that the AWD-LSTM model outperformed the traditional random forest model by a large margin of 13%. A paper titled Tunnel Boring Machine (TBM) parameter and rock mass prediction using multi-input and output AWD-LSTM has been submitted to the Chinese Journal of Rock Mechanics and Engineering. I would like to contribute my interdisciplinary modeling skills and advance the vital sensing and emotion estimation technologies.