Research

Publications

J. Wang and E. Uchibe. Regularized Reward-Punishment Reinforcement Learning. arXiv preprint, 2026
[website] [researchgate] [pdf]
J. Wang and E. Uchibe. Reward-punishment reinforcement learning with maximum entropy.
In 2024 International Joint Conference on Neural Networks. IEEE. Yokohama, Japan. 2024 (IJCNN with WCCI)
[website] [researchgate] [pdf]
J. Wang, S. Elfwing and E. Uchibe. Modular deep reinforcement learning from reward and punishment for robot navigation.
Neural Networks. DOI 10.1016/j.neunet.2020.12.001. 2020
[website] [researchgate] [pdf]
J. Wang, S. Elfwing and E. Uchibe. Deep reinforcement learning by parallelizing reward and punishment using MaxPain architecture.
In proceedings of the 8th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics. Waseda University, Tokyo, Japan. 2018 (ICDL)
[website] [researchgate] [pdf]
E. Uchibe and J. Wang. EM-based policy search for learning foraging and mating behaviors.
In proceedings of the 30th robotics and mechatronics conference. Kitakyushu, Japan. 2018 (ROBOMECH)
[website] [researchgate] [pdf]
E. Uchibe and J. Wang. Deterministic policy search method for real robot control.
The Brain & Neural Networks. 24(4). DOI 10.3902/jnns.24.195. 2017
[website] [researchgate] [pdf]
J. Wang, E. Uchibe, and K. Doya. Adaptive baseline enhances EM-based policy search: Validation in a view-based positioning task of a smartphone balancer.
Frontiers in Neurorobotics. 11:1. DOI 10.3389/fnbot.2017.00001. 2017
[website] [researchgate] [pdf]
J. Wang, E. Uchibe, and K. Doya. EM-based policy hyper parameter exploration: Application to standing and balancing of a two-wheeled smartphone robot.
Artificial Life and Robotics. 21: 125. DOI 10.1007/s10015-015-0260-7. 2016
[website] [researchgate] [pdf]
J. Wang, E. Uchibe, and K. Doya. Two-wheeled smartphone robot learns to stand up and balance by EM-based policy hyper parameter exploration.
In proceedings of the 20th International Symposium on Artificial Life and Robotics. Beppu, Japan. 2015 (AROB)
[website]
J. Wang, E. Uchibe, and K. Doya. Control of Two-wheel balancing and standing-up behaviors by an Android phone robot.
In Proceedings of the 32nd Annual Conference of the Robotics Society of Japan. Sangyo University, Fukuoka, Japan. 2014 (RSJ)
[website]
J. Wang, E. Uchibe, and K. Doya. Standing-up and balancing behaviors of Android phone robot: Control of spring-attached wheeled inverted pendulum.
IEICE technical report. Nonlinear problems. 113(341), 49-54, Hongkong, China. 2013 (NLP)
[website]

PhD Thesis

Policy hyperparameter exploration for behavioral learning of smartphone robots.
Kyoto University. DOI 10.14989/doctor.k20519. 2017
[website] [pdf]

Master Thesis

An exploration of developing learning robots based on Android platform.
Kyoto University. 2013

Talks

J. Wang. Modular deep reinforcement learning from reward and punishment for robot navigation.
OIST Integrated Open System Unit Seminar. Okinawa, Japan. 2019
[website]
J. Wang. Multiple deep reinforcement learners from reward and punishment for robot navigation.
Neural Computation Workshop. Okinawa, Japan. 2019
[website]
J. Wang. Deep reinforcement learning by parallelizing reward and punishment using the MaxPain Architecture.
OIST Neural Computation Unit Seminar. Okinawa, Japan. 2018
[website]
J. Wang, E. Uchibe, and K. Doya. EM-based policy hyper parameter exploration for a Two-wheeled smartphone robot learning to balancing and standing-up.
The 10th Annual Women in Machine Learning Workshop. Montreal, Canada. 2015 (WiML with NIPS)
[website]
J. Wang, E. Uchibe, and K. Doya. Smartphone robot learns to stand up and balance.
Machine Learning Summer School. Reykjavik, Iceland. 2014 (MLSS with AISTATS)
[website]

Reviewing Service

CoRL 2018, IEEE T-RO 2018, ICDL 2019, IEEE T-RO 2019, ICRA 2020, Neural Networks 2020, ICRA 2021, IROS 2021, Scientific Reports 2021, ICRA 2022, IROS 2022, ICRA 2023, IROS 2023, ICRA 2024, IROS 2024, ICRA 2025, IROS 2025, Neural Networks 2025, ICRA 2026