REINFORCEMENT LEARNING OF SPIKING NEURAL NETWORKS USING TRACE VARIABLES FOR SYNAPTIC WEIGHTS WITH MEMRISTIVE PLASTICITY

Kulagin, V. A.

doi:10.31857/S0544126925030033

PII

S0544126925030033-1

DOI

10.31857/S0544126925030033

Publication type

Article

Status

Published

Authors

V. A. Kulagin

Affiliation:
National Research Center Kurchatov Institute
Department of Physics, Lomonosov Moscow State University

Volume/ Edition

Volume 54 / Issue number 3

Pages

213-223

Abstract

Impulse neural networks, suitable for hardware implementation based on memristors, are very promising for robotics due to their energy efficiency. However, reinforcement learning algorithms using such networks remain poorly understood. One of the key motivations for using memristors as network weights is, in addition to energy efficiency, their ability to learn (change conductivity) in real time by superimposing voltage pulses from pre- and postsynaptic signals. The article presents the results of numerical modeling of a spiking neural network (SNN) with memristive synaptic connections, which approximately solves the optimal control problem using trace variables for weight changes, allowing one to approach reinforcement learning on a true time scale. The fundamental possibility of such training in the task of holding a pole on a moving platform is shown, a comparison of various reward functions is given, and assumptions are made about ways to increase the effectiveness of this approach.

Keywords

импульсные нейронные сети мемристоры резистивное переключение обучение с подкреплением непрерывное обучение STDP нейроморфные системы

Date of publication

16.09.2025

Year of publication

2025

Number of purchasers

Views

References

1. Black K., Brown N., Driess D., et al. π0: A Vision-Language-Action Flow Model for General Robot Control. Physical Intelligence, San Francisco, California. 2024. https://www.physicalintelligence.company/download/pi0.pdf
2. Kalashnikov D., Varley J., Chebotar Y., et al. Mt-opt: Continuous multi-task robotic reinforcement learning at scale, arXiv preprint arXiv:2104.08212. 2021.
3. Khimya Khetarpal, Matthew Riemer, Irina Rish, and Doina Precup. Towards continual reinforcement learning: A review and perspectives. arXiv preprint arXiv:2012.13490. 2020
4. Ielmini D., & Menzel S. Universal switching behavior. In Resistive switchingfrom fundamentals of nanoionic redox processes to memristive device applications. Weinheim: Wiley-VCH. 2016. P. 317.
5. Pershin Y.V., & Di Ventra M. Experimental demonstration of associative memory with memristive neural networks. Neural Networks, 2010. V. 23. № 7. 881–886. http://dx.doi.org/10.1016/j.neunet.2010.05.001
6. Zhu J., Zhang T., Yang Y., & Huang R. A comprehensive review on emerging artificial neuromorphic devices. Applied Physics Reviews, 2020. V. 7. № 1. Article 011312. http://dx.doi.org/10.1063/1.5118217
7. Berggren K., Xia Q., Likharev K.K., Strukov D.B., Jiang H., Mikolajick T., et al. Roadmap on emerging hardware and technology for machine learning. Nanotechnology, 2020. V. 32. № 1. Article 012002. http://dx.doi.org/10.1088/ 1361-6528/aba70f
8. Mnih V. et al. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. 2013.
9. Kharlanov O.G., Shvetsov B.S., Rylkov V.V., Minnekhanov A.A. Sta bility of quantized conductance levels in memristors with copper filaments: Toward understanding the mechanisms of resistive switching. Physical Review Applied, 2022. V. 17. Article 054035. http://dx.doi.org/10.1103/PhysRevApplied. 17.054035
10. Minnekhanov A.A., Shvetsov B.S., Martyshov M.M. et al. On the resistive switching mechanism of parylene-based memristive devices. Organic Electronics, 2019. V. 74. P. 89–95. http: //dx.doi.org/10.1016/j.orgel.2019.06.052
11. Matsukatova A.N., Emelyanov A.V., Kulagin V.A. et al. Nanocomposite parylene-C memristors with embedded Ag nanoparticles for biomedical data process ing. Organic Electronics, 2022. V. 102. Article 106455. http://dx.doi.org/10.1016/j.orgel. 2022.106455
12. Minnekhanov A.A., Emelyanov A.V., Lapkin D.A. et al. Parylene based memristive devices with mul tilevel resistive switching for neuromorphic applications. Scientific Reports, 2019. V. 9. № 1. P. 10800. http://dx.doi.org/10.1038/s41598-019-47263-9
13. Kvatinsky S., et al. VTEAM – A General Model for Voltage Controlled Memristors // IEEE Transactions On Circuits And Systems – Ii: Express Briefs, 2015. Vol. 62. No. 8.
14. Emelyanov A.V., Lapkin D.A., Demin V.A. et al. First steps towards the realization of a double layer perceptron based on organic memristive devices. AIP Advances, 2016. V. 6. № 11. Article 111301. http://dx.doi.org/10.1063/1.4966257
15. Sboev A., Serenko A., Rybka R., Vlasov D. Solving a classification task by spiking neural network with STDP based on rate and temporal input encoding. Mathematical Methods in the Applied Sciences, 2020. V. 43. № 13. P. 7802–7814. http://dx.doi.org/10.1002/mma.6241
16. Gütig R., Sompolinsky H. The tempotron: a neuron that learns spike timing–based decisions. Nature Neuroscience, 2006. V. 9. № 3. P. 420–428.
17. Wang X., Hou Z.-G., Lv F., Tan M., Wang Y. Mobile robots’ modular navigation controller using spiking neural networks. Neurocomputing, 2014. V. 134. P. 230–238. http://dx.doi.org/10.1016/J.NEUCOM.2013.07.055
18. Yu Q., Tang H., Tan K.C., Yu H. A brain-inspired spiking neural network model with temporal encoding and learning. Neurocomputing, 2014. V. 138. P. 3–13. http://dx.doi.org/10.1016/j.neucom.2013.06.052
19. Vlasov D., Minnekhanov A., Rybka R., et al. Memristor-based spiking neural network with online reinforcement learning, Neural Networks, 2023. V. 166. https://doi.org/10.1016/j.neunet.2023.07.031
20. Hazan H., Saunders D.J., Khan H., Patel D. BindsNET: A Machine Learning-Oriented Spiking Neural Networks Library in Python // Front. Neuroinform. 2018. V. 12. P. 89.
21. Sboe A., Serenko A., Rybka R., Vlasov D. Solving a classification task by spiking neural network with STDP based on rate and temporal input encoding. Mathematical Methods in the Applied Sciences, 2020. V. 43. № 13. P. 7802–7814. http://dx.doi.org/10.1002/mma.6241
22. Sboev A., Vlasov D., Rybka R., Davydov Y., Serenko A., Demin V. Modeling the dynamics of spiking networks with memristor-based STDP to solve classification tasks. Mathematics, 2021. V. 9. № 24. P. 3237:1–10. http://dx.doi.org/10. 3390/math9243237, URL https://www.mdpi.com/2227-7390/9/24/3237
23. Richard S., Sutton and Andrew G. Barto. Reinforcement Learning. 2nd Ed. The MIT Press. ISBN: 978-0-262-19398-6. 2018, p. 329.
24. Mnih V., et al. Asynchronous methods for deep reinforcement learning //International conference on machine learning. PmLR. 2016. P. 1928–1937.
25. Frémaux N, Sprekeler H, Gerstner W.Re inforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons. PLoS Comput Biol. 2013. V. 9. № 4. P. e1003024. https://doi.org/10.1371/journal.pcbi.1003024

GOST	Kulagin V. REINFORCEMENT LEARNING OF SPIKING NEURAL NETWORKS USING TRACE VARIABLES FOR SYNAPTIC WEIGHTS WITH MEMRISTIVE PLASTICITY // Russian Microelectronics. – 2025. – V. 54. – Issue number 3 C. 213-223 . URL: https://microelectronicsras.ru/s0544126925030033-1/?version_id=108968. DOI: 10.31857/S0544126925030033
MLA	Kulagin, V. A "REINFORCEMENT LEARNING OF SPIKING NEURAL NETWORKS USING TRACE VARIABLES FOR SYNAPTIC WEIGHTS WITH MEMRISTIVE PLASTICITY." Russian Microelectronics. 54.3 (2025).:213-223. DOI: 10.31857/S0544126925030033
APA	Kulagin V. (2025). REINFORCEMENT LEARNING OF SPIKING NEURAL NETWORKS USING TRACE VARIABLES FOR SYNAPTIC WEIGHTS WITH MEMRISTIVE PLASTICITY. Russian Microelectronics. vol. 54, no. 3, pp.213-223 DOI: 10.31857/S0544126925030033

RAS Nano & ITМикроэлектроника Russian Microelectronics

REINFORCEMENT LEARNING OF SPIKING NEURAL NETWORKS USING TRACE VARIABLES FOR SYNAPTIC WEIGHTS WITH MEMRISTIVE PLASTICITY

You can

References

Индексирование

RAS Nano & ITМикроэлектроника Russian Microelectronics

REINFORCEMENT LEARNING OF SPIKING NEURAL NETWORKS USING TRACE VARIABLES FOR SYNAPTIC WEIGHTS WITH MEMRISTIVE PLASTICITY

You can

References

Индексирование

Via social network