Danny's tech notebook | 丹尼技術手札

A personal technical notebook 一本個人的技術手札 (天道酬勤)

Monday, December 17, 2018

[Reinforcement Learning] Get started to learn Sarsa(lambda λ) for reinforcement learning

Once you know what the Sarsa algorithm is, you can continue to learn Sarsa(lambda λ) algorithm.
I basically refer to these tutorial documents (written in Chinese) :
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/3-3-A-sarsa-lambda/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/3-3-tabular-sarsa-lambda/
https://zhuanlan.zhihu.com/p/28108498

The Sarsa(lambda λ) algorithm looks like this:

- December 17, 2018 1 comment:

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Reinforcement Learning, Tensorflow

Friday, December 14, 2018

[Reinforcement Learning] Get started to learn Sarsa for reinforcement learning

If taking a look at Sarsa algorithm, you will find that it is so similar with Q-Learning.
For my previous post about Q-Learning, please refer to this link:
https://danny270degree.blogspot.com/2018/11/reinforcement-learning-get-started-to_21.html

Here is the Sarsa algorithm:

- December 14, 2018 2 comments:

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Reinforcement Learning

Thursday, December 13, 2018

[Reinforcement Learning] Using dynamic programming to solve a simple GridWorld with 4X4

I borrow the example and its source code from here which is a dynamic programming to solve a simple GridWorld with 4X4 and put my explanation for the calculation of value function. Hope that will help to understand dynamic programming and Markov Reward Process(MRP) more quickly.

- December 13, 2018 No comments:

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Reinforcement Learning

Wednesday, November 28, 2018

[Reinforcement Learning] Get started to learn DQN for reinforcement learning

The previous post about Q-Learning is here:
[Reinforcement Learning] Get started to learn Q-Learning for reinforcement learning

Basically, Deep Q-Learning ( DQN ) is upgraded the Q-Learning algorithm and the Q-table is replaced by the neural network. For the DQN tutorial, I refer to these as follows: ( sorry, they are written in Chinese )
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/4-1-A-DQN/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/4-1-DQN1/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/4-2-DQN2/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/4-3-DQN3/

- November 28, 2018 No comments:

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Reinforcement Learning, Tensorflow

Thursday, November 22, 2018

[Reinforcement Learning] Get started to learn Q-Learning for reinforcement learning

The previous post about reinforcement learning:
[Reinforcement Learning] Get started to learn gradient method for reinforcement learning

For the Q-Learning tutorial, I refer to these as follows: ( sorry, they are written in Chinese )
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/2-2-A-q-learning/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/2-2-tabular-q1/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/2-3-tabular-q2/

- November 22, 2018 No comments:

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Reinforcement Learning, Tensorflow

Wednesday, November 21, 2018

[Reinforcement Learning] Get started to learn policy gradient method for reinforcement learning

This post is about my first time to learn policy gradient method for reinforcement learning. Basically, there are already a lot of materials on the internet, but in this time, I only want to focus on a tutorial as follows: ( sorry, they are written in Chinese )
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/5-1-policy-gradient-softmax1/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/5-2-policy-gradient-softmax2/

- November 21, 2018 No comments:

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Reinforcement Learning, Tensorflow

Thursday, November 15, 2018

[RNN] What are the difference of input and output's tensor shape in dynamic_rnn and static_rnn using TensorFlow

When studying RNN, my first issue encountered in my program is about the shape of input and output tensors. Shape is a very important information to connect between layers. Here I just directly point out what are differences in input/output shape of static RNN and dynamic RNN.
P.S: If you use Keras to write your RNN model, you won't need to deal with these details.

- November 15, 2018 No comments:

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: RNN, Tensorflow

Tuesday, November 13, 2018

[TensorFlow] The explanation of average gradients by example in data parallelism

When studying some examples of training model using Multi-GPUs ( in data parallelism ), the average gradients function always exists in some kind of ways, and here is a simple version as follows:

- November 13, 2018 No comments:

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Tensorflow

Newer Posts Older Posts Home

Search This Blog

Home

About Me

TeYen (Danny)

View my complete profile

Blog Archive

▼ 2023 (3)
- ▼ August (3)

► 2022 (32)
- ► June (3)
- ► May (10)
- ► April (3)
- ► March (7)
- ► February (5)
- ► January (4)

► 2021 (33)
- ► December (6)
- ► November (4)
- ► October (3)
- ► September (20)

► 2020 (3)
- ► February (3)

► 2019 (38)
- ► October (2)
- ► September (2)
- ► August (4)
- ► July (6)
- ► June (3)
- ► May (1)
- ► April (4)
- ► March (7)
- ► February (1)
- ► January (8)

► 2018 (40)
- ► December (5)
- ► November (6)
- ► October (7)
- ► September (4)
- ► August (6)
- ► July (5)
- ► June (7)

► 2017 (14)
- ► August (4)
- ► July (1)
- ► May (9)

► 2016 (26)
- ► September (3)
- ► August (3)
- ► July (1)
- ► June (4)
- ► May (4)
- ► March (3)
- ► February (2)
- ► January (6)

► 2015 (29)
- ► December (4)
- ► November (1)
- ► October (4)
- ► September (3)
- ► August (5)
- ► July (2)
- ► June (1)
- ► May (7)
- ► January (2)

► 2014 (21)
- ► December (1)
- ► November (2)
- ► September (2)
- ► August (3)
- ► June (2)
- ► May (1)
- ► March (2)
- ► February (3)
- ► January (5)

► 2013 (35)
- ► December (2)
- ► November (5)
- ► October (8)
- ► September (2)
- ► August (6)
- ► July (3)
- ► June (1)
- ► April (1)
- ► March (2)
- ► February (2)
- ► January (3)

► 2012 (69)
- ► December (6)
- ► November (6)
- ► October (4)
- ► September (9)
- ► August (8)
- ► July (3)
- ► June (7)
- ► May (5)
- ► April (21)

► 2008 (12)
- ► April (1)
- ► March (3)
- ► February (8)

Labels

Advisor
Android
Ansible
AOT compilation
Apache
Apache2
API
ARP
Astyle
AutoKeras
Bazel
BCC
Big Switch
Boost
Boost.Python
BPDU
bpftrace
ByteCode
C
C++
Caffe
Ceilometer
Cgo
CI/CD
Cilium
CMake
CNI
Coffee Talks
concurrent
Confusion Matrix
Cross-Compile
Cucumber
D-Bus
Data Center
datapath id
Dataset
DDS
Debian
decorator
Decorators
Django
Django rest framework
Docker
Dot NET
Doxygen
DPDK
dynamic loader
eBPF
eBPF exporter
Eclipse
Elasticsearch
Ethertype
EVB
find
Floodlight
flow_dumper
Fuel
Fun
GCC
git
github
GNS3
Golang
Golang; janus
Google Chart
Go語言
Grafana
GraphQL
Grappler
GRE Tunneling
GUI
Haar Classifier
Hadoop
HAProxy
Horizon
Hubble
I2RS
Image
Imagemagick
Indigo
InfluxDB
inspektor gadget
Install
Interface
IOMMU
iptables
Java
Java ME
JavaScript
Jetson TX2
JIT compilation
JSON
juju
Jumbo Frame
Kafka
koko
kubectl
Kubernetes
KVM
kwargs
L3
L3HA
Lagopus
LBaaS
libbpf
libbpfgo
Linkerd2
Linux
Linux Bonding
Linux Kernel
LLDP
LLVM
LXC
macvlan
macvtap
Make
Makefile
Matplotlib
Maven
Meld
memcached
mininet
ML2
Mock
mod_wsgi
MongoDB
Mongrel2
Mpld3
MTU size and RX drop
nailgun
NCCL
Net-Tools
NetBeans
NETCONF
Netfilter
network emulation
Network Policy
Network Speed
Neutron
nGraph
NOX
NSX
numactl
OF-DPA
Offensive Security
ONIE
ONNX
Open Source
Open vSwitch
OpenCV
OpenFlow
OpenFlow 1.3
OpenGL
OpenGrok
OpenStack
OpenStack Quantum
OpenStak
OpenVNet
OVS
ovs-ofctl
ovs-vsctl
ovsdbmonitor
PAM
Parca
PCIe
picamera
Pid
port
postgres
Prometheus
Puppet
pyc
Python
Python Egg
Qanava
QoS
Qt4
Qt5
Quagga
Quantum
RARP
Raspberry Pi
React
Redis
Regression
Reinforcement Learning
replace
RESTful API
RNN
RotatingFileHandler
RouteFlow
Ruby
RX dropped
RYU
SDN
sed
Selenium
Service Mesh
setup
sFlow
sFlow agent
sFlow collector
Shell Script
show topology
Signal
Singleton
SLB
Socket
Spark
SPC
Spring Boot
Sqlite
Squid
SR-IOV
ssh
static analysis
statistics
Storage
Surftrace
SWIG
table schema
tcpdump
tee
Tensorflow
Tensorflow Lite
TensorRT
TFLMS
Tid
tmux
topology
trace.py
Tracee
Tracing
Trema
TRILL
TTP
TUN/TAP
Tutorial
TVM
U-Boot
ubuntu
UML
USDT
Userspace
VFIO
Virtual Network
VirtualBox
Visdom
vMotion
VMware
VScode
Vue.js
Web
Windows 10
Wireshark
XLA
Xorp
XorPlus
xpath
ZeroMQ
範例

Followers

Awesome Inc. theme. Theme images by 5ugarless. Powered by Blogger.