diff --git a/_posts/2024-09-07-robotics-reviews.md b/_posts/2024-09-07-robotics-reviews.md
new file mode 100644
index 0000000..84fedde
--- /dev/null
+++ b/_posts/2024-09-07-robotics-reviews.md
@@ -0,0 +1,49 @@
+---
+title: Embodied AI Paper Reviews
+tags: robotics
+date: 2024-09-07
+categories: [Notes, Robotics, MachineLearning]
+math: true
+mermaid: true
+---
+
+# Imitation Learning
+
+## BC-Z
+
+## Gato
+
+## Robocat
+
+## RT-1
+
+## PaLM-E
+
+## RT-2
+
+# Offline Reinforcement Learning
+
+## Q-Transformer
+
+## Conservative Q-Learning
+
+## Decision Transformer
+
+Limitation of the original paper:
+
+- It seems hard to generalize to the returns that it did not see before
+- The entire observation vector is converted to a single embedding
+  - The embedding conversion is just a linear transformation, feel lack of the ability to project complicated states to the right place.
+  - There is a paper did an ablation and found the discretization is better than using the continuous value.
+    - I guess the reason is each bin could be saved into an embedding lookup table, making it more nonlinear.
+- At the inference time, the latest return condition prompt = last return expectation - last reward. This could result in the latest return going to some range that the model does not see before in the training data.
+
+  - So in general, the offline dataset basically need to cover the entire return space to perform well.
+
+- Another thing I notice in my experiment, it randomly stitches different expert demos, but did not result in a better solution.
+
+# Prompting-based
+
+## Code-as-Policy
+
+## Language to Rewards