Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MOMDP representation? #67

Open
Hororohoruru opened this issue Apr 4, 2024 · 5 comments
Open

MOMDP representation? #67

Hororohoruru opened this issue Apr 4, 2024 · 5 comments

Comments

@Hororohoruru
Copy link
Contributor

Hello,

I would like to know if it is currently possible to create a problem with fully observable state variables and solve them using a .pomdpx file using SARSOP.

@zkytony
Copy link
Collaborator

zkytony commented Apr 4, 2024

Yes, it is possible. Check out this documentation page.

Regarding fully observable state variables, you can achieve that by having an observation model that simply returns the state.

@Hororohoruru
Copy link
Contributor Author

Hororohoruru commented Apr 5, 2024

Sorry, I did not formulate my question well. What I was meant to ask is how should I formulate my code so, when converted to .pomdpx, it represents some state variables as fully observable.

Right now I am working with a model that has access to the time that has passed since the beginning of the operation, and has a maximum number of time steps to act (finite horizon). The way I have approached is by creating states that, on top of their ID (either an int or 'term' for the terminal state), they have also a property t:

class TDState(pomdp_py.State):
    def __init__(self, state_id, time_step):
        self.id = state_id
        self.t = time_step
        self.name = f"s_{state_id}-t_{time_step}"

The methods for __hash__, __eq__, __str__ and __repr__ are defined similarly to the Tiger example in the documentation. However, observations only have an id property. When the observation depends on t, I retrieve it directly from the TDState object in my ObservationModel:

class TDObservationModel(pomdp.ObservationModel)
    def __init__(self, conf_matrix):
        self. observation_matrix = conf_matrix
        self.n_steps, self.n_states, self.n_obs = self.observation_matrix.shape

    def probability(self, observation, next_state, action):
        obs_idx = observation.id
        state_idx = next_state.id
        state_step = next_state.t

        return self.observation_matrix[state_step][state_idx][obs_idx]

The transition model includes the parameter t_max in order to generate a list of all possible states considering the maximum t. Explaining in Tiger terms, Action('listen') uses the ObservationModel to provide an observation and the state transitions deterministically such that T(s, a_listen, s') = 1 if s.id == s'.id and s'.t == s't + 1. If s.t == t_max, the state transitions to the terminal state no matter the action selected (and provides the corresponding terminal observation deterministically). If any action other than listen is selected, the model also transitions to the terminal state. As in the Tiger problem, the state states the same (here the s.id), but the time advances until an horizon t_max.

I would like the time to be fully observable in the produced .pomdpx file, but since you commented:

Regarding fully observable state variables, you can achieve that by having an observation model that simply returns the state.

I think the way I am handling it would not accomplish the MOMDP representation. How should I do it instead?

@Hororohoruru
Copy link
Contributor Author

Follow-up: I tried to convert to .pomdpx with my current problem definition and the file reflects only one state variable, which has a number of states equal to the number of possible state IDs times the possible values of t. In the case of 5 targets and 8 time-steps, I get 41 states of a single state variable (the extra state is the terminal state).

I would like to know how to define my model to have a state variable with 5 values (ID), which is not fully observable, and another state variable with 8 values (time), which would be fully observable.

@zkytony
Copy link
Collaborator

zkytony commented Apr 9, 2024

I will provide a sketch for the idea.

class State(pomdp_py.SimpleState)
    def __init__(self, target, time_step):
        super().__init__(data=(target, time_step))

class ObservationModel(pomdp_py.ObservationModel):
   def sample(self, next_state, action):
       time_step = next_state.data[1]
       return pomdp_py.SimpleObservation(data=time_step)

This makes time_step observable, but not the target.

@Hororohoruru
Copy link
Contributor Author

Hororohoruru commented Apr 10, 2024

That makes it clear, thank you! I imagine the ObservationModel need to know t_max in order to create a list of all the possible observations.

What I mean is that the target needs to be part of the observation as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants