To provide indoor position tracking using a set of WiFi signals and Machine Learning.
User has a set of wifi signal strengths (RSSI) and would like to know in what room he is in.
- input: a set of wifi signals (ex: [-73, -32, -67, ...])
- output: a single value that will be mapped to a label (ex: 3, 3='Living Room')
- Data Collection
- Collect training data using a sensor (Android Device)
- Data Cleaning
- Filter the training data to include only a subset of stable or frequent features (only use the access points that are consistently there)
- Split up the data into two sets:
- train data used for training
- validation data to test after training
- Training
- Use the filtered training data to train a Neural Network (using TensorFlow) model
- Testing
- Test the trained model with a validation set collected at the time of training
- Test the model in real world live data (Using an Android app)
- Notes
I created an app(AndroidWifiDataCollector) to collect wifi data. It captures the BSSID to use as a unique identifier,the level (RSSI) and a label (int)
In this phase we are collecting training samples the label must be specified by someone (Supervised Learning!).
User Interaction:
- Start Button
- Start collecting data using the label specified in the Spinner
- Stop Button
- Stop collecting data
- Label Spinner
- label corresponding to a number that will be used in the training data gathered
There is no in-app filtering of the source of data by design. There are many access points that only show up in a few training examples but instead of filtering in-app and losing that data. I wanted to save it in case I wanted to work on a separate machine learning project that would look into working with an input feature set that has missing data (0's for some inputs).
Here is an example of the training data that would come from the app. (Not real data)
+---------------+---------------------------------+
| Training Rows | Access Points |
+ +---------------------------------+
| | A1 | A2 | A3 | A4 | A5 | A6 |
+---------------+-----+-----+-----+-----+----+----+
| Row 1 | -98 | -54 | 0 | -6 | -9 | -5 |
+---------------+-----+-----+-----+-----+----+----+
| Row 2 | -45 | -64 | -45 | -8 | 0 | 0 |
+---------------+-----+-----+-----+-----+----+----+
| Row 3 | -67 | -78 | -34 | -26 | -8 | 0 |
+---------------+-----+-----+-----+-----+----+----+
We want to only work with a smaller subset of access points that show up always or very frequently. So in the above example we would only select A1, A2, A4 because they had values in all the training samples (3 rows in example
This filtering will be done in the next step .
The goal in this section is to end up with 2 sets and 4 files.
Sets:
- Training data (80% of the data collected)
- features: train_data_features
- labels: train_data_labels
- Test Data (20%)
- features: test_data_features
- labels: test_data_labels
+---------------+---------------------------------+
| Training Rows | Access Points |
+ +---------------------------------+
| | A1 | A2 | A3 | A4 | A5 | A6 |
+---------------+-----+-----+-----+-----+----+----+
| Row 1 | -98 | -54 | 0 | -6 | -9 | -5 |
+---------------+-----+-----+-----+-----+----+----+
| Row 2 | -45 | -64 | -45 | -8 | 0 | 0 |
+---------------+-----+-----+-----+-----+----+----+
| Row 3 | -67 | -78 | -34 | -26 | -8 | 0 |
+---------------+-----+-----+-----+-----+----+----+
The data collected in the last phase collected a bunch of wifi strengths from some access points that are not always there (0 values).
I decided to handpick the the access points based on the number of times the access points shows up in the data.
I created a map that mapped the access point and the number of times they showed up:
A1: 900
A2: 850
A3: 450
A4: 800
In the example above I would select A1, A2, A4 because they show up the most.
After handpicking the access points I filtered out the data that did not have a value for each.
Sample Output:
-56, -23, -89
-46, -25, -68
1, 0, 0
0, 1, 0
planning on testing different architectures (svm, deep/hidden layers)
The base model is a simple logistic regression model which uses cross entropy as the cost function, gradient descent as its optimizer and softmax for multi class classification. Trained using stochastic batching.
- Total examples: 40,109
- Train: 32,088
- Test: 8,021
- Training epochs: 25,000
- accuracy: 97%