- Initialize an empty record (np.NaN)
- Insert the empty record into exist DataFrame according to DataFrame’s index
- Within the DataFrame, Fill np.NaN with previous record (method=’ffill’)
Notes: Before fill in missing value with method forwar fill, the dataframe should be sorted first.
import pandas as pd
import numpy as np
from transform import transform_trafficData, transform_weatherData, get_y, create_matrix
weatherSlotDf = transform_weatherData('2016-01-26', folder='testing')
The records with index range(46, 158, 12) are missing.
We will append these record first and then fill in values with its pevious record.
weatherSlotDf
date | slot | Weather | temperature | PM25 | |
---|---|---|---|---|---|
time_slot | |||||
43 | 2016-01-26 | 43 | 1 | -2.0 | 60 |
44 | 2016-01-26 | 44 | 1 | -2.0 | 60 |
45 | 2016-01-26 | 45 | 1 | -2.0 | 60 |
67 | 2016-01-26 | 67 | 1 | 3.0 | 65 |
69 | 2016-01-26 | 69 | 1 | 3.0 | 65 |
79 | 2016-01-26 | 79 | 1 | 5.0 | 66 |
80 | 2016-01-26 | 80 | 1 | 5.0 | 66 |
81 | 2016-01-26 | 81 | 1 | 5.0 | 66 |
91 | 2016-01-26 | 91 | 1 | 7.0 | 59 |
92 | 2016-01-26 | 92 | 1 | 7.0 | 59 |
93 | 2016-01-26 | 93 | 1 | 7.0 | 59 |
104 | 2016-01-26 | 104 | 1 | 6.0 | 58 |
105 | 2016-01-26 | 105 | 1 | 6.0 | 58 |
115 | 2016-01-26 | 115 | 2 | 5.0 | 65 |
116 | 2016-01-26 | 116 | 2 | 5.0 | 65 |
117 | 2016-01-26 | 117 | 2 | 4.0 | 65 |
127 | 2016-01-26 | 127 | 9 | 4.0 | 89 |
128 | 2016-01-26 | 128 | 9 | 4.0 | 89 |
139 | 2016-01-26 | 139 | 3 | 3.0 | 101 |
140 | 2016-01-26 | 140 | 3 | 4.0 | 101 |
141 | 2016-01-26 | 141 | 3 | 4.0 | 101 |
# initialize the record and set all columns to np.NaN
rowTemp = weatherSlotDf.iloc[0]
# set all values to NaN
for key in rowTemp.keys():
rowTemp[key] = np.NaN
for i in range(46, 146, 12):
weatherSlotDf.loc[i] = rowTemp
weatherSlotDf.sort_index(inplace=True)
/Users/hadoop1/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
weatherSlotDf
date | slot | Weather | temperature | PM25 | |
---|---|---|---|---|---|
time_slot | |||||
43 | 2016-01-26 | 43.0 | 1.0 | -2.0 | 60.0 |
44 | 2016-01-26 | 44.0 | 1.0 | -2.0 | 60.0 |
45 | 2016-01-26 | 45.0 | 1.0 | -2.0 | 60.0 |
46 | NaN | NaN | NaN | NaN | NaN |
58 | NaN | NaN | NaN | NaN | NaN |
67 | 2016-01-26 | 67.0 | 1.0 | 3.0 | 65.0 |
69 | 2016-01-26 | 69.0 | 1.0 | 3.0 | 65.0 |
70 | NaN | NaN | NaN | NaN | NaN |
79 | 2016-01-26 | 79.0 | 1.0 | 5.0 | 66.0 |
80 | 2016-01-26 | 80.0 | 1.0 | 5.0 | 66.0 |
81 | 2016-01-26 | 81.0 | 1.0 | 5.0 | 66.0 |
82 | NaN | NaN | NaN | NaN | NaN |
91 | 2016-01-26 | 91.0 | 1.0 | 7.0 | 59.0 |
92 | 2016-01-26 | 92.0 | 1.0 | 7.0 | 59.0 |
93 | 2016-01-26 | 93.0 | 1.0 | 7.0 | 59.0 |
94 | NaN | NaN | NaN | NaN | NaN |
104 | 2016-01-26 | 104.0 | 1.0 | 6.0 | 58.0 |
105 | 2016-01-26 | 105.0 | 1.0 | 6.0 | 58.0 |
106 | NaN | NaN | NaN | NaN | NaN |
115 | 2016-01-26 | 115.0 | 2.0 | 5.0 | 65.0 |
116 | 2016-01-26 | 116.0 | 2.0 | 5.0 | 65.0 |
117 | 2016-01-26 | 117.0 | 2.0 | 4.0 | 65.0 |
118 | NaN | NaN | NaN | NaN | NaN |
127 | 2016-01-26 | 127.0 | 9.0 | 4.0 | 89.0 |
128 | 2016-01-26 | 128.0 | 9.0 | 4.0 | 89.0 |
130 | NaN | NaN | NaN | NaN | NaN |
139 | 2016-01-26 | 139.0 | 3.0 | 3.0 | 101.0 |
140 | 2016-01-26 | 140.0 | 3.0 | 4.0 | 101.0 |
141 | 2016-01-26 | 141.0 | 3.0 | 4.0 | 101.0 |
142 | NaN | NaN | NaN | NaN | NaN |
Done!!!
weatherSlotDf.fillna(method='ffill')
date | slot | Weather | temperature | PM25 | |
---|---|---|---|---|---|
time_slot | |||||
43 | 2016-01-26 | 43.0 | 1.0 | -2.0 | 60.0 |
44 | 2016-01-26 | 44.0 | 1.0 | -2.0 | 60.0 |
45 | 2016-01-26 | 45.0 | 1.0 | -2.0 | 60.0 |
46 | 2016-01-26 | 45.0 | 1.0 | -2.0 | 60.0 |
58 | 2016-01-26 | 45.0 | 1.0 | -2.0 | 60.0 |
67 | 2016-01-26 | 67.0 | 1.0 | 3.0 | 65.0 |
69 | 2016-01-26 | 69.0 | 1.0 | 3.0 | 65.0 |
70 | 2016-01-26 | 69.0 | 1.0 | 3.0 | 65.0 |
79 | 2016-01-26 | 79.0 | 1.0 | 5.0 | 66.0 |
80 | 2016-01-26 | 80.0 | 1.0 | 5.0 | 66.0 |
81 | 2016-01-26 | 81.0 | 1.0 | 5.0 | 66.0 |
82 | 2016-01-26 | 81.0 | 1.0 | 5.0 | 66.0 |
91 | 2016-01-26 | 91.0 | 1.0 | 7.0 | 59.0 |
92 | 2016-01-26 | 92.0 | 1.0 | 7.0 | 59.0 |
93 | 2016-01-26 | 93.0 | 1.0 | 7.0 | 59.0 |
94 | 2016-01-26 | 93.0 | 1.0 | 7.0 | 59.0 |
104 | 2016-01-26 | 104.0 | 1.0 | 6.0 | 58.0 |
105 | 2016-01-26 | 105.0 | 1.0 | 6.0 | 58.0 |
106 | 2016-01-26 | 105.0 | 1.0 | 6.0 | 58.0 |
115 | 2016-01-26 | 115.0 | 2.0 | 5.0 | 65.0 |
116 | 2016-01-26 | 116.0 | 2.0 | 5.0 | 65.0 |
117 | 2016-01-26 | 117.0 | 2.0 | 4.0 | 65.0 |
118 | 2016-01-26 | 117.0 | 2.0 | 4.0 | 65.0 |
127 | 2016-01-26 | 127.0 | 9.0 | 4.0 | 89.0 |
128 | 2016-01-26 | 128.0 | 9.0 | 4.0 | 89.0 |
130 | 2016-01-26 | 128.0 | 9.0 | 4.0 | 89.0 |
139 | 2016-01-26 | 139.0 | 3.0 | 3.0 | 101.0 |
140 | 2016-01-26 | 140.0 | 3.0 | 4.0 | 101.0 |
141 | 2016-01-26 | 141.0 | 3.0 | 4.0 | 101.0 |
142 | 2016-01-26 | 141.0 | 3.0 | 4.0 | 101.0 |