添加新的数据集

This commit is contained in:
haotian 2025-04-07 16:44:50 +08:00
parent 941820113e
commit bbb7d7ff35
20 changed files with 287090 additions and 1 deletions

View File

@ -0,0 +1,100 @@
112
98
69
82
91
93
91
95
111
96
97
124
95
107
83
84
50
28
87
16
57
111
113
20
145
119
66
97
90
115
8
48
106
7
11
19
21
50
142
28
18
10
59
109
114
47
135
92
21
79
114
29
26
97
137
15
103
37
114
100
21
54
72
28
128
14
77
8
121
94
118
50
131
126
113
10
34
107
63
90
8
9
137
58
118
89
116
115
136
28
38
20
85
55
128
137
82
59
117
20

View File

@ -0,0 +1,259 @@
18
79
106
110
15
155
6
90
11
79
6
73
30
11
37
67
68
99
22
54
97
10
142
77
88
163
126
138
83
78
75
11
53
173
63
100
151
55
48
37
44
27
18
6
15
112
131
13
122
13
98
53
52
106
103
152
123
26
178
73
169
39
39
14
11
121
86
56
115
17
148
104
78
86
98
36
94
52
91
15
141
74
146
17
47
194
21
79
97
8
9
73
183
97
73
49
31
97
9
14
106
8
8
106
116
120
61
168
35
80
9
50
151
78
91
7
181
150
106
15
67
145
180
7
179
124
82
108
79
121
120
39
38
9
167
87
88
7
51
55
155
47
81
43
98
10
92
11
165
34
115
59
99
103
108
83
171
15
9
42
13
41
88
14
155
188
96
82
135
182
36
107
14
95
142
23
6
144
35
97
68
14
67
191
19
10
158
183
43
12
148
13
37
122
80
93
132
32
103
174
111
68
192
121
134
48
85
8
23
8
6
57
83
172
101
81
86
165
73
121
139
75
151
145
11
108
14
126
61
85
8
101
153
89
190
12
62
134
101
121
167
17
161
181
16
152
148
56
111
23
84
12
43
48
122
191
56
131
51

View File

@ -0,0 +1,100 @@
44
51
27
120
101
99
71
55
55
66
77
115
115
31
108
56
136
132
85
56
18
119
78
9
58
11
88
144
124
89
79
55
71
65
87
137
145
22
8
41
131
115
128
69
111
7
137
55
135
11
78
120
87
87
55
93
88
40
49
128
129
58
117
28
115
87
92
103
100
63
35
45
99
117
45
27
86
20
18
133
15
6
145
104
56
25
68
144
41
51
81
14
67
10
127
113
123
17
8
28

View File

@ -0,0 +1,248 @@
22
39
107
75
149
78
94
14
99
162
143
7
71
105
12
160
162
104
194
82
91
11
26
142
39
92
76
124
64
118
6
22
147
126
36
73
89
11
151
10
97
30
42
60
85
134
34
45
24
86
119
151
142
176
157
67
97
8
154
139
51
33
184
46
12
133
46
46
12
33
15
176
23
89
124
163
25
74
78
114
96
10
172
166
115
70
94
56
86
96
50
73
154
129
171
71
105
113
37
7
13
22
9
120
100
107
41
153
126
59
18
66
13
14
139
13
75
8
109
137
41
192
23
86
184
15
195
126
120
165
101
116
126
36
7
122
159
88
173
146
130
108
53
162
59
100
56
145
76
57
31
88
173
34
7
133
172
6
22
83
82
84
95
174
111
72
109
87
179
158
126
12
8
10
123
103
12
106
12
32
37
116
15
10
46
142
24
135
56
43
178
71
104
15
166
89
36
11
92
96
59
13
167
151
154
109
116
91
11
88
108
76
14
89
145
17
66
154
41
182
73
39
58
14
145
88
162
189
120
98
33
184
110
68
24
75
18
16
166
98
176
81
118
35
131
194
112
26

View File

@ -0,0 +1,45 @@
Data Set: FD001
Train trjectories: 100
Test trajectories: 100
Conditions: ONE (Sea Level)
Fault Modes: ONE (HPC Degradation)
Data Set: FD002
Train trjectories: 260
Test trajectories: 259
Conditions: SIX
Fault Modes: ONE (HPC Degradation)
Data Set: FD003
Train trjectories: 100
Test trajectories: 100
Conditions: ONE (Sea Level)
Fault Modes: TWO (HPC Degradation, Fan Degradation)
Data Set: FD004
Train trjectories: 248
Test trajectories: 249
Conditions: SIX
Fault Modes: TWO (HPC Degradation, Fan Degradation)
Experimental Scenario
Data sets consists of multiple multivariate time series. Each data set is further divided into training and test subsets. Each time series is from a different engine i.e., the data can be considered to be from a fleet of engines of the same type. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation is considered normal, i.e., it is not considered a fault condition. There are three operational settings that have a substantial effect on engine performance. These settings are also included in the data. The data is contaminated with sensor noise.
The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure. The objective of the competition is to predict the number of remaining operational cycles before failure in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate. Also provided a vector of true Remaining Useful Life (RUL) values for the test data.
The data are provided as a zip-compressed text file with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle, each column is a different variable. The columns correspond to:
1) unit number
2) time, in cycles
3) operational setting 1
4) operational setting 2
5) operational setting 3
6) sensor measurement 1
7) sensor measurement 2
...
26) sensor measurement 26
Reference: A. Saxena, K. Goebel, D. Simon, and N. Eklund, “Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation”, in the Proceedings of the Ist International Conference on Prognostics and Health Management (PHM08), Denver CO, Oct 2008.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,218 @@
18
79
106
110
15
155
6
90
11
79
6
73
30
11
37
67
68
99
22
54
97
10
142
77
88
163
126
138
83
78
75
11
53
173
63
100
151
55
48
37
44
27
18
6
15
112
131
13
122
13
98
53
52
106
103
152
123
26
178
73
169
39
39
14
11
121
86
56
115
17
148
104
78
86
98
36
94
52
91
15
141
74
146
17
47
194
21
79
97
8
9
73
183
97
73
49
31
97
9
14
106
8
8
106
116
120
61
168
35
80
9
50
151
78
91
7
181
150
106
15
67
145
180
7
179
124
82
108
79
121
120
39
38
9
167
87
88
7
51
55
155
47
81
43
98
10
92
11
165
34
115
59
99
103
108
83
171
15
9
42
13
41
88
14
155
188
96
82
135
182
36
107
14
95
142
23
6
144
35
97
68
14
67
191
19
10
158
183
43
12
148
13
37
122
80
93
132
32
103
174
111
68
192
121
134
48
85
8
23
8
6
57
83
172
101
81
86
165

View File

@ -0,0 +1,54 @@
import csv
'''
处理数据集:
计数每个产品编号第二列的最大值.
txt_path_true 中的值 + 最大值 - 第二列值 添加到对应编号每一行末尾.
去除txt_path中的第一列和第二列, 保存成csv文件
'''
txt_path = "train_FD001.txt"
csv_path = txt_path.split(".")[0] + ".csv"
txt_path_true = "RUL_FD001.txt"
# 定义分隔符(根据实际情况修改,例如:'\t' 代表制表符,',' 代表逗号)
delimiter = ' '
with open(txt_path, 'r', encoding='utf-8') as txt_file, open(csv_path, 'w', encoding='utf-8') as csv_file \
, open(txt_path_true, 'r', encoding='utf-8') as txt_file_true:
csv_writer = csv.writer(csv_file)
n = 0
true_value = list()
for line in txt_file_true:
true_value.append(int(line.strip().split(delimiter)[0]))
n += 1
max_number = [0] * n
print("---------------------------------------------------------------------------------------------")
for line in txt_file:
row = line.strip().split(delimiter)
max_number[int(row[0])-1] = max(max_number[int(row[0])-1], int(row[1]))
# 重置指针到文件开头, 不然接着读会直接是空
txt_file.seek(0)
for line in txt_file:
# 去除行尾换行符并按分隔符分割字段
row = line.strip().split(delimiter)
# print(row)
# 将分割后的数据写入CSV
csv_writer.writerow(row[2: ] + [true_value[int(row[0])-1] - int(row[1]) + max_number[int(row[0])-1]])
print(f"转换完成CSV文件已保存至{csv_path}")

100
doc/test.md Normal file
View File

@ -0,0 +1,100 @@
112
98
69
82
91
93
91
95
111
96
97
124
95
107
83
84
50
28
87
16
57
111
113
20
145
119
66
97
90
115
8
48
106
7
11
19
21
50
142
28
18
10
59
109
114
47
135
92
21
79
114
29
26
97
137
15
103
37
114
100
21
54
72
28
128
14
77
8
121
94
118
50
131
126
113
10
34
107
63
90
8
9
137
58
118
89
116
115
136
28
38
20
85
55
128
137
82
59
117
20

View File

@ -0,0 +1,71 @@
1.设备维护-故障预测与剩余寿命估计
(1)预测工业设备的故障时间或剩余使用寿命(RUL)
(2)LSTM,随机森林,生存分析模型
(3)NASA Turbofan Engine Degradation Simulation
①https://www.kaggle.com/datasets/behrad3d/nasa-cmaps
数据集解析:
数据集路径: dataset/dataset_raw/CMaps
训练集:
train_FD00x.txt
列解析:
1) unit number
2) time, in cycles
3) operational setting 1
4) operational setting 2
5) operational setting 3
6) sensor measurement 1
7) sensor measurement 2
...
26) sensor measurement 26
注:
第一列代表产品编号
第二列代表产品已运行圈数. 第一列相同,第二列不同,代表通一个产品的不同时间状态.
训练集真值:
RUL_FD00x.txt
注: 每行代表每个产品最后所剩余的时间.
可通过最后剩余时间+第二列值=训练集每一行所对应的真值.
测试集:
test_FD00x.txt
测试集真值:
暂无
注: 不同训练集文件对应不同的现场情况, 不要混用.
2.能源行业-风电功率预测
(1)基于天气与历史数据预测风力发电量
(2)XGBoost,LSTM,Prophet时间序列模型
(3)Wind Turbine Scada Dataset
3.化工行业-过程异常检测
(1)分类化工生产过程中的异常状态(泄露,温度失控)
(2)SVM,孤立森林,Autoencoder
(3)ennessee Eastman Process (TEP):模拟化工厂故障的多变量时序数据。
4.物流与供应链-需求预测
(1)预测未来产品需求量以优化库存
(2)ARIMA,LightGBM, Transformer
(3)Retail Sales Forecasting零售行业历史销售数据Kaggle
①https://www.kaggle.com/competitions/m5-forecasting-accuracy/data
5.电力系统-电力负载预测
(1)预测区域电网的短期或长期用电负荷。
(2)GRU,TCN
(3)ISO-NE Public Load Data美国新英格兰地区实时电力负荷数据。
①https://www.iso-ne.com/isoexpress/
6.石油与天然气-油井产量预测
(1)基于地质和开采数据预测油井产量
(2)随机森林,梯度提升树
(3)Volve Field Data挪威北海油田的钻井与生产数据需申请下载
①https://www.equinor.com/energy/volve-data-sharing
7.半导体制造-晶圆缺陷分类
(1)检测半导体晶圆制造中的缺陷模式
(2)图像分割,目标检测
(3)WM-811K Wafer Map包含8万+晶圆缺陷图像及类别标签。
8.交通运输-货运延迟预测
(1)预测货物运输是否延迟(二分类)
(2)逻辑回归,随机森铃
(3)Freight Transport Delay Data欧洲货运公司运输记录需预处理
9.钢铁行业-高炉气体预测
(1)预测高炉煤气管道的co/co2浓度
(2)多元回归,LSTM
(3)Blast Furnace Gas Dataset来自钢铁厂的传感器时序数据。
10.制造业-产品质量缺陷分类
(1)利用图像分类检测产品表面缺陷
(2)CNN,迁移学习
(3)NEU-DET包含6类钢材表面缺陷图像滚痕、裂纹等来自东北大学。

View File

@ -6,17 +6,22 @@
│ ├── __init__.py
│ ├── data_api.py # 数据处理相关接口
│ ├── model_api.py # 模型相关接口
| ├── optimize_api.py # 模型优化相关接口
│ └── system_api.py # 系统监控相关接口
├── function/ # 功能实现层
│ ├── data_manager.py # 数据处理类
│ ├── model_manager.py # 模型管理类
│ ├── system_monitor.py # 系统监控类
| ├── optimize_manager.py # 模型优化管理类
│ └── utils/ # 工具函数
├── config/ # 配置文件
│ └── config.yaml # 系统配置
├── dataset/ # 数据集
│ ├── dataset_raw/ # 原始数据
│ └── dataset_processed/ # 处理后数据
├── optimize/ # 模型优化方法yaml文件
├── optimize_models/ # 模型优化方法yaml文件
|
├── .log/ # 日志文件
├── doc/ # 文档
└── main.py # 主程序入口
@ -39,5 +44,7 @@
## 3.启动命令
- 启动MLFlow
- mlflow server --host 10.0.0.202 --port 5000
- mlflow server --host 127.0.0.1 --port 5000
- 启动程序
- python main.py