添加新的数据集

2025-04-07 16:44:50 +08:00 · 2025-04-07 16:44:50 +08:00 · bbb7d7ff35
commit bbb7d7ff35
parent 941820113e
20 changed files with 287090 additions and 1 deletions
--- a/dataset/dataset_raw/CMaps/Damage
+++ b/dataset/dataset_raw/CMaps/Damage
--- a/dataset/dataset_raw/CMaps/RUL_FD001.txt
+++ b/dataset/dataset_raw/CMaps/RUL_FD001.txt
@ -0,0 +1,100 @@
+112 
+98 
+69 
+82 
+91 
+93 
+91 
+95 
+111 
+96 
+97 
+124 
+95 
+107 
+83 
+84 
+50 
+28 
+87 
+16 
+57 
+111 
+113 
+20 
+145 
+119 
+66 
+97 
+90 
+115 
+8 
+48 
+106 
+7 
+11 
+19 
+21 
+50 
+142 
+28 
+18 
+10 
+59 
+109 
+114 
+47 
+135 
+92 
+21 
+79 
+114 
+29 
+26 
+97 
+137 
+15 
+103 
+37 
+114 
+100 
+21 
+54 
+72 
+28 
+128 
+14 
+77 
+8 
+121 
+94 
+118 
+50 
+131 
+126 
+113 
+10 
+34 
+107 
+63 
+90 
+8 
+9 
+137 
+58 
+118 
+89 
+116 
+115 
+136 
+28 
+38 
+20 
+85 
+55 
+128 
+137 
+82 
+59 
+117 
+20 
--- a/dataset/dataset_raw/CMaps/RUL_FD002.txt
+++ b/dataset/dataset_raw/CMaps/RUL_FD002.txt
@ -0,0 +1,259 @@
+18 
+79 
+106 
+110 
+15 
+155 
+6 
+90 
+11 
+79 
+6 
+73 
+30 
+11 
+37 
+67 
+68 
+99 
+22 
+54 
+97 
+10 
+142 
+77 
+88 
+163 
+126 
+138 
+83 
+78 
+75 
+11 
+53 
+173 
+63 
+100 
+151 
+55 
+48 
+37 
+44 
+27 
+18 
+6 
+15 
+112 
+131 
+13 
+122 
+13 
+98 
+53 
+52 
+106 
+103 
+152 
+123 
+26 
+178 
+73 
+169 
+39 
+39 
+14 
+11 
+121 
+86 
+56 
+115 
+17 
+148 
+104 
+78 
+86 
+98 
+36 
+94 
+52 
+91 
+15 
+141 
+74 
+146 
+17 
+47 
+194 
+21 
+79 
+97 
+8 
+9 
+73 
+183 
+97 
+73 
+49 
+31 
+97 
+9 
+14 
+106 
+8 
+8 
+106 
+116 
+120 
+61 
+168 
+35 
+80 
+9 
+50 
+151 
+78 
+91 
+7 
+181 
+150 
+106 
+15 
+67 
+145 
+180 
+7 
+179 
+124 
+82 
+108 
+79 
+121 
+120 
+39 
+38 
+9 
+167 
+87 
+88 
+7 
+51 
+55 
+155 
+47 
+81 
+43 
+98 
+10 
+92 
+11 
+165 
+34 
+115 
+59 
+99 
+103 
+108 
+83 
+171 
+15 
+9 
+42 
+13 
+41 
+88 
+14 
+155 
+188 
+96 
+82 
+135 
+182 
+36 
+107 
+14 
+95 
+142 
+23 
+6 
+144 
+35 
+97 
+68 
+14 
+67 
+191 
+19 
+10 
+158 
+183 
+43 
+12 
+148 
+13 
+37 
+122 
+80 
+93 
+132 
+32 
+103 
+174 
+111 
+68 
+192 
+121 
+134 
+48 
+85 
+8 
+23 
+8 
+6 
+57 
+83 
+172 
+101 
+81 
+86 
+165 
+73 
+121 
+139 
+75 
+151 
+145 
+11 
+108 
+14 
+126 
+61 
+85 
+8 
+101 
+153 
+89 
+190 
+12 
+62 
+134 
+101 
+121 
+167 
+17 
+161 
+181 
+16 
+152 
+148 
+56 
+111 
+23 
+84 
+12 
+43 
+48 
+122 
+191 
+56 
+131 
+51 
--- a/dataset/dataset_raw/CMaps/RUL_FD003.txt
+++ b/dataset/dataset_raw/CMaps/RUL_FD003.txt
@ -0,0 +1,100 @@
+44 
+51 
+27 
+120 
+101 
+99 
+71 
+55 
+55 
+66 
+77 
+115 
+115 
+31 
+108 
+56 
+136 
+132 
+85 
+56 
+18 
+119 
+78 
+9 
+58 
+11 
+88 
+144 
+124 
+89 
+79 
+55 
+71 
+65 
+87 
+137 
+145 
+22 
+8 
+41 
+131 
+115 
+128 
+69 
+111 
+7 
+137 
+55 
+135 
+11 
+78 
+120 
+87 
+87 
+55 
+93 
+88 
+40 
+49 
+128 
+129 
+58 
+117 
+28 
+115 
+87 
+92 
+103 
+100 
+63 
+35 
+45 
+99 
+117 
+45 
+27 
+86 
+20 
+18 
+133 
+15 
+6 
+145 
+104 
+56 
+25 
+68 
+144 
+41 
+51 
+81 
+14 
+67 
+10 
+127 
+113 
+123 
+17 
+8 
+28 
--- a/dataset/dataset_raw/CMaps/RUL_FD004.txt
+++ b/dataset/dataset_raw/CMaps/RUL_FD004.txt
@ -0,0 +1,248 @@
+22 
+39 
+107 
+75 
+149 
+78 
+94 
+14 
+99 
+162 
+143 
+7 
+71 
+105 
+12 
+160 
+162 
+104 
+194 
+82 
+91 
+11 
+26 
+142 
+39 
+92 
+76 
+124 
+64 
+118 
+6 
+22 
+147 
+126 
+36 
+73 
+89 
+11 
+151 
+10 
+97 
+30 
+42 
+60 
+85 
+134 
+34 
+45 
+24 
+86 
+119 
+151 
+142 
+176 
+157 
+67 
+97 
+8 
+154 
+139 
+51 
+33 
+184 
+46 
+12 
+133 
+46 
+46 
+12 
+33 
+15 
+176 
+23 
+89 
+124 
+163 
+25 
+74 
+78 
+114 
+96 
+10 
+172 
+166 
+115 
+70 
+94 
+56 
+86 
+96 
+50 
+73 
+154 
+129 
+171 
+71 
+105 
+113 
+37 
+7 
+13 
+22 
+9 
+120 
+100 
+107 
+41 
+153 
+126 
+59 
+18 
+66 
+13 
+14 
+139 
+13 
+75 
+8 
+109 
+137 
+41 
+192 
+23 
+86 
+184 
+15 
+195 
+126 
+120 
+165 
+101 
+116 
+126 
+36 
+7 
+122 
+159 
+88 
+173 
+146 
+130 
+108 
+53 
+162 
+59 
+100 
+56 
+145 
+76 
+57 
+31 
+88 
+173 
+34 
+7 
+133 
+172 
+6 
+22 
+83 
+82 
+84 
+95 
+174 
+111 
+72 
+109 
+87 
+179 
+158 
+126 
+12 
+8 
+10 
+123 
+103 
+12 
+106 
+12 
+32 
+37 
+116 
+15 
+10 
+46 
+142 
+24 
+135 
+56 
+43 
+178 
+71 
+104 
+15 
+166 
+89 
+36 
+11 
+92 
+96 
+59 
+13 
+167 
+151 
+154 
+109 
+116 
+91 
+11 
+88 
+108 
+76 
+14 
+89 
+145 
+17 
+66 
+154 
+41 
+182 
+73 
+39 
+58 
+14 
+145 
+88 
+162 
+189 
+120 
+98 
+33 
+184 
+110 
+68 
+24 
+75 
+18 
+16 
+166 
+98 
+176 
+81 
+118 
+35 
+131 
+194 
+112 
+26 
--- a/dataset/dataset_raw/CMaps/readme.txt
+++ b/dataset/dataset_raw/CMaps/readme.txt
@ -0,0 +1,45 @@
+Data Set: FD001
+Train trjectories: 100
+Test trajectories: 100
+Conditions: ONE (Sea Level)
+Fault Modes: ONE (HPC Degradation)
+
+Data Set: FD002
+Train trjectories: 260
+Test trajectories: 259
+Conditions: SIX 
+Fault Modes: ONE (HPC Degradation)
+
+Data Set: FD003
+Train trjectories: 100
+Test trajectories: 100
+Conditions: ONE (Sea Level)
+Fault Modes: TWO (HPC Degradation, Fan Degradation)
+
+Data Set: FD004
+Train trjectories: 248
+Test trajectories: 249
+Conditions: SIX 
+Fault Modes: TWO (HPC Degradation, Fan Degradation)
+
+
+
+Experimental Scenario
+
+Data sets consists of multiple multivariate time series. Each data set is further divided into training and test subsets. Each time series is from a different engine – i.e., the data can be considered to be from a fleet of engines of the same type. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation is considered normal, i.e., it is not considered a fault condition. There are three operational settings that have a substantial effect on engine performance. These settings are also included in the data. The data is contaminated with sensor noise.
+
+The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure. The objective of the competition is to predict the number of remaining operational cycles before failure in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate. Also provided a vector of true Remaining Useful Life (RUL) values for the test data.
+
+The data are provided as a zip-compressed text file with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle, each column is a different variable. The columns correspond to:
+1)	unit number
+2)	time, in cycles
+3)	operational setting 1
+4)	operational setting 2
+5)	operational setting 3
+6)	sensor measurement  1
+7)	sensor measurement  2
+...
+26)	sensor measurement  26
+
+
+Reference: A. Saxena, K. Goebel, D. Simon, and N. Eklund, “Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation”, in the Proceedings of the Ist International Conference on Prognostics and Health Management (PHM08), Denver CO, Oct 2008.
--- a/dataset/dataset_raw/CMaps/test_FD001.txt
+++ b/dataset/dataset_raw/CMaps/test_FD001.txt
--- a/dataset/dataset_raw/CMaps/test_FD002.txt
+++ b/dataset/dataset_raw/CMaps/test_FD002.txt
--- a/dataset/dataset_raw/CMaps/test_FD003.txt
+++ b/dataset/dataset_raw/CMaps/test_FD003.txt
--- a/dataset/dataset_raw/CMaps/test_FD004.txt
+++ b/dataset/dataset_raw/CMaps/test_FD004.txt
--- a/dataset/dataset_raw/CMaps/train_FD001.csv
+++ b/dataset/dataset_raw/CMaps/train_FD001.csv
--- a/dataset/dataset_raw/CMaps/train_FD001.txt
+++ b/dataset/dataset_raw/CMaps/train_FD001.txt
--- a/dataset/dataset_raw/CMaps/train_FD002.txt
+++ b/dataset/dataset_raw/CMaps/train_FD002.txt
--- a/dataset/dataset_raw/CMaps/train_FD003.txt
+++ b/dataset/dataset_raw/CMaps/train_FD003.txt
--- a/dataset/dataset_raw/CMaps/train_FD004.txt
+++ b/dataset/dataset_raw/CMaps/train_FD004.txt
--- a/dataset/dataset_raw/CMaps/x.txt
+++ b/dataset/dataset_raw/CMaps/x.txt
@ -0,0 +1,218 @@
+18 
+79 
+106 
+110 
+15 
+155 
+6 
+90 
+11 
+79 
+6 
+73 
+30 
+11 
+37 
+67 
+68 
+99 
+22 
+54 
+97 
+10 
+142 
+77 
+88 
+163 
+126 
+138 
+83 
+78 
+75 
+11 
+53 
+173 
+63 
+100 
+151 
+55 
+48 
+37 
+44 
+27 
+18 
+6 
+15 
+112 
+131 
+13 
+122 
+13 
+98 
+53 
+52 
+106 
+103 
+152 
+123 
+26 
+178 
+73 
+169 
+39 
+39 
+14 
+11 
+121 
+86 
+56 
+115 
+17 
+148 
+104 
+78 
+86 
+98 
+36 
+94 
+52 
+91 
+15 
+141 
+74 
+146 
+17 
+47 
+194 
+21 
+79 
+97 
+8 
+9 
+73 
+183 
+97 
+73 
+49 
+31 
+97 
+9 
+14 
+106 
+8 
+8 
+106 
+116 
+120 
+61 
+168 
+35 
+80 
+9 
+50 
+151 
+78 
+91 
+7 
+181 
+150 
+106 
+15 
+67 
+145 
+180 
+7 
+179 
+124 
+82 
+108 
+79 
+121 
+120 
+39 
+38 
+9 
+167 
+87 
+88 
+7 
+51 
+55 
+155 
+47 
+81 
+43 
+98 
+10 
+92 
+11 
+165 
+34 
+115 
+59 
+99 
+103 
+108 
+83 
+171 
+15 
+9 
+42 
+13 
+41 
+88 
+14 
+155 
+188 
+96 
+82 
+135 
+182 
+36 
+107 
+14 
+95 
+142 
+23 
+6 
+144 
+35 
+97 
+68 
+14 
+67 
+191 
+19 
+10 
+158 
+183 
+43 
+12 
+148 
+13 
+37 
+122 
+80 
+93 
+132 
+32 
+103 
+174 
+111 
+68 
+192 
+121 
+134 
+48 
+85 
+8 
+23 
+8 
+6 
+57 
+83 
+172 
+101 
+81 
+86 
+165
--- a/dataset/dataset_raw/CMaps/将txt文件转换成csv文件.py
+++ b/dataset/dataset_raw/CMaps/将txt文件转换成csv文件.py
@ -0,0 +1,54 @@
+
+import csv
+
+'''
+    处理数据集:
+        计数每个产品编号第二列的最大值.
+        将 txt_path_true 中的值 + 最大值 - 第二列值  添加到对应编号每一行末尾.
+        去除txt_path中的第一列和第二列, 保存成csv文件
+
+'''
+
+
+txt_path = "train_FD001.txt"
+csv_path = txt_path.split(".")[0] + ".csv"
+
+txt_path_true = "RUL_FD001.txt"
+
+
+
+# 定义分隔符（根据实际情况修改，例如：'\t' 代表制表符，',' 代表逗号）
+delimiter = ' '
+with open(txt_path, 'r', encoding='utf-8') as txt_file, open(csv_path, 'w', encoding='utf-8') as csv_file \
+    , open(txt_path_true, 'r', encoding='utf-8') as txt_file_true:
+    csv_writer = csv.writer(csv_file)
+    
+    n = 0
+    
+    true_value = list()
+    for line in txt_file_true:
+        true_value.append(int(line.strip().split(delimiter)[0]))
+        n += 1
+    
+    max_number = [0] * n 
+    
+    print("---------------------------------------------------------------------------------------------")
+    for line in txt_file:
+        row = line.strip().split(delimiter)
+        max_number[int(row[0])-1] = max(max_number[int(row[0])-1], int(row[1]))
+        
+        
+    # 重置指针到文件开头, 不然接着读会直接是空
+    txt_file.seek(0)
+        
+
+    for line in txt_file:
+        # 去除行尾换行符并按分隔符分割字段
+        row = line.strip().split(delimiter)
+        # print(row)
+        # 将分割后的数据写入CSV
+        csv_writer.writerow(row[2: ] + [true_value[int(row[0])-1] - int(row[1]) + max_number[int(row[0])-1]])
+        
+        
+
+print(f"转换完成！CSV文件已保存至：{csv_path}")
--- a/doc/test.md
+++ b/doc/test.md
@ -0,0 +1,100 @@
+112 
+98 
+69 
+82 
+91 
+93 
+91 
+95 
+111 
+96 
+97 
+124 
+95 
+107 
+83 
+84 
+50 
+28 
+87 
+16 
+57 
+111 
+113 
+20 
+145 
+119 
+66 
+97 
+90 
+115 
+8 
+48 
+106 
+7 
+11 
+19 
+21 
+50 
+142 
+28 
+18 
+10 
+59 
+109 
+114 
+47 
+135 
+92 
+21 
+79 
+114 
+29 
+26 
+97 
+137 
+15 
+103 
+37 
+114 
+100 
+21 
+54 
+72 
+28 
+128 
+14 
+77 
+8 
+121 
+94 
+118 
+50 
+131 
+126 
+113 
+10 
+34 
+107 
+63 
+90 
+8 
+9 
+137 
+58 
+118 
+89 
+116 
+115 
+136 
+28 
+38 
+20 
+85 
+55 
+128 
+137 
+82 
+59 
+117 
+20 
--- a/doc/各领域应用(附带数据集).txt
+++ b/doc/各领域应用(附带数据集).txt
@ -0,0 +1,71 @@
+1.设备维护-故障预测与剩余寿命估计
+    (1)预测工业设备的故障时间或剩余使用寿命(RUL)
+    (2)LSTM,随机森林,生存分析模型
+    (3)NASA Turbofan Engine Degradation Simulation
+        ①https://www.kaggle.com/datasets/behrad3d/nasa-cmaps
+    数据集解析:
+        数据集路径: dataset/dataset_raw/CMaps
+            训练集:
+                train_FD00x.txt
+                列解析:
+                    1)	unit number
+                    2)	time, in cycles
+                    3)	operational setting 1
+                    4)	operational setting 2
+                    5)	operational setting 3
+                    6)	sensor measurement  1
+                    7)	sensor measurement  2
+                    ...
+                    26)	sensor measurement  26
+
+                    注: 
+                        第一列代表产品编号
+                        第二列代表产品已运行圈数. 第一列相同,第二列不同,代表通一个产品的不同时间状态.
+            训练集真值:
+                RUL_FD00x.txt
+                    注: 每行代表每个产品最后所剩余的时间.
+                        可通过最后剩余时间+第二列值=训练集每一行所对应的真值.
+            测试集:
+                test_FD00x.txt
+            测试集真值:
+                暂无
+            注: 不同训练集文件对应不同的现场情况, 不要混用.
+2.能源行业-风电功率预测
+    (1)基于天气与历史数据预测风力发电量
+    (2)XGBoost,LSTM,Prophet时间序列模型
+    (3)Wind Turbine Scada Dataset
+3.化工行业-过程异常检测
+    (1)分类化工生产过程中的异常状态(泄露,温度失控)
+    (2)SVM,孤立森林,Autoencoder
+    (3)ennessee Eastman Process (TEP)：模拟化工厂故障的多变量时序数据。
+4.物流与供应链-需求预测
+    (1)预测未来产品需求量以优化库存
+    (2)ARIMA,LightGBM, Transformer
+    (3)Retail Sales Forecasting：零售行业历史销售数据（Kaggle）。
+        ①https://www.kaggle.com/competitions/m5-forecasting-accuracy/data
+5.电力系统-电力负载预测
+    (1)预测区域电网的短期或长期用电负荷。
+    (2)GRU,TCN
+    (3)ISO-NE Public Load Data：美国新英格兰地区实时电力负荷数据。
+        ①https://www.iso-ne.com/isoexpress/
+6.石油与天然气-油井产量预测
+    (1)基于地质和开采数据预测油井产量
+    (2)随机森林,梯度提升树
+    (3)Volve Field Data：挪威北海油田的钻井与生产数据（需申请下载）。
+        ①https://www.equinor.com/energy/volve-data-sharing
+7.半导体制造-晶圆缺陷分类
+    (1)检测半导体晶圆制造中的缺陷模式
+    (2)图像分割,目标检测
+    (3)WM-811K Wafer Map：包含8万+晶圆缺陷图像及类别标签。
+8.交通运输-货运延迟预测
+    (1)预测货物运输是否延迟(二分类)
+    (2)逻辑回归,随机森铃
+    (3)Freight Transport Delay Data：欧洲货运公司运输记录（需预处理）。
+9.钢铁行业-高炉气体预测
+    (1)预测高炉煤气管道的co/co2浓度
+    (2)多元回归,LSTM
+    (3)Blast Furnace Gas Dataset：来自钢铁厂的传感器时序数据。
+10.制造业-产品质量缺陷分类
+    (1)利用图像分类检测产品表面缺陷
+    (2)CNN,迁移学习
+    (3)NEU-DET：包含6类钢材表面缺陷图像（滚痕、裂纹等），来自东北大学。
--- a/readme.md
+++ b/readme.md
@ -6,17 +6,22 @@
    │   ├── __init__.py
    │   ├── data_api.py        # 数据处理相关接口
    │   ├── model_api.py       # 模型相关接口
+    |   ├── optimize_api.py    # 模型优化相关接口
    │   └── system_api.py      # 系统监控相关接口
    ├── function/              # 功能实现层
    │   ├── data_manager.py  # 数据处理类
    │   ├── model_manager.py   # 模型管理类
    │   ├── system_monitor.py  # 系统监控类
+    |   ├── optimize_manager.py  # 模型优化管理类
    │   └── utils/            # 工具函数
    ├── config/               # 配置文件
    │   └── config.yaml      # 系统配置
    ├── dataset/             # 数据集
    │   ├── dataset_raw/     # 原始数据
    │   └── dataset_processed/ # 处理后数据
+    ├── optimize/ # 模型优化方法yaml文件
+    ├── optimize_models/ # 模型优化方法yaml文件
+    |    
    ├── .log/                # 日志文件
    ├── doc/                 # 文档
    └── main.py             # 主程序入口
@ -39,5 +44,7 @@

 ## 3.启动命令
    - 启动MLFlow
-        - mlflow server --host 10.0.0.202 --port 5000
+        - mlflow server --host 127.0.0.1 --port 5000
+    - 启动程序
+        - python main.py