暮糜彤摩法联群咕吠碌渺沮惨
Week 1. Introduction to data mining Test 1
1、 Which one is not the description of Data mining?
答案: Appropriate statistical analysis methods to analyze the data collected
2、 Which one describes the right process of knowledge discovery?
答案: Selection-Preprocessing-Transformation-Data mining-Interpretation/Evaluation
3、 Which one is not belong to the process of KDD?
答案: Data description
4、 Which one is not the right alternative name of data mining??
答案: Data harvesting
5、 Which one is not the nominal variables?
答案: Age
6、 Which one is wrong about classification and regression??
答案: We can construct classification models (functions) without some training examples.
7、 Which one is wrong about clustering and outliers?
答案: Clustering belongs to supervised learning.
8、 About data process, which one is wrong?
答案: When making data classification, we predict categorical labels excluding unordered one.
9、 Outlier mining?such as density based method belongs to supervised learning.
答案: 错误
10、 Support vector machines can be used for classification and regression.
答案: 正确
Week 2. Data pre-processing Test 2
1、 Which is not the reason we need to preprocess the data?
答案: to make result meet our hypothesis
2、 Which is not the major tasks in data preprocessing?
答案: Transition
3、 How to construct new feature space by PCA?
答案: New feature space by PCA is constructed by eliminating the weak components to reduce the size of the data.
4、 Which one is wrong about methods for discretization?
答案: Clustering analysis only belongs to top-down split.
5、 Which one is wrong about Equal-width (distance) partitioning and Equal-depth (frequency) partitioning?
答案: The interval of the former one is not equal.
6、 Which one is wrong way to normalize data?
答案: Simple scaling
7、 Which are the right way to fill in missing values?
答案: Smart mean;
Probable value;
Ignore
8、 Which are the right way to handle noise data?
答案: Regression;
Cluster;
WT;
Manual
9、 Which one is right about wavelet transforms?
答案: The DWT decomposes each segment of time series via the successive use of low-pass and high-pass filtering at appropriate levels.;
Wavelet transforms can be used for reducing data and smoothing data.
10、 Which are the common used ways to sampling?
答案: Simple random sample without replacement;
Simple random sample with replacement;
Stratified sample;
Cluster sample
11、 Discretization means dividing the range of a continuous attribute into intervals.
答案: 正确
Week 3. Instance based learning Test 3
1、 What’s the difference between eager learner and lazy learner?
答案: Eager learners would generate a model for classification while lazy learner would not.
2、 How to choose the optimal value for K?
答案: Cross-validation can be used to determine a good value by using an independent dataset to validate the K values.;
Low values for K (like k=1 or k=2) can be noisy and subject to the effect of outliers.;
Historically, the optimal K for most datasets has been between 3-10.
3、 What’s the major components in KNN?
答案: How to measure similarity?;
How to choose “k”?;
How are class labels assigned?
4、 Which one of the following ways can be used to obtain attribute weight for Attribute-Weighted KNN?
答案: Prior knowledge / experience.;
PCA, FA (Factor analysis method).;
Information gain.;
Gradient descent, simplex methods and genetic algorithm.
5、 At learning stage KNN would find the K closest neighbors and then decide classify K identified nearest label.
答案: 错误
上方为免费预览版答案,如需购买完整答案,请点击下方红字
点关注,不迷路,微信扫一扫下方二维码
关注我们的公众号:阿布查查 随时查看答案,网课轻松过
为了方便下次阅读,建议在浏览器添加书签收藏本网页
电脑浏览器添加/查看书签方法
1.按键盘的ctrl键+D键,收藏本页面
2.下次如何查看收藏的网页?
点击浏览器右上角-【工具】或者【收藏夹】查看收藏的网页
手机浏览器添加/查看书签方法
一、百度APP添加/查看书签方法
1.点击底部五角星收藏本网页
2.下次如何查看收藏的网页?
点击右上角【┇】-再点击【收藏中心】查看
二、其他手机浏览器添加/查看书签方法
1.点击【设置】-【添加书签】收藏本网页
2.下次如何查看收藏的网页?
点击【设置】-【书签/历史】查看收藏的网页
闹胳溪屋辨垒背常颊钒疚须舰