欧美一级a免费放视频,欧美一级a免费放视频_丰满年轻岳欲乱中文字幕电影_欧美成人性一区二区三区_av不卡网站,99久久精品产品给合免费视频,色综合黑人无码另类字幕,特级免费黄片,看黃色录像片,色色资源站无码AV网址,暖暖 免费 日本 在线播放,欧美com

合肥生活安徽新聞合肥交通合肥房產(chǎn)生活服務(wù)合肥教育合肥招聘合肥旅游文化藝術(shù)合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務(wù)合肥法律

代寫DAT 560M,、代做R編程語言

時(shí)間:2023-12-09  來源:合肥網(wǎng)hfw.cc  作者:hfw.cc 我要糾錯(cuò)



DAT 560M – Big Data and Cloud Computing 2023 – Homework #4
- 1 -
DAT 560M: Big Data and Cloud Computing
Fall 2023, Mini B
Homework #4
INSTRUCTIONS
1. This is an individual assignment. You may not discuss your approach to solving these
questions with anyone, other than the instructor or TA.
2. Please include only your Student ID on the submission.
3. The only allowed material is:
a. Class notes
b. Content posted on Canvas
c. Utilize ONLY the codes we practice. Anything beyond will not get any point!
4. You are not permitted to use other online resources.
5. The physical submission is due by the next lab.
6. There will be TA office hours. See the schedule on Canvas.
ASSIGNMENT
In this assignment, we are going to practice Spark on a file named loans.csv and the file is located
in our database. In case you don’t have the file, you can get it from the dataset folder on the server.
http://server-ip/dataset/loans.csv
This dataset has information about loans distributed to poor and financially excluded people
around the world by a company called Kiva. There are a few number of columns in the dataset
and we would like to do an analysis on that by pyspark. Please answer each question and provide
a screenshot.
Part ** Initialize Spark (5 pts)
** Start the PySpark engine and load the file. This homework is a little bit complex and its
better that we assign more resources. Then, when assigning your engine, you can assign
all available CPU cores on your machine to the Spark to perform faster. To do that, just
simply put local[*] instead of local (look at the following screenshot). If it crashes or
doesn’t work properly, you are more than welcome to go back to the normal initialization
process. (2 pts)
DAT 560M – Big Data and Cloud Computing 2023 – Homework #4
- 2 -
2- Get to know the dataset and do a preliminary examination (for example type of columns,
summary, …) (2 pts)
3- Here, we have two identifier for the country of the loan receiver, country, and
country_code and so one is enough. Then please drop country_code. (1 pts)
Part 2- Data analysis (50 pts)
4- Find the three most loan awarded sector when the loan amount is larger than 1000. (5 pts)
5- For the top sector you found in Q4, list 6 most used activities. (5 pts)
6- Find the number of given loans per year. For that, use the year from posted_time. You
may add a new column called “year”. (5 pts)
7- Using SQL syntax, list the number of loans per sector in decreasing order where the
countries are the 3 top countries in terms of the number of received loans. (10 pts)
8- Find the top 20 countries in terms of the total loan amount they have received where the
use of the loan include the word “stock”. You may use SQL. (5 pts)
9- Do a wordcount on the “use” column. For that, consider all lower case. If you can, it’s
great to remove stopwords and then do the wordcount. It’s OK if you don’t know how to
do so. (10 pts)
10- Group the loans into 5 categories. If the loan amount is equal or larger than 50000, call it
“super large”. If less but larger or equal to 10000, call it “large”. If less but larger or
equal to 5000, call it “medium”. If less but larger or equal to 1000, call it “small”. If less,
call it “tiny”. Then, find the number of given loans to each category per gender. For
gender, only consider “male” or “female”. (10 pts)
Part 3- Feature engineering (10 pts)
1** Let’s find how many people are involved in each loan application. To find it out, look at
gender column. You can see sometimes it is one value, and sometimes more than one.
Count it for each loan and add it to the dataframe. (10 pts)
DAT 560M – Big Data and Cloud Computing 2023 – Homework #4
- 3 -
Part 4- Machine learning (35 pts)
12- Now let’s focus only on Retail, Agriculture, and Food sectors the remove the rest of the
rows (10 pts).
13- We like to predict the loan_amount based on sector, country, term_in_months, year, and
the new attribute you added in Q11 and drop the rest of the columns. (5 pts)
14- Prepare your data to do a prediction task. We are interested in predicting the loan amount
based on the rest of the features. (10 pts)
15- Perform a regression task for and find the Mean Squared Error and R-square of the model
(80% training, 20% testing) (10 pts). 
請加QQ:99515681 或郵箱:[email protected]   WX:codehelp

掃一掃在手機(jī)打開當(dāng)前頁
  • 上一篇:CSCI 2122代寫,、代做C++設(shè)計(jì)程序
  • 下一篇:代寫ISOM 2007、代做 Python 程序設(shè)計(jì)
  • 無相關(guān)信息
    合肥生活資訊

    合肥圖文信息
    出評 開團(tuán)工具
    出評 開團(tuán)工具
    挖掘機(jī)濾芯提升發(fā)動(dòng)機(jī)性能
    挖掘機(jī)濾芯提升發(fā)動(dòng)機(jī)性能
    戴納斯帝壁掛爐全國售后服務(wù)電話24小時(shí)官網(wǎng)400(全國服務(wù)熱線)
    戴納斯帝壁掛爐全國售后服務(wù)電話24小時(shí)官網(wǎng)
    菲斯曼壁掛爐全國統(tǒng)一400售后維修服務(wù)電話24小時(shí)服務(wù)熱線
    菲斯曼壁掛爐全國統(tǒng)一400售后維修服務(wù)電話2
    美的熱水器售后服務(wù)技術(shù)咨詢電話全國24小時(shí)客服熱線
    美的熱水器售后服務(wù)技術(shù)咨詢電話全國24小時(shí)
    海信羅馬假日洗衣機(jī)亮相AWE  復(fù)古美學(xué)與現(xiàn)代科技完美結(jié)合
    海信羅馬假日洗衣機(jī)亮相AWE 復(fù)古美學(xué)與現(xiàn)代
    合肥機(jī)場巴士4號線
    合肥機(jī)場巴士4號線
    合肥機(jī)場巴士3號線
    合肥機(jī)場巴士3號線
  • 短信驗(yàn)證碼 酒店vi設(shè)計(jì) 投資移民

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網(wǎng) 版權(quán)所有
    ICP備06013414號-3 公安備 42010502001045