Presentation is loading. Please wait.

Presentation is loading. Please wait.

INTRODUCTIONDP SUMMARIESQUERIES International Student College Experience Enhancement Program Team Members Alice Zhang Florence Liao Huan Guo Jake Magner.

Similar presentations


Presentation on theme: "INTRODUCTIONDP SUMMARIESQUERIES International Student College Experience Enhancement Program Team Members Alice Zhang Florence Liao Huan Guo Jake Magner."— Presentation transcript:

1 INTRODUCTIONDP SUMMARIESQUERIES International Student College Experience Enhancement Program Team Members Alice Zhang Florence Liao Huan Guo Jake Magner Li Shubin Viraj Mohan Zahin Ali NORMALIZATIONFORMS

2 Project Background To design a database for a website that helps international students with various aspects of “settling in”, by providing a platform for interaction between students, local communities, cultural organizations and employers Project Objective XiYiRen, a start up social utility website will be using a small part of our expansive project, focusing on Chinese students. Client INTRODUCTIONDP SUMMARIESQUERIESNORMALIZATIONFORMS

3  Project Background: Objective and Client description  Summary of entities involved  Database capabilities  Simplified EER diagram with 10 entities, 3 Weak entities/relationships, and superclass/subclass division  Project Background: Objective and Client description  Summary of entities involved  Database capabilities  Simplified EER diagram with 10 entities, 3 Weak entities/relationships, and superclass/subclass division DP I Summary Progress INTRODUCTIONDP SUMMARIESQUERIESNORMALIZATIONFORMS

4 DP II Summary  Revised simplified EER diagram Including more entities and 30 relationships  Implementation of queries in relational algebra  Realized need for more complex queries utilizing IEOR methods: forecasting, optimal event locating, etc.  Revised simplified EER diagram Including more entities and 30 relationships  Implementation of queries in relational algebra  Realized need for more complex queries utilizing IEOR methods: forecasting, optimal event locating, etc. Progress INTRODUCTIONQUERIESNORMALIZATIONFORMSDP SUMMARIES

5 DP III Summary  Revised simplified EER diagram  Relational schema  Five queries implemented in SQL and Access  Focused on client-centric queries  Revised simplified EER diagram  Relational schema  Five queries implemented in SQL and Access  Focused on client-centric queries Progress INTRODUCTIONQUERIESNORMALIZATIONFORMSDP SUMMARIES

6 EER INTRODUCTIONQUERIESNORMALIZATIONFORMSDP SUMMARIES

7 Relational Schema 1.Person(Pid, Fname, Lname, MI, Birth_date, Profile 5 ) 2.Student(Pid 1, Housing 7, University 14, Pickup_Person 3, Flight, Country 11, price_preference, year, sleep, wakeup, study, friends, outgoing) 3.Community_Member(Pid 1, occupation) 4.Alumni(Pid 1, Class, Occupation, Donation_Amount) 5.Profile(Profile_id, Pic, Email, Phone) 6.Location(Street, City, State, Apt_Suite, Zip, x, y) 7.Housing(Hid, offered_by_person 1, Street 6, Apt_Suite 6, Zip 6, offered_by_org 8, org_profile 5, price, availability_date, furnished, number_rooms, number_bathrooms, water, electice, garbage, gas, internet, move-in special) 8.Organization(OrgName, Profile_id 5, Street 6, Apt_Suite 6, Zip 6, type, description) 9.Department(DepName, University 14 ) 10.Event(EventName, Profile_id 5, Street 6, Apt_Suite 6, Zip 6, description, attendance, date, time) 11.Country(Name, Capital, Population) 12.Language(Name, Countries_spoken_in) 13.Resource(Rid, Owner 1, Price, Quantity) 14.University(Name, student_population, ranking) 15.Donation(Did, Amount, Time, Date, Pid 1 ) INTRODUCTIONQUERIESNORMALIZATIONFORMSDP SUMMARIES

8 Relational Schema (contd) 16.Mentors(Mentor 1, Mentee 2 ) 17.Student_University(Student 2, University 14 ) 18.Person_in_Org(Person 1, OrgName 8, OrgProfile 5 ) 19.RSVP(Person 1, EventName 10, EventProfile 5, SurveyScore) 20.Student_in_Department(Student 2, DepName 9, UniName 14 ) 21.Person_speaks_language(Person 1, Language 12 ) 22.Housing_near_Uni(Housing 7, UniName 14 ) 23.Organization_University(OrgName 8, OrgProfile 5, UniName 14 ) 24.Org_holds_event(OrgName 8, OrgProfile 5, EventName 10, EventProfile 5 ) 25.Org_speaks_Language(OrgName 8, OrgProfile 5, Language 12 ) 26.Org_Country(OrgName 8, OrgProfile 5, Country 11 ) 27.Dep_sponsors_event(DepName 9, UniName 14, EventName 10, EventProfile 5 ) 28.Event_speaks_language(EventName 10, EventProfile 5, Language 12 ) 29.Event_country(EventName 10, EventProfile 5, Country 11 ) 30.Country_Language(Country 11, Language 12 ) 31.Alumni_Uni(Pid 4, UniName 14, class_of) 32.Alumni_Dept(Pid 4, DepName 9 ) 33.Person_gives_donation(Pid 1, Did 15 ) 34.Rommates(Pid1 1, Pid2 1 ) INTRODUCTIONQUERIESNORMALIZATIONFORMSDP SUMMARIES

9 Relational Design INTRODUCTIONQUERIESNORMALIZATIONFORMSDP SUMMARIES

10 Query 1: Roommate Matching Shows all possible roommate combinations ordered by MatchRating. A dorm/off-campus housing facility can use it to pair up students interested in their housing Shows all possible roommate combinations ordered by MatchRating. A dorm/off-campus housing facility can use it to pair up students interested in their housing Description Description of Attributes SleepEarly to late sleep time (Scale of 1-5) WakeupEarly to late (1-5) OutgoingOutgoingness Level (1-5) StudyIn room(1) - Library(5) FriendsHaving friends in room never(1) – always(5) INTRODUCTIONDP SUMMARIESQUERIESNORMALIZATIONFORMS

11 Query 1: Roommate Matching SELECT P.Fname, P.Lname, Q.Fname, Q.Lname, Min(0.2*(Abs(S.sleep- R.sleep))+0.2*(Abs(S.wakeup-R.wakeup))+0.2*(Abs(S.outgoing- R.outgoing))+0.2*(Abs(S.study-R.study))+0.2*(Abs(S.friends- R.friends))) AS Matchrating FROM Student AS S, Student AS R, Person AS P, Person AS Q WHERE (((S.pid)=[P].[pid]) AND ((Q.pid)=[R].[pid] And (Q.pid)<[P].[pid])) GROUP BY P.Fname, P.Lname, Q.Fname, Q.Lname HAVING (((([P].[Fname]=[Q].[Fname]) And ([P].[Lname]=[Q].[Lname]))=False)) ORDER BY Min(0.2*(20-Abs(S.sleep-R.sleep))+0.2*(20-Abs(S.wakeup- R.wakeup))+0.2*(20-Abs(S.outgoing-R.outgoing))+0.2*(20- Abs(S.study-R.study))+0.2*(20-Abs(S.friends-R.friends))); SELECT P.Fname, P.Lname, Q.Fname, Q.Lname, Min(0.2*(Abs(S.sleep- R.sleep))+0.2*(Abs(S.wakeup-R.wakeup))+0.2*(Abs(S.outgoing- R.outgoing))+0.2*(Abs(S.study-R.study))+0.2*(Abs(S.friends- R.friends))) AS Matchrating FROM Student AS S, Student AS R, Person AS P, Person AS Q WHERE (((S.pid)=[P].[pid]) AND ((Q.pid)=[R].[pid] And (Q.pid)<[P].[pid])) GROUP BY P.Fname, P.Lname, Q.Fname, Q.Lname HAVING (((([P].[Fname]=[Q].[Fname]) And ([P].[Lname]=[Q].[Lname]))=False)) ORDER BY Min(0.2*(20-Abs(S.sleep-R.sleep))+0.2*(20-Abs(S.wakeup- R.wakeup))+0.2*(20-Abs(S.outgoing-R.outgoing))+0.2*(20- Abs(S.study-R.study))+0.2*(20-Abs(S.friends-R.friends))); SQL Code INTRODUCTIONDP SUMMARIESNORMALIZATIONFORMSQUERIES

12 Query 1: Roommate Matching INTRODUCTIONDP SUMMARIESNORMALIZATIONFORMSQUERIES

13 Query 2: New Student Forecasting Extracts the data of how many new students come each year which can then be used to forecast the future number of students The year table is a one attribute table containing a list of years Uses regression equation y=ax+b with slope b = (N∑XY - (∑X)(∑Y))/(N∑X2 - (∑X)2), and intercept a = (∑Y - b(∑X))/N. Where N = number of tuples, X =year, and Y = number of students Extracts the data of how many new students come each year which can then be used to forecast the future number of students The year table is a one attribute table containing a list of years Uses regression equation y=ax+b with slope b = (N∑XY - (∑X)(∑Y))/(N∑X2 - (∑X)2), and intercept a = (∑Y - b(∑X))/N. Where N = number of tuples, X =year, and Y = number of students Description SELECT y.year AS [Year], count(s.pid) AS Number_Of_Students, u.name AS University FROM [year] AS y, student AS s, university AS u WHERE s.year=y.year AND s.university=u.name GROUP BY y.year, u.name ORDER BY y.year; SELECT y.year AS [Year], count(s.pid) AS Number_Of_Students, u.name AS University FROM [year] AS y, student AS s, university AS u WHERE s.year=y.year AND s.university=u.name GROUP BY y.year, u.name ORDER BY y.year; SQL Code INTRODUCTIONDP SUMMARIESNORMALIZATIONFORMSQUERIES

14 Query 3: Event Interest Outputs a list of all events along with their computed attendance rate, the average level of student interest, and a metric combining surveyed interest with actual attendance Organizations throwing events with low attendance but high survey scores may need to look into changing venues or increasing advertising. Outputs a list of all events along with their computed attendance rate, the average level of student interest, and a metric combining surveyed interest with actual attendance Organizations throwing events with low attendance but high survey scores may need to look into changing venues or increasing advertising. Description SELECT e.EventName, e.Attendance/(Count(r.person)) AS Attendance_Rate, Avg(r.SurveyScore) AS Surveyed_Interest, Avg(r.SurveyScore)*e.Attendance/(Count(r.person)) AS Interest_Metric FROM Event AS e, RSVP AS r WHERE (((r.EventProfile)=[e].[Profile_id])) GROUP BY e.EventName, e.Profile_id, e.Attendance ORDER BY Avg(r.SurveyScore)*e.Attendance/(Count(r.person)) DESC; SELECT e.EventName, e.Attendance/(Count(r.person)) AS Attendance_Rate, Avg(r.SurveyScore) AS Surveyed_Interest, Avg(r.SurveyScore)*e.Attendance/(Count(r.person)) AS Interest_Metric FROM Event AS e, RSVP AS r WHERE (((r.EventProfile)=[e].[Profile_id])) GROUP BY e.EventName, e.Profile_id, e.Attendance ORDER BY Avg(r.SurveyScore)*e.Attendance/(Count(r.person)) DESC; SQL Code INTRODUCTIONDP SUMMARIESNORMALIZATIONFORMSQUERIES

15 Query 3: Event Interest INTRODUCTIONDP SUMMARIESNORMALIZATIONFORMSQUERIES

16 Query 4: Optimal Event Location Selects optimal potential event location on UC Berkeley campus in relation to attendee housing locations. By utilizing P-Median approach for event location that minimizes total demand weighted distances Assume P = 1 and calculate D ij by utilizing Euclidean distance formula: Selects optimal potential event location on UC Berkeley campus in relation to attendee housing locations. By utilizing P-Median approach for event location that minimizes total demand weighted distances Assume P = 1 and calculate D ij by utilizing Euclidean distance formula: Description INTRODUCTIONDP SUMMARIESNORMALIZATIONFORMSQUERIES

17 Query 4: Optimal Event Location SELECT e.EventName, l2.street AS Potential_Location, sum(((l.x-l2.x)^2)+((l.y- l2.y)^2)^0.5) AS distance, AVG(s.EventInterest) AS Demand FROM Student AS s, RSVP AS p, Housing AS h, location AS l, location AS l2, Event AS e WHERE s.PID=p.person And p.EventName=e.EventName And s.housing=h.hid And h.street=l.street And h.state=l.state And h.city=l.city And h.apt_suite=l.apt_suite And h.zip=l.zip GROUP BY e.EventName, l2.street ORDER BY e.EventName, sum(((l.x-l2.x)^2)+((l.y-l2.y)^2)^0.5); SELECT e.EventName, l2.street AS Potential_Location, sum(((l.x-l2.x)^2)+((l.y- l2.y)^2)^0.5) AS distance, AVG(s.EventInterest) AS Demand FROM Student AS s, RSVP AS p, Housing AS h, location AS l, location AS l2, Event AS e WHERE s.PID=p.person And p.EventName=e.EventName And s.housing=h.hid And h.street=l.street And h.state=l.state And h.city=l.city And h.apt_suite=l.apt_suite And h.zip=l.zip GROUP BY e.EventName, l2.street ORDER BY e.EventName, sum(((l.x-l2.x)^2)+((l.y-l2.y)^2)^0.5); SQL Code INTRODUCTIONDP SUMMARIESNORMALIZATIONFORMSQUERIES

18 Query 4: Optimal Event Location INTRODUCTIONDP SUMMARIESNORMALIZATIONFORMSQUERIES

19 Assumptions: (1)Only take students who arrive at the airport between 8am to 7:59 pm into account (2)Buses leave the airport on the hour. (3)The opportunity cost of each student waiting for a bus for an hour is $10. (4) Each type I bus has a total of 5 seats and each type II bus has a total of 10 seats. (5) We only deal with the arrival hour of each student, (student arriving at 1:01pm is treated the same as a student arriving at 1:59pm in this query implementation. and a ten-seat-vehicle to the airport and back cost $50 and $100, respectively. For date, airport extract # of students arriving in each time interval C i A≤i≤L; C i is interpreted as the number of students arriving at the airport no earlier than (i-1) o’clock but prior to i o’clock Assumptions: (1)Only take students who arrive at the airport between 8am to 7:59 pm into account (2)Buses leave the airport on the hour. (3)The opportunity cost of each student waiting for a bus for an hour is $10. (4) Each type I bus has a total of 5 seats and each type II bus has a total of 10 seats. (5) We only deal with the arrival hour of each student, (student arriving at 1:01pm is treated the same as a student arriving at 1:59pm in this query implementation. and a ten-seat-vehicle to the airport and back cost $50 and $100, respectively. For date, airport extract # of students arriving in each time interval C i A≤i≤L; C i is interpreted as the number of students arriving at the airport no earlier than (i-1) o’clock but prior to i o’clock Description Query 5: Min Airport Pick-up Cost INTRODUCTIONDP SUMMARIESNORMALIZATIONFORMSQUERIES

20 Query 5: Min Airport Pick-up Cost Formulation Decision variables: tij= 1 if a type j bus is arranged to pick up students at i o’clock. tij = 0 otherwise; (For A≤i≤L, 1≤j≤2) Objective Function (Cost Min.): SELECT s.airport AS Airport, s.arr_date AS Arr_Date, s.flight_arr_hour AS Arr_Time, COUNT(*) AS Number_of_Students FROM student AS s GROUP BY s.flight_arr_hour, s.arr_date, s.airport; SELECT s.airport AS Airport, s.arr_date AS Arr_Date, s.flight_arr_hour AS Arr_Time, COUNT(*) AS Number_of_Students FROM student AS s GROUP BY s.flight_arr_hour, s.arr_date, s.airport; SQL Code INTRODUCTIONDP SUMMARIESNORMALIZATIONFORMSQUERIES Subject to. People_constrain {Z in A,B,C,D,E,F,G,H,I,J,K,L}:

21 Query 5: Min Airport Pick-up Cost INTRODUCTIONDP SUMMARIESNORMALIZATIONFORMSQUERIES

22 Normalization Analysis: 1NF R is in 1NF if the domain of an attribute must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute. Profile (Profile_id, Pic, Emails, Phones)  Pic (Profile_id, Pic) Email (Profile_id, Email) Phone (Profile_id, Phone) R is in 1NF if the domain of an attribute must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute. Profile (Profile_id, Pic, Emails, Phones)  Pic (Profile_id, Pic) Email (Profile_id, Email) Phone (Profile_id, Phone) 1NF INTRODUCTIONDP SUMMARIESQUERIESNORMALIZATIONFORMS

23 Normalization Analysis: 2NF R is in 2NF if R is in 1NF and every nonprime attribute A in R is fully functionally dependent on the primary key of R. Location (Street, City, State, Apt_Suite, Zip, x, y) Assumption: ZIP_CODE determines CITY and STATE.  Location1 (Street, Apt_Suite, Zip, x, y) Zip (Zip, City, State) Organization (OrgName, Profile_id 5, Street 6, Apt_Suite 6, Zip 6, type, description) Assumption: The name of an organization determines its type.  OrgName (OrgName, Type) Organization1 (OrgName, Profile_id 5, Street 6, Apt_Suite 6, Zip 6, description) R is in 2NF if R is in 1NF and every nonprime attribute A in R is fully functionally dependent on the primary key of R. Location (Street, City, State, Apt_Suite, Zip, x, y) Assumption: ZIP_CODE determines CITY and STATE.  Location1 (Street, Apt_Suite, Zip, x, y) Zip (Zip, City, State) Organization (OrgName, Profile_id 5, Street 6, Apt_Suite 6, Zip 6, type, description) Assumption: The name of an organization determines its type.  OrgName (OrgName, Type) Organization1 (OrgName, Profile_id 5, Street 6, Apt_Suite 6, Zip 6, description) 2NF INTRODUCTIONDP SUMMARIESQUERIESFORMSNORMALIZATION

24 Normalization Analysis: 3NF R is in 3NF if R is in 2NF and no nonprime attribute of R is transitively dependent on the primary key. Housing (Hid, offered_by_person 1, Street 6, Apt_Suite 6, Zip 6, offered_by_org 8, org_profile 5, price, availability_date, furnished, number_rooms, number_bathrooms, water, electricity, garbage, gas, internet, move_in_special, ready_to_move_in) Assumption: For a housing place to be “ready to move in”, it has to have Internet, water, electricity, gas and garbage.  Housing1 (Hid, offered_by_person 1, Street 6, Apt_Suite 6, Zip 6, offered_by_org 8, org_profile 5, price, availability_date, furnished, number_rooms, number_bathrooms, move_in_special, Water, Electricity, Garbage, Gas, Internet) Ready_to_move_in (ready_to_move_in, Water, Electricity, Garbage, Gas, Internet) R is in 3NF if R is in 2NF and no nonprime attribute of R is transitively dependent on the primary key. Housing (Hid, offered_by_person 1, Street 6, Apt_Suite 6, Zip 6, offered_by_org 8, org_profile 5, price, availability_date, furnished, number_rooms, number_bathrooms, water, electricity, garbage, gas, internet, move_in_special, ready_to_move_in) Assumption: For a housing place to be “ready to move in”, it has to have Internet, water, electricity, gas and garbage.  Housing1 (Hid, offered_by_person 1, Street 6, Apt_Suite 6, Zip 6, offered_by_org 8, org_profile 5, price, availability_date, furnished, number_rooms, number_bathrooms, move_in_special, Water, Electricity, Garbage, Gas, Internet) Ready_to_move_in (ready_to_move_in, Water, Electricity, Garbage, Gas, Internet) 3NF INTRODUCTIONDP SUMMARIESQUERIESFORMSNORMALIZATION

25 Normalization Analysis: BCNF R is in BCNF if whenever a nontrivial functional dependency X  A holds in R, then X is a superkey of R. Student (Pid 1, Housing 7, University 14, Pickup_Person 3, Flight, Country 11, price_preference, year, sleep, wakeup, study, friends, outgoing) R is in BCNF if whenever a nontrivial functional dependency X  A holds in R, then X is a superkey of R. Student (Pid 1, Housing 7, University 14, Pickup_Person 3, Flight, Country 11, price_preference, year, sleep, wakeup, study, friends, outgoing) BCNF INTRODUCTIONDP SUMMARIESQUERIESFORMSNORMALIZATION

26 INTRODUCTIONDP SUMMARIESQUERIESNORMALIZATIONFORMS Organization Form

27 Person Form

28 INTRODUCTIONDP SUMMARIESQUERIESNORMALIZATIONFORMS Student Report

29 Questions?


Download ppt "INTRODUCTIONDP SUMMARIESQUERIES International Student College Experience Enhancement Program Team Members Alice Zhang Florence Liao Huan Guo Jake Magner."

Similar presentations


Ads by Google