אינדקסינג והשינג (indexing & hashing)

אינדקסינג והשינג (indexing & hashing)
אמצעים לאיתור רשומות בקובץ, למשל לצורך ביצוע שאילתא. קובץ האינדקס מכיל רשומות (כניסות אינדקס) בעלות מבנה: (מפתח חיפוש, פוינטר) כאשר: מפתח החיפוש הוא השדה על פיו מחפשים מידע בקובץ פוינטר הוא מצביע לרשומה בקובץ הנתונים שני סוגים עיקריים של אינדקסים: אינדקסים מסודרים (ordered indices) אינדקסי הש (hash indices)

אינדקסים סדורים (orderd indices)
שיטה לגישה מהירה: סדרתית ואקראית. כל אינדקס מאופיין על ידי מפתח חיפוש. נניח כי כל הקבצים בבסיס הנתונים הם קבצים ממוינים. אינדקס ראשי (primary index): אינדקס שמפתח החיפוש שלו מגדיר את סדר הרשומות בקובץ. אינדקס משני (secondary index): כל אינדקס שאינו ראשי.

קובץ סדרתי לרשומות היחס deposit
מתאים הן לגישה סדרתית והן לגישה אקראית.

אינדקס צפוף (dense index)
אינדקס צפוף: אינדקס הכולל רשומת אינדקס אחת לכל ערך של מפתח החיפוש.

אינדקס דליל (sparse index)
אינדקס דליל: אינדקס הכולל רשומת אינדקס רק לחלק מהערכים של מפתח החיפוש.

אינדקס דליל בעל שתי רמות
כאשר קבצי האינדקס גדלים יעילותם קטנה. במקרה זה ניתן לבצע אינדקס על קובץ האינדקס. block 1 block 2 block 65 level 2 index (outer index) block 66 level 1 index (inner index) data file

אינדקס משני (secondary index)
הקובץ אינו ממוין לפי מפתח החיפוש של האינדקס. Secondary index with buckets block 1 block 2 Block 3 block 4 block 5 Block 6 secondary index Block 7 buckets data file

שיטות למימוש אינדקס משני
אינדקס משני צפוף המכיל כתובות לכל רשומות קובץ הנתונים. (כפילות של ערכי מפתח החיפוש באינדקס). רשומת אינדקס אחת לכל ערך של מפתח חיפוש עם פוינטרים אחדים לכל רשומה. רשומת אינדקס אחת לכל ערך של מפתח חיפוש עם הפנייה ל”דלי” (bucket) של פוינטרים לקובץ הנתונים..

עץ (B+ tree index) B+ יעילותו של מבנה אינדקס סדרתי פוחתת עם גודל הקובץ. מבנה נתונים של עץ מאוזן שומר על יעילות גבוהה, במחיר של תקורת זיכרון. צומת אופיינית של העץ: K1, K2,… Kn-1 הם ערכים של מפתח החיפוש P1, P2,… Pn הם מצביעים. P1 K1 P2 K2 Pn-1 Kn-1 Pn

הערכים של מפתח החיפוש מסודרים בכל צומת כך ש K1< K2<…< Kn-1
המצביעים של צמתי הביניים (לא העלים) מורים על תתי עצים כדלקמן: P1 Pj Kj Pj+1 Pn K <Kj K  Kj

המצביעים P1,… Pn-1 בצמתי העלים מצביעים על רשומות בקובץ הנתונים
K1 Pj Kj Pn-1 Kn-1 Pn רשומות המתאימות ל K1 רשומות המתאימות ל Kj רשומות המתאימות ל Kn-1

כללים החלים על מבנה עץ B+
כללים החלים על מבנה עץ B+ אם השורש אינו עלה, חייבים להיות לו לפחות שני בנים. לעץ מסדר n , לכל צומת (פרט לשורש ולעלים) בין n/2 ל n בנים. לעץ מסדר n , לכל עלה בין (n-1)/2 ל n-1 בנים.(רשומות בקובץ הנתונים). מספר ערכי מפתח החיפוש בצומת שאינו עלה קטן ב 1 ממספר המצביעים. על העץ להיות מאוזן. העלים מקושרים זה לזה בסדר מפתח החיפוש.

דוגמה למבנה עץ B+ עץ עם n=3 לקובץ deposit . pointers to file records
Merkazi Eilat Pinkas Aviv Darom Eilat Merkazi Pinkas Tsafon pointers to file records

B+ tree insertions Find the leaf node where an entry belongs and insert it there. Use the search algorithm to find the appropriate position. If the entry fits in the node (it is not full) the process is finished. If the node is already full split it. Half of the entries go in the old node and half in the new node. With the creation of a new node an entry has to be inserted in the penultimate level of the tree. Adding an entry to this level may also result in a split if the node is full. A new node is created by splitting the old node. One of the values is pushed up to the next level of the tree (removed from the current level). This, in turn may produce a new node. This process is repeated until no new nodes are created or a new root node has been created. Penultimate = next to last

B+ Tree Deletions Find the entry in the leaf node and delete it. This may result in too few entries (less than (n–1)/2) in which case either the predecessor or successor of the node is also affected (pick one) and one of two things occur. If the values from the two nodes fit in one node the nodes are coalesced. Keep track of the value in the parent between the pointers to the two nodes. Combine the two nodes into one. If the node is a leaf node re-connect the links between leaves. Delete (using this procedure) the value in the parent. If the values from the two nodes don’t fit in one node the values are rearranged. Take one value and pointer from the other node. This will create a new initial value in one node. Replace the appropriate parent value with this. If, after recursive deletions, the root node has only one child, make this child the new root and delete the old root.

הגדרת אינדקס ב SQL create [unique] index index-name on relation-name (attribute-list) כדוגמה: create index br_indx on branch (branch-name) drop index index-name השמטת אינדקס:

אינדקסי הש (hash indices)
פונקצית השינג מחשבת כתובת של בלוק. שדה/שדות ההש (hash field): שדה/שדות שעליהם פועלת פונקצית ההשינג לחישוב הכתובת.

דוגמת השינג h(x)= (…) mod 5 hash table for deposit . key= branch-name
Pinkas h(x)= (…) mod 5 Bucket 0 Tsafon Darom Darom Bucket 1 Darom Darom Eilat Bucket 2 Merkazi Bucket3 Merkazi Merkazi Aviv Bucket 4 hash table for deposit . key= branch-name

השינג דינמי (dynamic hashing)
השינג סטטי (static hashing): השינג עם מספר קבוע של דליים. השינג דינמי (dynamic hashing): השינג שבו פונקצית ההשינג ומספר הדליים משתנים בהתאם לגודל קובץ הנתונים.

השינג בר-הרחבה (extendabable hashing)
bucket 1 hash prefix i i2 bucket 2 i3 bucket address table bucket 3

דוגמה: הוספת רשומות לקובץ deposit תוך בניית השינג בר-הרחבה
Aviv Carmel Darom Eilat Merkazi Pinkas Tsafon 32 bit hash function for branch-name

מבנה סופי לאחר הזנת הרשומות לעיל
1 Aviv Tsafon 3 3 000 Eilat 001 010 011 3 100 101 Darom Pinkas 110 111 bucket address table 2 Merkazi Carmel

אינדקסינג והשינג (indexing & hashing)

Similar presentations

Presentation on theme: "אינדקסינג והשינג (indexing & hashing)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

אינדקסינג והשינג (indexing & hashing)

Similar presentations

Presentation on theme: "אינדקסינג והשינג (indexing & hashing)"— Presentation transcript:

Similar presentations

About project

Feedback