17:35:16 Okay, we're continuing in our series about what the needs. Of an operating system. Oh excuse me, other data base.
17:35:28 Needs in terms of systems and operating systems and this time I want to talk about some issues of storage and backup a lot of the conceptual issues.
17:35:41 Are discussed in general and of course like operating systems and other courses too. I'm pretty sure. But I wanna take a look at how they.
17:35:54 Interface with the database needs. First off.
17:35:59 A database has. Too much. Information.
17:36:12 You know, to store it in any significant way. In the CPU. So that means
17:36:22 We're gonna have to go to secondary.
17:36:28 Storage areas.
17:36:32 You know, like hard drives, external discs, solves state, drives, optical magnetic tape, slash drives, that kind of stuff.
17:36:41 Now one of the problems
17:36:49 Sequential access. Does not work. Well, for databases.
17:37:00 Except for certain ways the backups work. Because The database doesn't go through things in sequential order.
17:37:15 Don't even go through alphabetical order. They really have. Different ordering. So.
17:37:21 That makes a difference in how the storage. Now, Of course, as a review. There's a
17:37:30 There's main memory.
17:37:34 And there's cash. And then there's disk. And you know other kinds of hierarchy and
17:37:46 As you know. There's trade offs, the faster it is, the more expensive it is, etc.
17:37:53 But,
17:37:59 Database wants ultimately non-volatile.
17:38:05 But.
17:38:13 Random access. Fast memory.
17:38:18 No.
17:38:23 Large amounts of non volatile random access fast memory would be
17:38:29 Fairly expensive. You know, archives and things.
17:38:37 That can be done fairly cheaply because archives. And things you don't.
17:38:48 You need. To access. Randomly.
17:39:02 The trade off is. The faster the. Speed is the faster the capacity is the more the cost is Lower the speed is lower the capacity the lower the cost is and we want high speed and high capacity We'd rather not have high cost.
17:39:22 And of course, if a but they did it small enough we could use discs or disk packs but discs are generally
17:39:44 Sequential access. Originally,
17:39:50 Remember that on Disk drives.
17:39:58 You have a spinning head. And, and in arms.
17:40:05 That reaches in or out. So even though you couldn't access a particular block.
17:40:13 They particular block. It takes time to spin. The access to that. So.
17:40:28 Well, one of the things that saves
17:40:34 The is buffering. To save time, you can often read into a buffer.
17:40:45 And simply wait until the buffer is red and go do something else.
17:40:52 That in the when you open a file that usually Operate things in a bucker. Now.
17:41:10 Sometimes it helps.
17:41:14 To organize data logically.
17:41:26 No.
17:41:35 In the database world. If you can organize the data. So the contiguous blocks. Are relevant like you would put.
17:41:46 All rows in a table.
17:41:50 Together on the disc, for example.
17:41:57 And to save time.
17:42:02 You often read large blocks. You know, huge chunks of tables. Oh, at once. And There were many times.
17:42:19 When you expect what you're going to. Read next so you can do some advanced reading.
17:42:31 And you can schedule.
17:42:39 And put output requests.
17:42:44 And you often. Old material. To be written.
17:42:53 Until it is. Convenient. To update it in the more permanent form. I mean, these are all.
17:43:05 Possible the things to work with. And flash drive and magnetic caper.
17:43:16 Often used for recovery. Now when we use buffering
17:43:30 We often need to worry about buffer replacement strategies. Which records we can even if we have a gig in our computer.
17:43:42 We can't hold everything in.
17:43:52 You know, which records are held in memory for how long. Now in a program. There's a tendency for locality of reference that is When you refer to code, you're probably going to be looking at.
17:44:07 Code in that same area right away. And. In a database. You're more likely to be splitting around.
17:44:19 But we still have a question. But like the
17:44:28 Some of the strategies include swap out whatever was used. The longest time ago without being referenced, least recently used.
17:44:37 First in first out.
17:44:45 And then there's the other strategies. Let it involve like. The clock. Policy.
17:44:55 If it wasn't used in the last. Several seconds. Don't market and if it's not used again.
17:45:03 We'll pull it out. There's all kinds of strategies that we're discussed in operating systems.
17:45:12 A lot of times in our database.
17:45:23 You can have a files of, you know, essentially unordered. Records.
17:45:31 There is they don't even have to be in the file structure.
17:45:47 Keeping track of such records is fun.
17:45:54 Hash methods.
17:45:58 Can be used. For example, if you stored Each table is one file, which is an oversimplification.
17:46:07 You could use a hash based on the name of the table to where you would store it. And then every time you wanted to refer something to a table.
17:46:17 You could go to it. Database management system sometimes bypass or supplement the operating systems. Now in general
17:46:33 A lot of people like the idea of having
17:46:37 Files sorted. And in fact.
17:46:46 Sorting a file by a primary key.
17:46:53 Makes an intuitive sense
17:46:59 If you do that.
17:47:10 Then if you do operations that are not by the primary key, they come out to be slower.
17:47:20 Another approach, cause you know really how often in a database are we only looking. By the primary key.
17:47:34 So another approach we have is indexed files. That is, for each
17:47:42 Attribute.
17:47:45 By which you might want to search.
17:47:54 You have a tree like structure.
17:48:03 To allow search.
17:48:07 So. Instead of just physically having them in order. By the primary key. You have them in order whatever order you want.
17:48:18 But you can have an index structure that can think of it as like a binary linked tree.
17:48:25 That you can look up by name or you can look up by. Other fields. Yeah. The advantage of index files is faster.
17:48:43 To
17:48:46 Search. If it's actually been. Indexed. Disadvantage is.
17:49:05 There's overhead needed to do and store indexing.
17:49:11 And generally.
17:49:21 We're back to the Zoom.
17:49:24 Searches that are not through the index.
17:49:30 Are even slower.
17:49:36 I guess I.
17:49:41 I mentioned hashing. A little bit.
17:49:48 Some variations.
17:49:57 Pashing can make it.
17:50:05 Baster, certainly faster than a linear search. You know, linear search is way too slow.
17:50:13 Binary search couldn't even be slow.
17:50:18 If you have index files, you can do different kinds of binary search. So.
17:50:29 To like to say.
17:50:34 If you're using a hashing approach, you don't wanna have your table full and you do have to handle collisions.
17:50:40 Hashing is not a major topic of this course, but I will if you haven't heard about it elsewhere, please feel free.
17:50:46 In database, you might use external hashing.
17:50:55 It basically means The hashing goes to a disk.
17:51:02 And there's forms of extendable.
17:51:12 Yeah, that are available. Now in operating systems. Most of you probably. Discussed.
17:51:23 Redundant ray of independent or inexpensive discs. One of the reasons for doing That is for.
17:51:39 Essentially avoiding data loss on disk failure.
17:51:49 But it can also. Have more
17:51:56 Reading take place.
17:52:02 Simultaneously because we have multiple disk drives that can be reading at the same time rather than one disk drive.
17:52:09 And again, the redundancy in the redundant, of disk is a backup redundancy.
17:52:17 That's not a bad thing. Yeah, there's a reminder, you know, in if you had 4 disk drives.
17:52:26 And you simply put one quarter of your data on each of the 4 disk drives. Okay, effectively cuts your reading and writing time.
17:52:36 In a fourth compared to having only one disk drive. If you had 9 disk drives with one bit from each bite.
17:52:45 And then one bit for the parity, you could cut access time to one eighth.
17:53:04 So that could save us. Something.
17:53:14 No. But.
17:53:20 I mean, we are talking.
17:53:24 There's a lot of data that goes in your database.
17:53:30 And you'll be asking. Some of these questions. In a lab coming up.
17:53:40 How many? Pieces of data. Are in your database.
17:53:49 You know, is like. Number of tables.
17:53:55 You know, times the attributes.
17:54:01 In a table. Times the number. Of rows. In a table.
17:54:12 If you have a large company. Company or government organization.
17:54:19 That can be really large.
17:54:26 Yeah, a binary tree, you know, like. If we were doing binary search.
17:54:43 1,000 items.
17:54:47 Would have 10. Disc operations. To find the one thing you're looking for. And remember, you're not looking for one thing you're looking for a whole table.
17:54:57 And a 1 million operations.
17:55:05 Would have 20 disc operations. So.
17:55:11 In database storage and retrieval. We don't really care that much about what's going on in core.
17:55:19 We're trying to reduce. The number of disc operations.
17:55:28 In the lab. That's coming up. Will be looking at a couple of approaches.
17:55:41 A binary tree.
17:55:45 Is nice. But they're or other. Options. Most of them.
17:55:56 Our versions of a tree.
17:55:59 And you'll be hearing about, B tree, KD, B, Tree, B plus tree, things like that.
17:56:05 Sometimes. That go faster. Then a simple binary search based on the fact that we're looking.
17:56:14 In a disk drive. So.
17:56:19 I reflect on this a little, but, you're going to be working on it. To understand the concept.
17:56:26 In a lab coming up. So.
17:56:33 I think that finishes our series on the needed. Things are A future topic will be. Data mining.