Summary
Keywords
Full Transcript
Data is stored in a variety of media types including disk (magnetic or optical), RAM (random-access memory), and magnetic tape. When storing data on a media, some considerations are the speed (transfer rate of the data being read or written, or access time to get data), storage capacity and the cost per GB of memory, and whether it is volatile (in-memory) or non-volatile (stored on disk). The 3 most important types of media for a database are: RAM (random-access memory or main memory) – fastest, most expensive, smaller capacity, volatile. SSD (solid-state drive or flash memory) – fast, not too expensive, little bit larger capacity, non-volatile HHD (hard-disk drive or magnetic disk) - slow, cheap, large capacity, non-volatile. A sector is a physical track or slice on a disk that holds data, and is typically 512 bytes per sector, or up to 4 kb (kilobytes) with newer disk formats. Data is grouped in pages with SSD flash memory, and is typically between 2 kb to 16 kb per page. A block is a uniform size of data that is used by both databases and file systems. It is used when transferring data between main memory and storage media, and is typically 2 kb to 64 kb. Row-oriented databases are databases that organize data by record, keeping all of the data associated with a record next to each other in memory. Row oriented databases are the most common way of organizing data store data quickly. Relational databases commonly use row-oriented storage. They are optimized for reading and writing rows efficiently. MySQL and Postgres are examples of a row-oriented database. Column-oriented databases are databases that organize data by field, keeping all of the data associated with a field next to each other in memory. Column oriented databases are getting more and more popular and provide performance advantages to querying data. They are optimized for reading and computing on columns efficiently. Redshift and Snowflake are examples of column-oriented databases. A table structure is a scheme that organizes rows in blocks for storage. Besides row or column oriented table structures, there are a few alternative table structures such as heap tables, sorted tables, hash tables, and table clusters. A heap table is a table that is stored without any specific order imposed on the rows. When rows are inserted into a heap, there is no way to ensure where the data will be written or remain in the same order. Heaps can be used as staging tables for large, unordered insert operations. Because data is inserted without enforcing a strict order, the insert operation is pretty fast. A sorted table is a table sorted by a specified column, typically the primary key, which determines the row order. The table’s rows are assigned to blocks based on the value of the sort column. Each block contains a range of sorted rows, based on the column they are sorted by. Using sorted tables are great for queries that read in order of the column that it is sorted by. A hash table is a table where the rows are assigned to buckets, which are blocks containing rows. Each bucket starts with one block, and as the table grows, buckets may fill up with rows and new blocks are allocated by the database and linked to the initial block, creating a chain of linked blocks. The bucket containing each row is identified by a hash function and a hash key which is typically the primary key, or another column or even a group of columns. The hash function uses the hash key to compute which bucket contains the row. A table cluster interleaves rows of two or more tables in the same storage area. A cluster key is a column that is available in all interleaved tables, and it determines the row order in which they are interleaved. Rows with the same cluster key value are stored together, and the cluster key is typically the primary key of one table and the corresponding foreign key of another table. Subscribe to Appficial for more programming videos coming soon. Also, don't forget to click LIKE and comment on the video if it helped you out!
