WELCOME, GUEST
Search in Topic Titles
Welcome to Knowledge Xpert for Oracle
Knowledge Xpert for Oracle Administration
Oracle Architecture
Database Administration
Database Tuning
Network Management
SQL Reference
Regular Expressions in Oracle
Quote Character Assignment
Built-in Packages
SQL Functions
SQL *Plus
SQL Statements
DDL - Data Definition Language
ANALYZE
ASSOCIATE STATISTICS
AUDIT
CLUSTER
COMMENT
CONTEXT
CONTROLFILE
DATABASE
DATABASE LINK
DIMENSION
DIRECTORY
DISASSOCIATE STATISTICS
DISKGROUP
FLASHBACK ARCHIVE
FUNCTION
GRANT
INDEX
INDEXTYPE
JAVA
LIBRARY
MATERIALIZED VIEW
MATERIALIZED VIEW LOG
NOAUDIT
OPERATOR
OUTLINE
PACKAGE
PFILE
PROCEDURE
PROFILE
RENAME
RESOURCE COST
RESTORE POINT
REDO LOG
REVOKE
ROLE
ROLLBACK SEGMENT
SCHEMA
SEQUENCE
SET CONSTRAINTS
SNAPSHOT
SNAPSHOT LOG
SPFILE
SYNONYM
TABLE
Oracle 10g Collection Enhancements
Cascade Update
Creating
Altering
Dropping
Truncating
Deleting Data
Estimating Size
Renaming
Types of Tables
Rebuilding
Monitoring
Tables - Random Sampling
TABLESPACE
TEMPORARY TABLESPACE
TRIGGER
TRUNCATE
TYPE
USER
VIEW
DML - Data Manipulation Language
Embedded SQL Statements
Transaction Control Statements
Session Control
System Control
Keyword Reserved Words
ANSI Reserved Words
SQL Reserved Words
SQL Coding Best Practices
Instant Scripts
Disclaimer
Knowledge Xpert for PL/SQL Development
Knowledge Xpert Feedback

Tables - Random Sampling

A valuable data extraction technique is to present a random subset of records from a table. Selecting the first 1000 rows of a table is hardly random and may not be indicative of the cross section of data within a large table. Selecting records that match one or more predefined key values is even less useful.

Selecting records from the entire table based on a random row identifier within each Oracle data block will produce a more appropriate representation of the data. The following example demonstrates the SQL required to select the tenth row from each Oracle block.

SELECT . . .
  FROM emp_history
 WHERE SUBSTR(ROWID,6,4) = '0010'

This technique can be taken one step further to introduce a level of randomness into the actual row selection. Tying the record identifier to the current time allows us to repeat the exercise with varying results.

SELECT . . .
  FROM emp_history
 WHERE SUBSTR(ROWID,10,4) = TO_CHAR(SYSDATE,'"00"ss')

Note that as of Oracle 8, you must use the DBMS_ROWID package to translate the new extended rowid into the old format. For example:

SELECT 'INSERT INTO temp_size_table'||&&cr||
         'SELECT '||&&qt||segment_name||&&qt||&&cr||
         ',COUNT( DISTINCT( SUBSTR( ROWID,1,8))) blocks'||&&cr||
        'FROM &&owner..'||segment_name, ';'
  FROM dba_segments
 WHERE segment_type ='TABLE' AND owner = upper('&owner');

becomes

SELECT 'INSERT INTO temp_size_table'||&&cr||
         'SELECT '||&&qt||segment_name||&&qt||&&cr||
         ',COUNT( DISTINCT(
          SUBSTR( dbms_rowid.rowid_to_restricted(ROWID,1),1,8))) blocks'||&&cr||
        'FROM &&owner..'||segment_name, ';'
  FROM dba_segments
 WHERE segment_type ='TABLE' AND owner = upper('&owner');

In Oracle 8i there is also a sample clause for use with SQL SELECT to do randomized sampling which simplifies the process further. In the example below, the BLOCK keyword instructs Oracle to perform random block sampling instead of random row sampling:

SELECT COUNT(*) * 100 FROM emp SAMPLE BLOCK (1);

Data sampling is a very important issue for very large databases. The accuracy of your results can be directly dependant on the precision of your data samples.

Rating (Votes: 0)

Note: Only Registered Users may rate topics.