AI Meets Databases: Exploring Vector Search in Oracle 23c

AI Meets Databases: Exploring Vector Search in Oracle 23c

To implement AI Vector Search functionality in Oracle Database 23c (23ai), this guide goes through the steps of loading a pre-trained model, generating vector embeddings for data, searching with vectors, and creating vector indexes. Examples and corresponding SQL are provided to elucidate the implementation. Choosing the Model The first step in vector search is obtaining a model capable of generating vectors from textual data. Oracle provides a pre-trained model called `all-MiniLM-L12-v2`, which is used in this example. It generates vectors directly from plain text input. To begin, download the ONNX model from Oracle and unpack it into a designated directory using shell commands:

mkdir -p /u01/models
cd /u01/models
wget https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip
unzip -oq all_MiniLM_L12_v2_augmented.zip

Loading the Model Next, connect to the database to create a test user, grant necessary privileges, and define a directory object pointing to the folder containing the ONNX model:

conn sys/SysPassword1@//localhost:1521/freepdb1 as sysdba
create user if not exists testuser1 identified by testuser1 quota unlimited on users;
grant create session, db_developer_role, create mining model to testuser1;
create or replace directory model_dir as '/u01/models';
grant read, write on directory model_dir to testuser1;

Load the model into the database using the `DBMS_VECTOR` package:

begin
  dbms_vector.drop_onnx_model (
    model_name => 'ALL_MINILM_L12_V2',
    force => true);
  dbms_vector.load_onnx_model (
    directory  => 'model_dir',
    file_name  => 'all_MiniLM_L12_v2.onnx',
    model_name => 'ALL_MINILM_L12_V2');
end;
/

Verify the loaded model by checking the `USER_MINING_MODELS` view:

column model_name format a30
column algorithm format a10
column mining_function format a15
select model_name, algorithm, mining_function
from   user_mining_models
where  model_name = 'ALL_MINILM_L12_V2';

Generating Vectors (VECTOR Data Type) The `VECTOR_EMBEDDING` function generates vector representations. Here is an example of generating a vector for the text `Quick test`:

select 
vector_embedding(all_minilm_l12_v2 using 'Quick test' as data) AS my_vector;

The result is a high-dimensional vector returned as output. To work with real data, download a dataset of movie quotes:

cd /u01/models
wget https://huggingface.co/datasets/ygorgeurts/movie-quotes/resolve/main/movie_quotes.csv?download=true -O movie_quotes.csv

Create a table based on the dataset and prepare it for vector embeddings:

conn testuser1/testuser1@//localhost:1521/freepdb1
drop table if exists movie_quotes purge;

create table movie_quotes as
select movie_quote, movie, movie_type, movie_year
from   external (
         (
           movie_quote  varchar2(400),
           movie        varchar2(200),
           movie_type   varchar2(50),
           movie_year   number(4)
         )
         type oracle_loader
         default directory model_dir
         access parameters (
           records delimited by newline
           skip 1
           badfile model_dir
           logfile model_dir:'moview_quotes_ext_tab_%a_%p.log'
           discardfile model_dir
           fields csv with embedded terminated by ',' optionally enclosed by '"'
           missing field values are null
           (
             movie_quote char(400),
             movie,
             movie_type,
             movie_year
           )
        )
        location ('movie_quotes.csv')
        reject limit unlimited
      );

alter table movie_quotes add (
  movie_quote_vector vector
);

update movie_quotes
set    movie_quote_vector = vector_embedding(all_minilm_l12_v2 using movie_quote as data);
commit;

Vector Search Using VECTOR_DISTANCE Vector search leverages `VECTOR_DISTANCE` to measure the similarity between vectors. Here’s an example query to find “Films with motivational speaking in them”:

variable search_text varchar2(100);
exec :search_text := 'Films with motivational speaking in them';

SELECT vector_distance(movie_quote_vector, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance,
       movie,
       movie_quote
FROM   movie_quotes
order by 1
fetch approximate first 5 rows only;

Another example query for “Films about war”:

variable search_text varchar2(100);
exec :search_text := 'Films about war';

SELECT vector_distance(movie_quote_vector, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance,
       movie,
       movie_quote
FROM   movie_quotes
order by 1
fetch approximate first 5 rows only;

Creating a Vector Index (Optional) Enhance search performance by creating vector indexes. First, allocate sufficient memory by adjusting the `VECTOR_MEMORY_SIZE` parameter:

conn / as sysdba
alter system set vector_memory_size = 1G scope=spfile;
shutdown immediate;
startup;

Create indexes for optimized vector search:

create vector index movie_quotes_vector_idx on movie_quotes(movie_quote_vector) organization neighbor partitions
distance cosine with target accuracy 95;

drop index if exists movie_quotes_vector_idx;

create vector index movie_quotes_vector_idx on movie_quotes(movie_quote_vector) organization inmemory neighbor graph
distance cosine with target accuracy 95;

These steps bring structure and scalability to utilizing AI Vector Search within Oracle Database 23ai, showcasing its potential to integrate AI-powered search into database systems effectively.