Different types of data—structured, unstructured, and semi-structured—pose unique challenges in scaling AI programs. How are AI techniques applied to different data types?

Unstructured data: The Wild West of data management

Unstructured data, as the term implies, lacks a predefined format or structure.  This is data that essentially needs a human to read and consume it for it to make any sense. Unstructured data includes formats like audio files, videos, and emails that are inherently complex and not readily queryable using traditional database queries like SQL.

To derive meaning from an audio file, one must listen and interpret the content. To make sense of an email, you have to read it. There's no straightforward way for machines to query this data type to directly extract insights.

AI and unstructured data

AI technologies, particularly those leveraging Natural Language Processing (NLP) and machine learning, are crucial in managing unstructured data. AI can transcribe speech from audio files, extract key phrases, or determine sentiment from textual data. Additionally, image recognition and video analysis, powered by AI, enable the categorization and tagging of visual content automatically.

However, the application of AI in unstructured data is fraught with challenges. These include the high variability of data formats and the need for extensive training data to teach AI models how to interpret and process this information correctly.

Structured data: The orderly realm

Structured data is highly organized and easily searchable due to its rigid schemas—it is divided into rows and columns, nodes and edges, or objects and properties, and can be efficiently queried using languages like SQL. This data type is exemplified by relational and graph databases where entries are stored in predictable and standardized formats.

AI and structured data

In the realm of structured data, AI enhances the efficiency and speed of data processing. AI can automate complex query operations, predict trends based on historical data, and perform real-time anomaly detection to prevent fraud. AI systems can also optimize database performance by learning query patterns and adjusting resource allocations dynamically.

The right way to use AI with structured data is to teach it about the metadata, which is the schema of the database. The best form of schema is ontology - which incorporates the structure of the data along with the logical meaning of the data.  That combination of things is what makes Ontology the best way for AI to interact with structured data. 

You can "teach" the AI about this in a couple of different ways:  fine-tuning (training) models can work. But often, a form of RAG (Retrieval Augmented Generation) against a knowledge graph of the ontology is a superior approach, and is one we focus on with our AI Context Engine

Semi-structured data: Bridging the gap

Semi-structured data  does not conform to a strict schema - this  includes XML and JSON documents, which contain tags or keys that provide some level of hierarchy and order but not to the extent of fully structured databases. These types of data are machine-readable, but they often lack any sort of schema definition providing a "roadmap" to indicate what the fields mean, so they present a harder problem for machines to process than fully structured data.

AI in semi-structured data

Many of the same approaches and strategies for structured data apply to semi-structured data. Essentially, all of them boil down to adding or inferring some of the structure that is only "semi-" there in semi-structured data.

Future trends in AI 

Looking forward, AI is set to play an increasingly central role in data management across all data types. It's exciting to witness the development of AI models that can transition between handling unstructured, structured, and semi-structured data. Such capabilities will enable businesses to derive insights quicker and with greater accuracy, irrespective of the data type.

That having been said, we've already seen disillusionment set in for generative AI; even professors of AI are saying that it is a failure, because it doesn't have any knowledge. You can't trust LLMs. Rather than making knowledge and data obsolete, AI will provide access to data to a much wider audience. As we come to understand the limitations and capabilities of AI and of our data systems, we will come to expect knowledgeable, intelligent access to data.