Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL functions to evaluate similarity #3

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 53 additions & 48 deletions ImageHashing/ImageHashing.csproj
Original file line number Diff line number Diff line change
@@ -1,55 +1,60 @@
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup>
<Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
<Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform>
<ProductVersion>8.0.30703</ProductVersion>
<SchemaVersion>2.0</SchemaVersion>
<ProjectGuid>{5B189394-0B91-4BFF-B4FB-5CEA51174C09}</ProjectGuid>
<OutputType>Library</OutputType>
<AppDesignerFolder>Properties</AppDesignerFolder>
<RootNamespace>ImageHashing</RootNamespace>
<AssemblyName>ImageHashing</AssemblyName>
<TargetFrameworkVersion>v4.0</TargetFrameworkVersion>
<FileAlignment>512</FileAlignment>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' ">
<DebugSymbols>true</DebugSymbols>
<DebugType>full</DebugType>
<Optimize>false</Optimize>
<OutputPath>bin\Debug\</OutputPath>
<DefineConstants>DEBUG;TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' ">
<DebugType>pdbonly</DebugType>
<Optimize>true</Optimize>
<OutputPath>bin\Release\</OutputPath>
<DefineConstants>TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<ItemGroup>
<Reference Include="System" />
<Reference Include="System.Core" />
<Reference Include="System.Drawing" />
<Reference Include="System.Xml.Linq" />
<Reference Include="System.Data.DataSetExtensions" />
<Reference Include="Microsoft.CSharp" />
<Reference Include="System.Data" />
<Reference Include="System.Xml" />
</ItemGroup>
<ItemGroup>
<Compile Include="ImageHashing.cs" />
<Compile Include="Properties\AssemblyInfo.cs" />
</ItemGroup>
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup>
<Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
<Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform>
<ProductVersion>8.0.30703</ProductVersion>
<SchemaVersion>2.0</SchemaVersion>
<ProjectGuid>{5B189394-0B91-4BFF-B4FB-5CEA51174C09}</ProjectGuid>
<OutputType>Library</OutputType>
<AppDesignerFolder>Properties</AppDesignerFolder>
<RootNamespace>ImageHashing</RootNamespace>
<AssemblyName>ImageHashing</AssemblyName>
<TargetFrameworkVersion>v4.0</TargetFrameworkVersion>
<FileAlignment>512</FileAlignment>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' ">
<DebugSymbols>true</DebugSymbols>
<DebugType>full</DebugType>
<Optimize>false</Optimize>
<OutputPath>bin\Debug\</OutputPath>
<DefineConstants>DEBUG;TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' ">
<DebugType>pdbonly</DebugType>
<Optimize>true</Optimize>
<OutputPath>bin\Release\</OutputPath>
<DefineConstants>TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<ItemGroup>
<Reference Include="System" />
<Reference Include="System.Core" />
<Reference Include="System.Drawing" />
<Reference Include="System.Xml.Linq" />
<Reference Include="System.Data.DataSetExtensions" />
<Reference Include="Microsoft.CSharp" />
<Reference Include="System.Data" />
<Reference Include="System.Xml" />
</ItemGroup>
<ItemGroup>
<Compile Include="ImageHashing.cs" />
<Compile Include="Properties\AssemblyInfo.cs" />
</ItemGroup>
<ItemGroup>
<Content Include="sql\GetSimilarImages.sql" />
<Content Include="sql\EvalSimilarity.sql" />
<Content Include="sql\CountBits.sql" />
</ItemGroup>
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
<!-- To modify your build process, add your task inside one of the targets below and uncomment it.
Other similar extension points exist, see Microsoft.Common.targets.
<Target Name="BeforeBuild">
</Target>
<Target Name="AfterBuild">
</Target>
-->
-->
</Project>
144 changes: 144 additions & 0 deletions ImageHashing/sql/CountBits.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
/*

T-SQL version of ImageHashing.BitCount(ulong num) method.
Returns the number of ones in binary representation of 64-bit integer @var including the sign bit.

*/
CREATE FUNCTION [dbo].[CountBits]
(@var BIGINT)
RETURNS INT
AS
BEGIN
DECLARE @counter AS INT = 0;
SET @counter = 0 + CASE
WHEN @var & CAST (1 AS BIGINT) = 1 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (2 AS BIGINT) = 2 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (4 AS BIGINT) = 4 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (8 AS BIGINT) = 8 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (16 AS BIGINT) = 16 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (32 AS BIGINT) = 32 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (64 AS BIGINT) = 64 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (128 AS BIGINT) = 128 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (256 AS BIGINT) = 256 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (512 AS BIGINT) = 512 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (1024 AS BIGINT) = 1024 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (2048 AS BIGINT) = 2048 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (4096 AS BIGINT) = 4096 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (8192 AS BIGINT) = 8192 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (16384 AS BIGINT) = 16384 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (32768 AS BIGINT) = 32768 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (65536 AS BIGINT) = 65536 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (131072 AS BIGINT) = 131072 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (262144 AS BIGINT) = 262144 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (524288 AS BIGINT) = 524288 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (1048576 AS BIGINT) = 1048576 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (2097152 AS BIGINT) = 2097152 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (4194304 AS BIGINT) = 4194304 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (8388608 AS BIGINT) = 8388608 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (16777216 AS BIGINT) = 16777216 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (33554432 AS BIGINT) = 33554432 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (67108864 AS BIGINT) = 67108864 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (134217728 AS BIGINT) = 134217728 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (268435456 AS BIGINT) = 268435456 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (536870912 AS BIGINT) = 536870912 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (1073741824 AS BIGINT) = 1073741824 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (2147483648 AS BIGINT) = 2147483648 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (4294967296 AS BIGINT) = 4294967296 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (8589934592 AS BIGINT) = 8589934592 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (17179869184 AS BIGINT) = 17179869184 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (34359738368 AS BIGINT) = 34359738368 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (68719476736 AS BIGINT) = 68719476736 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (137438953472 AS BIGINT) = 137438953472 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (274877906944 AS BIGINT) = 274877906944 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (549755813888 AS BIGINT) = 549755813888 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (1099511627776 AS BIGINT) = 1099511627776 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (2199023255552 AS BIGINT) = 2199023255552 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (4398046511104 AS BIGINT) = 4398046511104 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (8796093022208 AS BIGINT) = 8796093022208 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (17592186044416 AS BIGINT) = 17592186044416 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (35184372088832 AS BIGINT) = 35184372088832 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (70368744177664 AS BIGINT) = 70368744177664 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (140737488355328 AS BIGINT) = 140737488355328 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (281474976710656 AS BIGINT) = 281474976710656 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (562949953421312 AS BIGINT) = 562949953421312 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (1125899906842624 AS BIGINT) = 1125899906842624 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (2251799813685248 AS BIGINT) = 2251799813685248 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (4503599627370496 AS BIGINT) = 4503599627370496 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (9007199254740992 AS BIGINT) = 9007199254740992 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (18014398509481984 AS BIGINT) = 18014398509481984 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (36028797018963968 AS BIGINT) = 36028797018963968 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (72057594037927936 AS BIGINT) = 72057594037927936 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (144115188075855872 AS BIGINT) = 144115188075855872 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (288230376151711744 AS BIGINT) = 288230376151711744 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (576460752303423488 AS BIGINT) = 576460752303423488 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (1152921504606846976 AS BIGINT) = 1152921504606846976 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (2305843009213693952 AS BIGINT) = 2305843009213693952 THEN 1 ELSE 0
END + CASE
WHEN @var & CAST (4611686018427387904 AS BIGINT) = 4611686018427387904 THEN 1 ELSE 0
END + CASE
-- bigint gets overflowed when the most significant bit is one and the rest is zero
WHEN @var & CAST (-9223372036854775808 AS BIGINT) = -9223372036854775808 THEN 1 ELSE 0
END;
RETURN (@counter);
END;
14 changes: 14 additions & 0 deletions ImageHashing/sql/EvalSimilarity.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
/*

T-SQL version of ImageHashing.Similarity(ulong hash1, ulong hash2) method.
Returns a percentage-based similarity value between the two given hashes. The higher the percentage, the closer the hashes are to being identical.

*/

CREATE FUNCTION [dbo].[EvalSimilarity]
(@a BIGINT, @b BIGINT)
RETURNS FLOAT
AS
BEGIN
RETURN (((64 - dbo.CountBits(@a ^ @b)) * 100) / 64.0);
END;
21 changes: 21 additions & 0 deletions ImageHashing/sql/GetSimilarImages.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
/*

This is just an example of how functions can be used in a select statement.

Select similar images using ImageHashing functions and a threshold value.
When storing ulong integers in the database as bigints be aware that we may have overflows
because ulong has a greater range of positive numbers, nonetheless we can safely store them
because but both are 64 bit data types so the binary representation does not change after conversion.

*/

CREATE PROCEDURE [dbo].[GetSimilarImages]
@phash BIGINT
AS
BEGIN
DECLARE @threshold AS FLOAT = 93;
SELECT *
FROM Images AS i
WHERE i.PHash IS NOT NULL
AND dbo.EvalSimilarity(i.PHash, @phash) >= @threshold;
END