Skip navigation links
A B C D E F G H I K L M N O P Q R S T U V W 

A

AbstractIpAddressBasedQueue - Class in org.dice_research.squirrel.queue
This abstract class manages two important aspects of an IpAddressBasedQueue.
AbstractIpAddressBasedQueue() - Constructor for class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
 
AbstractKnownUriFilterDecorator - Class in org.dice_research.squirrel.data.uri.filter
 
AbstractKnownUriFilterDecorator() - Constructor for class org.dice_research.squirrel.data.uri.filter.AbstractKnownUriFilterDecorator
 
AbstractKnownUriFilterDecorator(KnownUriFilter) - Constructor for class org.dice_research.squirrel.data.uri.filter.AbstractKnownUriFilterDecorator
 
accessURL - Static variable in class org.dice_research.squirrel.vocab.DCAT
 
Activity - Static variable in class org.dice_research.squirrel.vocab.PROV_O
 
activityUri - Variable in class org.dice_research.squirrel.metadata.CrawlingActivity
A unique id.
ActivityUtil - Class in org.dice_research.squirrel.metadata
A simple utilities class for working with the CrawlingActivity objects.
ActivityUtil() - Constructor for class org.dice_research.squirrel.metadata.ActivityUtil
 
add(CrawleableUri, long) - Method in class org.dice_research.squirrel.data.uri.filter.AbstractKnownUriFilterDecorator
 
add(CrawleableUri, long, long) - Method in class org.dice_research.squirrel.data.uri.filter.AbstractKnownUriFilterDecorator
 
add(CrawleableUri, long) - Method in class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter
 
add(CrawleableUri, long, long) - Method in class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter
 
add(CrawleableUri, long) - Method in interface org.dice_research.squirrel.data.uri.filter.KnownUriFilter
Adds the given URI to the list of already known URIs.
add(CrawleableUri, long, long) - Method in interface org.dice_research.squirrel.data.uri.filter.KnownUriFilter
Adds the given URI to the list of already known URIs together with the the time at which it has been crawled.
addData(String, Object) - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
 
addData(CrawleableUri, byte[]) - Method in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
 
addData(CrawleableUri, InputStream) - Method in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
 
addData(CrawleableUri, String) - Method in interface org.dice_research.squirrel.sink.UnstructuredDataSink
Stores the given data for the given URI.
addData(CrawleableUri, byte[]) - Method in interface org.dice_research.squirrel.sink.UnstructuredDataSink
Stores the given data for the given URI.
addData(CrawleableUri, InputStream) - Method in interface org.dice_research.squirrel.sink.UnstructuredDataSink
Stores the data from the given stream for the given URI.
addHashValuesForUris(List<CrawleableUri>) - Method in interface org.dice_research.squirrel.deduplication.hashing.UriHashCustodian
Add the given hash values for the given uris.
addMetaData(Model) - Method in interface org.dice_research.squirrel.sink.Sink
 
addNewUri(CrawleableUri, CrawleableUri) - Method in class org.dice_research.squirrel.collect.SimpleUriCollector
 
addNewUri(CrawleableUri, CrawleableUri) - Method in interface org.dice_research.squirrel.collect.UriCollector
Adds the given new URI to the list of URIs collected for the given URI.
addNewUri(CrawleableUri, Node) - Method in interface org.dice_research.squirrel.collect.UriCollector
Adds the given new URI to the list of URIs collected for the given URI.
addNewUri(CrawleableUri, String) - Method in interface org.dice_research.squirrel.collect.UriCollector
Adds the given new URI to the list of URIs collected for the given URI.
addNewUri(CrawleableUri) - Method in interface org.dice_research.squirrel.frontier.Frontier
Add this URIs to the Frontiers internal queue if the internal rules of the Frontier allow it.
addNewUris(List<CrawleableUri>) - Method in interface org.dice_research.squirrel.frontier.Frontier
Adds the given list of URIs to the Frontier.
addOutputResource(String, Resource) - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
 
ADDRESS_HOST_KEY - Static variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
ADDRESS_IP_KEY - Static variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
ADDRESS_KEY - Static variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
addStep(CrawleableUri, Class<?>, String...) - Static method in class org.dice_research.squirrel.metadata.ActivityUtil
A simple method which attaches a step with the given Class and the given actions to the CrawlingActivity of the given URI if it exists.
addStep(CrawleableUri, Class<?>) - Static method in class org.dice_research.squirrel.metadata.ActivityUtil
A simple method which attaches a step with the given Class to the CrawlingActivity of the given URI if it exists.
addStep(Class<?>, String...) - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
 
addToQueue(CrawleableUri) - Method in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
 
addToQueue(CrawleableUri) - Method in class org.dice_research.squirrel.queue.InMemoryQueue
 
addTriple(CrawleableUri, Triple) - Method in interface org.dice_research.squirrel.collect.UriCollector
Adds the given triple to the list of URIs collected from the given URI.
addTriple(CrawleableUri, Triple) - Method in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
 
addTriple(CrawleableUri, Triple) - Method in interface org.dice_research.squirrel.sink.tripleBased.TripleBasedSink
Add a triple for the given uri.
addUri(CrawleableUri) - Method in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
 
addUri(CrawleableUri) - Method in interface org.dice_research.squirrel.queue.UriQueue
Adds the given CrawleableUri instance to the queue.
AdvancedTripleBasedSink - Interface in org.dice_research.squirrel.sink.tripleBased
A specialization of TripleBasedSink which has the capability to give back all Triples stored behind a given CrawleableUri.
agent - Static variable in class org.dice_research.squirrel.vocab.PROV_O
 
AliveMessage - Class in org.dice_research.squirrel.worker
 
AliveMessage(int) - Constructor for class org.dice_research.squirrel.worker.AliveMessage
Create aliveMessage by an id of a worker.
approxNumberOfTriples - Static variable in class org.dice_research.squirrel.vocab.Squirrel
 
ArrayHashValue - Class in org.dice_research.squirrel.deduplication.hashing.impl
A hash value as a Array of integers.
ArrayHashValue() - Constructor for class org.dice_research.squirrel.deduplication.hashing.impl.ArrayHashValue
Constructor.
ArrayHashValue(Integer[]) - Constructor for class org.dice_research.squirrel.deduplication.hashing.impl.ArrayHashValue
Constructor.
Association - Static variable in class org.dice_research.squirrel.vocab.PROV_O
 

B

blockedIps - Variable in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
 
build() - Method in class org.dice_research.squirrel.rabbit.RPCServer.Builder
Builds the DataReceiverImpl instance with the previously given information.
builder - Variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer
 
builder() - Static method in class org.dice_research.squirrel.rabbit.RPCServer
Returns a newly created RPCServer.Builder.
Builder() - Constructor for class org.dice_research.squirrel.rabbit.RPCServer.Builder
 
buildMsgProcessingTask(QueueingConsumer.Delivery) - Method in class org.dice_research.squirrel.rabbit.RPCServer
 
byteSize - Static variable in class org.dice_research.squirrel.vocab.DCAT
 

C

CHARSET_NAME - Static variable in class org.dice_research.squirrel.data.uri.CrawleableUri
 
close() - Method in class org.dice_research.squirrel.data.uri.filter.AbstractKnownUriFilterDecorator
 
close() - Method in class org.dice_research.squirrel.queue.InMemoryQueue
 
close() - Method in interface org.dice_research.squirrel.queue.UriQueue
Close RDB connection, destroy the database.
close() - Method in interface org.dice_research.squirrel.sink.Sink
 
close(Closeable) - Static method in class org.dice_research.squirrel.utils.Closer
Closes the given Closeable and logs occuring exceptions with the Logger of this utility class.
close(Closeable, Logger) - Static method in class org.dice_research.squirrel.utils.Closer
Closes the given Closeable and logs occuring exceptions with the given Logger.
close(Closeable, Logger, boolean) - Static method in class org.dice_research.squirrel.utils.Closer
Closes the given Closeable and logs occuring exceptions with the given Logger.
closedSinks - Variable in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
Set of URIs for which the sink has already been closed.
closeQuietly(Closeable) - Static method in class org.dice_research.squirrel.utils.Closer
Closes the given Closeable while ignoring all exception that may occur.
Closer - Class in org.dice_research.squirrel.utils
A simple class offering methods to close other classes either quitely or with logging errors.
Closer() - Constructor for class org.dice_research.squirrel.utils.Closer
 
closeSinkForUri(CrawleableUri) - Method in class org.dice_research.squirrel.collect.SimpleUriCollector
 
closeSinkForUri(CrawleableUri) - Method in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
 
closeSinkForUri(CrawleableUri) - Method in interface org.dice_research.squirrel.sink.SinkBase
Closes the resources necessary for storing the data of the given URI.
compareTo(IpUriTypePair) - Method in class org.dice_research.squirrel.queue.IpUriTypePair
 
Configuration - Class in org.dice_research.squirrel.configurator
 
Configuration() - Constructor for class org.dice_research.squirrel.configurator.Configuration
 
Constants - Class in org.dice_research.squirrel
This class contains constant values of the Squirrel project.
Constants() - Constructor for class org.dice_research.squirrel.Constants
 
consumed - Variable in class org.dice_research.squirrel.iterators.SqlBasedIterator
 
contactPoint - Static variable in class org.dice_research.squirrel.vocab.DCAT
 
containsDataOf - Static variable in class org.dice_research.squirrel.vocab.Squirrel
 
count() - Method in class org.dice_research.squirrel.data.uri.filter.AbstractKnownUriFilterDecorator
 
count() - Method in class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter
 
count() - Method in interface org.dice_research.squirrel.data.uri.filter.KnownUriFilter
count the numbers of known URIs
crawl(List<CrawleableUri>) - Method in interface org.dice_research.squirrel.worker.Worker
Crawls the given URIs and sends URIs that have been found while crawling to the frontier.
CrawleableUri - Class in org.dice_research.squirrel.data.uri
This class represents a URI and additional meta data that is helpful for crawling it.
CrawleableUri(URI) - Constructor for class org.dice_research.squirrel.data.uri.CrawleableUri
 
CrawleableUri(URI, InetAddress) - Constructor for class org.dice_research.squirrel.data.uri.CrawleableUri
 
CrawleableUri(URI, InetAddress, UriType) - Constructor for class org.dice_research.squirrel.data.uri.CrawleableUri
Deprecated.
CrawleableUriAdapter() - Constructor for class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
CrawleableUriFactory - Interface in org.dice_research.squirrel.data.uri
This factory generates CrawleableUri instances.
CrawleableUriFactory4Tests - Class in org.dice_research.squirrel.data.uri
 
CrawleableUriFactory4Tests() - Constructor for class org.dice_research.squirrel.data.uri.CrawleableUriFactory4Tests
 
CrawleableUriFactoryImpl - Class in org.dice_research.squirrel.data.uri
A simple implementation of a CrawleableUriFactory that is expandable by UriFilter instances.
CrawleableUriFactoryImpl() - Constructor for class org.dice_research.squirrel.data.uri.CrawleableUriFactoryImpl
The default constructor that imposes no additional requirements.
CrawleableUriFactoryImpl(UriFilter...) - Constructor for class org.dice_research.squirrel.data.uri.CrawleableUriFactoryImpl
Constructor taking additional filters that are used to check URIs during the creation.
crawled - Static variable in class org.dice_research.squirrel.vocab.Squirrel
 
CrawlingActivity - Class in org.dice_research.squirrel.metadata
Representation of Crawling activity.
CrawlingActivity(CrawleableUri, String) - Constructor for class org.dice_research.squirrel.metadata.CrawlingActivity
Constructor.
CrawlingActivity.CrawlingURIState - Enum in org.dice_research.squirrel.metadata
 
crawlingDone(List<CrawleableUri>) - Method in interface org.dice_research.squirrel.frontier.Frontier
This method should be called after a list of URIs have been requested using the Frontier.getNextUris() method and the crawling has been finished.
crawlingInProcess - Variable in class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter.UriInfo
 
CrawlingResult - Class in org.dice_research.squirrel.rabbit.msgs
 
CrawlingResult(List<CrawleableUri>, String) - Constructor for class org.dice_research.squirrel.rabbit.msgs.CrawlingResult
 
CrawlingResult(List<CrawleableUri>) - Constructor for class org.dice_research.squirrel.rabbit.msgs.CrawlingResult
 
CrawlingURIState() - Constructor for enum org.dice_research.squirrel.metadata.CrawlingActivity.CrawlingURIState
 
create(String) - Method in interface org.dice_research.squirrel.data.uri.CrawleableUriFactory
Creates a CrawleableUri from the given URI String.
create(URI) - Method in interface org.dice_research.squirrel.data.uri.CrawleableUriFactory
Creates a CrawleableUri from the given URI instance.
create(URI, UriType) - Method in interface org.dice_research.squirrel.data.uri.CrawleableUriFactory
Creates a CrawleableUri from the given URI instance and the given UriType
create(URI, InetAddress, UriType) - Method in class org.dice_research.squirrel.data.uri.CrawleableUriFactory4Tests
 
create(String) - Method in class org.dice_research.squirrel.data.uri.CrawleableUriFactoryImpl
 
create(URI) - Method in class org.dice_research.squirrel.data.uri.CrawleableUriFactoryImpl
 
create(URI, UriType) - Method in class org.dice_research.squirrel.data.uri.CrawleableUriFactoryImpl
 
create(String) - Method in class org.dice_research.squirrel.data.uri.DefaultCrawleableUriFactory
 
create(URI) - Method in class org.dice_research.squirrel.data.uri.DefaultCrawleableUriFactory
 
create(URI, UriType) - Method in class org.dice_research.squirrel.data.uri.DefaultCrawleableUriFactory
 
createCrawleableUriList(String[]) - Static method in class org.dice_research.squirrel.data.uri.UriUtils
 
createCrawleableUriList(ArrayList, UriType) - Static method in class org.dice_research.squirrel.data.uri.UriUtils
Deprecated.
createCrawleableUriList(Collection<String>) - Static method in class org.dice_research.squirrel.data.uri.UriUtils
 

D

data - Variable in class org.dice_research.squirrel.data.uri.CrawleableUri
 
DATA_KEY - Static variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
DATA_NAME_KEY - Static variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
DATA_VALUE_KEY - Static variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
DATA_VALUE_TYPE_KEY - Static variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
dataHandler(DataHandler) - Method in class org.dice_research.squirrel.rabbit.RPCServer.Builder
Sets the handler that is called if data is incoming.
Dataset - Static variable in class org.dice_research.squirrel.vocab.DCAT
 
dateEnded - Variable in class org.dice_research.squirrel.metadata.CrawlingActivity
When the activity has ended.
dateLastAlive - Variable in class org.dice_research.squirrel.worker.WorkerInfo
The date of the last AliveMessage from the org.apache.jena.sparql.sse.ItemWalker.Worker.
dateStarted - Variable in class org.dice_research.squirrel.metadata.CrawlingActivity
When the activity has started.
DCAT - Class in org.dice_research.squirrel.vocab
 
DCAT() - Constructor for class org.dice_research.squirrel.vocab.DCAT
 
decodeFromString(String) - Method in interface org.dice_research.squirrel.deduplication.hashing.HashValue
Decode a HashValue from the given String.
decodeFromString(String) - Method in class org.dice_research.squirrel.deduplication.hashing.impl.ArrayHashValue
 
decorated - Variable in class org.dice_research.squirrel.data.uri.filter.AbstractKnownUriFilterDecorator
 
DEDUPLICATION_ACTIVE_KEY - Static variable in class org.dice_research.squirrel.Constants
 
DEDUPLICATOR_QUEUE_NAME - Static variable in class org.dice_research.squirrel.Constants
 
DEFAULT_ACTIVITY_URI_PREFIX - Static variable in class org.dice_research.squirrel.Constants
 
DEFAULT_CHARSET - Static variable in class org.dice_research.squirrel.Constants
 
DEFAULT_DEDUPLICATION_ACTIVE - Static variable in class org.dice_research.squirrel.Constants
 
DEFAULT_META_DATA_GRAPH_URI - Static variable in class org.dice_research.squirrel.Constants
 
DEFAULT_RESULT_GRAPH_URI_PREFIX - Static variable in class org.dice_research.squirrel.Constants
 
DEFAULT_STATUS_URI_PREFIX - Static variable in class org.dice_research.squirrel.Constants
 
DEFAULT_USER_AGENT - Static variable in class org.dice_research.squirrel.Constants
 
DEFAULT_WORKER_URI_PREFIX - Static variable in class org.dice_research.squirrel.Constants
 
DefaultCrawleableUriFactory - Class in org.dice_research.squirrel.data.uri
 
DefaultCrawleableUriFactory() - Constructor for class org.dice_research.squirrel.data.uri.DefaultCrawleableUriFactory
 
defaultRecrawlTime - Variable in class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter
 
DELIMETER - Static variable in class org.dice_research.squirrel.deduplication.hashing.impl.ArrayHashValue
The delimeter between the individual HashValues
delivery - Variable in class org.dice_research.squirrel.rabbit.RPCServer.MsgProcessingTask
 
deserialize(byte[]) - Method in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer
 
deserialize(byte[]) - Method in class org.dice_research.squirrel.data.uri.serialize.java.GzipJavaUriSerializer
 
deserialize(byte[]) - Method in class org.dice_research.squirrel.data.uri.serialize.java.SnappyJavaUriSerializer
 
deserialize(byte[]) - Method in interface org.dice_research.squirrel.data.uri.serialize.Serializer
 
deserializeSafely(byte[]) - Method in interface org.dice_research.squirrel.data.uri.serialize.Serializer
 
Distribution - Static variable in class org.dice_research.squirrel.vocab.DCAT
 
distribution - Static variable in class org.dice_research.squirrel.vocab.DCAT
 
doesRecrawling() - Method in interface org.dice_research.squirrel.frontier.Frontier
Indicates whether this frontier does recrawling.
downloadURL - Static variable in class org.dice_research.squirrel.vocab.DCAT
 

E

encodeToString() - Method in interface org.dice_research.squirrel.deduplication.hashing.HashValue
Encode to String in order to easily store in a database.
encodeToString() - Method in class org.dice_research.squirrel.deduplication.hashing.impl.ArrayHashValue
 
ENCODING_CHARSET - Static variable in class org.dice_research.squirrel.data.uri.CrawleableUri
 
endedAtTime - Static variable in class org.dice_research.squirrel.vocab.PROV_O
 
env - Static variable in class org.dice_research.squirrel.configurator.Configuration
 
equals(Object) - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
 
equals(Object) - Method in class org.dice_research.squirrel.deduplication.hashing.impl.ArrayHashValue
 
equals(Object) - Method in class org.dice_research.squirrel.queue.IpUriTypePair
 
equals(Object) - Method in class org.dice_research.squirrel.rabbit.msgs.CrawlingResult
 
equals(Object) - Method in class org.dice_research.squirrel.rabbit.msgs.UriSet
 
equals(Object) - Method in class org.dice_research.squirrel.rabbit.msgs.UriSetRequest
 
ExtendedFrontier - Interface in org.dice_research.squirrel.frontier
 

F

filter(CrawleableUri) - Method in class org.dice_research.squirrel.data.uri.CrawleableUriFactoryImpl
Returns the given CrawleableUri instance if all local CrawleableUriFactoryImpl.filters marked it as a good URI.
filters - Variable in class org.dice_research.squirrel.data.uri.CrawleableUriFactoryImpl
URI filters applied to the URIs during the generation.
finishActivity(Sink) - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
Finish the crawling activity and send data to sink
fn - Static variable in class org.dice_research.squirrel.vocab.VCard
 
fromByteArray(byte[]) - Static method in class org.dice_research.squirrel.data.uri.CrawleableUri
Deprecated.
Use the JSON deserialization instead.
fromByteBuffer(ByteBuffer) - Static method in class org.dice_research.squirrel.data.uri.CrawleableUri
Deprecated.
Use the JSON deserialization instead.
fromString(byte[]) - Static method in class org.dice_research.squirrel.data.uri.serialize.java.GzipJavaUriSerializer
 
fromString(byte[]) - Static method in class org.dice_research.squirrel.data.uri.serialize.java.SnappyJavaUriSerializer
 
Frontier - Interface in org.dice_research.squirrel.frontier
A Frontier is a central class of the crawler managing a queue of URIs that should be crawled in the future.
FRONTIER_QUEUE_NAME - Static variable in class org.dice_research.squirrel.Constants
 
frontierDoesRecrawling - Variable in class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter
Indicates whether the Frontier using this filter does recrawling.

G

generateFileName(String, boolean) - Static method in class org.dice_research.squirrel.data.uri.UriUtils
generateFileName(String, String) - Static method in class org.dice_research.squirrel.data.uri.UriUtils
Deprecated.
generateFileName(CrawleableUri, String) - Static method in class org.dice_research.squirrel.data.uri.UriUtils
 
generatePrefixMap() - Static method in class org.dice_research.squirrel.vocab.Prefixes
 
getCrawleableUri() - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
 
getCrawleableUriList() - Static method in class org.dice_research.squirrel.data.uri.UriUtils
 
getCrawledRdfData() - Method in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
Returns the data written to the sink as a map with the crawled URI as key and the RDF data as value.
getCrawledUnstructuredData() - Method in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
Returns the data written to the sink as a map with the crawled URI as key and the unstructured data as value.
getData(String) - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
 
getData() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
 
getDateLastAlive() - Method in class org.dice_research.squirrel.worker.WorkerInfo
 
getDecorated() - Method in class org.dice_research.squirrel.data.uri.filter.AbstractKnownUriFilterDecorator
 
getDecorated() - Method in interface org.dice_research.squirrel.data.uri.filter.KnownUriFilterDecorator
 
getDomainName(String) - Static method in class org.dice_research.squirrel.data.uri.UriUtils
 
getEnv(String, Logger) - Static method in class org.dice_research.squirrel.configurator.Configuration
 
getEnvBoolean(String, Logger) - Static method in class org.dice_research.squirrel.configurator.Configuration
 
getEnvInteger(String, Logger) - Static method in class org.dice_research.squirrel.configurator.Configuration
 
getEnvLong(String, Logger) - Static method in class org.dice_research.squirrel.configurator.Configuration
 
getId() - Method in interface org.dice_research.squirrel.worker.Worker
Gives the unique id of the worker.
getIdOfWorker() - Method in class org.dice_research.squirrel.rabbit.msgs.UriSetRequest
 
getIdOfWorker() - Method in class org.dice_research.squirrel.worker.AliveMessage
Get the id of the worker.
getIpAddress() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
 
getIPURIIterator() - Method in class org.dice_research.squirrel.queue.InMemoryQueue
 
getIPURIIterator() - Method in interface org.dice_research.squirrel.queue.IpAddressBasedQueue
Goes through the queue und collects all IP-address with their URIs
getIterator() - Method in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
 
getIterator() - Method in class org.dice_research.squirrel.queue.InMemoryQueue
 
getNextUris() - Method in interface org.dice_research.squirrel.frontier.Frontier
Returns the next chunk of URIs that should be crawled or null.
getNextUris() - Method in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
 
getNextUris() - Method in interface org.dice_research.squirrel.queue.UriQueue
Returns the next chunk of URIs that should be crawled or null.
getNumberOfBlockedIps() - Method in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
 
getNumberOfBlockedIps() - Method in interface org.dice_research.squirrel.queue.IpAddressBasedQueue
Returns the number of IP addresses that are currently blocked.
getNumberOfPendingUris() - Method in interface org.dice_research.squirrel.frontier.Frontier
(optional) Returns the number of URIs that have been requested from the Frontier using Frontier.getNextUris() and have not been marked as crawled using Frontier#crawlingDone(Map).
getOutdatedUris() - Method in class org.dice_research.squirrel.data.uri.filter.AbstractKnownUriFilterDecorator
 
getOutdatedUris() - Method in class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter
 
getOutdatedUris() - Method in interface org.dice_research.squirrel.data.uri.filter.KnownUriFilter
Returns all CrawleableUris which have to be recrawled.
getSize() - Method in class org.dice_research.squirrel.collect.SimpleUriCollector
 
getSize(CrawleableUri) - Method in class org.dice_research.squirrel.collect.SimpleUriCollector
 
getSize(CrawleableUri) - Method in interface org.dice_research.squirrel.collect.UriCollector
Returns the total of uris that have been collected
getState() - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
 
getStepsAsString() - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
 
getTempDir(String, String) - Static method in class org.dice_research.squirrel.utils.TempFileHelper
Creates a temporary directory that can be used for tests.
getTimestampNextCrawl() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
 
getTriplesForGraph(CrawleableUri) - Method in interface org.dice_research.squirrel.sink.tripleBased.AdvancedTripleBasedSink
Get all Triples behind the given uri.
getType() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
Deprecated.
getUri() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
 
getUri() - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
 
getURI() - Static method in class org.dice_research.squirrel.vocab.DCAT
returns the URI for this schema
getURI() - Static method in class org.dice_research.squirrel.vocab.PROV_O
returns the URI for this schema
getURI() - Static method in class org.dice_research.squirrel.vocab.Squirrel
returns the URI for this schema
getURI() - Static method in class org.dice_research.squirrel.vocab.VCard
returns the URI for this schema
getUri() - Method in interface org.dice_research.squirrel.worker.Worker
Gives the unique URI of the worker.
getUris(CrawleableUri) - Method in class org.dice_research.squirrel.collect.SimpleUriCollector
 
getUris(CrawleableUri) - Method in interface org.dice_research.squirrel.collect.UriCollector
Returns a list of serialized CrawleableUri instances that have been collected for the given URI.
getUris(IpUriTypePair) - Method in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
 
getUris(IpUriTypePair) - Method in class org.dice_research.squirrel.queue.InMemoryQueue
 
getUrisCrawling() - Method in class org.dice_research.squirrel.worker.WorkerInfo
 
getUrisWithSameHashValues(Set<HashValue>) - Method in interface org.dice_research.squirrel.deduplication.hashing.UriHashCustodian
Get all uris that have a common hash value with one of the hash values of the given set.
gson - Variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer
 
gson - Variable in class org.dice_research.squirrel.rabbit.RabbitMQHelper
Deprecated.
 
GsonUriSerializer - Class in org.dice_research.squirrel.data.uri.serialize.gson
A serializer that uses Gson to serialize URIs.
GsonUriSerializer() - Constructor for class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer
 
GsonUriSerializer(GsonBuilder) - Constructor for class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer
 
GsonUriSerializer.CrawleableUriAdapter - Class in org.dice_research.squirrel.data.uri.serialize.gson
 
GzipJavaUriSerializer - Class in org.dice_research.squirrel.data.uri.serialize.java
 
GzipJavaUriSerializer() - Constructor for class org.dice_research.squirrel.data.uri.serialize.java.GzipJavaUriSerializer
 

H

hadPlan - Static variable in class org.dice_research.squirrel.vocab.PROV_O
 
handleData(byte[], ResponseHandler, String, String) - Method in interface org.dice_research.squirrel.rabbit.RespondingDataHandler
 
hasEmail - Static variable in class org.dice_research.squirrel.vocab.VCard
 
hashCode() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
 
hashCode() - Method in class org.dice_research.squirrel.queue.IpUriTypePair
 
hashCode() - Method in class org.dice_research.squirrel.rabbit.msgs.CrawlingResult
 
hashCode() - Method in class org.dice_research.squirrel.rabbit.msgs.UriSet
 
hashCode() - Method in class org.dice_research.squirrel.rabbit.msgs.UriSetRequest
 
HashValue - Interface in org.dice_research.squirrel.deduplication.hashing
An abstract representation of a hash value computed by a TripleSetHashFunction.
hashValues - Variable in class org.dice_research.squirrel.deduplication.hashing.impl.ArrayHashValue
The Array of HashValues.
hasNext - Variable in class org.dice_research.squirrel.iterators.SqlBasedIterator
 
hasNext() - Method in class org.dice_research.squirrel.iterators.SqlBasedIterator
 
hasNext_unsecured() - Method in class org.dice_research.squirrel.iterators.SqlBasedIterator
 
healthyness - Variable in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
The healthyness of the sink that is set to false if an error is encountered.

I

idOfWorker - Variable in class org.dice_research.squirrel.rabbit.msgs.CrawlingResult
 
idOfWorker - Variable in class org.dice_research.squirrel.rabbit.msgs.UriSetRequest
The id of the Worker that sent this request.
idOfWorker - Variable in class org.dice_research.squirrel.worker.AliveMessage
The id of the worker that sends the alive message.
informAboutDeadWorker(int, List<CrawleableUri>) - Method in interface org.dice_research.squirrel.frontier.ExtendedFrontier
The frontier gets the information that some worker has died and he has to react somehow.
InMemoryKnownUriFilter - Class in org.dice_research.squirrel.data.uri.filter
A simple in-memory implementation of the KnownUriFilter interface.
InMemoryKnownUriFilter(boolean, long) - Constructor for class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter
Constructor.
InMemoryKnownUriFilter() - Constructor for class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter
Constructor.
InMemoryKnownUriFilter(Hashtable<CrawleableUri, InMemoryKnownUriFilter.UriInfo>, boolean) - Constructor for class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter
Constructor.
InMemoryKnownUriFilter.UriInfo - Class in org.dice_research.squirrel.data.uri.filter
 
InMemoryQueue - Class in org.dice_research.squirrel.queue
 
InMemoryQueue() - Constructor for class org.dice_research.squirrel.queue.InMemoryQueue
 
InMemoryQueue(Comparator<IpUriTypePair>) - Constructor for class org.dice_research.squirrel.queue.InMemoryQueue
 
InMemorySink - Class in org.dice_research.squirrel.sink.impl.mem
This is a simple in-memory implementation of a sink that can be used for testing purposes.
InMemorySink() - Constructor for class org.dice_research.squirrel.sink.impl.mem.InMemorySink
 
ip - Variable in class org.dice_research.squirrel.queue.IpUriTypePair
 
ipAddress - Variable in class org.dice_research.squirrel.data.uri.CrawleableUri
 
IpAddressBasedQueue - Interface in org.dice_research.squirrel.queue
This extension of the UriQueue interface defines additional methods enabling the queue to manage the retrieving of chunks of URIs based on IP addresses.
IpUriTypePair - Class in org.dice_research.squirrel.queue
 
IpUriTypePair(InetAddress, UriType) - Constructor for class org.dice_research.squirrel.queue.IpUriTypePair
 
isEmpty() - Method in class org.dice_research.squirrel.queue.InMemoryQueue
 
isEmpty() - Method in interface org.dice_research.squirrel.queue.UriQueue
Returns true is the queue is empty
isSinkHealthy() - Method in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
Returns the status of the sink.
isStringMatchRegexps(String, String[]) - Static method in class org.dice_research.squirrel.data.uri.UriUtils
 
isStringMatchRegexps(String, String[]) - Method in class org.dice_research.squirrel.uri.processing.UriProcessor
 
isUriGood(CrawleableUri) - Method in class org.dice_research.squirrel.data.uri.filter.AbstractKnownUriFilterDecorator
 
isUriGood(CrawleableUri) - Method in class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter
 
isUriGood(CrawleableUri) - Method in interface org.dice_research.squirrel.data.uri.filter.UriFilter
Returns true if the given CrawleableUri object fulfills the requirements imposed by this filter.

K

keyword - Static variable in class org.dice_research.squirrel.vocab.DCAT
 
Kind - Static variable in class org.dice_research.squirrel.vocab.VCard
 
KnownUriFilter - Interface in org.dice_research.squirrel.data.uri.filter
A UriFilter that works like a blacklist filter and contains only those URIs on its blacklist that the crawler already has seen before.
KnownUriFilterDecorator - Interface in org.dice_research.squirrel.data.uri.filter
 

L

landingPage - Static variable in class org.dice_research.squirrel.vocab.DCAT
 
lastCrawlTimestamp - Variable in class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter.UriInfo
 
LIMITFORITERATOR - Static variable in class org.dice_research.squirrel.queue.InMemoryQueue
 
LOGGER - Static variable in class org.dice_research.squirrel.collect.SimpleUriCollector
 
LOGGER - Static variable in class org.dice_research.squirrel.data.uri.CrawleableUri
 
LOGGER - Static variable in class org.dice_research.squirrel.data.uri.CrawleableUriFactoryImpl
 
LOGGER - Static variable in class org.dice_research.squirrel.data.uri.DefaultCrawleableUriFactory
 
LOGGER - Static variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer
 
LOGGER - Static variable in class org.dice_research.squirrel.iterators.SqlBasedIterator
 
LOGGER - Static variable in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
 
LOGGER - Static variable in class org.dice_research.squirrel.rabbit.RabbitMQHelper
Deprecated.
 
LOGGER - Static variable in class org.dice_research.squirrel.rabbit.RPCServer
 
LOGGER - Static variable in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
 
LOGGER - Static variable in class org.dice_research.squirrel.uri.processing.UriProcessor
 
LOGGER - Static variable in class org.dice_research.squirrel.utils.Closer
 

M

markIpAddressAsAccessible(InetAddress) - Method in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
 
markIpAddressAsAccessible(InetAddress) - Method in interface org.dice_research.squirrel.queue.IpAddressBasedQueue
Marks the given IP address as accessible.
maxParallelProcessedMsgs(int) - Method in class org.dice_research.squirrel.rabbit.RPCServer.Builder
Sets the maximum number of incoming messages that are processed in parallel.
mediaType - Static variable in class org.dice_research.squirrel.vocab.DCAT
 
MsgProcessingTask(QueueingConsumer.Delivery, ResponseHandler) - Constructor for class org.dice_research.squirrel.rabbit.RPCServer.MsgProcessingTask
 

N

next - Variable in class org.dice_research.squirrel.iterators.SqlBasedIterator
 
next() - Method in class org.dice_research.squirrel.iterators.SqlBasedIterator
 
nextCrawlTimestamp - Variable in class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter.UriInfo
 
numberOfTriples - Variable in class org.dice_research.squirrel.metadata.CrawlingActivity
 

O

open() - Method in class org.dice_research.squirrel.queue.InMemoryQueue
 
open() - Method in interface org.dice_research.squirrel.queue.UriQueue
Open RDB connection, init the database.
openSinkForUri(CrawleableUri) - Method in class org.dice_research.squirrel.collect.SimpleUriCollector
 
openSinkForUri(CrawleableUri) - Method in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
 
openSinkForUri(CrawleableUri) - Method in interface org.dice_research.squirrel.sink.SinkBase
Opens the sink to process data for the given URI.
org.dice_research.squirrel - package org.dice_research.squirrel
 
org.dice_research.squirrel.collect - package org.dice_research.squirrel.collect
 
org.dice_research.squirrel.configurator - package org.dice_research.squirrel.configurator
 
org.dice_research.squirrel.data.uri - package org.dice_research.squirrel.data.uri
 
org.dice_research.squirrel.data.uri.filter - package org.dice_research.squirrel.data.uri.filter
 
org.dice_research.squirrel.data.uri.serialize - package org.dice_research.squirrel.data.uri.serialize
 
org.dice_research.squirrel.data.uri.serialize.gson - package org.dice_research.squirrel.data.uri.serialize.gson
 
org.dice_research.squirrel.data.uri.serialize.java - package org.dice_research.squirrel.data.uri.serialize.java
 
org.dice_research.squirrel.deduplication.hashing - package org.dice_research.squirrel.deduplication.hashing
 
org.dice_research.squirrel.deduplication.hashing.impl - package org.dice_research.squirrel.deduplication.hashing.impl
 
org.dice_research.squirrel.frontier - package org.dice_research.squirrel.frontier
 
org.dice_research.squirrel.iterators - package org.dice_research.squirrel.iterators
 
org.dice_research.squirrel.metadata - package org.dice_research.squirrel.metadata
 
org.dice_research.squirrel.queue - package org.dice_research.squirrel.queue
 
org.dice_research.squirrel.rabbit - package org.dice_research.squirrel.rabbit
 
org.dice_research.squirrel.rabbit.msgs - package org.dice_research.squirrel.rabbit.msgs
 
org.dice_research.squirrel.sink - package org.dice_research.squirrel.sink
 
org.dice_research.squirrel.sink.impl.mem - package org.dice_research.squirrel.sink.impl.mem
 
org.dice_research.squirrel.sink.tripleBased - package org.dice_research.squirrel.sink.tripleBased
 
org.dice_research.squirrel.uri.processing - package org.dice_research.squirrel.uri.processing
 
org.dice_research.squirrel.utils - package org.dice_research.squirrel.utils
 
org.dice_research.squirrel.vocab - package org.dice_research.squirrel.vocab
 
org.dice_research.squirrel.worker - package org.dice_research.squirrel.worker
 
outputResource - Variable in class org.dice_research.squirrel.metadata.CrawlingActivity
The URIs of the resources generated by this activity as well as their type as RDF Resource.

P

page - Variable in class org.dice_research.squirrel.iterators.SqlBasedIterator
 
parseObject(byte[]) - Method in class org.dice_research.squirrel.rabbit.RabbitMQHelper
Deprecated.
A method for deserializing an object that has been serialized using the RabbitMQHelper.writeObject(Object) method.
performCrawling(CrawleableUri) - Method in interface org.dice_research.squirrel.worker.Worker
Crawls the given URI and adds new URIs that have been found while crawling to the given list of new URIs.
Plan - Static variable in class org.dice_research.squirrel.vocab.PROV_O
 
PREFIX_TO_URI - Static variable in class org.dice_research.squirrel.vocab.Prefixes
 
Prefixes - Class in org.dice_research.squirrel.vocab
A simple utility class in which we collected out predefined prefixes.
Prefixes() - Constructor for class org.dice_research.squirrel.vocab.Prefixes
 
prepareMetadataModel() - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
Prepare the metadata model and returns it.
property(String) - Static method in class org.dice_research.squirrel.vocab.DCAT
 
property(String) - Static method in class org.dice_research.squirrel.vocab.PROV_O
 
property(String) - Static method in class org.dice_research.squirrel.vocab.Squirrel
 
property(String) - Static method in class org.dice_research.squirrel.vocab.VCard
 
PROV_O - Class in org.dice_research.squirrel.vocab
 
PROV_O() - Constructor for class org.dice_research.squirrel.vocab.PROV_O
 
ps - Variable in class org.dice_research.squirrel.iterators.SqlBasedIterator
 

Q

qualifiedAssociation - Static variable in class org.dice_research.squirrel.vocab.PROV_O
 
queue - Variable in class org.dice_research.squirrel.queue.InMemoryQueue
 
queueMutex - Variable in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
 

R

RabbitMQHelper - Class in org.dice_research.squirrel.rabbit
Deprecated.
Use one of the Serializer instead
RabbitMQHelper() - Constructor for class org.dice_research.squirrel.rabbit.RabbitMQHelper
Deprecated.
 
RabbitMQHelper(Gson) - Constructor for class org.dice_research.squirrel.rabbit.RabbitMQHelper
Deprecated.
 
RDB_HOST_NAME_KEY - Static variable in class org.dice_research.squirrel.Constants
 
RDB_PORT_KEY - Static variable in class org.dice_research.squirrel.Constants
 
rdfData - Variable in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
In-memory map used to store the RDF data that is written to the sink.
read(JsonReader) - Method in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
readDataObject(JsonReader, Map<String, Object>) - Method in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
readInetAddress(JsonReader) - Method in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
recognizeInetAddress(CrawleableUri) - Method in class org.dice_research.squirrel.uri.processing.UriProcessor
 
recognizeInetAddress(CrawleableUri) - Method in interface org.dice_research.squirrel.uri.processing.UriProcessorInterface
Recognizes the IP address of CrawleableUri.
recognizeUriType(CrawleableUri) - Method in class org.dice_research.squirrel.uri.processing.UriProcessor
 
recognizeUriType(CrawleableUri) - Method in interface org.dice_research.squirrel.uri.processing.UriProcessorInterface
Recognizes the type of CrawleableUri.
resource(String) - Static method in class org.dice_research.squirrel.vocab.DCAT
 
resource(String) - Static method in class org.dice_research.squirrel.vocab.PROV_O
 
resource(String) - Static method in class org.dice_research.squirrel.vocab.Squirrel
 
resource(String) - Static method in class org.dice_research.squirrel.vocab.VCard
 
RespondingDataHandler - Interface in org.dice_research.squirrel.rabbit
 
responseChannel - Variable in class org.dice_research.squirrel.rabbit.RPCServer
 
responseFactory - Variable in class org.dice_research.squirrel.rabbit.RPCServer.Builder
 
ResponseHandler - Interface in org.dice_research.squirrel.rabbit
 
responseHandler - Variable in class org.dice_research.squirrel.rabbit.RPCServer.MsgProcessingTask
 
responseQueueFactory(RabbitQueueFactory) - Method in class org.dice_research.squirrel.rabbit.RPCServer.Builder
Method for providing the necessary information to connect to the queue to which responses should be sent.
ResultFile - Static variable in class org.dice_research.squirrel.vocab.Squirrel
 
ResultGraph - Static variable in class org.dice_research.squirrel.vocab.Squirrel
 
RPCServer - Class in org.dice_research.squirrel.rabbit
 
RPCServer(RabbitQueue, RespondingDataHandler, int, Channel) - Constructor for class org.dice_research.squirrel.rabbit.RPCServer
 
RPCServer.Builder - Class in org.dice_research.squirrel.rabbit
 
RPCServer.MsgProcessingTask - Class in org.dice_research.squirrel.rabbit
 
rs - Variable in class org.dice_research.squirrel.iterators.SqlBasedIterator
 
run() - Method in class org.dice_research.squirrel.rabbit.RPCServer.MsgProcessingTask
 

S

searchPath4Files(File) - Static method in class org.dice_research.squirrel.utils.TempPathUtils
 
sendResponse(byte[], String, String) - Method in interface org.dice_research.squirrel.rabbit.ResponseHandler
 
sendResponse(byte[], String, String) - Method in class org.dice_research.squirrel.rabbit.RPCServer
 
sendsAliveMessages() - Method in interface org.dice_research.squirrel.worker.Worker
Indicates whether the worker sends alive messages in order to convince the Frontier that he is still alive.
serialize(T) - Method in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer
 
serialize(T) - Method in class org.dice_research.squirrel.data.uri.serialize.java.GzipJavaUriSerializer
 
serialize(T) - Method in class org.dice_research.squirrel.data.uri.serialize.java.SnappyJavaUriSerializer
 
serialize(T) - Method in interface org.dice_research.squirrel.data.uri.serialize.Serializer
 
serializer - Variable in class org.dice_research.squirrel.collect.SimpleUriCollector
 
Serializer - Interface in org.dice_research.squirrel.data.uri.serialize
 
serializeSafely(T) - Method in interface org.dice_research.squirrel.data.uri.serialize.Serializer
 
serialVersionUID - Static variable in class org.dice_research.squirrel.data.uri.CrawleableUri
 
serialVersionUID - Static variable in class org.dice_research.squirrel.deduplication.hashing.impl.ArrayHashValue
 
serialVersionUID - Static variable in class org.dice_research.squirrel.metadata.CrawlingActivity
 
serialVersionUID - Static variable in class org.dice_research.squirrel.rabbit.msgs.CrawlingResult
 
serialVersionUID - Static variable in class org.dice_research.squirrel.rabbit.msgs.UriSet
 
serialVersionUID - Static variable in class org.dice_research.squirrel.rabbit.msgs.UriSetRequest
 
serialVersionUID - Static variable in class org.dice_research.squirrel.worker.AliveMessage
 
setData(Map<String, Object>) - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
 
setDateLastAlive(Date) - Method in class org.dice_research.squirrel.worker.WorkerInfo
 
setIpAddress(InetAddress) - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
 
setNumberOfTriples(long) - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
 
setState(CrawlingActivity.CrawlingURIState) - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
 
setTerminateFlag(boolean) - Method in interface org.dice_research.squirrel.worker.Worker
 
setTimestampNextCrawl(long) - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
 
setType(UriType) - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
Deprecated.
SimpleUriCollector - Class in org.dice_research.squirrel.collect
 
SimpleUriCollector(Serializer) - Constructor for class org.dice_research.squirrel.collect.SimpleUriCollector
 
Sink - Interface in org.dice_research.squirrel.sink
The interface of a sink used by a worker.
SinkBase - Interface in org.dice_research.squirrel.sink
This interface defines the basic functionality of all sinks, i.e., they can be opened and closed for a given URI.
SnappyJavaUriSerializer - Class in org.dice_research.squirrel.data.uri.serialize.java
 
SnappyJavaUriSerializer() - Constructor for class org.dice_research.squirrel.data.uri.serialize.java.SnappyJavaUriSerializer
 
SqlBasedIterator - Class in org.dice_research.squirrel.iterators
 
SqlBasedIterator(PreparedStatement) - Constructor for class org.dice_research.squirrel.iterators.SqlBasedIterator
 
Squirrel - Class in org.dice_research.squirrel.vocab
 
Squirrel() - Constructor for class org.dice_research.squirrel.vocab.Squirrel
 
SQUIRREL_URI_PREFIX - Static variable in class org.dice_research.squirrel.Constants
 
start - Variable in class org.dice_research.squirrel.iterators.SqlBasedIterator
 
startedAtTime - Static variable in class org.dice_research.squirrel.vocab.PROV_O
 
state - Variable in class org.dice_research.squirrel.metadata.CrawlingActivity
The crawling state of the uri.
status - Static variable in class org.dice_research.squirrel.vocab.Squirrel
 
steps - Variable in class org.dice_research.squirrel.metadata.CrawlingActivity
 

T

TempFileHelper - Class in org.dice_research.squirrel.utils
 
TempFileHelper() - Constructor for class org.dice_research.squirrel.utils.TempFileHelper
 
TempPathUtils - Class in org.dice_research.squirrel.utils
 
TempPathUtils() - Constructor for class org.dice_research.squirrel.utils.TempPathUtils
 
theme - Static variable in class org.dice_research.squirrel.vocab.DCAT
 
timestampNextCrawl - Variable in class org.dice_research.squirrel.data.uri.CrawleableUri
 
toByteArray() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
Deprecated.
Use the JSON serialization instead.
toByteBuffer() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
Deprecated.
Use the JSON serialization instead.
toString() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
 
toString(Serializable) - Static method in class org.dice_research.squirrel.data.uri.serialize.java.GzipJavaUriSerializer
 
toString(Serializable) - Static method in class org.dice_research.squirrel.data.uri.serialize.java.SnappyJavaUriSerializer
 
toString() - Method in class org.dice_research.squirrel.deduplication.hashing.impl.ArrayHashValue
 
toString() - Method in class org.dice_research.squirrel.queue.IpUriTypePair
 
toString() - Method in class org.dice_research.squirrel.rabbit.msgs.CrawlingResult
 
toString() - Method in class org.dice_research.squirrel.rabbit.msgs.UriSet
 
total_uris - Variable in class org.dice_research.squirrel.collect.SimpleUriCollector
 
TripleBasedSink - Interface in org.dice_research.squirrel.sink.tripleBased
A sink that can handle triples.
type - Variable in class org.dice_research.squirrel.data.uri.CrawleableUri
Deprecated.
type - Variable in class org.dice_research.squirrel.queue.IpUriTypePair
 

U

unstrcuturedData - Variable in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
In-memory map used to store the unstructured data that is written to the sink.
UnstructuredDataSink - Interface in org.dice_research.squirrel.sink
A sink that can handle unstructured data.
uri - Variable in class org.dice_research.squirrel.data.uri.CrawleableUri
 
uri - Variable in enum org.dice_research.squirrel.metadata.CrawlingActivity.CrawlingURIState
 
uri - Variable in class org.dice_research.squirrel.metadata.CrawlingActivity
The uri for the crawling activity.
uri - Static variable in class org.dice_research.squirrel.vocab.DCAT
The namespace of the vocabulary as a string
uri - Static variable in class org.dice_research.squirrel.vocab.PROV_O
The namespace of the vocabulary as a string
uri - Static variable in class org.dice_research.squirrel.vocab.Squirrel
The namespace of the vocabulary as a string
uri - Static variable in class org.dice_research.squirrel.vocab.VCard
The namespace of the vocabulary as a string
URI_CRAWLING_ACTIVITY - Static variable in class org.dice_research.squirrel.Constants
 
URI_CRAWLING_ACTIVITY_URI - Static variable in class org.dice_research.squirrel.Constants
 
URI_DATA_FILE_NAME - Static variable in class org.dice_research.squirrel.Constants
 
URI_HASH_KEY - Static variable in class org.dice_research.squirrel.Constants
 
URI_HTTP_ACCEPT_CHARSET_HEADER - Static variable in class org.dice_research.squirrel.Constants
 
URI_HTTP_ACCEPT_HEADER - Static variable in class org.dice_research.squirrel.Constants
 
URI_HTTP_CHARSET_KEY - Static variable in class org.dice_research.squirrel.Constants
 
URI_HTTP_MIME_TYPE_KEY - Static variable in class org.dice_research.squirrel.Constants
 
URI_HTTP_STATUS_CODE - Static variable in class org.dice_research.squirrel.Constants
 
URI_KEY - Static variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
URI_PREFERRED_RECRAWL_ON - Static variable in class org.dice_research.squirrel.Constants
The preferred date for recrawling a URI is assumed to be a timestamp (in ms from 1st January 1970).
URI_START_INDEX - Static variable in class org.dice_research.squirrel.data.uri.CrawleableUri
 
URI_TYPE_KEY - Static variable in class org.dice_research.squirrel.Constants
 
URI_TYPE_KEY - Static variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
URI_TYPE_VALUE_CSV - Static variable in class org.dice_research.squirrel.Constants
 
URI_TYPE_VALUE_DEREF - Static variable in class org.dice_research.squirrel.Constants
 
URI_TYPE_VALUE_DUMP - Static variable in class org.dice_research.squirrel.Constants
 
URI_TYPE_VALUE_HTML - Static variable in class org.dice_research.squirrel.Constants
 
URI_TYPE_VALUE_SPARQL - Static variable in class org.dice_research.squirrel.Constants
 
UriCollector - Interface in org.dice_research.squirrel.collect
A URI collector stores the URIs that have been found by a worker while crawling/processing a certain URI.
UriFilter - Interface in org.dice_research.squirrel.data.uri.filter
A simple filter that can decide whether a given CrawleableUri object imposes a certain requirement or not.
UriHashCustodian - Interface in org.dice_research.squirrel.deduplication.hashing
This component maintains HashValues for uris.
uriHostedOn - Static variable in class org.dice_research.squirrel.vocab.Squirrel
 
UriInfo(long, long, boolean) - Constructor for class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter.UriInfo
 
UriProcessor - Class in org.dice_research.squirrel.uri.processing
Uri Processor implementation.
UriProcessor() - Constructor for class org.dice_research.squirrel.uri.processing.UriProcessor
 
UriProcessorInterface - Interface in org.dice_research.squirrel.uri.processing
Interface for Uri Processor, defines main methods for processing.
UriQueue - Interface in org.dice_research.squirrel.queue
Interface of a URI queue managing the URIs that should be crawled next.
uris - Variable in class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter
- key: the crawled (known) uri - value: the info about the URI (see InMemoryKnownUriFilter.UriInfo), including the reference list
uris - Variable in class org.dice_research.squirrel.rabbit.msgs.CrawlingResult
 
uris - Variable in class org.dice_research.squirrel.rabbit.msgs.UriSet
 
urisCrawling - Variable in class org.dice_research.squirrel.worker.WorkerInfo
List contains all uris that the worker is currently crawling.
UriSet - Class in org.dice_research.squirrel.rabbit.msgs
 
UriSet(List<CrawleableUri>) - Constructor for class org.dice_research.squirrel.rabbit.msgs.UriSet
 
UriSet() - Constructor for class org.dice_research.squirrel.rabbit.msgs.UriSet
 
UriSetRequest - Class in org.dice_research.squirrel.rabbit.msgs
 
UriSetRequest() - Constructor for class org.dice_research.squirrel.rabbit.msgs.UriSetRequest
Standard constructor setting just default values.
UriSetRequest(int, boolean) - Constructor for class org.dice_research.squirrel.rabbit.msgs.UriSetRequest
Parametrized Constructor.
urisOfUris - Variable in class org.dice_research.squirrel.collect.SimpleUriCollector
 
UriType - Enum in org.dice_research.squirrel.data.uri
Deprecated.
UriType() - Constructor for enum org.dice_research.squirrel.data.uri.UriType
Deprecated.
 
UriUtils - Class in org.dice_research.squirrel.data.uri
Created by ivan on 29.02.16.
UriUtils() - Constructor for class org.dice_research.squirrel.data.uri.UriUtils
 
UUID_KEY - Static variable in class org.dice_research.squirrel.Constants
 

V

valueOf(String) - Static method in enum org.dice_research.squirrel.data.uri.UriType
Deprecated.
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.dice_research.squirrel.metadata.CrawlingActivity.CrawlingURIState
Returns the enum constant of this type with the specified name.
values() - Static method in enum org.dice_research.squirrel.data.uri.UriType
Deprecated.
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.dice_research.squirrel.metadata.CrawlingActivity.CrawlingURIState
Returns an array containing the constants of this enum type, in the order they are declared.
VCard - Class in org.dice_research.squirrel.vocab
 
VCard() - Constructor for class org.dice_research.squirrel.vocab.VCard
 

W

wasAssociatedWith - Static variable in class org.dice_research.squirrel.vocab.PROV_O
 
wasGeneratedBy - Static variable in class org.dice_research.squirrel.vocab.PROV_O
 
Worker - Interface in org.dice_research.squirrel.worker
 
WorkerInfo - Class in org.dice_research.squirrel.worker
This class is used to exchange information about objects of Worker over the network.
WorkerInfo(boolean, List<CrawleableUri>, Date) - Constructor for class org.dice_research.squirrel.worker.WorkerInfo
 
workerSendsAliveMessages - Variable in class org.dice_research.squirrel.rabbit.msgs.UriSetRequest
Indicates whether the worker (see UriSetRequest.idOfWorker) sends org.dice_research.squirrel.worker.impl.AliveMessage.
workerSendsAliveMessages() - Method in class org.dice_research.squirrel.rabbit.msgs.UriSetRequest
 
workerSendsAliveMessages - Variable in class org.dice_research.squirrel.worker.WorkerInfo
Indicates whether the Worker sends objects of AliveMessage.
workerSendsAliveMessages() - Method in class org.dice_research.squirrel.worker.WorkerInfo
 
workerUri - Variable in class org.dice_research.squirrel.metadata.CrawlingActivity
URI of the worker assigned carrying out this activity.
write(JsonWriter, CrawleableUri) - Method in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
writeDataEntry(JsonWriter, String, Object) - Method in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
writeInetAddress(JsonWriter, InetAddress) - Method in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
 
writeObject(Object) - Method in class org.dice_research.squirrel.rabbit.RabbitMQHelper
Deprecated.
Serializes the given object by creating a byte array with the following content: length of class name class name of the given object length of JSON representation the given object as JSON
A B C D E F G H I K L M N O P Q R S T U V W 
Skip navigation links

Copyright © 2017–2019. All rights reserved.