- generateFileName(String, boolean) - Static method in class org.dice_research.squirrel.data.uri.UriUtils
-
- generateFileName(String, String) - Static method in class org.dice_research.squirrel.data.uri.UriUtils
-
Deprecated.
- generateFileName(CrawleableUri, String) - Static method in class org.dice_research.squirrel.data.uri.UriUtils
-
- generatePrefixMap() - Static method in class org.dice_research.squirrel.vocab.Prefixes
-
- getCrawleableUri() - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
-
- getCrawleableUriList() - Static method in class org.dice_research.squirrel.data.uri.UriUtils
-
- getCrawledRdfData() - Method in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
-
Returns the data written to the sink as a map with the crawled URI as key and
the RDF data as value.
- getCrawledUnstructuredData() - Method in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
-
Returns the data written to the sink as a map with the crawled URI as key and
the unstructured data as value.
- getData(String) - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
-
- getData() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
-
- getDateLastAlive() - Method in class org.dice_research.squirrel.worker.WorkerInfo
-
- getDecorated() - Method in class org.dice_research.squirrel.data.uri.filter.AbstractKnownUriFilterDecorator
-
- getDecorated() - Method in interface org.dice_research.squirrel.data.uri.filter.KnownUriFilterDecorator
-
- getDomainName(String) - Static method in class org.dice_research.squirrel.data.uri.UriUtils
-
- getEnv(String, Logger) - Static method in class org.dice_research.squirrel.configurator.Configuration
-
- getEnvBoolean(String, Logger) - Static method in class org.dice_research.squirrel.configurator.Configuration
-
- getEnvInteger(String, Logger) - Static method in class org.dice_research.squirrel.configurator.Configuration
-
- getEnvLong(String, Logger) - Static method in class org.dice_research.squirrel.configurator.Configuration
-
- getId() - Method in interface org.dice_research.squirrel.worker.Worker
-
Gives the unique id of the worker.
- getIdOfWorker() - Method in class org.dice_research.squirrel.rabbit.msgs.UriSetRequest
-
- getIdOfWorker() - Method in class org.dice_research.squirrel.worker.AliveMessage
-
Get the id of the worker.
- getIpAddress() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
-
- getIPURIIterator() - Method in class org.dice_research.squirrel.queue.InMemoryQueue
-
- getIPURIIterator() - Method in interface org.dice_research.squirrel.queue.IpAddressBasedQueue
-
Goes through the queue und collects all IP-address with their URIs
- getIterator() - Method in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
-
- getIterator() - Method in class org.dice_research.squirrel.queue.InMemoryQueue
-
- getNextUris() - Method in interface org.dice_research.squirrel.frontier.Frontier
-
Returns the next chunk of URIs that should be crawled or null.
- getNextUris() - Method in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
-
- getNextUris() - Method in interface org.dice_research.squirrel.queue.UriQueue
-
Returns the next chunk of URIs that should be crawled or null.
- getNumberOfBlockedIps() - Method in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
-
- getNumberOfBlockedIps() - Method in interface org.dice_research.squirrel.queue.IpAddressBasedQueue
-
Returns the number of IP addresses that are currently blocked.
- getNumberOfPendingUris() - Method in interface org.dice_research.squirrel.frontier.Frontier
-
(optional) Returns the number of URIs that have been requested from the
Frontier using
Frontier.getNextUris() and have not been marked as
crawled using
Frontier#crawlingDone(Map).
- getOutdatedUris() - Method in class org.dice_research.squirrel.data.uri.filter.AbstractKnownUriFilterDecorator
-
- getOutdatedUris() - Method in class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter
-
- getOutdatedUris() - Method in interface org.dice_research.squirrel.data.uri.filter.KnownUriFilter
-
- getSize() - Method in class org.dice_research.squirrel.collect.SimpleUriCollector
-
- getSize(CrawleableUri) - Method in class org.dice_research.squirrel.collect.SimpleUriCollector
-
- getSize(CrawleableUri) - Method in interface org.dice_research.squirrel.collect.UriCollector
-
Returns the total of uris that have been collected
- getState() - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
-
- getStepsAsString() - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
-
- getTempDir(String, String) - Static method in class org.dice_research.squirrel.utils.TempFileHelper
-
Creates a temporary directory that can be used for tests.
- getTimestampNextCrawl() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
-
- getTriplesForGraph(CrawleableUri) - Method in interface org.dice_research.squirrel.sink.tripleBased.AdvancedTripleBasedSink
-
Get all Triples behind the given uri.
- getType() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
-
Deprecated.
- getUri() - Method in class org.dice_research.squirrel.data.uri.CrawleableUri
-
- getUri() - Method in class org.dice_research.squirrel.metadata.CrawlingActivity
-
- getURI() - Static method in class org.dice_research.squirrel.vocab.DCAT
-
returns the URI for this schema
- getURI() - Static method in class org.dice_research.squirrel.vocab.PROV_O
-
returns the URI for this schema
- getURI() - Static method in class org.dice_research.squirrel.vocab.Squirrel
-
returns the URI for this schema
- getURI() - Static method in class org.dice_research.squirrel.vocab.VCard
-
returns the URI for this schema
- getUri() - Method in interface org.dice_research.squirrel.worker.Worker
-
Gives the unique URI of the worker.
- getUris(CrawleableUri) - Method in class org.dice_research.squirrel.collect.SimpleUriCollector
-
- getUris(CrawleableUri) - Method in interface org.dice_research.squirrel.collect.UriCollector
-
Returns a list of serialized
CrawleableUri instances that have been
collected for the given URI.
- getUris(IpUriTypePair) - Method in class org.dice_research.squirrel.queue.AbstractIpAddressBasedQueue
-
- getUris(IpUriTypePair) - Method in class org.dice_research.squirrel.queue.InMemoryQueue
-
- getUrisCrawling() - Method in class org.dice_research.squirrel.worker.WorkerInfo
-
- getUrisWithSameHashValues(Set<HashValue>) - Method in interface org.dice_research.squirrel.deduplication.hashing.UriHashCustodian
-
Get all uris that have a common hash value with one of the hash values of the given set.
- gson - Variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer
-
- gson - Variable in class org.dice_research.squirrel.rabbit.RabbitMQHelper
-
Deprecated.
- GsonUriSerializer - Class in org.dice_research.squirrel.data.uri.serialize.gson
-
A serializer that uses Gson to serialize URIs.
- GsonUriSerializer() - Constructor for class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer
-
- GsonUriSerializer(GsonBuilder) - Constructor for class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer
-
- GsonUriSerializer.CrawleableUriAdapter - Class in org.dice_research.squirrel.data.uri.serialize.gson
-
- GzipJavaUriSerializer - Class in org.dice_research.squirrel.data.uri.serialize.java
-
- GzipJavaUriSerializer() - Constructor for class org.dice_research.squirrel.data.uri.serialize.java.GzipJavaUriSerializer
-
- unstrcuturedData - Variable in class org.dice_research.squirrel.sink.impl.mem.InMemorySink
-
In-memory map used to store the unstructured data that is written to the
sink.
- UnstructuredDataSink - Interface in org.dice_research.squirrel.sink
-
A sink that can handle unstructured data.
- uri - Variable in class org.dice_research.squirrel.data.uri.CrawleableUri
-
- uri - Variable in enum org.dice_research.squirrel.metadata.CrawlingActivity.CrawlingURIState
-
- uri - Variable in class org.dice_research.squirrel.metadata.CrawlingActivity
-
The uri for the crawling activity.
- uri - Static variable in class org.dice_research.squirrel.vocab.DCAT
-
The namespace of the vocabulary as a string
- uri - Static variable in class org.dice_research.squirrel.vocab.PROV_O
-
The namespace of the vocabulary as a string
- uri - Static variable in class org.dice_research.squirrel.vocab.Squirrel
-
The namespace of the vocabulary as a string
- uri - Static variable in class org.dice_research.squirrel.vocab.VCard
-
The namespace of the vocabulary as a string
- URI_CRAWLING_ACTIVITY - Static variable in class org.dice_research.squirrel.Constants
-
- URI_CRAWLING_ACTIVITY_URI - Static variable in class org.dice_research.squirrel.Constants
-
- URI_DATA_FILE_NAME - Static variable in class org.dice_research.squirrel.Constants
-
- URI_HASH_KEY - Static variable in class org.dice_research.squirrel.Constants
-
- URI_HTTP_ACCEPT_CHARSET_HEADER - Static variable in class org.dice_research.squirrel.Constants
-
- URI_HTTP_ACCEPT_HEADER - Static variable in class org.dice_research.squirrel.Constants
-
- URI_HTTP_CHARSET_KEY - Static variable in class org.dice_research.squirrel.Constants
-
- URI_HTTP_MIME_TYPE_KEY - Static variable in class org.dice_research.squirrel.Constants
-
- URI_HTTP_STATUS_CODE - Static variable in class org.dice_research.squirrel.Constants
-
- URI_KEY - Static variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
-
- URI_PREFERRED_RECRAWL_ON - Static variable in class org.dice_research.squirrel.Constants
-
The preferred date for recrawling a URI is assumed to be a timestamp (in ms
from 1st January 1970).
- URI_START_INDEX - Static variable in class org.dice_research.squirrel.data.uri.CrawleableUri
-
- URI_TYPE_KEY - Static variable in class org.dice_research.squirrel.Constants
-
- URI_TYPE_KEY - Static variable in class org.dice_research.squirrel.data.uri.serialize.gson.GsonUriSerializer.CrawleableUriAdapter
-
- URI_TYPE_VALUE_CSV - Static variable in class org.dice_research.squirrel.Constants
-
- URI_TYPE_VALUE_DEREF - Static variable in class org.dice_research.squirrel.Constants
-
- URI_TYPE_VALUE_DUMP - Static variable in class org.dice_research.squirrel.Constants
-
- URI_TYPE_VALUE_HTML - Static variable in class org.dice_research.squirrel.Constants
-
- URI_TYPE_VALUE_SPARQL - Static variable in class org.dice_research.squirrel.Constants
-
- UriCollector - Interface in org.dice_research.squirrel.collect
-
A URI collector stores the URIs that have been found by a worker while
crawling/processing a certain URI.
- UriFilter - Interface in org.dice_research.squirrel.data.uri.filter
-
A simple filter that can decide whether a given
CrawleableUri object
imposes a certain requirement or not.
- UriHashCustodian - Interface in org.dice_research.squirrel.deduplication.hashing
-
This component maintains
HashValues for uris.
- uriHostedOn - Static variable in class org.dice_research.squirrel.vocab.Squirrel
-
- UriInfo(long, long, boolean) - Constructor for class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter.UriInfo
-
- UriProcessor - Class in org.dice_research.squirrel.uri.processing
-
Uri Processor implementation.
- UriProcessor() - Constructor for class org.dice_research.squirrel.uri.processing.UriProcessor
-
- UriProcessorInterface - Interface in org.dice_research.squirrel.uri.processing
-
Interface for Uri Processor, defines main methods for processing.
- UriQueue - Interface in org.dice_research.squirrel.queue
-
Interface of a URI queue managing the URIs that should be crawled next.
- uris - Variable in class org.dice_research.squirrel.data.uri.filter.InMemoryKnownUriFilter
-
- uris - Variable in class org.dice_research.squirrel.rabbit.msgs.CrawlingResult
-
- uris - Variable in class org.dice_research.squirrel.rabbit.msgs.UriSet
-
- urisCrawling - Variable in class org.dice_research.squirrel.worker.WorkerInfo
-
List contains all uris that the worker is currently crawling.
- UriSet - Class in org.dice_research.squirrel.rabbit.msgs
-
- UriSet(List<CrawleableUri>) - Constructor for class org.dice_research.squirrel.rabbit.msgs.UriSet
-
- UriSet() - Constructor for class org.dice_research.squirrel.rabbit.msgs.UriSet
-
- UriSetRequest - Class in org.dice_research.squirrel.rabbit.msgs
-
- UriSetRequest() - Constructor for class org.dice_research.squirrel.rabbit.msgs.UriSetRequest
-
Standard constructor setting just default values.
- UriSetRequest(int, boolean) - Constructor for class org.dice_research.squirrel.rabbit.msgs.UriSetRequest
-
Parametrized Constructor.
- urisOfUris - Variable in class org.dice_research.squirrel.collect.SimpleUriCollector
-
- UriType - Enum in org.dice_research.squirrel.data.uri
-
Deprecated.
- UriType() - Constructor for enum org.dice_research.squirrel.data.uri.UriType
-
Deprecated.
- UriUtils - Class in org.dice_research.squirrel.data.uri
-
Created by ivan on 29.02.16.
- UriUtils() - Constructor for class org.dice_research.squirrel.data.uri.UriUtils
-
- UUID_KEY - Static variable in class org.dice_research.squirrel.Constants
-