public class SimpleUriCollector extends Object implements UriCollector
| Modifier and Type | Field and Description |
|---|---|
private static org.slf4j.Logger |
LOGGER |
protected Serializer |
serializer |
private long |
total_uris |
protected Map<String,Map<String,byte[]>> |
urisOfUris |
| Constructor and Description |
|---|
SimpleUriCollector(Serializer serializer) |
| Modifier and Type | Method and Description |
|---|---|
void |
addNewUri(CrawleableUri uri,
CrawleableUri newUri)
Adds the given new URI to the list of URIs collected for the given URI.
|
void |
closeSinkForUri(CrawleableUri uri)
Closes the resources necessary for storing the data of the given URI.
|
long |
getSize() |
long |
getSize(CrawleableUri uri)
Returns the total of uris that have been collected
|
Iterator<byte[]> |
getUris(CrawleableUri uri)
Returns a list of serialized
CrawleableUri instances that have been
collected for the given URI. |
void |
openSinkForUri(CrawleableUri uri)
Opens the sink to process data for the given URI.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitaddNewUri, addNewUri, addTripleprivate static final org.slf4j.Logger LOGGER
private long total_uris
protected Serializer serializer
public SimpleUriCollector(Serializer serializer)
public void openSinkForUri(CrawleableUri uri)
SinkBaseopenSinkForUri in interface SinkBaseuri - the URI for which data should be stored.public Iterator<byte[]> getUris(CrawleableUri uri)
UriCollectorCrawleableUri instances that have been
collected for the given URI.getUris in interface UriCollectoruri - The URI from which the returned serialized URIs have been
collected.Iterator that iterates over the already serialized URIs
that have been collected for the given URI.public void addNewUri(CrawleableUri uri, CrawleableUri newUri)
UriCollectoraddNewUri in interface UriCollectoruri - The URI from which the given new URI has been collected.newUri - The new URI that has been collected.public long getSize()
public void closeSinkForUri(CrawleableUri uri)
SinkBasecloseSinkForUri in interface SinkBaseuri - the URI for which data has been stored and for which the resources
should be freed.public long getSize(CrawleableUri uri)
UriCollectorgetSize in interface UriCollectoruri - The URI from which the returned serialized URIs have been
collected.Copyright © 2017–2019. All rights reserved.